is it possible?

well, in eclipse it succeeded. i added some encoding code in Content.java using 
HtmlParser (a plugin). it workes succesfully in eclipse (I have tested using 
SegmentReader only, not any unit tests though).

but when compiling using ant I get compile errors.

 
here is the modification in Content.java in nutch-0.9.tar.gz release version 
(not trunk)
I have replaced the line:
   buffer.append(new String(content)); // try default encoding
with
            Configuration conf = NutchConfiguration.create();
            HtmlParser parser = new HtmlParser();
            parser.setConf(conf);
            Parse parse = parser.getParse( this );
            String 
encoding=parse.getData().getParseMeta().get("OriginalCharEncoding");
                String localEncodedString="java incompatible encoding";
                try{
                        localEncodedString = new String(content,encoding);
                }
                catch(Exception e){
                        e.printStackTrace();
                }
                buffer.append(localEncodedString);

here is the compile errors;
compile-core:
    [javac] Compiling 165 source files to /home/onur/nutch-0.9/build/classes
    [javac] 
/home/onur/nutch-0.9/src/java/org/apache/nutch/protocol/Content.java:39: 
package org.apache.nutch.parse.html does not exist
    [javac] import org.apache.nutch.parse.html.HtmlParser;
    [javac]                                   ^
    [javac] 
/home/onur/nutch-0.9/src/java/org/apache/nutch/protocol/Content.java:240: 
cannot find symbol
    [javac] symbol  : class HtmlParser
    [javac] location: class org.apache.nutch.protocol.Content
    [javac]         HtmlParser parser = new HtmlParser();
    [javac]         ^
    [javac] 
/home/onur/nutch-0.9/src/java/org/apache/nutch/protocol/Content.java:240: 
cannot find symbol
    [javac] symbol  : class HtmlParser
    [javac] location: class org.apache.nutch.protocol.Content
    [javac]         HtmlParser parser = new HtmlParser();
    [javac]                                 ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
    [javac] 3 errors

BUILD FAILED
/home/onur/nutch-0.9/build.xml:106: Compile failed; see the compiler error 
output for details.


do I need to make any other configuration to fix it? (parse-html exists in 
nutch-default.xml plugin.includes property, i tried also adding it in 
nutch-site.xml, but did not work)
or it is not intended to use plugins in core code?

any ideas?

(by the way what I'm trying to do here is to enable encoding in -get 
functionality.. it normally gives content in platform-default encoding (utf-8) )

thanks


onur deniz



      

Reply via email to