Hi,

i tryed 2 days ago to change the name of 2 meta fields in  index-basic plugin:
i renamed the 2 fields  'url' and 'content' as  'web.url' and 'web.content' in 
the BasicIndexingFilter.java :



After that i run 'ANT' to build the project.

i copied the plugin folder 'index-basic'   from  nutch-1.0/build/plugins/    to 
 /nutch-1.0/plugins



and since that changes i have this error when crawling :



2009-10-02 15:15:44,145 INFO  indexer.DeleteDuplicates - Dedup: starting

2009-10-02 15:15:44,147 INFO  indexer.DeleteDuplicates - Dedup: adding indexes 
in: crawl_dc/indexes

2009-10-02 15:15:44,153 WARN  mapred.JobClient - Use
GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.

2009-10-02 15:15:45,518 WARN  mapred.LocalJobRunner - job_local_0013

java.lang.NullPointerException

        at org.apache.hadoop.io.Text.encode(Text.java:388)

        at org.apache.hadoop.io.Text.set(Text.java:178)

        at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:191)

        at 
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:157)

        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)

        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)

        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)

        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)

        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)





even when making a rollback still have the same problem....

what shoud i do plz !!!



public class BasicIndexingFilter implements IndexingFilter {

.....

 doc.add("web.url", reprUrlString == null ? urlString : reprUrlString);
 doc.add("web.content", parse.getText());

....

public void addIndexBackendOptions(Configuration conf) {

....

 // url is both stored and indexed, so it's both searchable and returned
    LuceneWriter.addFieldOptions("web.url", LuceneWriter.STORE.YES,
        LuceneWriter.INDEX.TOKENIZED, conf);

    // content is indexed, so that it's searchable, but not stored in index
    LuceneWriter.addFieldOptions("web.content", LuceneWriter.STORE.NO,
        LuceneWriter.INDEX.TOKENIZED, conf);


.....} // end of method addIndexBackendOptions



....} //end of class



thx

                                          
_________________________________________________________________
We are your photos. Share us now with Windows Live Photos.
http://go.microsoft.com/?linkid=9666047

Reply via email to