hi, i forget to say that when the errors happen, and the crawling stops it creates the folder 'dedup-urls-485515157' can some one tell me when using 'ant' what will we do after that ?? concerning jars , build ...etc
thx > From: mbel...@msn.com > To: nutch-user@lucene.apache.org > Subject: RE: problem ending crawl nutch 1.0 - DeleteDuplicates > Date: Sun, 4 Oct 2009 16:21:13 +0000 > > > hi, > any idea !! > > > > > From: mbel...@msn.com > > To: nutch-user@lucene.apache.org > > Subject: problem ending crawl nutch 1.0 - DeleteDuplicates > > Date: Fri, 2 Oct 2009 19:36:06 +0000 > > > > > > > > Hi, > > > > i tryed 2 days ago to change the name of 2 meta fields in index-basic > > plugin: > > i renamed the 2 fields 'url' and 'content' as 'web.url' and 'web.content' > > in the BasicIndexingFilter.java : > > > > > > > > After that i run 'ANT' to build the project. > > > > i copied the plugin folder 'index-basic' from nutch-1.0/build/plugins/ > > to /nutch-1.0/plugins > > > > > > > > and since that changes i have this error when crawling : > > > > > > > > 2009-10-02 15:15:44,145 INFO indexer.DeleteDuplicates - Dedup: starting > > > > 2009-10-02 15:15:44,147 INFO indexer.DeleteDuplicates - Dedup: adding > > indexes in: crawl_dc/indexes > > > > 2009-10-02 15:15:44,153 WARN mapred.JobClient - Use > > GenericOptionsParser for parsing the arguments. Applications should > > implement Tool for the same. > > > > 2009-10-02 15:15:45,518 WARN mapred.LocalJobRunner - job_local_0013 > > > > java.lang.NullPointerException > > > > at org.apache.hadoop.io.Text.encode(Text.java:388) > > > > at org.apache.hadoop.io.Text.set(Text.java:178) > > > > at > > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:191) > > > > at > > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:157) > > > > at > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192) > > > > at > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176) > > > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > > > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) > > > > > > > > > > > > even when making a rollback still have the same problem.... > > > > what shoud i do plz !!! > > > > > > > > public class BasicIndexingFilter implements IndexingFilter { > > > > ..... > > > > doc.add("web.url", reprUrlString == null ? urlString : reprUrlString); > > doc.add("web.content", parse.getText()); > > > > .... > > > > public void addIndexBackendOptions(Configuration conf) { > > > > .... > > > > // url is both stored and indexed, so it's both searchable and returned > > LuceneWriter.addFieldOptions("web.url", LuceneWriter.STORE.YES, > > LuceneWriter.INDEX.TOKENIZED, conf); > > > > // content is indexed, so that it's searchable, but not stored in index > > LuceneWriter.addFieldOptions("web.content", LuceneWriter.STORE.NO, > > LuceneWriter.INDEX.TOKENIZED, conf); > > > > > > .....} // end of method addIndexBackendOptions > > > > > > > > ....} //end of class > > > > > > > > thx > > > > > > _________________________________________________________________ > > We are your photos. Share us now with Windows Live Photos. > > http://go.microsoft.com/?linkid=9666047 > > _________________________________________________________________ > Click less, chat more: Messenger on MSN.ca > http://go.microsoft.com/?linkid=9677404 _________________________________________________________________ Click less, chat more: Messenger on MSN.ca http://go.microsoft.com/?linkid=9677404