ok :)

i found the problem ! i just deleted all file from build folder and run an ANT 
again and it works.
but still nutch error messages are not that expressive :) hope they will make 
some efforts on those messages .

thx to all.



> From: mbel...@msn.com
> To: nutch-user@lucene.apache.org
> Subject: RE: problem ending crawl nutch 1.0 - DeleteDuplicates
> Date: Tue, 6 Oct 2009 13:59:16 +0000
> 
> 
> hi,
> 
> i forget to say that when the errors happen, and the crawling stops it 
> creates the folder  'dedup-urls-485515157'
> can some one tell me when using  'ant' what will we do after that ?? 
> concerning jars , build ...etc
> 
> thx
> 
> 
> 
> > From: mbel...@msn.com
> > To: nutch-user@lucene.apache.org
> > Subject: RE: problem ending crawl nutch 1.0 - DeleteDuplicates
> > Date: Sun, 4 Oct 2009 16:21:13 +0000
> > 
> > 
> > hi,
> > any idea !! 
> > 
> > 
> > 
> > > From: mbel...@msn.com
> > > To: nutch-user@lucene.apache.org
> > > Subject: problem ending crawl nutch 1.0 - DeleteDuplicates
> > > Date: Fri, 2 Oct 2009 19:36:06 +0000
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > i tryed 2 days ago to change the name of 2 meta fields in  index-basic 
> > > plugin:
> > > i renamed the 2 fields  'url' and 'content' as  'web.url' and 
> > > 'web.content' in the BasicIndexingFilter.java :
> > > 
> > > 
> > > 
> > > After that i run 'ANT' to build the project.
> > > 
> > > i copied the plugin folder 'index-basic'   from  nutch-1.0/build/plugins/ 
> > >    to  /nutch-1.0/plugins
> > > 
> > > 
> > > 
> > > and since that changes i have this error when crawling :
> > > 
> > > 
> > > 
> > > 2009-10-02 15:15:44,145 INFO  indexer.DeleteDuplicates - Dedup: starting
> > > 
> > > 2009-10-02 15:15:44,147 INFO  indexer.DeleteDuplicates - Dedup: adding 
> > > indexes in: crawl_dc/indexes
> > > 
> > > 2009-10-02 15:15:44,153 WARN  mapred.JobClient - Use
> > > GenericOptionsParser for parsing the arguments. Applications should
> > > implement Tool for the same.
> > > 
> > > 2009-10-02 15:15:45,518 WARN  mapred.LocalJobRunner - job_local_0013
> > > 
> > > java.lang.NullPointerException
> > > 
> > >         at org.apache.hadoop.io.Text.encode(Text.java:388)
> > > 
> > >         at org.apache.hadoop.io.Text.set(Text.java:178)
> > > 
> > >         at 
> > > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:191)
> > > 
> > >         at 
> > > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:157)
> > > 
> > >         at 
> > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
> > > 
> > >         at 
> > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
> > > 
> > >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> > > 
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> > > 
> > >         at 
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> > > 
> > > 
> > > 
> > > 
> > > 
> > > even when making a rollback still have the same problem....
> > > 
> > > what shoud i do plz !!!
> > > 
> > > 
> > > 
> > > public class BasicIndexingFilter implements IndexingFilter {
> > > 
> > > .....
> > > 
> > >  doc.add("web.url", reprUrlString == null ? urlString : reprUrlString);
> > >  doc.add("web.content", parse.getText());
> > > 
> > > ....
> > > 
> > > public void addIndexBackendOptions(Configuration conf) {
> > > 
> > > ....
> > > 
> > >  // url is both stored and indexed, so it's both searchable and returned
> > >     LuceneWriter.addFieldOptions("web.url", LuceneWriter.STORE.YES,
> > >         LuceneWriter.INDEX.TOKENIZED, conf);
> > > 
> > >     // content is indexed, so that it's searchable, but not stored in 
> > > index
> > >     LuceneWriter.addFieldOptions("web.content", LuceneWriter.STORE.NO,
> > >         LuceneWriter.INDEX.TOKENIZED, conf);
> > > 
> > > 
> > > .....} // end of method addIndexBackendOptions
> > > 
> > > 
> > > 
> > > ....} //end of class
> > > 
> > > 
> > > 
> > > thx
> > > 
> > >                                     
> > > _________________________________________________________________
> > > We are your photos. Share us now with Windows Live Photos.
> > > http://go.microsoft.com/?linkid=9666047
> >                                       
> > _________________________________________________________________
> > Click less, chat more: Messenger on MSN.ca
> > http://go.microsoft.com/?linkid=9677404
>                                         
> _________________________________________________________________
> Click less, chat more: Messenger on MSN.ca
> http://go.microsoft.com/?linkid=9677404
                                          
_________________________________________________________________
New: Messenger sign-in on the MSN homepage
http://go.microsoft.com/?linkid=9677403

Reply via email to