Hi Doğacan, I use the nutch trunk of last night, only about 10 h ago.
Best regards, Felix. -----Ursprüngliche Nachricht----- Von: Doğacan Güney [mailto:[email protected]] Gesendet: Donnerstag, 29. Januar 2009 11:34 An: [email protected] Betreff: Re: mergedb (hadoop) malfunction? On Thu, Jan 29, 2009 at 11:56 AM, Felix Zimmermann <[email protected]> wrote: > Hi, > > > > I use "mergedb" to filter urls before indexing with "solrindex". > > Instead of Indexing, I got the error log message below. > > The same happens, if I do not use the "-filter"-statement. > > When Indexing without "mergedb", everything works fine. > > Can you try with a newer trunk? I think I fixed this error in https://issues.apache.org/jira/browse/NUTCH-676 > > The commands: > > > > [.] > > /progs/nutch/bin/nutch mergedb /data/nutch/crawldata/crawldb_new > /data/nutch/crawldata/crawldb > > segment=`ls -d /data/nutch/crawldata/segments/*` > > /progs/nutch/bin/nutch solrindex http://127.0.0.1:8080/solr1 > /data/nutch/crawldata/crawldb_new /data/nutch/crawldata/linkdb $segment > > > > > > The error log: > > > > 2009-01-29 10:19:57,952 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter > > 2009-01-29 10:19:57,954 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.anchor.AnchorIndexingFilter > > 2009-01-29 10:19:57,957 WARN mapred.LocalJobRunner - job_local_0001 > > java.lang.RuntimeException: java.lang.NullPointerException > > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81) > > at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164) > > at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262) > > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d > eserialize(WritableSerialization.java:67) > > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d > eserialize(WritableSerialization.java:40) > > at > org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java: > 1817) > > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1 > 790) > > at > org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFi > leRecordReader.java:103) > > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRea > der.java:78) > > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java > :186) > > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) > > Caused by: java.lang.NullPointerException > > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:796) > > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73) > > ... 13 more > > 2009-01-29 10:19:58,459 FATAL solr.SolrIndexer - SolrIndexer: > java.io.IOException: Job failed! > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) > > at > org.apache.nutch.indexer.solr.SolrIndexer.indexSolr(SolrIndexer.java:57) > > at org.apache.nutch.indexer.solr.SolrIndexer.run(SolrIndexer.java:79) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > org.apache.nutch.indexer.solr.SolrIndexer.main(SolrIndexer.java:88) > > > > > > > > Is it a bug or am I doing something wrong? > > > > I use the latest trunk, ubuntu 8.10 server and java-6-openjdk. > > > > Best regards and thanks for help! > > Felix. > > > > > > -- Doğacan Güney
