On Thu, Jan 29, 2009 at 1:00 PM, Doğacan Güney <[email protected]> wrote: > On Thu, Jan 29, 2009 at 12:41 PM, Felix Zimmermann <[email protected]> wrote: >> Hi Doğacan, >> >> I use the nutch trunk of last night, only about 10 h ago. >> > > Then that means, I broke something :) > > How big is your crawldb? If it is small, maybe you can send it to me > and I can take a look. >
No need, I figured out the bug. Can you try with this patch: http://www.ceng.metu.edu.tr/~e1345172/crawldbmerger.patch >> Best regards, >> Felix. >> >> >> >> >> >> -----Ursprüngliche Nachricht----- >> Von: Doğacan Güney [mailto:[email protected]] >> Gesendet: Donnerstag, 29. Januar 2009 11:34 >> An: [email protected] >> Betreff: Re: mergedb (hadoop) malfunction? >> >> On Thu, Jan 29, 2009 at 11:56 AM, Felix Zimmermann <[email protected]> wrote: >>> Hi, >>> >>> >>> >>> I use "mergedb" to filter urls before indexing with "solrindex". >>> >>> Instead of Indexing, I got the error log message below. >>> >>> The same happens, if I do not use the "-filter"-statement. >>> >>> When Indexing without "mergedb", everything works fine. >>> >>> >> >> Can you try with a newer trunk? I think I fixed this error >> in >> >> https://issues.apache.org/jira/browse/NUTCH-676 >> >> >> >>> >>> The commands: >>> >>> >>> >>> [.] >>> >>> /progs/nutch/bin/nutch mergedb /data/nutch/crawldata/crawldb_new >>> /data/nutch/crawldata/crawldb >>> >>> segment=`ls -d /data/nutch/crawldata/segments/*` >>> >>> /progs/nutch/bin/nutch solrindex http://127.0.0.1:8080/solr1 >>> /data/nutch/crawldata/crawldb_new /data/nutch/crawldata/linkdb $segment >>> >>> >>> >>> >>> >>> The error log: >>> >>> >>> >>> 2009-01-29 10:19:57,952 INFO indexer.IndexingFilters - Adding >>> org.apache.nutch.indexer.basic.BasicIndexingFilter >>> >>> 2009-01-29 10:19:57,954 INFO indexer.IndexingFilters - Adding >>> org.apache.nutch.indexer.anchor.AnchorIndexingFilter >>> >>> 2009-01-29 10:19:57,957 WARN mapred.LocalJobRunner - job_local_0001 >>> >>> java.lang.RuntimeException: java.lang.NullPointerException >>> >>> at >>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81) >>> >>> at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164) >>> >>> at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262) >>> >>> at >>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d >>> eserialize(WritableSerialization.java:67) >>> >>> at >>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d >>> eserialize(WritableSerialization.java:40) >>> >>> at >>> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java: >>> 1817) >>> >>> at >>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1 >>> 790) >>> >>> at >>> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFi >>> leRecordReader.java:103) >>> >>> at >>> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRea >>> der.java:78) >>> >>> at >>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java >>> :186) >>> >>> at >>> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170) >>> >>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) >>> >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) >>> >>> at >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) >>> >>> Caused by: java.lang.NullPointerException >>> >>> at >>> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:796) >>> >>> at >>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73) >>> >>> ... 13 more >>> >>> 2009-01-29 10:19:58,459 FATAL solr.SolrIndexer - SolrIndexer: >>> java.io.IOException: Job failed! >>> >>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) >>> >>> at >>> org.apache.nutch.indexer.solr.SolrIndexer.indexSolr(SolrIndexer.java:57) >>> >>> at org.apache.nutch.indexer.solr.SolrIndexer.run(SolrIndexer.java:79) >>> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>> >>> at >>> org.apache.nutch.indexer.solr.SolrIndexer.main(SolrIndexer.java:88) >>> >>> >>> >>> >>> >>> >>> >>> Is it a bug or am I doing something wrong? >>> >>> >>> >>> I use the latest trunk, ubuntu 8.10 server and java-6-openjdk. >>> >>> >>> >>> Best regards and thanks for help! >>> >>> Felix. >>> >>> >>> >>> >>> >>> >> >> >> >> -- >> Doğacan Güney >> >> > > > > -- > Doğacan Güney > -- Doğacan Güney
