On Thu, Jan 29, 2009 at 12:41 PM, Felix Zimmermann <[email protected]> wrote: > Hi Doğacan, > > I use the nutch trunk of last night, only about 10 h ago. >
Then that means, I broke something :) How big is your crawldb? If it is small, maybe you can send it to me and I can take a look. > Best regards, > Felix. > > > > > > -----Ursprüngliche Nachricht----- > Von: Doğacan Güney [mailto:[email protected]] > Gesendet: Donnerstag, 29. Januar 2009 11:34 > An: [email protected] > Betreff: Re: mergedb (hadoop) malfunction? > > On Thu, Jan 29, 2009 at 11:56 AM, Felix Zimmermann <[email protected]> wrote: >> Hi, >> >> >> >> I use "mergedb" to filter urls before indexing with "solrindex". >> >> Instead of Indexing, I got the error log message below. >> >> The same happens, if I do not use the "-filter"-statement. >> >> When Indexing without "mergedb", everything works fine. >> >> > > Can you try with a newer trunk? I think I fixed this error > in > > https://issues.apache.org/jira/browse/NUTCH-676 > > > >> >> The commands: >> >> >> >> [.] >> >> /progs/nutch/bin/nutch mergedb /data/nutch/crawldata/crawldb_new >> /data/nutch/crawldata/crawldb >> >> segment=`ls -d /data/nutch/crawldata/segments/*` >> >> /progs/nutch/bin/nutch solrindex http://127.0.0.1:8080/solr1 >> /data/nutch/crawldata/crawldb_new /data/nutch/crawldata/linkdb $segment >> >> >> >> >> >> The error log: >> >> >> >> 2009-01-29 10:19:57,952 INFO indexer.IndexingFilters - Adding >> org.apache.nutch.indexer.basic.BasicIndexingFilter >> >> 2009-01-29 10:19:57,954 INFO indexer.IndexingFilters - Adding >> org.apache.nutch.indexer.anchor.AnchorIndexingFilter >> >> 2009-01-29 10:19:57,957 WARN mapred.LocalJobRunner - job_local_0001 >> >> java.lang.RuntimeException: java.lang.NullPointerException >> >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81) >> >> at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164) >> >> at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262) >> >> at >> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d >> eserialize(WritableSerialization.java:67) >> >> at >> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d >> eserialize(WritableSerialization.java:40) >> >> at >> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java: >> 1817) >> >> at >> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1 >> 790) >> >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFi >> leRecordReader.java:103) >> >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRea >> der.java:78) >> >> at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java >> :186) >> >> at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170) >> >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) >> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) >> >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) >> >> Caused by: java.lang.NullPointerException >> >> at >> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:796) >> >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73) >> >> ... 13 more >> >> 2009-01-29 10:19:58,459 FATAL solr.SolrIndexer - SolrIndexer: >> java.io.IOException: Job failed! >> >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) >> >> at >> org.apache.nutch.indexer.solr.SolrIndexer.indexSolr(SolrIndexer.java:57) >> >> at org.apache.nutch.indexer.solr.SolrIndexer.run(SolrIndexer.java:79) >> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> >> at >> org.apache.nutch.indexer.solr.SolrIndexer.main(SolrIndexer.java:88) >> >> >> >> >> >> >> >> Is it a bug or am I doing something wrong? >> >> >> >> I use the latest trunk, ubuntu 8.10 server and java-6-openjdk. >> >> >> >> Best regards and thanks for help! >> >> Felix. >> >> >> >> >> >> > > > > -- > Doğacan Güney > > -- Doğacan Güney
