Hi,
I use "mergedb" to filter urls before indexing with "solrindex". Instead of Indexing, I got the error log message below. The same happens, if I do not use the "-filter"-statement. When Indexing without "mergedb", everything works fine. The commands: [.] /progs/nutch/bin/nutch mergedb /data/nutch/crawldata/crawldb_new /data/nutch/crawldata/crawldb segment=`ls -d /data/nutch/crawldata/segments/*` /progs/nutch/bin/nutch solrindex http://127.0.0.1:8080/solr1 /data/nutch/crawldata/crawldb_new /data/nutch/crawldata/linkdb $segment The error log: 2009-01-29 10:19:57,952 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2009-01-29 10:19:57,954 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2009-01-29 10:19:57,957 WARN mapred.LocalJobRunner - job_local_0001 java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81) at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164) at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d eserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.d eserialize(WritableSerialization.java:40) at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java: 1817) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1 790) at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFi leRecordReader.java:103) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordRea der.java:78) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java :186) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:796) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73) ... 13 more 2009-01-29 10:19:58,459 FATAL solr.SolrIndexer - SolrIndexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) at org.apache.nutch.indexer.solr.SolrIndexer.indexSolr(SolrIndexer.java:57) at org.apache.nutch.indexer.solr.SolrIndexer.run(SolrIndexer.java:79) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.solr.SolrIndexer.main(SolrIndexer.java:88) Is it a bug or am I doing something wrong? I use the latest trunk, ubuntu 8.10 server and java-6-openjdk. Best regards and thanks for help! Felix.
