Hi Koch, Sorry, I thought that would have fixed your problem.
How big is your crawldb? If it is small, would you mind sending it to me so I can have a look? On Wed, Feb 11, 2009 at 10:24 AM, Koch Martina <[email protected]> wrote: > Hi Doğacan, > > thanks for your reply! > > I applied the patch, but I still get the same error message. > I also tried to merge the old crawldb in a new one and then to a readdb, but > even the merge step fails with the following error message: > > 2009-02-11 08:35:31,520 INFO jvm.JvmMetrics - Initializing JVM Metrics with > processName=JobTracker, sessionId= > 2009-02-11 08:35:31,707 INFO mapred.FileInputFormat - Total input paths to > process : 1 > 2009-02-11 08:35:32,004 INFO mapred.JobClient - Running job: job_local_0001 > 2009-02-11 08:35:32,004 INFO mapred.FileInputFormat - Total input paths to > process : 1 > 2009-02-11 08:35:32,082 INFO mapred.MapTask - numReduceTasks: 1 > 2009-02-11 08:35:32,082 INFO mapred.MapTask - io.sort.mb = 100 > 2009-02-11 08:35:32,191 INFO mapred.MapTask - data buffer = 79691776/99614720 > 2009-02-11 08:35:32,191 INFO mapred.MapTask - record buffer = 262144/327680 > 2009-02-11 08:35:32,222 WARN mapred.LocalJobRunner - job_local_0001 > java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81) > at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164) > at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) > at > org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) > Caused by: java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73) > ... 13 more > 2009-02-11 08:35:33,003 FATAL crawl.CrawlDbMerger - CrawlDb merge: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) > at org.apache.nutch.crawl.CrawlDbMerger.merge(CrawlDbMerger.java:119) > at org.apache.nutch.crawl.CrawlDbMerger.run(CrawlDbMerger.java:178) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.CrawlDbMerger.main(CrawlDbMerger.java:150) > > I ran the merge step in debug mode and saw that the new code lines of > CrawlDbMerger are never read. The error occurs earlier somewhere in the merge > method. > > Kind regards, > Martina > > > -----Ursprüngliche Nachricht----- > Von: Doğacan Güney [mailto:[email protected]] > Gesendet: Dienstag, 10. Februar 2009 22:54 > An: [email protected] > Betreff: Re: "old" crawldb not readable with current trunk > > On Tue, Feb 10, 2009 at 4:47 PM, Koch Martina <[email protected]> wrote: >> Hi, >> >> I just upgraded from trunk version 28.12.2008 to trunk version 04.02.2009. >> Now, I'm trying to read my old crawldb's e.g. by using the command >> "bin/nutch readdb <crawldb> -stats" , but I always get the following error: >> >> 2009-02-10 15:41:05,541 DEBUG mapred.MapTask - Writing local split to >> /tmp/CRAWLNAME.default.xyz/mapred/local/localRunner/split.dta >> 2009-02-10 15:41:05,588 DEBUG mapred.TaskRunner - >> attempt_local_0001_m_000000_0 Progress/ping thread started >> 2009-02-10 15:41:05,588 INFO mapred.MapTask - numReduceTasks: 1 >> 2009-02-10 15:41:05,588 INFO mapred.MapTask - io.sort.mb = 100 >> 2009-02-10 15:41:05,698 INFO mapred.MapTask - data buffer = >> 79691776/99614720 >> 2009-02-10 15:41:05,698 INFO mapred.MapTask - record buffer = 262144/327680 >> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Creating group >> org.apache.hadoop.mapred.Task$Counter with bundle >> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_BYTES >> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_OUTPUT_RECORDS >> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_INPUT_RECORDS >> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding COMBINE_OUTPUT_RECORDS >> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_RECORDS >> 2009-02-10 15:41:05,713 DEBUG mapred.Counters - Adding MAP_INPUT_BYTES >> 2009-02-10 15:41:05,729 WARN mapred.LocalJobRunner - job_local_0001 >> java.lang.RuntimeException: java.lang.NullPointerException >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81) >> at >> org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:164) >> at >> org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:262) >> at >> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) >> at >> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) >> at >> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817) >> at >> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790) >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103) >> at >> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78) >> at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:186) >> at >> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:170) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) >> Caused by: java.lang.NullPointerException >> at >> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768) >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:73) >> ... 13 more >> >> With the older version oft he trunk I can read the crawldb without >> difficulty. >> >> Are the old files not readable with the new trunk version since the upgrade >> to lucene 2.4? >> Is there anything I can do to re-use my old data with the new version? >> > > Try again in a couple of days. This is a known bug (NUTCH-683). I will > commit that patch very > soon. Meanwhile, you can apply patch there manually. > >> Kind regards, >> Martina >> > > > > -- > Doğacan Güney > -- Doğacan Güney
