I wanted to try last night's nightly for the new freegen command. On my test case, which is:
rm -rf crawl bin/nutch inject crawl/crawldb urls/ # a single URL is in urls/urls bin/nutch generate crawl/crawldb crawl/segments bin/nutch fetch crawl/segments/2007... bin/nutch updatedb crawl/crawldb crawl/segments/2007... # generate a new segment with 5 URIs bin/nutch generate crawl/crawldb crawl/segments -topN 10 bin/nutch fetch crawl/segments/2007... # new segment bin/nutch updatedb crawl/crawldb crawl/segments/2007... # new segment # merge the segments and index bin/nutch mergesegs crawl/merged -dir crawl/segments .. We get a crash in the mergesegs. This crash, with the exact same script and start URI, configuration and plugins, does not happen on a nightly from a week ago. 2007-01-18 14:57:11,411 INFO segment.SegmentMerger - Merging 2 segments to crawl/merged_07_01_18_14_56_22/20070118145711 2007-01-18 14:57:11,482 INFO segment.SegmentMerger - SegmentMerger: adding crawl/segments/20070118145628 2007-01-18 14:57:11,489 INFO segment.SegmentMerger - SegmentMerger: adding crawl/segments/20070118145641 2007-01-18 14:57:11,495 INFO segment.SegmentMerger - SegmentMerger: using segment data from: content crawl_generate crawl_fetch crawl_parse parse_data parse_text 2007-01-18 14:57:11,594 INFO mapred.InputFormatBase - Total input paths to process : 12 2007-01-18 14:57:11,819 INFO mapred.JobClient - Running job: job_5ug2ip 2007-01-18 14:57:12,073 WARN mapred.LocalJobRunner - job_5ug2ip java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.DataOutputBuffer$Buffer.write (DataOutputBuffer.java:57) at org.apache.hadoop.io.DataOutputBuffer.write (DataOutputBuffer.java:91) at org.apache.hadoop.io.UTF8.readChars(UTF8.java:212) at org.apache.hadoop.io.UTF8.readString(UTF8.java:204) at org.apache.hadoop.io.ObjectWritable.readObject (ObjectWritable.java:173) at org.apache.hadoop.io.ObjectWritable.readFields (ObjectWritable.java:61) at org.apache.nutch.metadata.MetaWrapper.readFields (MetaWrapper.java:100) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spill (MapTask.java:427) at org.apache.hadoop.mapred.MapTask $MapOutputBuffer.sortAndSpillToDisk(MapTask.java:385) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access $200(MapTask.java:239) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:188) at org.apache.hadoop.mapred.LocalJobRunner$Job.run (LocalJobRunner.java:109) -- http://variogr.am/ [EMAIL PROTECTED] ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers