I wanted to try last night's nightly for the new freegen command.
On my test case, which is:

rm -rf crawl
bin/nutch inject crawl/crawldb urls/  # a single URL is in urls/urls
bin/nutch generate crawl/crawldb crawl/segments
bin/nutch fetch crawl/segments/2007...
bin/nutch updatedb crawl/crawldb crawl/segments/2007...

# generate a new segment with 5 URIs
bin/nutch generate crawl/crawldb crawl/segments -topN 10
bin/nutch fetch crawl/segments/2007... # new segment
bin/nutch updatedb crawl/crawldb crawl/segments/2007... # new segment

# merge the segments and index
bin/nutch mergesegs crawl/merged -dir crawl/segments
..

We get a crash in the mergesegs. This crash, with the exact same  
script and start URI, configuration and plugins, does not happen on a  
nightly from a week ago.


2007-01-18 14:57:11,411 INFO  segment.SegmentMerger - Merging 2  
segments to crawl/merged_07_01_18_14_56_22/20070118145711
2007-01-18 14:57:11,482 INFO  segment.SegmentMerger -  
SegmentMerger:   adding crawl/segments/20070118145628
2007-01-18 14:57:11,489 INFO  segment.SegmentMerger -  
SegmentMerger:   adding crawl/segments/20070118145641
2007-01-18 14:57:11,495 INFO  segment.SegmentMerger - SegmentMerger:  
using segment data from: content crawl_generate crawl_fetch  
crawl_parse parse_data parse_text
2007-01-18 14:57:11,594 INFO  mapred.InputFormatBase - Total input  
paths to process : 12
2007-01-18 14:57:11,819 INFO  mapred.JobClient - Running job: job_5ug2ip
2007-01-18 14:57:12,073 WARN  mapred.LocalJobRunner - job_5ug2ip
java.io.EOFException
         at java.io.DataInputStream.readFully(DataInputStream.java:178)
         at org.apache.hadoop.io.DataOutputBuffer$Buffer.write 
(DataOutputBuffer.java:57)
         at org.apache.hadoop.io.DataOutputBuffer.write 
(DataOutputBuffer.java:91)
         at org.apache.hadoop.io.UTF8.readChars(UTF8.java:212)
         at org.apache.hadoop.io.UTF8.readString(UTF8.java:204)
         at org.apache.hadoop.io.ObjectWritable.readObject 
(ObjectWritable.java:173)
         at org.apache.hadoop.io.ObjectWritable.readFields 
(ObjectWritable.java:61)
         at org.apache.nutch.metadata.MetaWrapper.readFields 
(MetaWrapper.java:100)
         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spill 
(MapTask.java:427)
         at org.apache.hadoop.mapred.MapTask 
$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:385)
         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access 
$200(MapTask.java:239)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:188)
         at org.apache.hadoop.mapred.LocalJobRunner$Job.run 
(LocalJobRunner.java:109)






--
http://variogr.am/
[EMAIL PROTECTED]




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to