Hello By merging segments with ...
nutch mergesegs $crawldir/MERGEDsegments $crawldir/segments/* -slice 50000 ... I got the following message in hadoop.log: ------------------------- . . 2010-03-03 03:27:01,849 INFO segment.SegmentMerger - Slice size: 50000 URLs. 2010-03-03 22:47:43,130 INFO segment.SegmentMerger - Slice size: 50000 URLs. 2010-03-03 22:47:43,149 INFO segment.SegmentMerger - Slice size: 50000 URLs. 2010-03-03 23:47:00,029 WARN mapred.LocalJobRunner - job_local_0001 java.lang.ThreadDeath at java.lang.Thread.stop(Thread.java:715) at org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) at org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239) at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) ------------------------- In the folder $crawldir/MERGEDsegments, I have also a sub directory with the name "_temporary" What I'm doing wrong? Can I use the generated segments for indexing and continue later with the crawling process? Thanks for your help. Pat __________________________________________________ Do You Yahoo!? Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen Massenmails. http://mail.yahoo.com