Hello Lewis, thanks for the response, but in both cases i got an error.
FIRST CASE: ERROR: No input segments. SECOND CASE: Exception in thread "main" java.io.IOException: No input paths specified in job at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:152) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:638) at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:683) Regards, Patricio ________________________________ Von: lewis john mcgibbney <[email protected]> An: [email protected]; Patricio Galeas <[email protected]> Cc: nutch-user <[email protected]> Gesendet: 12:10 Montag, 31.Oktober 2011 Betreff: Re: error by merging segments Hi Patricio, Try this bin/nutch mergesegs /user/nutch/crawl_al/MERGEDsegments -dir /user/nutch/my_crawl/segments/* -filter -slice 50000 or bin/nutch mergesegs /user/nutch/crawl_al/MERGEDsegments -dir /user/nutch/my_crawl/segments/seg1 /user/nutch/my_crawl/segments/seg2 /user/nutch/my_crawl/segments/seg3, etc -filter -slice 50000 If this works then we need to edit the wiki to accomodate the '/*' which is required to refer to ALL segments in any given directory. HTH On Fri, Oct 28, 2011 at 11:09 PM, Patricio Galeas <[email protected]> wrote: Hello, > >when I try to merge segment using ... > >bin/nutch mergesegs /user/nutch/crawl_al/MERGEDsegments -dir >/user/nutch/my_crawl/segments -filter -slice 50000 > > >.... I get the following error. What I'm doing wrong? > >Thanks >Patricio > >java.io.EOFException >at java.io.DataInputStream.readByte(DataInputStream.java:250) >at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) >at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) >at org.apache.hadoop.io.Text.readString(Text.java:400) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) > >java.io.EOFException >at java.io.DataInputStream.readByte(DataInputStream.java:250) >at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) >at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) >at org.apache.hadoop.io.Text.readString(Text.java:400) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) > >java.io.EOFException >at java.io.DataInputStream.readByte(DataInputStream.java:250) >at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) >at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) >at org.apache.hadoop.io.Text.readString(Text.java:400) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) > >attempt_201110152026_0003_r_000001_1: log4j:WARN No appenders could be found >for logger (org.apache.hadoop.hdfs.DFSClient). >attempt_201110152026_0003_r_000001_1: log4j:WARN Please initialize the log4j >system properly. >java.io.EOFException >at java.io.DataInputStream.readByte(DataInputStream.java:250) >at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) >at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) >at org.apache.hadoop.io.Text.readString(Text.java:400) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) > >attempt_201110152026_0003_r_000000_1: log4j:WARN No appenders could be found >for logger (org.apache.hadoop.hdfs.DFSClient). >attempt_201110152026_0003_r_000000_1: log4j:WARN Please initialize the log4j >system properly. >java.io.EOFException >at java.io.DataInputStream.readByte(DataInputStream.java:250) >at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) >at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) >at org.apache.hadoop.io.Text.readString(Text.java:400) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) > >java.io.EOFException >at java.io.DataInputStream.readByte(DataInputStream.java:250) >at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) >at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) >at org.apache.hadoop.io.Text.readString(Text.java:400) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) >at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) > >Exception in thread "main" java.io.IOException: Job failed! >at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) >at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:638) >at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:683 > -- Lewis

