It seems that the getStats code ignores the nocontent and nogenerate options:
public void getStats(Path segment, final SegmentReaderStats stats) throws
Exception {
SequenceFile.Reader[] readers =
SequenceFileOutputFormat.getReaders(getConf(), new Path(segment,
CrawlDatum.GENERATE_DIR_NAME));
long cnt = 0L;
Text key = new Text();
for (int i = 0; i < readers.length; i++) {
while (readers[i].next(key)) cnt++;
readers[i].close();
}
...
But then it should also not work in local mode or Hadoop just silently ignores
it in local mode?
Seems we should open an issue for getStats not to ignore the -no* flags.
-----Original message-----
> From:Markus Jelsma <[email protected]>
> Sent: Thursday 2nd January 2014 17:34
> To: [email protected]
> Subject: SegmentReader broken in distributed mode
>
> Hi,
>
> We can read segments fine from the local disk like this:
>
> bin/nutch readseg -list 20140102161258 -nocontent -nogenerate
> NAME GENERATED FETCHER START FETCHER END
> FETCHED PARSED
> 20140102161258 0 2014-01-02T16:13:09 2014-01-02T16:20:39
> 1227 1096
>
> But we get the following exception when reading the same segment in
> distributde mode:
>
> bin/nutch readseg -list 20140102161258 -nocontent -nogenerate
> Exception in thread "main" java.lang.NullPointerException
> at java.util.ComparableTimSort.sort(ComparableTimSort.java:146)
> at java.util.Arrays.sort(Arrays.java:472)
> at
> org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:85)
> at
> org.apache.nutch.segment.SegmentReader.getStats(SegmentReader.java:463)
> at org.apache.nutch.segment.SegmentReader.list(SegmentReader.java:441)
> at org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:587)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
> Can anyone confirm this issue?
>
> Thanks,
> Markus
>