Hello nutchers!
I am trying to compute linkrank scores without spending excessive time on the 
task. My version of the crawl script contains the following line, which is 
similar to a commented-out line in the bin/crawl script in the 1.12 
distribution.
__bin_nutch webgraph $commonOptions -filter -normalize -segmentDir 
"$CRAWL_PATH"/segments/ -webgraphdb "$CRAWL_PATH"
I notice that it specifies -segmentDir, rather than -segment. Does that mean it 
 re-computes the outlinkdb and other information for every existing segment 
every time it does a new segment, or does it check and avoid re-doing things it 
did before?
If I change it to say -segment "$CRAWL_PATH"/segments/$SEGMENT, will it do just 
what needs doing? The way I have it now, it spends a lot of time computing 
outlinkdb.
Thanks for any light you may shed.

Reply via email to