Hello, Start by disabling filtering and normalizing, it was already done in the parser. Only enable it just once if you changed filters and/or normalizers. You can use -segmemt to update an existing graph. By the way, is building the graph a performance problem? What about computing the linkrank which is much more costly.
Markus -----Original message----- > From:Michael Coffey <[email protected]> > Sent: Thursday 2nd March 2017 2:07 > To: [email protected] > Subject: webgraph speed > > Hello nutchers! > I am trying to compute linkrank scores without spending excessive time on the > task. My version of the crawl script contains the following line, which is > similar to a commented-out line in the bin/crawl script in the 1.12 > distribution. > __bin_nutch webgraph $commonOptions -filter -normalize -segmentDir > "$CRAWL_PATH"/segments/ -webgraphdb "$CRAWL_PATH" > I notice that it specifies -segmentDir, rather than -segment. Does that mean > it re-computes the outlinkdb and other information for every existing > segment every time it does a new segment, or does it check and avoid re-doing > things it did before? > If I change it to say -segment "$CRAWL_PATH"/segments/$SEGMENT, will it do > just what needs doing? The way I have it now, it spends a lot of time > computing outlinkdb. > Thanks for any light you may shed. >

