Is it possible to determine from which domain(s) an outlink was
located?  The only way I know how is to limit the crawl to a single
domain (so, I would know where the outlink came from). Also, I am
having difficultly trying to figure out how in 0.9 (probably the same
in 0.8) to easily get the outlinks for my segments.  In nutch 0.7.* we
use to do something like:

<snippet>

segmentReader = createSegmentReader(segment);

final FetcherOutput fetcherOutput = new FetcherOutput();
final Content content                   = new Content();
final ParseData indexParseData   = new ParseData();
final ParseText parseText            = new ParseText();

while (segmentReader.next(fetcherOutput, content, parseText, indexParseData)) {
   extractOutlinksFromParseData(indexParseData, outlinks);
}

</snippet>

<snippet>
private void extractOutlinksFromParseData(final ParseData
indexParseData, final    Set<String> outlinks) {

       for (final Outlink outlink : indexParseData.getOutlinks()) {
           if (null != outlink  && outlink.getToUrl() != null) {
               outlinks.add(outlink.getToUrl());
           }
       }
   }
</snippet>

I am finally making the plunge and attempting to get this thing (my
application) up to date with the latest and greatest!

Thanks for your time!  And once I really get through this code I
promise to start posting answers.

Briggs.

--
"Conscious decisions by conscious minds are what make reality real"

Reply via email to