Kevin MacDonald wrote:
See the code snippet below from org.apache.nutch.crawl.Crawl. I think parsing happens opposite to what the nutch-site.xml config file indicates.public static void main(...) { ... if (!Fetcher.isParsing(job)) { parseSegment.parse(segment); // parse it, if needed } ... }
What do you mean? This snippet simply shows that if you set the Fetcher to non-parsing mode we need to run the parsing as a separate explicit step. In any case you need to parse the content in order to collect links and update the db.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
