I'm sure it's just my ignorance of some basics of nutch. The way I read that
code it said to me "if I'm not supposed to parse, go ahead and parse".
On Thu, Sep 18, 2008 at 2:33 PM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Kevin MacDonald wrote:
>
>> See the code snippet below from org.apache.nutch.crawl.Crawl. I think
>> parsing happens opposite to what the nutch-site.xml config file indicates.
>>
>> public static void main(...) {
>> ...
>>
>> if (!Fetcher.isParsing(job)) {
>> parseSegment.parse(segment); // parse it, if needed
>> }
>>
>> ...
>> }
>>
>
> What do you mean? This snippet simply shows that if you set the Fetcher to
> non-parsing mode we need to run the parsing as a separate explicit step. In
> any case you need to parse the content in order to collect links and update
> the db.
>
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>