"Again, this procedure does NOT work when using HDFS - you won't even see
the partial output (without some serious hacking)"

Got it !

"You can simply set the fetcher.parsing config option to false."

Found it !

Thanks for the help


2010/5/3 Andrzej Bialecki <a...@getopt.org>

> On 2010-05-03 22:58, Emmanuel de Castro Santana wrote:
> > "The first method of recovering that is mentioned there works only
> > (sometimes) for crawls performed using LocalJobTracker and local file
> > system. It does not work in any other case."
> >
> > if I stop the crawling process, take the crawled content from the dfs
> into
> > my local disk, do the fix and then put it back into hdfs, would it work ?
> Or
> > would there be a problem about dfs replication of the new files ?
>
> Again, this procedure does NOT work when using HDFS - you won't even see
> the partial output (without some serious hacking).
>
> >
> > "Yes, but if the parsing fails you still have the downloaded content,
> > which you can re-parse again after you fixed the config or the code..."
> >
> > Interesting ... I did not see any option like a -noParsing in the
> "bin/nutch
> > crawl" command, that means I will have to code my own .sh for crawling,
> one
> > that uses the -noparsing option of the fetcher right ?
>
> You can simply set the fetcher.parsing config option to false.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
Emmanuel de Castro Santana

Reply via email to