Hello,

Can I parse more than once fetched segments without having to fetch
everything again?

When I first tried to use the "./bin nutch parse
./path/to/an/already/parsed/segment" command I got a java exception
explaining that the segment involved had already be parsed. Indeed the
following subdirectories could be found under the segment directory:

segment/content
segment/crawl_fetch
segment/crawl_generate
segment/crawl_parse
segment/parse_data
segment/parse_text

To try and force the parsing process I renamed the last 3 subdirectories
to something else and re-lunched the "./bin nutch parse" command. It has
been running for more than 24 hours... and it is still not over.

My idea is to afterward recreate an index with the newly parsed segment.

Is this the way to do it? Isn't there a simpler, and maybe quicker, way
to reparsed segments?

Thank you,

David

Reply via email to