I have a question about segslice. Is there no way to use it on a segment that has been fetched but not parsed? It complains of a missing fetcher directory. The output is in fetcher_output not fetcher. I created a large fetchlist (2 million), fetched it, and now I want to split it before parsing. I tried just parsing the whole segment, but it seemed to slow way down after parsing awhile.
Not yet... This still needs to be added to SegmentReader/Writer API. Patches are welcome :-)
As I said before, I believe that the default mode of operation for fetcher should be to create non-parsed segments, and to run parsers only after merge/dedup/slice/whatever steps are completed. From this point of view having the support for non-parsed segments is a must.
-- Best regards, Andrzej Bialecki
------------------------------------------------- Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator ------------------------------------------------- FreeBSD developer (http://www.freebsd.org)
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
