Hi,
I just committed a high-level API for working with segment data. The classes are located in net.nutch.segment.* package.
I just realized that SegmentReader doesn't work now with segments created by Fetcher in -noparse mode. I'll fix it in a day or two.
However, I have a similar issue for the new version of SegmentMergeTool, but here I'm not sure how to react if I discover mixed-mode segments on input, i.e. some segments created in full-parse mode, and some created in -noparse mode... Should the tool in such case do one of the following:
* assume that you don't want the parse data, and you will re-create it anyway for all data in the output segment. This means that it should merge all input from both "fetcher" and "fetcher_output" into a single output "fetcher", and at the same time discard all data in parse_data and parse_text.
* assume you want only parsed segments, and skip all non-parsed segments, issuing a warning
* assume you want only parsed segments, and run ParseSegment if parse_text is missing.
Any suggestions?
-- Best regards, Andrzej Bialecki
------------------------------------------------- Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator ------------------------------------------------- FreeBSD developer (http://www.freebsd.org)
------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
