Andrzej Bialecki wrote:
Hi,

I just committed a high-level API for working with segment data. The classes are located in net.nutch.segment.* package.

I just realized that SegmentReader doesn't work now with segments created by Fetcher in -noparse mode. I'll fix it in a day or two.


However, I have a similar issue for the new version of SegmentMergeTool, but here I'm not sure how to react if I discover mixed-mode segments on input, i.e. some segments created in full-parse mode, and some created in -noparse mode... Should the tool in such case do one of the following:

* assume that you don't want the parse data, and you will re-create it anyway for all data in the output segment. This means that it should merge all input from both "fetcher" and "fetcher_output" into a single output "fetcher", and at the same time discard all data in parse_data and parse_text.

* assume you want only parsed segments, and skip all non-parsed segments, issuing a warning

* assume you want only parsed segments, and run ParseSegment if parse_text is missing.


Any suggestions?

--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to