On Fri, Nov 12, 2004 at 11:15:18PM +0100, Andrzej Bialecki wrote: > Andrzej Bialecki wrote: > >Hi, > > > >I just committed a high-level API for working with segment data. The > >classes are located in net.nutch.segment.* package. > > I just realized that SegmentReader doesn't work now with segments > created by Fetcher in -noparse mode. I'll fix it in a day or two. > > However, I have a similar issue for the new version of SegmentMergeTool, > but here I'm not sure how to react if I discover mixed-mode segments on > input, i.e. some segments created in full-parse mode, and some created > in -noparse mode... Should the tool in such case do one of the following: > > * assume that you don't want the parse data, and you will re-create it > anyway for all data in the output segment. This means that it should > merge all input from both "fetcher" and "fetcher_output" into a single > output "fetcher", and at the same time discard all data in parse_data > and parse_text.
One thorny issue is: how to deal with various FetcherOutput states. Before parsing was separated from fetching, failed parsing was logged as NOT_FOUND. Now it will be marked as CANT_PARSE. We may have to increase VERSION in FetcherOutput from 4 to 5, so that "old" ./fetcher can be easily distignushed from new ./fetcher and ./fetcher_output. I did not do that because not feel compelled at that time. John > > * assume you want only parsed segments, and skip all non-parsed > segments, issuing a warning > > * assume you want only parsed segments, and run ParseSegment if > parse_text is missing. > > > Any suggestions? > > -- > Best regards, > Andrzej Bialecki ------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
