I see this need of recording "Content-Type" (mime type) together with downloaded pages. It will also be good for indexing/searching by mime types. It is just a matter of adding a member variable in either FetcherOutput.java or FetcherContent.java. I will prepare a patch for it, but would like to know which class you prefer?
Thanks! This would be a great contribution.
I can think of cases where you might wish to know the content type without retrieving the content, so I guess I'd vote for putting it in FetcherOutput, along with other meta-information, rather than in FetcherContent, the raw bits.
It seems fetcher_text is saved only for index purpose. Since indexing can be done using fetcher_content directly, we might reduce half of storage space by not saving fetcher_text. Is there any other use of fetcher_text?
The text is also used to build the query-specific text snippets displayed with hits. Without it we'd have to do format conversion at search time before each hit could be displayed.
Doug
------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
