Re: [Nutch-dev] recording Content-Type

Doug Cutting Mon, 01 Mar 2004 12:35:46 -0800

[EMAIL PROTECTED] wrote:

I see this need of recording "Content-Type" (mime type)
together with downloaded pages. It will also be good for indexing/searching
by mime types. It is just a matter of adding a member variable
in either FetcherOutput.java or FetcherContent.java.
I will prepare a patch for it, but would like to know which class you prefer?

Thanks! This would be a great contribution.

I can think of cases where you might wish to know the content type without retrieving the content, so I guess I'd vote for putting it in FetcherOutput, along with other meta-information, rather than in FetcherContent, the raw bits.

It seems fetcher_text is saved only for index purpose.
Since indexing can be done using fetcher_content directly, we might
reduce half of storage space by not saving fetcher_text.
Is there any other use of fetcher_text?

The text is also used to build the query-specific text snippets displayed with hits. Without it we'd have to do format conversion at search time before each hit could be displayed.

Doug


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] recording Content-Type

Reply via email to