[Nutch-dev] fetch unparse-able content

Stefan Groschupf Wed, 19 May 2004 17:01:29 -0700

Hi, something comes in my mind just in the moment i was deleting the light. In case we just 'stupid' fetch and extract the content in a second process we fetch files we can not handle as well. Since traffic is expansive we should only fetch file we can handle, right?

I see 2 solutions the fetcher ask the content extractor factory what kind of mime types actually are supported. Since the content extraction can be done on a second machine that shares a Network storage with the fetcher this is may be tricky. A other solution is to setup what kind of mime type are allowed to fetch.

May be I'm wrong and we have not such a problem and i oversee something that already exist. Any comments?

Good night!
Stefan


---------------------------------------------------------------
open technology:   http://www.media-style.com
open source:           http://www.weta-group.net
open discussion:    http://www.text-mining.org

------------------------------------------------------- This SF.Net email is sponsored by: Oracle 10g Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] fetch unparse-able content

Reply via email to