Re: caching - filetypes

Ernesto De Santis Wed, 13 Sep 2006 14:43:37 -0700

Hi Steven

I don't know if I understand completely your email.
What you mean with "cache"?


If do you want to crawl pdf's, you need to delete the url filter for that.

In your crawl-urlfilter.txt, do you have a line starting with a minusand a list of file extensions. Delete pdf extension.


Good luck
Ernesto.

PD: I'm a nutch beginner, but how nobody did response you, I try to helpyou.



steven shingler escribió:

Hi all,

I'm trying to find out which filetypes nutch will cache.

for example: it does html, but not pdf.

Is there any documentation on how different filetypes are handled?

Is it possible to configure nutch to cache pdfs etc?

Any advice very gratefully received.
Thanks,
Steve

------------------------------------------------------------------------

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.12.3/445 - Release Date: 11/09/2006


        
        
                
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).

¡Probalo ya!http://www.yahoo.com.ar/respuestas

Re: caching - filetypes

Reply via email to