Kashif
 
In the regex-urlfilter.txt file only allow .pdf
 
+\.pdf$
-.
 
his will only allow files ending in .pdf and ignore everything else.

 
 
 

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kashif Khadim
Sent: Saturday, February 26, 2005 6:26 AM
To: [email protected]
Subject: [Nutch-dev] Indexing only PDF files

Hi,
 
I just want to index PDF files from my website using intranet crawl. I don't want html or other files how can i do this ?.
 
Thanks.
 
Kashif.

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Reply via email to