Kashif
In the regex-urlfilter.txt file only allow
.pdf
+\.pdf$
-.
his will only allow
files ending in .pdf and ignore everything
else.
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kashif
Khadim
Sent: Saturday, February 26, 2005 6:26 AM
To: [email protected]
Subject: [Nutch-dev] Indexing only PDF files
Sent: Saturday, February 26, 2005 6:26 AM
To: [email protected]
Subject: [Nutch-dev] Indexing only PDF files
Hi,
I just want to index PDF files from my website using intranet crawl. I
don't want html or other files how can i do this ?.
Thanks.
Kashif.
__________________________________________________
Do You Yahoo!?
Tired
of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
