Re: indexing only special documents

Naess, Ronny Thu, 07 Jun 2007 01:19:03 -0700

 
Hi.

Configure crawl-urlfilter.txt
Thus you want to add something like +\.pdf$ I guess another way would be
to exclude all others


Try expanding the line below with html, doc, xls, ppt, etc
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|r
pm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|JS|dojo|DOJO|jsp|JSP)$

Or try including 
+\.pdf$
#
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|r
pm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|JS|dojo|DOJO|jsp|JSP)$
Followd by
-.

Have'nt tried it myself, but experiment some and I guess you figure it
out pretty soon.

Regards,
Ronny 

-----Opprinnelig melding-----
Fra: Martin Kammerlander [mailto:[EMAIL PROTECTED]

Sendt: 6. juni 2007 20:30
Til: [email protected]
Emne: indexing only special documents



hi!

I have a question. If I have for example the seed urls and do a crawl
based o that seeds. If I want to index then only pages that contain for
example pdf documents, how can I do that?

cheers
martin



!DSPAM:4666ff05259891293215062!

Re: indexing only special documents

Reply via email to