Ronny, your way is probably better.  See, I was only dealing with the
fetched properties.  But, in your case, you don't fetch it, which gets rid
of all that wasted bandwidth.

For dealing with types that can be dealt with via the file extension, this
would probably work better.


On 6/7/07, Naess, Ronny <[EMAIL PROTECTED]> wrote:


Hi.

Configure crawl-urlfilter.txt
Thus you want to add something like +\.pdf$ I guess another way would be
to exclude all others

Try expanding the line below with html, doc, xls, ppt, etc
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|r
pm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|JS|dojo|DOJO|jsp|JSP)$

Or try including
+\.pdf$
#
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|r
pm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|JS|dojo|DOJO|jsp|JSP)$
Followd by
-.

Have'nt tried it myself, but experiment some and I guess you figure it
out pretty soon.

Regards,
Ronny

-----Opprinnelig melding-----
Fra: Martin Kammerlander [mailto:[EMAIL PROTECTED]

Sendt: 6. juni 2007 20:30
Til: [email protected]
Emne: indexing only special documents



hi!

I have a question. If I have for example the seed urls and do a crawl
based o that seeds. If I want to index then only pages that contain for
example pdf documents, how can I do that?

cheers
martin



!DSPAM:4666ff05259891293215062!




--
"Conscious decisions by conscious minds are what make reality real"

Reply via email to