Ronny, your way is probably better. See, I was only dealing with the fetched properties. But, in your case, you don't fetch it, which gets rid of all that wasted bandwidth.
For dealing with types that can be dealt with via the file extension, this would probably work better. On 6/7/07, Naess, Ronny <[EMAIL PROTECTED]> wrote:
Hi. Configure crawl-urlfilter.txt Thus you want to add something like +\.pdf$ I guess another way would be to exclude all others Try expanding the line below with html, doc, xls, ppt, etc -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|r pm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|JS|dojo|DOJO|jsp|JSP)$ Or try including +\.pdf$ # -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|r pm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|JS|dojo|DOJO|jsp|JSP)$ Followd by -. Have'nt tried it myself, but experiment some and I guess you figure it out pretty soon. Regards, Ronny -----Opprinnelig melding----- Fra: Martin Kammerlander [mailto:[EMAIL PROTECTED] Sendt: 6. juni 2007 20:30 Til: [email protected] Emne: indexing only special documents hi! I have a question. If I have for example the seed urls and do a crawl based o that seeds. If I want to index then only pages that contain for example pdf documents, how can I do that? cheers martin !DSPAM:4666ff05259891293215062!
-- "Conscious decisions by conscious minds are what make reality real"
