Hello,

I would definitely like not to index any javascript pages, this means any pages 
ending with ".js". So for this purpose I simply edited the crawl-urlfilter.txt 
file and changed the default suffix list not to be parsed to add the .js 
extension so that it looks like this now:

# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js)$

Unfortunately I noticed that javascript pages are still getting indexed. So 
what does this exactly mean ? Is crawl-urlfilter.txt not working ? Did I miss 
something maybe ? 

I was also wondering what is the difference between these two files:

crawl-urlfilter.txt
regex-urlfilter.txt

?

Many thanks
Regards


      

Reply via email to