Re: How to effectively stop indexing javascript pages ending with .js

Dennis Kubes Tue, 02 Dec 2008 08:51:46 -0800

ML mail wrote:

Hello,


I would definitely like not to index any javascript pages, this means any pages ending 
with ".js". So for this purpose I simply edited the crawl-urlfilter.txt file 
and changed the default suffix list not to be parsed to add the .js extension so that it 
looks like this now:

# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js)$

The easiest way IMO is to use prefix and suffix urlfilters instead regexurlfilter. Change plugin.includes and replace urlfilter-regex withurlfilter-(prefix|suffix). Then in the suffix-urlfilter.txt file add.js under .css in web formats.

Also change plugin.includes from parse-(text|html|js) to beparse-(text|html).

Unfortunately I noticed that javascript pages are still getting indexed. So what does this exactly mean ? Is crawl-urlfilter.txt not working ? Did I miss something maybe ?
I was also wondering what is the difference between these two files:

crawl-urlfilter.txt
regex-urlfilter.txt

crawl-urlfilter.txt file is used by the crawl command. The regex,suffix, prefix, and other urlfilter files and plugins are used whencalling commands manually in various tools.


Dennis


?

Many thanks
Regards

Re: How to effectively stop indexing javascript pages ending with .js

Reply via email to