Dear Dennis Many thanks for your quick response. Now everything is clear and I understand why it didn't work...
I will still use the urlfilter-regex plugin as I would like to crawl only domains from a single top level domain but as suggested I have added the urlfilter-suffix plugin to avoid indexing javascript pages. In the past I already had deactivated the parse-js plugin. So I am now looking forward to the next crawls being freed of stupid file formats like js ;-) Greetings --- On Tue, 12/2/08, Dennis Kubes <[EMAIL PROTECTED]> wrote: > From: Dennis Kubes <[EMAIL PROTECTED]> > Subject: Re: How to effectively stop indexing javascript pages ending with .js > To: [email protected] > Date: Tuesday, December 2, 2008, 8:50 AM > ML mail wrote: > > Hello, > > > > I would definitely like not to index any javascript > pages, this means any pages ending with ".js". So > for this purpose I simply edited the crawl-urlfilter.txt > file and changed the default suffix list not to be parsed to > add the .js extension so that it looks like this now: > > > > # skip image and other suffixes we can't yet parse > > > -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js)$ > > The easiest way IMO is to use prefix and suffix urlfilters > instead regex urlfilter. Change plugin.includes and replace > urlfilter-regex with urlfilter-(prefix|suffix). Then in the > suffix-urlfilter.txt file add .js under .css in web formats. > > Also change plugin.includes from parse-(text|html|js) to be > parse-(text|html). > > > > > Unfortunately I noticed that javascript pages are > still getting indexed. So what does this exactly mean ? Is > crawl-urlfilter.txt not working ? Did I miss something maybe > ? > > I was also wondering what is the difference between > these two files: > > > > crawl-urlfilter.txt > > regex-urlfilter.txt > > crawl-urlfilter.txt file is used by the crawl command. The > regex, suffix, prefix, and other urlfilter files and plugins > are used when calling commands manually in various tools. > > Dennis > > > > ? > > > > Many thanks > > Regards > > > > > >
