Dear Dennis

Many thanks for your quick response. Now everything is clear and I understand 
why it didn't work...

I will still use the urlfilter-regex plugin as I would like to crawl only 
domains from a single top level domain but as suggested I have added the 
urlfilter-suffix plugin to avoid indexing javascript pages. In the past I 
already had deactivated the parse-js plugin. 

So I am now looking forward to the next crawls being freed of stupid file 
formats like js ;-)

Greetings 


--- On Tue, 12/2/08, Dennis Kubes <[EMAIL PROTECTED]> wrote:

> From: Dennis Kubes <[EMAIL PROTECTED]>
> Subject: Re: How to effectively stop indexing javascript pages ending with .js
> To: [email protected]
> Date: Tuesday, December 2, 2008, 8:50 AM
> ML mail wrote:
> > Hello,
> > 
> > I would definitely like not to index any javascript
> pages, this means any pages ending with ".js". So
> for this purpose I simply edited the crawl-urlfilter.txt
> file and changed the default suffix list not to be parsed to
> add the .js extension so that it looks like this now:
> > 
> > # skip image and other suffixes we can't yet parse
> >
> -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js)$
> 
> The easiest way IMO is to use prefix and suffix urlfilters
> instead regex urlfilter.  Change plugin.includes and replace
> urlfilter-regex with urlfilter-(prefix|suffix).  Then in the
> suffix-urlfilter.txt file add .js under .css in web formats.
> 
> Also change plugin.includes from parse-(text|html|js) to be
> parse-(text|html).
> 
> > 
> > Unfortunately I noticed that javascript pages are
> still getting indexed. So what does this exactly mean ? Is
> crawl-urlfilter.txt not working ? Did I miss something maybe
> ? 
> > I was also wondering what is the difference between
> these two files:
> > 
> > crawl-urlfilter.txt
> > regex-urlfilter.txt
> 
> crawl-urlfilter.txt file is used by the crawl command.  The
> regex, suffix, prefix, and other urlfilter files and plugins
> are used when calling commands manually in various tools.
> 
> Dennis
> > 
> > ?
> > 
> > Many thanks
> > Regards
> > 
> > 
> >


      

Reply via email to