Just removing these lines should be enough. # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED]
I notice that your domain is 'realdomain'. Make sure you set that right, or it won't match what you *do* want. Thanks, Steve Betts [EMAIL PROTECTED] 937-477-1797 -----Original Message----- From: Andy Morris [mailto:[EMAIL PROTECTED] Sent: Thursday, February 02, 2006 3:31 PM To: [email protected] Subject: RE: Still not processing asp files So do I just add the + to the files I want crawled? Here is my crawl-urlfilter file, I just want my local intranet site crawled.... # The url filter file used by the crawl command. # Better for intranet crawling. # Be sure to change MY.DOMAIN.NAME to your domain name. # Each non-comment, non-blank line contains a regular expression # prefixed by '+' or '-'. The first matching pattern in the file # determines whether a URL is included or ignored. If no pattern # matches, the URL is ignored. # skip file:, ftp:, & mailto: urls +^(file|ftp|mailto): # skip image and other suffixes we can't yet parse +\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|m ov|MOV|exe|png)$ # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] # skip URLs with slash-delimited segment that repeats 3+ times, to break loops -.*(/.+?)/.*?\1/.*?\1/ # accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*realdomain/ # skip everything else -. Thanks, Andy -----Original Message----- From: Ivan Sekulovic [mailto:[EMAIL PROTECTED] Sent: Thursday, February 02, 2006 10:28 AM To: [email protected] Subject: Re: Still not processing asp files You should also check the same regex for '=' sign. Best Regards, Sekula http://www.ifimages.com/ Steve Betts wrote: >Does your url filter (I use regex) remove all urls with a '?' in them? >That would remove most of your dynamic content. > >Thanks, > >Steve Betts >[EMAIL PROTECTED] >937-477-1797 > > >-----Original Message----- >From: Andy Morris [mailto:[EMAIL PROTECTED] >Sent: Thursday, February 02, 2006 9:54 AM >To: [email protected] >Subject: Still not processing asp files > > I have version "nutch-nightly" running from january 26. I am still >not able to process the asp files, the htm, html files work great. Any >options I need to set for this to work? > >Andy > > > > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
