Thanks Marko. The URLFilter not found was occuring when I try to run crawl command from eclipse in a debug environment.
When I run from command (cygwin), I dont get the error. May be I am mising something.. I will get it fixed. Nowing coming back Crawling intranet and internet. I just tried crawling intranet by modifying the crawl-urlfilter.txt. It seems to be working.. For Internet I have to try, but will have to do it from my home computer.. My URL, am trying to fetch is as follows search_results.html?country1=USA&search_type_form=quick&updated_since=sixtydays&basicsearch=0&advancedsearch=0&keywords_all=motel&search=Search&metro_area=1&kw=motel Should I be changing anything in urlsfilelist ? Thanks Do you crawl the intranet or do you crawl the web? If you crawl the web then you must edit the urlfilter-regex.txt and not the crawl- urlfilter.txt. In your first mail you said you get an exception like "org.apache.nutch.net.URLFilter not found". Does the exception still occur? Marko On 3/9/06, Vertical Search <[EMAIL PROTECTED]> wrote: > > Okay, I have noticed that for URLs containing "?", "&" and "=" I cannot > crawl. > I have tried all combinations of modifying crawl-urlfilter.txt and > # skip URLs containing certain characters as probable queries, etc. > [EMAIL PROTECTED] > > But invain. I have hit a road block.. that is terrible.. :( > > >
