Thanks Marko.  The URLFilter not found was occuring when I try to run crawl
command from eclipse in a debug environment.

When I run from command (cygwin), I dont get the error. May be I am mising
something.. I will get it fixed.

Nowing coming back Crawling intranet and internet. I just tried crawling
intranet by modifying the crawl-urlfilter.txt.
It seems to be working..
For Internet I have to try, but will have to do it from my home computer..

My URL, am trying to fetch is as follows
search_results.html?country1=USA&search_type_form=quick&updated_since=sixtydays&basicsearch=0&advancedsearch=0&keywords_all=motel&search=Search&metro_area=1&kw=motel

Should I be changing anything in urlsfilelist ?

Thanks


Do you crawl the intranet or do you crawl the web? If you crawl the
web then you must edit the urlfilter-regex.txt and not the crawl-
urlfilter.txt.
In your first mail you said you get an exception like
"org.apache.nutch.net.URLFilter not found". Does the exception still
occur?


Marko



On 3/9/06, Vertical Search <[EMAIL PROTECTED]> wrote:
>
>  Okay, I have noticed that for URLs containing "?", "&" and "=" I cannot
> crawl.
> I have tried all combinations of modifying crawl-urlfilter.txt and
> # skip URLs containing certain characters as probable queries, etc.
> [EMAIL PROTECTED]
>
> But invain. I have hit a road block.. that is terrible.. :(
>
>
>

Reply via email to