[email protected] pisze:
 Hello,

I use nutch-0.9 and try to index urls with ? and & symbols. I have commented 
this line? -[...@=] in conf/crawl-urlfilter.txt, conf/automaton-urlfilter and 
conf/regex-urlfilter.txt files.
However nutch still ignores these urls.

Does anyone know how this can be fixed?

Thanks in advance.
A.





Hi,

If you commented out those line it should be fine. That part is correct so problem is somewhere else.

I must give us more information like:
- does your nutch crawles and index "normal" URL's (without ? and &)
- are you crawling domains that are NOT blocked in crawl-urlfilter
- is robots.txt on this domain doesn't block your url's
- are your talking about one specific domain or many different?

Thanks,
Bartosz

Reply via email to