Re: Nutch0.6 and Nutch 0.7 crawlers

Andrzej Bialecki Wed, 12 Apr 2006 13:48:04 -0700

eric park wrote:

hello, the problem is they are not unwanted URLS.
I crawled on the site 'www.qmind.co.kr'. I found that the nutch7.0 crawler
works just fine in first depth. However in second depth,  it filters out any
links that start with 'www.qmind.co.kr'.  It only crawls urls starting with
'qmind.co.kr'.  I can't figure out why it filters out urls starting with
'www' in second depth. Nutch 6.0 works just fine. Are there any known bugs
in Nutch7.0 crawler?

Could you please show us your URL filters configuration (I presume youare using the regex-urlfilter, then it's the regex-urlfilter.txt file).


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Nutch0.6 and Nutch 0.7 crawlers

Reply via email to