eric park wrote:
hello, the problem is they are not unwanted URLS. I crawled on the site 'www.qmind.co.kr'. I found that the nutch7.0 crawler works just fine in first depth. However in second depth, it filters out any links that start with 'www.qmind.co.kr'. It only crawls urls starting with 'qmind.co.kr'. I can't figure out why it filters out urls starting with 'www' in second depth. Nutch 6.0 works just fine. Are there any known bugs in Nutch7.0 crawler?
Could you please show us your URL filters configuration (I presume you are using the regex-urlfilter, then it's the regex-urlfilter.txt file).
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
