eric park wrote:
hello, the problem is they are not unwanted URLS. I crawled on the site 'www.qmind.co.kr'. I found that the nutch7.0 crawler works just fine in first depth. However in second depth, it filters out any links that start with 'www.qmind.co.kr'. It only crawls urls starting with 'qmind.co.kr'. I can't figure out why it filters out urls starting with 'www' in second depth. Nutch 6.0 works just fine. Are there any known bugs in Nutch7.0 crawler?
Could you please show us your URL filters configuration (I presume you are using the regex-urlfilter, then it's the regex-urlfilter.txt file).
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
