[ https://issues.apache.org/jira/browse/NUTCH-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki closed NUTCH-381. ----------------------------------- Resolution: Won't Fix Fix Version/s: 0.9.0 Assignee: Andrzej Bialecki This was caused by following redirected pages immediately in Fetcher. Set http.redirect.max to 0 to avoid this problem. > Ignore external link not work as expected > ----------------------------------------- > > Key: NUTCH-381 > URL: https://issues.apache.org/jira/browse/NUTCH-381 > Project: Nutch > Issue Type: Bug > Affects Versions: 0.8.1 > Reporter: Uros Gruber > Assigned To: Andrzej Bialecki > Priority: Critical > Fix For: 0.9.0 > > > Currently there is no way to properly limit fetcher without regexp rules we > use ignore.external.link option but It seams that It doesn't work in all > cases. > Here is example urls I'm seeing but > cat urls1 urls2 urls3 urls/urls |grep yahoo.com doesn't return any hit. > fetching http://help.yahoo.com/help/sports > fetching http://www.turkish-xxx.com/adult-traffic-trade.php > fetching http://help.yahoo.com/help/us/astr/ > fetching http://www.polish-xxx.com/de-index.html > fetching http://www.driversplanet.com/Articles/Software/SpareBackup2.4.aspx > fetching http://help.yahoo.com/help/groups > fetching http://help.yahoo.com/help/fin/ > fetching > http://www.driversplanet.com/Articles/Software/WindowsStorageServer2003R2.aspx > fetching http://help.yahoo.com/help/us/edit/ > fetching http://www.polish-xxx.com/es-index.html > Anyone notice this? > I assume that there must be something with expired domains where pages > generates randomly. But still why urls from other domain was added. Maybe > urlregexp filter +* exclude. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers