[ https://issues.apache.org/jira/browse/NUTCH-120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633278#action_12633278 ]
Andrzej Bialecki commented on NUTCH-120: ----------------------------------------- This has been fixed as a part of another commit. > one "bad" link on a page kills parsing > -------------------------------------- > > Key: NUTCH-120 > URL: https://issues.apache.org/jira/browse/NUTCH-120 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.7 > Environment: ubuntu 5.10 > Reporter: Earl Cahill > Fix For: 1.0.0 > > > Since the try in src/java/org/apache/nutch/parse/OutlinkExtractor.java, > getOutlinks method loops around the whole > while (matcher.contains(input, pattern)) { > ... > } > loop, if one url causes an exception, no more links will be extracted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.