[ https://issues.apache.org/jira/browse/NUTCH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467883 ]
Andrew Groh commented on NUTCH-436: ----------------------------------- This is a bug in java.net.URL, specifically the URLStreamClass that it uses. new URL("http://a/b/c/d;p?q#f ","?y") creates a URL object with a bad URL. > Incorrect handling of relative paths when the embedded URL path is empty > ------------------------------------------------------------------------ > > Key: NUTCH-436 > URL: https://issues.apache.org/jira/browse/NUTCH-436 > Project: Nutch > Issue Type: Bug > Components: fetcher > Reporter: Andrew Groh > Priority: Critical > > If you have a base URL of the form: > http://a/b/c/d;p?q#f > Embedded URL: ?y > Correct Absolute URL: http://a/b/c/d;p?y > Nutch Generated URL: http://a/b/c/?y > Embedded URL: ;x > Correct Absolute URL: http://a/b/c/d;x > Nutch Generated URL: http://a/b/c/;x > See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of > steps, and section 5.1 for example > http://www.ietf.org/rfc/rfc1808.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers