[ https://issues.apache.org/jira/browse/NUTCH-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pablo Aragón updated NUTCH-802: ------------------------------- Attachment: ParseOutputFormat.patch > Problems managing outlinks with large url length > ------------------------------------------------ > > Key: NUTCH-802 > URL: https://issues.apache.org/jira/browse/NUTCH-802 > Project: Nutch > Issue Type: Bug > Components: parser > Reporter: Pablo Aragón > Attachments: ParseOutputFormat.patch > > > Nutch can get idle during the collection of outlinks if the URL address of > the outlink is too large. > The maximum sizes of an URL for the main web servers are: > * Apache: 4,000 bytes > * Microsoft Internet Information Server (IIS): 16, 384 bytes > * Perl HTTP::Daemon: 8.000 bytes > URL adress sizes bigger than 4000 bytes are problematic, so the limit should > be set in the nutch-default.xml configuration file. > I attached a patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.