[jira] [Updated] (NUTCH-1270) some of Deflate encoded pages not fetched
[ https://issues.apache.org/jira/browse/NUTCH-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1270: - Fix Version/s: (was: 1.9) some of Deflate encoded pages not fetched - Key: NUTCH-1270 URL: https://issues.apache.org/jira/browse/NUTCH-1270 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.4 Environment: software Reporter: behnam nikbakht Labels: fetch, processDeflateEncoded Attachments: NUTCH-1270.patch it is a problem with some of web pages that fetched but their content can not retrived after this change, this error fixed we change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java public byte[] processDeflateEncoded(byte[] compressed, URL url) throws IOException { if (LOGGER.isTraceEnabled()) { LOGGER.trace(inflating); } byte[] content = DeflateUtils.inflateBestEffort(compressed, getMaxContent()); +if(content==null) + content = DeflateUtils.inflateBestEffort(compressed, 20); if (content == null) throw new IOException(inflateBestEffort returned null); if (LOGGER.isTraceEnabled()) { LOGGER.trace(fetched + compressed.length + bytes of compressed content (expanded to + content.length + bytes) from + url); } return content; } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-1270) some of Deflate encoded pages not fetched
[ https://issues.apache.org/jira/browse/NUTCH-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1270: - Component/s: (was: fetcher) protocol some of Deflate encoded pages not fetched - Key: NUTCH-1270 URL: https://issues.apache.org/jira/browse/NUTCH-1270 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.4 Environment: software Reporter: behnam nikbakht Labels: fetch, processDeflateEncoded Fix For: 1.9 Attachments: NUTCH-1270.patch it is a problem with some of web pages that fetched but their content can not retrived after this change, this error fixed we change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java public byte[] processDeflateEncoded(byte[] compressed, URL url) throws IOException { if (LOGGER.isTraceEnabled()) { LOGGER.trace(inflating); } byte[] content = DeflateUtils.inflateBestEffort(compressed, getMaxContent()); +if(content==null) + content = DeflateUtils.inflateBestEffort(compressed, 20); if (content == null) throw new IOException(inflateBestEffort returned null); if (LOGGER.isTraceEnabled()) { LOGGER.trace(fetched + compressed.length + bytes of compressed content (expanded to + content.length + bytes) from + url); } return content; } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-1270) some of Deflate encoded pages not fetched
[ https://issues.apache.org/jira/browse/NUTCH-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1270: Patch Info: Patch Available Fix Version/s: 1.7 some of Deflate encoded pages not fetched - Key: NUTCH-1270 URL: https://issues.apache.org/jira/browse/NUTCH-1270 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.4 Environment: software Reporter: behnam nikbakht Labels: fetch, processDeflateEncoded Fix For: 1.7 Attachments: NUTCH-1270.patch it is a problem with some of web pages that fetched but their content can not retrived after this change, this error fixed we change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java public byte[] processDeflateEncoded(byte[] compressed, URL url) throws IOException { if (LOGGER.isTraceEnabled()) { LOGGER.trace(inflating); } byte[] content = DeflateUtils.inflateBestEffort(compressed, getMaxContent()); +if(content==null) + content = DeflateUtils.inflateBestEffort(compressed, 20); if (content == null) throw new IOException(inflateBestEffort returned null); if (LOGGER.isTraceEnabled()) { LOGGER.trace(fetched + compressed.length + bytes of compressed content (expanded to + content.length + bytes) from + url); } return content; } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1270) some of Deflate encoded pages not fetched
[ https://issues.apache.org/jira/browse/NUTCH-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] behnam nikbakht updated NUTCH-1270: --- Attachment: NUTCH-1270.patch some of Deflate encoded pages not fetched - Key: NUTCH-1270 URL: https://issues.apache.org/jira/browse/NUTCH-1270 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.4 Environment: software Reporter: behnam nikbakht Labels: fetch, processDeflateEncoded Attachments: NUTCH-1270.patch it is a problem with some of web pages that fetched but their content can not retrived after this change, this error fixed we change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java public byte[] processDeflateEncoded(byte[] compressed, URL url) throws IOException { if (LOGGER.isTraceEnabled()) { LOGGER.trace(inflating); } byte[] content = DeflateUtils.inflateBestEffort(compressed, getMaxContent()); +if(content==null) + content = DeflateUtils.inflateBestEffort(compressed, 20); if (content == null) throw new IOException(inflateBestEffort returned null); if (LOGGER.isTraceEnabled()) { LOGGER.trace(fetched + compressed.length + bytes of compressed content (expanded to + content.length + bytes) from + url); } return content; } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira