[jira] [Commented] (NUTCH-1270) some of Deflate encoded pages not fetched

2012-04-03 Thread behnam nikbakht (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245259#comment-13245259
 ] 

behnam nikbakht commented on NUTCH-1270:


for example, with the site:
http://www.noormags.com/view/fa/default
when fetch the first page, and dump from segment, see that there is a problem 
with fetch,
when i replace 
byte[] content = DeflateUtils.inflateBestEffort(compressed, getMaxContent());
with
byte[] content = DeflateUtils.inflateBestEffort(compressed, 999);
it's work

 some of Deflate encoded pages not fetched
 -

 Key: NUTCH-1270
 URL: https://issues.apache.org/jira/browse/NUTCH-1270
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: 1.4
 Environment: software
Reporter: behnam nikbakht
  Labels: fetch, processDeflateEncoded
 Attachments: NUTCH-1270.patch


 it is a problem with some of web pages that fetched but their content can not 
 retrived
 after this change, this error fixed
 we change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
   public byte[] processDeflateEncoded(byte[] compressed, URL url) throws 
 IOException {
 if (LOGGER.isTraceEnabled()) { LOGGER.trace(inflating); }
 byte[] content = DeflateUtils.inflateBestEffort(compressed, 
 getMaxContent());
 +if(content==null)
 + content = DeflateUtils.inflateBestEffort(compressed, 20);
 if (content == null)
   throw new IOException(inflateBestEffort returned null);
 if (LOGGER.isTraceEnabled()) {
   LOGGER.trace(fetched  + compressed.length
  +  bytes of compressed content (expanded to 
  + content.length +  bytes) from  + url);
 }
 return content;
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1270) some of Deflate encoded pages not fetched

2012-02-08 Thread Lewis John McGibbney (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203515#comment-13203515
 ] 

Lewis John McGibbney commented on NUTCH-1270:
-

Hi Benham, again thanks for opening this ticket, but could you possibly patch 
this against trunk (1.5)? Thankyou

 some of Deflate encoded pages not fetched
 -

 Key: NUTCH-1270
 URL: https://issues.apache.org/jira/browse/NUTCH-1270
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: 1.4
 Environment: software
Reporter: behnam nikbakht
  Labels: fetch, processDeflateEncoded

 it is a problem with some of web pages that fetched but their content can not 
 retrived
 after this change, this error fixed
 we change lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
   public byte[] processDeflateEncoded(byte[] compressed, URL url) throws 
 IOException {
 if (LOGGER.isTraceEnabled()) { LOGGER.trace(inflating); }
 byte[] content = DeflateUtils.inflateBestEffort(compressed, 
 getMaxContent());
 +if(content==null)
 + content = DeflateUtils.inflateBestEffort(compressed, 20);
 if (content == null)
   throw new IOException(inflateBestEffort returned null);
 if (LOGGER.isTraceEnabled()) {
   LOGGER.trace(fetched  + compressed.length
  +  bytes of compressed content (expanded to 
  + content.length +  bytes) from  + url);
 }
 return content;
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira