Aah thanks Lewis. We're still on 1.15, glad to see this was fixed already,
and that i would have patched it in exactly the same way.

Thanks!

Op di 30 jul 2024 om 18:42 schreef lewis john mcgibbney <lewi...@apache.org
>:

> Hi Markus,
>
> Which version of Nutch are you referring to? I'm not seeing this exact
> code in master branch.
> Is this roughly the code you are referencing?
>
> https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L304-L318
>
> Thanks
> lewismc
>
> On Tue, Jul 30, 2024 at 8:14 AM <user-digest-h...@nutch.apache.org> wrote:
>
> > ---------- Forwarded message ----------
> > From: Markus Jelsma <markus.jel...@openindex.io>
> > To: user <user@nutch.apache.org>
> > Cc:
> > Bcc:
> > Date: Tue, 30 Jul 2024 17:13:01 +0200
> > Subject: Protocol-http not storing response headers
> > Hi,
> >
> > Protocol-http does this (not storing HTTP response heades if response is
> > compressed):
> >
> >           // store the headers verbatim only if the response was not
> > compressed
> >           // as the content length reported does not match otherwise
> >           if (httpHeaders != null) {
> >             headers.add(Response.RESPONSE_HEADERS,
> httpHeaders.toString());
> >           }
> >           if (Http.LOG.isTraceEnabled()) {
> >             Http.LOG.trace("fetched " + content.length + " bytes from " +
> > url);
> >           }
> >
> > And i do not agree with it. Almost all content is compressed now, so this
> > will never work. We need the headers and response code stored for WARC
> > export and do not care about an incorrect length header.
> >
> > Before patching this up and breaking that code out of the compression
> > condition, i do ask myself, is that a good idea? I don't see okhttp
> having
> > the same condition.
> >
> > Markus
>
> --
> http://people.apache.org/keys/committer/lewismc
>

Reply via email to