Aah thanks Lewis. We're still on 1.15, glad to see this was fixed already, and that i would have patched it in exactly the same way.
Thanks! Op di 30 jul 2024 om 18:42 schreef lewis john mcgibbney <lewi...@apache.org >: > Hi Markus, > > Which version of Nutch are you referring to? I'm not seeing this exact > code in master branch. > Is this roughly the code you are referencing? > > https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L304-L318 > > Thanks > lewismc > > On Tue, Jul 30, 2024 at 8:14 AM <user-digest-h...@nutch.apache.org> wrote: > > > ---------- Forwarded message ---------- > > From: Markus Jelsma <markus.jel...@openindex.io> > > To: user <user@nutch.apache.org> > > Cc: > > Bcc: > > Date: Tue, 30 Jul 2024 17:13:01 +0200 > > Subject: Protocol-http not storing response headers > > Hi, > > > > Protocol-http does this (not storing HTTP response heades if response is > > compressed): > > > > // store the headers verbatim only if the response was not > > compressed > > // as the content length reported does not match otherwise > > if (httpHeaders != null) { > > headers.add(Response.RESPONSE_HEADERS, > httpHeaders.toString()); > > } > > if (Http.LOG.isTraceEnabled()) { > > Http.LOG.trace("fetched " + content.length + " bytes from " + > > url); > > } > > > > And i do not agree with it. Almost all content is compressed now, so this > > will never work. We need the headers and response code stored for WARC > > export and do not care about an incorrect length header. > > > > Before patching this up and breaking that code out of the compression > > condition, i do ask myself, is that a good idea? I don't see okhttp > having > > the same condition. > > > > Markus > > -- > http://people.apache.org/keys/committer/lewismc >