Re: [xml] problem with gzip decoding in nanohttp

Gary Coady Sat, 28 Jan 2006 11:26:02 -0800

Liron wrote:
> You're absolutely right about the "don't fix it if it ain't broken"
> approach
> but after reviewing the code I think that it's badly broken and I'll
> explain:
> It's true that those functions do different things but the common thing for
> all of them is reading from a socket. As I explained, the bug I found will
> occur in every function that reads content (as opposed to headers) from the
> socket and not using "inflate" when the content is gzip encoded. The only
> function that decompresses the content is xmlNanoHTTPRead BUT it's not the
> only function that delivers content.


This was an oversight on my part.

> xmlNanoHTTPFetch is a function that opens a URL and saves the content to
> file, if you're telling me that this file should contain encoded data in
> case that gzip is used then we don't have a problem (I think it's wrong but
> if it's by design...). But if you do expect the output file to contain html
> for example then it won't work.

It probably should, I think.

> Like I said, I didn't review the code enough to make the changes (or get
> permission to do so yet ;)) but it looks like xmlNanoHTTPRecv and ONLY
> xmlNanoHTTPRecv should be changed to support the decompression since every
> function that reads from the socket calls it (but not every function that
> calls it decompresses the content)

I don't like the idea of pushing this functionality into
xmlNanoHTTPRecv(), because it would increase memory usage for all
internal users of nanohttp. At the moment, the xmlNanoHTTPCtxt structure
holds the compressed content of the response, and for most purposes
xmlNanoHTTPRead() expands this content, chunk by chunk.
With this change, the expanded content needs to be stored. In addition,
a buffer of 4Kb would need to be kept available to pass to recv() -
probably stored in the context.

With this in mind, I think it would be preferable to keep the current
behaviour in xmlNanoHTTPRead(), and add similar functionality into
xmlNanoHTTPFetch() or xmlNanoHTTPFetchContent().

> Another issue I wanted to ask you about is the content-length when the
> content is compressed. If content-length header exists it'll return the
> length of the compressed data so I applied a new function that simply
> returns whether the content is gziped or not. This is good for applications
> that depends on the content-length header for buffer allocations so they
> should be aware if this field contains the right number or not (Just a
> helper function).

External users should loop around xmlNanoHTTPRead() until it returns a
negative value - and this will do the right thing both for gzipped and
non-gzipped resources.

I'm not sure that users would need the helper function - they still need
to write this loop for gzipped content anyway, since we can't predict
the eventual size until inflation is complete. I'm not sure how to flag
incorrect usage to applications - but I don't think this function will
help...

Gary.

signature.asc
Description: OpenPGP digital signature

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] problem with gzip decoding in nanohttp

Reply via email to