Hi Nick,
On 11/02/2012 07:25 PM CEST +02:00, Nick Kew wrote:
just debugged a case where Apache used as reverse proxy filters a
text/javascript file through mod_proxy_html and mod_xml2enc. As
mod_proxy_html sees no business in filtering that file, it removes
itself from the filter chain, but mod_xml2enc still tries to do its job.
That looks like a logic bug you've found!
yes, that's also possible.
It looks like an edge case: one you'll only see when the charset coming
from the backend is not supported by libxml2 on your platform, so that
mod_xml2enc converts it using apr_iconv.
No, not exactly this edge case. The backend server sends the response
header Content-Type: text/javascript, i.e. without any information
about the used charset. From what I've seen in GDB, mod_xml2enc seems to
resort to assume that the server sends ISO-8859-1 and without an error
converts that to UTF-8 (even though in fact it seems to be mixed
ISO-8859-1 and UTF-8).
The attached patch based on httpd-trunk fixes that issue by removing the
Content-Length header entirely. Please review it. I would appreciate it,
if it could get applied to trunk and then backported to the httpd-2.4.x
branch.
Your patch fixes the immediate bug (thanks!), but the fact that
mod_xml2enc is doing anything at all in the case you describe is a
bigger bug.
Ok, I too wondered about mod_xml2enc staying active being a bug, but was
not sure. So I only fixed the immediate bug. If you consider this a bug,
I assume the Content-Type check just needs to be unified. In
mod_proxy_html check_filter_init() checks for text/html or
application/xhtml+xml, whereas in mod_xml2enc xml2enc_ffunc() checks
for prefix text/ or xml anywhere in the content type string. This is
not consistent, and causes mod_proxy_html to skip text/javascript (or
text/css) files, while mod_xml2enc takes them.
There's no easy solution: mod_proxy_html delays some of the checks
until it has a first chunk of data, to allow for cases where an earlier
filter (e.g. XSLT) might affect Content-Type. But by that time it's
too late to insert or uninsert the xml2enc filter, as that needs to go
in front of the proxy_html filter.
Yes, the delayed checks also seem necessary for the charset guessing in
case no charset is specified.
But what about making the Content-Type check consistent?
Regards,
Micha