[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 Joe Orton changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #17 from Joe Orton --- Merged to 2.4.x in r1916412 -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 Joe Orton changed: What|Removed |Added Status|NEEDINFO|RESOLVED Resolution|--- |FIXED --- Comment #16 from Joe Orton --- Second change merged in r1915625 and will propose for backport. Thanks to Joseph and others who've provided input here. -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #15 from Joe Orton --- So in fact I should have been testing on 2.4 rather than trunk and I would not have wasted half a day on this! :) https://github.com/apache/httpd/commit/c57a036dc3e116f5a397bd6a97da77dd6b503a83 significantly changes the behaviour of mod_xml2enc on trunk (this patch is not in 2.4.x) which led me wildly astray. It is indeed simple to reproduce the content transformation suggested above. I am still not sure how the module is "supposed" to behave, which seems quite ill-defined. -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #14 from Joe Orton --- Thanks. I think what I was missing here is that mod_proxy_html/mod_xml2enc it not actually transforming the content of the responses, which can really only happen for HTML documents if I'm following the code correctly. The bug is "merely" adding "charset=utf-8" unnecessarily, which then changes the client interpretation of the response data so that it appears corrupt when it would otherwise have been correctly interpreted as (e.g.) ISO-8859-1. I think with the latest patch in https://github.com/apache/httpd/pull/409 the behaviour is not changed for most cases compared to the current 2.4.58 code. -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #13 from Joseph Heenan --- @Joe - your patch looks sensible to me. I don't have a minimum config; the corruption happened on a pretty lightly configured apache2 instance on Ubuntu 20.04.6. The important part of the config is likely to be this: ProxyPass http://intranet.example.com/ ProxyPassReverse http://intranet.example.com/ ProxyHTMLEnable On ProxyHTMLExtended On ProxyHTMLURLMap / /intranet/ ce -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #12 from Joe Orton --- Rereading this again, maybe Nick was just referring to application/xml which is wrongly excluded after 1884505. I still think we should follow RFC 7303 here rather than matching "xml" at any point in the content-type string. I have another PR here which does this: https://github.com/apache/httpd/pull/409/commits/19ed165e19cf142e65715d1b71c68da14f7873c4 I am trying to write a test case to exercise this, but it's not obvious what minimal configuration forces mod_proxy_html to transform response data (i.e. corrupting it by trying to transform ISO-8859-1 to UTF-8) - does anybody have a minimal config? -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #11 from Joe Orton --- A user hit this also confirmed that not loading mod_xml2enc also fixes the issue. RFC 7303 "XML Media Types" - https://www.rfc-editor.org/rfc/rfc7303.html I think the module should follow the rules there: anything which isn't text/xml, application/xml or contains "+xml" should be ignored. Nick Kew - any chance you could expand on your comment about what types you expect to match which aren't? -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #10 from Joseph Heenan --- (In reply to Nick Kew from comment #7) > Github 150 won't work: it precludes xml doctypes that should be processed. @Nick Can you please be very explicit and give a few examples of mime types that should be processed and now aren't processed please? -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #9 from Joseph Heenan --- (In reply to Karlo from comment #8) > Data point: disabling mod_xml2enc fixes this problem for me. Disabling mod_xml2enc fixed it for me too. -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #8 from Karlo --- Data point: disabling mod_xml2enc fixes this problem for me. -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 Nick Kew changed: What|Removed |Added Status|NEW |NEEDINFO --- Comment #7 from Nick Kew --- Github 150 won't work: it precludes xml doctypes that should be processed. Giovanni Bechis's patch looks like a possible candidate for this particular issue. Though I'm not sure you need to test both the "application" and the vnd.openxml. And shouldn't the test be case-insensitive? It does, however, open the question of whether *ALL* vnd.openxml types should be excluded. Is there no use case for running any of them through a markup-aware filter in which mod_xml2enc is required for i18n support? A second question arises here: if mod_xml2enc risks trashing your docs, then surely so does mod_proxy_html. Can those who reported or reproduced the problem tell us what happens if you apply the identical configuration but don't load mod_xml2enc at all, so mod_proxy_html runs without i18n support? mod_proxy_html should remove itself from the filter chain: does your debug output confirm this? Marking this NEEDINFO in the hope of feedback on those last two paragrahs. -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 Joseph Heenan changed: What|Removed |Added CC||jos...@heenan.me.uk --- Comment #6 from Joseph Heenan --- Thanks Joe for merging this! I also hadn't seen Giovanni comment. I've re-read the code in my patch and it still looks right to me. The comments in the code also appear to match the behaviour. -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #5 from Joe Orton --- Committed in r1884505, but oh, I missed the comment from Giovanni, sorry. "Excel OOXML mime type starts with "application" so it won't match that condition," if the type starts with "application/", then strncmp(ctype, "text/", 5) will be true, so the change looks right, or am I missing something? -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #4 from Giovanni Bechis --- Created attachment 37603 --> https://bz.apache.org/bugzilla/attachment.cgi?id=37603&action=edit fix for openxml documents Excel OOXML mime type starts with "application" so it won't match that condition, this diff should work for your use-case. -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #3 from Joseph Heenan --- I submitted a possible fix here: https://github.com/apache/httpd/pull/150 -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #2 from Joseph Heenan --- I've just run into this same problem. The issue is that the file is being utf8 encoded - notice how the character 0xb1 is being turned into 0xc2 0xb1, which is it's utf8 encoding ( https://www.compart.com/en/unicode/U+00B1 ). The problem is not the file extension, but the mime type, in my case I was testing with an Excel OOXML file, which has the content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Note how this contains 'xml' in it. The problem seems to actually be xml2enc (which is automatically enabled when you do 'ProxyHTMLEnable On') - in particular this line of code: https://github.com/apache/httpd/blob/trunk/modules/filters/mod_xml2enc.c#L347 namely: /* only act if starts-with "text/" or contains "xml" */ if (strncmp(ctype, "text/", 5) && !strstr(ctype, "xml")) { The 'strstr' is matching any content-type that contains xml. I'm unclear on the original intent of this line. It might make sense to look for +xml rather than just xml, which would definitely fix this bug. I appear to have been able to workaround this bug by disabling the xml2enc module. (I think I don't need it in my use case.) -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 --- Comment #1 from Karlo --- Was able to cleanly reproduce: 1) New VM "loadbalancer", default centos7 install, yum install httpd mod_proxy_html, config: # tail -n 18 conf/httpd.conf ProxyHTMLEnable On ProxyHTMLInterp On ProxyPreserveHost Off ProxyPass/ http://10.0.60.188:80/ ProxyPassReverse / http://10.0.60.188:80/ ProxyHTMLURLMap http://10.0.60.188:80/ / DocumentRoot /defaultdir/ # Supplemental configuration # # Load config files in the "/etc/httpd/conf.d" directory, if any. IncludeOptional conf.d/*.conf 2) Install vm "webserver", on 10.0.60.188, default install (yum install httpd). Add a OpenOffice generated newdoc.docx file to /var/www/html/newdoc.docx . Make copy newdoc.XXX [!!] 3) Request files http://loadbalancer/newdoc.docx http://loadbalancer/newdoc.XXX 4) Notice headers are different: [desktop]$ wget http://10.0.60.189/newdoc.docx [desktop]$ wget http://10.0.60.189/newdoc.XXX [desktop]$ xxd newdoc.docx | head -n3 : 504b 0304 1400 0808 0800 c2b1 c29c c28a PK.. 0010: 5000 000b P... 0020: 005f 7265 6c73 2f2e 7265 6c73 c2ad c292 ._rels/.rels [desktop]$ xxd newdoc.XXX | head -n3 : 504b 0304 1400 0808 0800 b19c 8a50 PK...P.. 0010: 0b00 5f72 .._r 0020: 656c 732f 2e72 656c 73ad 924d 4b03 410c els/.rels..MK.A. [desktop]$ file * newdoc.docx: Zip archive data, at least v2.0 to extract newdoc.XXX: Microsoft Word 2007+ -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org
[Bug 64339] mod_proxy_html changing docx header / file content, leading to corrupt documents
https://bz.apache.org/bugzilla/show_bug.cgi?id=64339 Karlo changed: What|Removed |Added CC||karlo_bzapache@luiten.famil ||y -- You are receiving this mail because: You are the assignee for the bug. - To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org For additional commands, e-mail: bugs-h...@httpd.apache.org