Hi, On Thu, Mar 05, 2015 at 06:37:21PM -0500, Alex Vandiver wrote: > On Fri, 6 Mar 2015 00:06:32 +0100 Václav Ovsík <vaclav.ov...@i.cz> > wrote: > > https://issues.bestpractical.com/Ticket/Display.html?id=29735 > > Aha -- thanks for digging that out! I thought I vaguely recalled > something in this area previously. > https://issues.bestpractical.com/Ticket/Attachment/286095/157750/utf8-encoding.patch > looks to be functionally fairly similar to the branch.
Thanks for attention to this... > There are a few other, orthogonal fixes in there that may still be > interesting to tease out into their own commits. It looks like I see > changes to: > > * Fix the computed max size of base64'd attachments; I'd need to > squint at it harder, but seems eminently reasonable. > > * Attempt to gracefully deal with TruncateLongAttachments truncating > mid-byte of UTF-8 data. As above; the decode/encode is an interesting > trick to attempt to ensure that the byte stream is consistent. I'd > like to test it a bit, but seems not unreasonable. It is not too efficient maybe, but easy and safety first :) > * Choose base64 vs QP based on which is shorter; I'm less convinced by > this, since it means that for large data, it gets QP'd, base64'd, and > then one of those _again_ -- which isn't terribly efficient. I'm less > convinced by the tradeoff of computation time to stored in-database > size. You are right. My intention was to gather as much readable text as possible. Maybe a text contains some invalid characters, but the rest of the text is readable, so QP is more appropriate, because it leaves the most of a text readable. So the measuring of length of an encoded data Base64/QP gives a result of how much ASCII chars are there. len Base64 < len QP - many binary data - maybe some octet stream len QP < len Base64 - many ASCII chars - maybe the text But this is corner case probably and it is not very interesting. The most of the text should be UTF-8 valid and the rest is not interesting these days. > If you're interested in reworking the patch into a 2-3 commit series, > I'm happy to apply for 4.2-trunk. > - Alex https://github.com/bestpractical/rt/compare/stable...zito:4.2-zito-encodelob-utf8-fix This is a bit newer version I'm using within production instance rt-4.2.9. I will be happy if some part will be usable for RT mainline. Thanks for fine software! Cheers -- Zito