Hello, We have found that messages from one particular sender are declared as being in a UTF8 encoding, but contain byte sequences which are not valid in UTF8; in particular '0xb2', '0xb3', '0xb9' - they appear to relate to particularly brain-dead renderings of various quotation marks: <http://www.memoryhole.net/kyle/2007/08/superscriptone.html> (although that page doesn't cover the extra breakage of inserting those particular bytes into a UTF8 encoded document).
With PostgreSQL at least, the attachments are stored internally as unicode characters, so PostgreSQL not unreasonably refuses to store such an attachment. Of course, it's then impossible to create a ticket. In an ideal world, the correspondent would receive the error message, enquire further, be told why his/her message wasn't usable, and fix his/her software. In practice, this is unlikely to happen in this particular case and the messages are considered of high value to the organisation. So, what to do? I've thought of four possibilities: One: validate all data received via RT and pass it out to a heuristic routine which would substitute all invalid characters by some number of U+FFFD characters before storing the message. This might be controversial behaviour if the expectation is that RT stores what was supplied to it. An alternative approach would be to alter the database scheme to allow for an attachment with unknown or invalid encoding; the binary data would be stored unmodified, and the web interface would offer for download the raw data for interpreting at the user's whim. A third approach might involve filtering the incoming message outside of RT; this might be the most practical way to achieve the behaviour we desire, especially since it could be easily contained to individual queues. Yet another acceptable workaround might be a much smaller modification to notify the queue owners that a message failed to be stored, as well as the correspondent. Our logs indicate we've had 9 such occurrences (although some may relate to a separate UTF8 related bug fixed in 3.8.8 which we've only just installed) over 37,000 tickets so it's not a particularly common problem. I would be interested to hear of anyone else encountering this issue, and any work taken to improve the situation for the unfortunate recipient of highly important garbage emails. When it comes down to both user expectations, and the oft-quoted principal of being liberal in what one accepts, there is clearly some room for improvement here. Cheers, Dominic. -- Dominic Hargreaves, Systems Development and Support Team Computing Services, University of Oxford
signature.asc
Description: Digital signature
Discover RT's hidden secrets with RT Essentials from O'Reilly Media. Buy a copy at http://rtbook.bestpractical.com
