Julian Bradfield wrote at 14:54 +0000 on Jan 19, 2012: > On 2012-01-19, John Hein <[email protected]> wrote: > > I think we should mark the new saved message as 'edited' since it's > > different than the original. It will also be interesting when > > "illegal" characters appear in the decoding. We could only allow the > > re-coding for text/ mime, but wrong mime type hints are known to > > happen. Maybe re-coding to quoted-printable? Or refuse to recode > > when non-printable characters show up, but that may be hard to do. > > What do you mean by an illegal character? Why would you want to stop > decoding of, say, PDFs to binary? It would save time and space later. > The main problem with textual search is that the character encoding > may vary from message to message, and even from part to part within a > MIME message. Because VM folders are, and have to be, binary, you > can't search for non-ASCII characters within a folder. I don't see a > good solution to this, excepting transcoding everything to utf-8 > before saving.
Short answer: I'm not sure what the best solution might be either. And I'm not suffering from any delusions that this would be a simple task with a one-size-fits-all solution. I guess an 'illegal' character would be a character that does not belong in an email message per RFC. Perhaps put another way - no character that would choke a mail reader such as vm or other mail handler. Longer... That said, when "exporting" a message, one may have plans to use it outside a mail reader, but that's beyond the scope of what I was thinking. And we more or less have a tool for that already (to save mime parts) - vm-mime-save-all-attachments. That doesn't re-save the parts in place in the message, of course. So transcoding to utf-8 would probably be out since you can't have raw utf-8 in an email message and expect all email handling tools to be happy with it. That said, grep (and emacs?) can be told to search utf-8 input, so it would be useful at some level. Can vm handle messages with raw utf-8 in the body? Changing a plain text base64 that only has 7-bit ascii in the decoded stream to 7-bit ascii encoding and resaving the mime part with the appropriate encoding hint would be one example of the sort of "legal" transformation I had in my head. Transforming to quoted-printable if possible seems legal as well and opens up the space beyond 7-bit ascii. Grepping through encoded quoted-printable may be useful in many cases as long as you set your expectations appropriately (e.g., = is =3d). Generally, I only care about re-encoding to something that will make it easier to use grep or the like. So saving a pdf to a hunk of binary within the message would be something I didn't want. If I want to search a pdf I generally have to use some other tool (from strings(1) at a minimum for certain pdfs to pdftotext to interactive pdf reading tools) anyway, so re-saving a pdf mime part to binary then using the tool isn't really better than having to add a mime decoder in front of that tool. I sometimes find myself using vm-edit-message 'mimencode -u -b' on a mime part for various needs. This imagined re-encoder would help with that in addition to times when I use $ | with various tools to muck with decoded mime parts of a message.
