Re: [VM] recoding mime parts (was: searching in mime encoded email)

John Hein Thu, 19 Jan 2012 08:45:04 -0800

Julian Bradfield wrote at 14:54 +0000 on Jan 19, 2012:
 > On 2012-01-19, John Hein <[email protected]> wrote:
 > > I think we should mark the new saved message as 'edited' since it's
 > > different than the original.  It will also be interesting when
 > > "illegal" characters appear in the decoding.  We could only allow the
 > > re-coding for text/ mime, but wrong mime type hints are known to
 > > happen.  Maybe re-coding to quoted-printable?  Or refuse to recode
 > > when non-printable characters show up, but that may be hard to do.
 > 
 > What do you mean by an illegal character? Why would you want to stop
 > decoding of, say, PDFs to binary? It would save time and space later.
 > The main problem with textual search is that the character encoding
 > may vary from message to message, and even from part to part within a
 > MIME message. Because VM folders are, and have to be, binary, you
 > can't search for non-ASCII characters within a folder. I don't see a
 > good solution to this, excepting transcoding everything to utf-8
 > before saving.


Short answer: I'm not sure what the best solution might be either.
And I'm not suffering from any delusions that this would be a simple
task with a one-size-fits-all solution.

I guess an 'illegal' character would be a character that does not
belong in an email message per RFC.  Perhaps put another way - no
character that would choke a mail reader such as vm or other mail
handler.

Longer...
That said, when "exporting" a message, one may have plans to use it
outside a mail reader, but that's beyond the scope of what I was
thinking.  And we more or less have a tool for that already (to save
mime parts) - vm-mime-save-all-attachments.  That doesn't re-save the
parts in place in the message, of course.

So transcoding to utf-8 would probably be out since you can't have raw
utf-8 in an email message and expect all email handling tools to be
happy with it.  That said, grep (and emacs?) can be told to search
utf-8 input, so it would be useful at some level.  Can vm handle
messages with raw utf-8 in the body?

Changing a plain text base64 that only has 7-bit ascii in the decoded
stream to 7-bit ascii encoding and resaving the mime part with the
appropriate encoding hint would be one example of the sort of "legal"
transformation I had in my head.

Transforming to quoted-printable if possible seems legal as well and
opens up the space beyond 7-bit ascii.  Grepping through encoded
quoted-printable may be useful in many cases as long as you
set your expectations appropriately (e.g., = is =3d).

Generally, I only care about re-encoding to something that will make
it easier to use grep or the like.  So saving a pdf to a hunk of
binary within the message would be something I didn't want.  If I want
to search a pdf I generally have to use some other tool (from
strings(1) at a minimum for certain pdfs to pdftotext to interactive
pdf reading tools) anyway, so re-saving a pdf mime part to binary then
using the tool isn't really better than having to add a mime decoder
in front of that tool.

I sometimes find myself using vm-edit-message 'mimencode -u -b' on a
mime part for various needs.  This imagined re-encoder would help
with that in addition to times when I use $ | with various tools
to muck with decoded mime parts of a message.

Re: [VM] recoding mime parts (was: searching in mime encoded email)

Reply via email to