>>>>> On Fri, 20 Jan 2012, Uday Reddy wrote: > Julian Bradfield writes: >> On 2012-01-19, Uday Reddy <[email protected]> wrote: >> > More generally, I am thinking that there is no reason why we >> > can't have VM folders stored in some other character set, other >> > than US-ASCII, e.g., UTF-8. Those folders won't be interoperable >> > with other mail clients, but do >> >> VM folders are simply binary files. The character set of a given >> message - or subpart of a message - is determined by its MIME >> charset. >> >> If you wanted, you could transcode all non-utf-8 parts to utf-8, >> but the folder would still be a binary file; it would just be a >> binary file that happened also to be valid utf-8 as a whole.
> Oh, perhaps you are saying they are "binary" as opposed to "ASCII". > I think that is a matter of view point. I think that Julian is right here. Folders don't have any specific character encoding, they are simply a stream of bytes. (In terms of coding systems, it's "raw-text".) Character sets come into play on the level of individual messages, and they're specified by the message's (or part's in case of multipart messages) MIME headers. There's another aspect why general recoding of saved messages might not be a good idea: A message can be PGP signed, and any change of encoding will destroy the message's integrity and therefore render the signature invalid. > [...] > You are probably thinking of the folder as being made up of bytes as > opposed to characters. That is a fine view point to take as long as > you don't care to search. But searching is what this thread is > about! Seems like the search function must MIME-decode each message then. I've no idea though if doing so would be fast enough. Before reinventing the wheel, maybe it would be worthwhile to look at dedicated search tools like mairix. There's also an Emacs interface for it, see <http://randomsample.de/mairix-el-doc/>. Ulrich
