I RTFS'd mg a while back looking to add UTF-8 support, and the biggest problem for bloating and complexity seemed to be the layer that manages the screen contents for refresh optimization. Basic UTF-8 support would require enlarging it quite a bit, and full support (including nonspacing marks) would require a system where the lines stored in this buffer could grow unboundedly large.
Thus, the potentially unpopular fix I'd propose for consideration is to rip out the whole refresh-optimizing layer and move screen updates to be called directly from the code that modifies the buffer contents, as the modifications take place. This would eliminate a whole layer of bloat in terms of performance, memory usage, and code size, after which adding UTF-8 support would still be a net win. Seeing as this sort of curses-style refresh optimization was one of the original "technologies" emacs pioneered, some folks might be sad to see it go, but I suspect killing it is the right choice for a _light_ emacs anyway. Without the need for display management structures getting in the way, UTF-8 support (or preferably multibyte character support based on whatever the configured locale uses) should not be too difficult. It's just a matter of updating the editing functions to treat whole UTF-8 sequences as a unit for deletion, etc., and of making sure the new display code handles them and correctly computes column widths. BTW: Trent's recommendations of other Emacsen are somewhat inaccurate. XEmacs will silently and severely corrupt UTF-8 text containing any characters not present in the old mule charsets when loading and resaving a file, even if no changes are made. GNU Emacs 22 is somewhat usable but still has column width issues for CJK-wide and nonspacing characters. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]