I RTFS'd mg a while back looking to add UTF-8 support, and the biggest
problem for bloating and complexity seemed to be the layer that
manages the screen contents for refresh optimization. Basic UTF-8
support would require enlarging it quite a bit, and full support
(including nonspacing marks) would require a system where the lines
stored in this buffer could grow unboundedly large.

Thus, the potentially unpopular fix I'd propose for consideration is
to rip out the whole refresh-optimizing layer and move screen updates
to be called directly from the code that modifies the buffer contents,
as the modifications take place. This would eliminate a whole layer of
bloat in terms of performance, memory usage, and code size, after
which adding UTF-8 support would still be a net win. Seeing as this
sort of curses-style refresh optimization was one of the original
"technologies" emacs pioneered, some folks might be sad to see it go,
but I suspect killing it is the right choice for a _light_ emacs
anyway.

Without the need for display management structures getting in the way,
UTF-8 support (or preferably multibyte character support based on
whatever the configured locale uses) should not be too difficult. It's
just a matter of updating the editing functions to treat whole UTF-8
sequences as a unit for deletion, etc., and of making sure the new
display code handles them and correctly computes column widths.

BTW: Trent's recommendations of other Emacsen are somewhat inaccurate.
XEmacs will silently and severely corrupt UTF-8 text containing any
characters not present in the old mule charsets when loading and
resaving a file, even if no changes are made. GNU Emacs 22 is somewhat
usable but still has column width issues for CJK-wide and nonspacing
characters.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to