> From: Philippe Verdy <verd...@wanadoo.fr>
> Date: Sun, 9 Sep 2018 19:35:47 +0200
> Cc: Richard Wordingham <richard.wording...@ntlworld.com>, 
>       unicode Unicode Discussion <unicode@unicode.org>
> 
>  In Emacs, buffer text is a character string with a gap, actually.
> 
> A text buffer with gaps is a complex structure, not just a plain string.

The difference is very small, and a couple of macros allow you to
almost forget about the gap.

> I doubt it constantly uses a single gap at end (insertions and deletions in 
> the middle would
> constant move large blocks and use excessive CPU and memory bandwidth, with 
> very slow response: users
> do not want to see what they type appearing on the screen at one keystroke 
> every few seconds because each
> typed key causes massive block moves and excessive memory paging from/to disk 
> while this move is being
> performed).

In Emacs, the gap is always where the text is inserted or deleted, be
it in the middle of text or at its end.

> All editors I have seen treat the text as ordered collections of small 
> buffers (these small buffers may still have
> small gaps), which are occasionnally merged or splitted when needed (merging 
> does not cause any
> reallocation but may free one of the buffers), some of them being paged out 
> to tempoary files when memory is
> stressed. There are some heuristics in the editor's code to when mainatenance 
> of the collection is really
> needed and useful for the performance.

My point was to say that Emacs is not one of these editors you
describe.

> But beside this the performance cost of UTF indexing of the codepoints is 
> invisible: each buffer will only need
> to avoid breaking text between codepoint boundaries, if the current encoding 
> of the edited text is an UTF. An
> editor may also avoid breaking buffers in the middle of clusters if they 
> render clusters (including ligatures if
> they are supported): clusters are still small in size in every encoding and 
> reasonnable buffer sizes can hold at
> least hundreds of clusters (even the largest ones which occur rarely). How 
> editors will manage clusters to
> make them editable is dependant of the implementation, buyt even the UTF or 
> codepoints boundaries are not
> enough to handle that. In all cases the logical text buffer is structured 
> with a complex backing store, where
> parts may be paged out (and will also include more than just the current 
> text, notably it will include parts of the
> indexes, possibly in another temporary working file).

You ignore or disregard the need to represent raw bytes in editor
buffers.  That is when the encoding stops being "invisible".

Reply via email to