[scintilla] Improving performance of per-line data, separating cell buffer

Neil Hodgson Mon, 01 Jan 2007 21:57:24 -0800

  Scintilla currently uses simple expandable arrays to store per-line
data. Per-line data includes line start positions, folding level,
markers and lexer line state. When a line is inserted or deleted, each
line after that is moved. Whenever any text is inserted or removed,
each line after that has to have the text length added or subtracted
from its start position. This is reasonably fast when interactively
editing source code of normal length but can be slow for intensive
editing or when editing large files.


  Changing the per-line data to use split vectors, similar to the
text+style buffers already used makes this more efficient since only
the lines between the current insertion/deletion and the previous are
copied and modifications are often close to the previous modification.
The markers data is now only allocated for all the lines when the
first marker is added so applications that do not use markers will use
less memory.

  To minimize the cost of maintaining the line start positions when
inserting and removing text, a step is included so that all line
starts after the step line have the step value added. Thus if the step
is on line 10 and a character is added to line 10 then the step value
is incremented. If a character is added to line 20 the step is moved
there (by adding the step value to intervening lines) before
incrementing the step value. This data structure has been part of
SinkWorld for a couple of years so has received some testing.

  The lexer line state is left as a simple expandable vector since it
is appended to in order during each lex and there are no insertions or
deletions.

  There was a performance problem caused by folding when inserting a
large piece of text onto a blank line. The level of the blank line
(including its whitespace flag) was copied onto each newly inserted
line which led to each line being considered subordinate (whitespace
lines are always subordinate) which then caused large blocks to be
processed by the folding code. This was exacerbated by
ContractionState::SetVisible invalidating the whole ContractionState
even if the lines being made visible were already visible.

  The character bytes and style bytes are now separated into two
objects (substance and style) inside CellBuffer. I won't be
implementing different sized characters or styling information but
this change should make it easier for others that want to make these
changes.

  These modifications have changed very fundamental parts of
Scintilla and are likely to have caused new bugs and to have changed
performance so it would be good to see them tested and any bugs
reported.

  Available from CVS and from
http://scintilla.sourceforge.net/scite.zip Source
http://scintilla.sourceforge.net/wscite.zip Windows executable

  Neil
_______________________________________________
Scintilla-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scintilla-interest

[scintilla] Improving performance of per-line data, separating cell buffer

Reply via email to