On Sun, 13 Oct 2013 04:22:02 +0100, Baz Walter <[email protected]> wrote: > On 12/10/13 23:29, Baz Walter wrote: >> On 12/10/13 11:52, Phil Thompson wrote: >>> So I need to call SCI_SETWORDCHARS when a lexer is set using the value >>> returned by the lexer's wordCharacters() method. >>> >>> Is this likely to cause any unforeseen problems? >> >> As usual with Scintilla, the main source of potential problems is >> single-byte vs multi-byte encodings. For latin-1, any byte in the range >> 0-255 can be set as a word character. But for utf-8, only the ascii >> range is relevant - all unicode characters above 127 are always treated >> as word characters, regardless of what has been set using >> SCI_SETWORDCHARS. >> >> However, Scintilla's default set of word characters (i.e. those set via >> SCI_SETCHARSDEFAULT) includes the standard alphanumerics and underscore, >> *plus* all the characters in the range 128-255 (regardless of the >> code-page setting). >> >> So, assuming the current lexer wordCharacters functions only ever return >> ascii, there is some potential for changes in behaviour if QScintilla is >> being used in *latin-1* mode (utf-8 mode should be unaffected). >> >> The only other potential issue I can think of at the moment, is that >> setting the word characters automatically resets the whitespace and >> punctuation characters to their default values. >> > > One area that I didn't consider was auto-completion. I concocted my own > implementation of this a long time ago, and so I haven't used > QScintilla's version of it much. > > After having a look at the source, I'm wondering whether things may be > more complicated than I thought. > > It seems the lexer wordCharacters method *must* return ascii, because > auto-completion only ever looks at *single bytes*. Things could break in
> utf-8 mode if wordCharacters included some random non-ascii bytes and a > multi-byte character was encountered. (For example, if the lead byte of > a multi-byte sequence was included in the word characters, but not its > continuation bytes, it might result in an attempt to insert text at an > invalid position). > > On top of that, auto-completion also uses Scintilla's search apis to > find the start of words (which in turn depends on Scintilla's definition > of word characters). What happens if the lexer's definition of word > characters conflicts with Scintilla's? Possibly there are some > edge-cases where this might matter, but I confess I'm not sure. ...in other words a can of worms. I won't change it then. Maybe QScintilla should be a fork of Scintilla rather than a port. Thanks for looking into this, Phil _______________________________________________ QScintilla mailing list [email protected] http://www.riverbankcomputing.com/mailman/listinfo/qscintilla
