Re: LSP: cursor positioning on a multi-byte character with composing characters

Bram Moolenaar Sat, 10 Jun 2023 06:26:39 -0700


Yegappan wrote:


> > > I am updating the Vim9 LSP plugin to support various position
> > > encodings (utf-8, utf-16 and utf-32).
> > > I ran into a problem with positioning the cursor on a multibyte
> > > character with composing characters.
> > >
> > > The LSP plugin uses the Vim function setcursorcharpos() to position
> > > the cursor.  This function ignores composing characters. The LSP
> > > server counts the composing characters separately from
> > > the base character.  So when using the character index returned by the
> > > LSP server to
> > > position the cursor, the cursor is placed in an incorrect column.
> > >
> > > e.g:
> > >
> > > void fn(int aVar)
> > > {
> > >     printf("aVar = %d\n", aVar);
> > >     printf("𐐟˜Š𐐟˜Š𐐟˜Š𐐟˜Š = %d\n", aVar);
> > >     printf("áb́áb́ = %d\n", aVar);
> > >     printf("ą́ą́ą́ą́ = %d\n", aVar);
> > > }
> > >
> > > I have tried this test with clangd, pyright and gopls language servers
> > > and all of them count the
> > > composing characters as separate characters.
> > >
> > > One approach to solve this issue is to add an optional argument to the
> > > setcursorcharpos() function
> > > that either counts or ignores composing characters. The default is to
> > > ignore the composing
> > > characters.  Another approach is to add a function that computes the
> > > character offset ignoring the composing characters from a character
> > > offset that includes the composing characters.
> > >
> > > Any suggestions?
> >
> > Whether to count composing characters separately or not applies to many
> > functions.  Adding a flag to each function to specify how composing
> > characters are to be handled is going to require a lot of changes.  And
> > even for setcursorcharpos() I don't see a good way to add this flag.
> >
> > Assuming we have the text, using a separate function to ignore composing
> > characters would be a separate step and a universal solution.  I suppose
> > it could be something like:
> >
> >         idx_without = charpos_dropcomposing({text}, {idx_with})
> >
> > It may not be needed now, but the opposite should be possible:
> >
> >         idx_with = charpos_addcomposing({text}, {idx_without})
> >
> > Hopefully we can think of better (shorter) names.
> >
> 
> I have created PR https://github.com/vim/vim/pull/12513 to add these
> two new functions.  Should we merge these two functions into a single
> function with an argument to specify whether to count or not count
> combining characters?

Thanks for working on this.  My main concern at first is that the user
will be confused by seeing three functions:

        charidx({string}, {idx} [, {countcc} [, {utf16}]])
        charidx_addcc({string}, {idx})
        charidx_dropcc({string}, {idx})

Only when reading the details we can find out that the {idx} of
charidx() is a byte index, the other two are character indexes.
Changing the argument name to {byteidx} would help.  We may have to do
that for other functions as well, to keep consistency.

Having the {countcc} argument for charidx() and a separate function name
for the other two is confusing.  Also because "addcc" and "dropcc" can
be seen as an alternative for {countcc} (and that's not really
incorrect), but there is no hint that the {idx} argument is used
differently.

Alternatively there would be a function that does have the {countcc}
argument and the name indicating that {idx} is a character index:

        charidx_XXX({string}, {idx}, {countcc})

However, is this {countcc} argument really doing the same thing?  The
help for charidx() says:

        When {countcc} is omitted or |FALSE|, then composing characters
        are not counted separately, their byte length is added to the
        preceding base character.
        When {countcc} is |TRUE|, then composing characters are
        counted as separate characters.

We can't use exactly the same for charidx_XXX(), since the index is not
in bytes.  And using a character index, we would have to mention whether
composing characters are counted separately.  This gets confusing, an
argument {countcc} which actually means something else, depending on
whether you look at the input or the result.

It's probably better to use two separate functions.  I hope we find
better names though.

The help for the new functions should be extra clear, since it's easy to
misunderstand.  We can discuss that on the PR.

-- 
Drink wet cement and get really stoned.

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///                                                                      \\\
\\\        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_dev/20230610132632.2D4D51C0642%40moolenaar.net.

Re: LSP: cursor positioning on a multi-byte character with composing characters

Raspunde prin e-mail lui