Hi Carlos,

thanks for reviewing!

On Tue, 17 Jul 2018 19:18:36 +0200
Carlos Garnacho <[email protected]> wrote:

> Hi!,
> 
> (Way way late, trying to revive the conversation...)
> 
> On Thu, May 3, 2018 at 9:22 PM, Dorota Czaplejewicz
> <[email protected]> wrote:
> > On Thu, 3 May 2018 20:47:27 +0200
> > Silvan Jegen <[email protected]> wrote:
> >  
> >> Hi Dorota
> >>
> >> Some comments and typo fixes below.
> >>
> >> On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote:  
> >> > This new protocol description is a simplification over v2.
> >> >
> >> > - All pre-edit text styling is gone.
> >> > - Pre-edit cursor can span characters.
> >> > - No events regarding input panel (OSK) state nor covered rectangle.
> >> >   Compositors are still free to handle situations where the keyboard
> >> >   focus rectangle is covered by the input panel.
> >> > - No set_preferred_language request for clients.
> >> > - There is no event to send keysyms. Compositors can use wl_keyboard
> >> >   interface instead.
> >> > - All state is double-buffered, with specified state.
> >> > - Use Unicode codepoints to measure strings.
> >> >
> >> > Signed-off-by: Dorota Czaplejewicz <[email protected]>
> >> > Signed-off-by: Carlos Garnacho <[email protected]>
> >> > ---
> >> > This is the next update coming from Purism to perfect the text input 
> >> > protocol.
> >> >
> >> > The following changes added on top of PATCHv3:
> >> >
> >> > - Fixed whitespaces.
> >> > - Removed enable flags - the same information can be gathered from the 
> >> > first requests after enter.
> >> > - Changed offsets inside UTF-8 strings to use Unicode character counts 
> >> > in order to remove the possibility of communicating invalid state.
> >> > - Specified the exact lifetime of double-buffered state, and initial 
> >> > values.
> >> > - Made changes requested by the IM double-buffered.
> >> >
> >> > Some questions remain open. One is: how to specify how much text to 
> >> > capture in set_surrounding_text, and how often to update?  
> 
> IMHO the only reason to state it here is that it's more likely that a
> lazy implementation will try to squeeze a full book here, than eg. an
> application setting an insanely long title. But certainly other
> messages across protocols may hit this limit (the long title issue
> wasn't made up :).
> 
> As for how much, I think it ultimately depends on the IM behind. Text
> correction probably just wants the current word, any sort of
> prediction will probably require phrases to paragraphs, char
> composition can probably do without. Sounds like this could be some
> sort of hint, but I don't think IMs can tell you today how much text
> do they want...
> 
> >> >
> >> > A possible change that I decided against for now is to replace 
> >> > enable/disable events by create/destroy of a new object, which would 
> >> > make more state lifetimes encoded in the protocol.
> >> >
> >> > After reading a blog post on fcitx [0], I got the impression that 
> >> > letting the compositor know some persistent ID of a text edit instance 
> >> > could be useful, however I'm not sure what the use cases are.
> >> >
> >> > As always, I'm happy to hear feedback.
> >> >
> >> > Cheers,
> >> > Dorota Czaplejewicz
> >> >
> >> > [0] 
> >> > https://www.csslayer.info/wordpress/fcitx-dev/gaps-between-wayland-and-fcitx-or-all-input-methods/
> >> >
> >> >  Makefile.am                                    |   1 +
> >> >  unstable/text-input/text-input-unstable-v3.xml | 362 
> >> > +++++++++++++++++++++++++
> >> >  2 files changed, 363 insertions(+)
> >> >  create mode 100644 unstable/text-input/text-input-unstable-v3.xml
> >> >
> >> > diff --git a/Makefile.am b/Makefile.am
> >> > index 4b9a901..86d7ca9 100644
> >> > --- a/Makefile.am
> >> > +++ b/Makefile.am
> >> > @@ -3,6 +3,7 @@ unstable_protocols =                                     
> >> >                            \
> >> >     unstable/fullscreen-shell/fullscreen-shell-unstable-v1.xml           
> >> >    \
> >> >     unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml                   
> >> >    \
> >> >     unstable/text-input/text-input-unstable-v1.xml                       
> >> >    \
> >> > +   unstable/text-input/text-input-unstable-v3.xml                       
> >> >    \
> >> >     unstable/input-method/input-method-unstable-v1.xml                   
> >> >    \
> >> >     unstable/xdg-shell/xdg-shell-unstable-v5.xml                         
> >> >    \
> >> >     unstable/xdg-shell/xdg-shell-unstable-v6.xml                         
> >> >    \
> >> > diff --git a/unstable/text-input/text-input-unstable-v3.xml 
> >> > b/unstable/text-input/text-input-unstable-v3.xml
> >> > new file mode 100644
> >> > index 0000000..ed5204f
> >> > --- /dev/null
> >> > +++ b/unstable/text-input/text-input-unstable-v3.xml
> >> > @@ -0,0 +1,362 @@
> >> > +<?xml version="1.0" encoding="UTF-8"?>
> >> > +
> >> > +<protocol name="text_input_unstable_v3">
> >> > +  <copyright>
> >> > +    Copyright © 2012, 2013 Intel Corporation
> >> > +    Copyright © 2015, 2016 Jan Arne Petersen
> >> > +    Copyright © 2017, 2018 Red Hat, Inc.
> >> > +    Copyright © 2018 Purism SPC
> >> > +
> >> > +    Permission to use, copy, modify, distribute, and sell this
> >> > +    software and its documentation for any purpose is hereby granted
> >> > +    without fee, provided that the above copyright notice appear in
> >> > +    all copies and that both that copyright notice and this permission
> >> > +    notice appear in supporting documentation, and that the name of
> >> > +    the copyright holders not be used in advertising or publicity
> >> > +    pertaining to distribution of the software without specific,
> >> > +    written prior permission.  The copyright holders make no
> >> > +    representations about the suitability of this software for any
> >> > +    purpose.  It is provided "as is" without express or implied
> >> > +    warranty.
> >> > +
> >> > +    THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
> >> > +    SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
> >> > +    FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
> >> > +    SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> >> > +    WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
> >> > +    AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
> >> > +    ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
> >> > +    THIS SOFTWARE.
> >> > +  </copyright>
> >> > +
> >> > +  <interface name="zwp_text_input_v3" version="1">
> >> > +    <description summary="text input">
> >> > +      The zwp_text_input_v3 interface represents text input and input 
> >> > methods
> >> > +      associated with a seat. It provides enter/leave events to follow 
> >> > the
> >> > +      text input focus for a seat.
> >> > +
> >> > +      Requests are used to enable/disable the text-input object and set
> >> > +      state information like surrounding and selected text or the 
> >> > content type.
> >> > +      The information about the entered text is sent to the text-input 
> >> > object
> >> > +      via the pre-edit and commit_string events.
> >> > +
> >> > +      Text is valid UTF-8 encoded, indices and lengths are in code 
> >> > points. If a
> >> > +      grapheme is made up of multiple code points, an index pointing to 
> >> > any of
> >> > +      them should be interpreted as pointing to the first one.  
> >>
> >> That way we make sure we don't put the cursor/anchor between bytes that
> >> belong to the same UTF-8 encoded Unicode code point which is nice. It
> >> also means that the client has to parse all the UTF-8 encoded strings
> >> into Unicode code points up to the desired cursor/anchor position
> >> on each "preedit_string" event. For each "delete_surrounding_text" event
> >> the client has to parse the UTF-8 sequences before and after the cursor
> >> position up to the requested Unicode code point.
> >>
> >> I feel like we are processing the UTF-8 string already in the
> >> input-method. So I am not sure that we should parse it again on the
> >> client side. Parsing it again would also mean that the client would need
> >> to know about UTF-8 which would be nice to avoid.
> >>
> >> Thoughts?  
> >
> > The client needs to know about Unicode, but not necessarily about UTF-8. 
> > Specifying code points is actually an advantage here, because byte offsets 
> > are inherently expressed relative to UTF-8. By counting with code points, 
> > client's internal representation can be UTF-16 or maybe even something 
> > else.  
> 
> I personally think byte offsets are more handy than codepoints:
> pointer math is O(1) and str*() functions are "sensible" (on UTF-8 at
> least, and past the bytes!=chars gotchas), it's relatively simple to
> find out whether you are in the middle of a UTF-8 char, it seems
> simpler to deal with than the other way around if utf16/codepoints are
> used in either side; and this might even be moot as all parties are
> interested in chopping strings between word/char boundaries.
> 
> As for using UTF-8 specifically, other protocols do use it for
> exchange of strings (eg. xdg_surface.set_title). It's the perfect fit
> for glib/pango/etc, so it wouldn't be me who objects, either :).
> 
> Cheers,
>   Carlos

I think you're tipping the scales here. In the interest of having the protocol 
move forward I'm changing code points to bytes, since I don't think they make a 
huge difference in practice. v5 incoming!

Cheers,
Dorota

Attachment: pgp0Nrcz4VhMi.pgp
Description: OpenPGP digital signature

_______________________________________________
wayland-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Reply via email to