Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
Hi Carlos, thanks for reviewing! On Tue, 17 Jul 2018 19:18:36 +0200 Carlos Garnacho wrote: > Hi!, > > (Way way late, trying to revive the conversation...) > > On Thu, May 3, 2018 at 9:22 PM, Dorota Czaplejewicz > wrote: > > On Thu, 3 May 2018 20:47:27 +0200 > > Silvan Jegen wrote: > > > >> Hi Dorota > >> > >> Some comments and typo fixes below. > >> > >> On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote: > >> > This new protocol description is a simplification over v2. > >> > > >> > - All pre-edit text styling is gone. > >> > - Pre-edit cursor can span characters. > >> > - No events regarding input panel (OSK) state nor covered rectangle. > >> > Compositors are still free to handle situations where the keyboard > >> > focus rectangle is covered by the input panel. > >> > - No set_preferred_language request for clients. > >> > - There is no event to send keysyms. Compositors can use wl_keyboard > >> > interface instead. > >> > - All state is double-buffered, with specified state. > >> > - Use Unicode codepoints to measure strings. > >> > > >> > Signed-off-by: Dorota Czaplejewicz > >> > Signed-off-by: Carlos Garnacho > >> > --- > >> > This is the next update coming from Purism to perfect the text input > >> > protocol. > >> > > >> > The following changes added on top of PATCHv3: > >> > > >> > - Fixed whitespaces. > >> > - Removed enable flags - the same information can be gathered from the > >> > first requests after enter. > >> > - Changed offsets inside UTF-8 strings to use Unicode character counts > >> > in order to remove the possibility of communicating invalid state. > >> > - Specified the exact lifetime of double-buffered state, and initial > >> > values. > >> > - Made changes requested by the IM double-buffered. > >> > > >> > Some questions remain open. One is: how to specify how much text to > >> > capture in set_surrounding_text, and how often to update? > > IMHO the only reason to state it here is that it's more likely that a > lazy implementation will try to squeeze a full book here, than eg. an > application setting an insanely long title. But certainly other > messages across protocols may hit this limit (the long title issue > wasn't made up :). > > As for how much, I think it ultimately depends on the IM behind. Text > correction probably just wants the current word, any sort of > prediction will probably require phrases to paragraphs, char > composition can probably do without. Sounds like this could be some > sort of hint, but I don't think IMs can tell you today how much text > do they want... > > >> > > >> > A possible change that I decided against for now is to replace > >> > enable/disable events by create/destroy of a new object, which would > >> > make more state lifetimes encoded in the protocol. > >> > > >> > After reading a blog post on fcitx [0], I got the impression that > >> > letting the compositor know some persistent ID of a text edit instance > >> > could be useful, however I'm not sure what the use cases are. > >> > > >> > As always, I'm happy to hear feedback. > >> > > >> > Cheers, > >> > Dorota Czaplejewicz > >> > > >> > [0] > >> > https://www.csslayer.info/wordpress/fcitx-dev/gaps-between-wayland-and-fcitx-or-all-input-methods/ > >> > > >> > Makefile.am| 1 + > >> > unstable/text-input/text-input-unstable-v3.xml | 362 > >> > + > >> > 2 files changed, 363 insertions(+) > >> > create mode 100644 unstable/text-input/text-input-unstable-v3.xml > >> > > >> > diff --git a/Makefile.am b/Makefile.am > >> > index 4b9a901..86d7ca9 100644 > >> > --- a/Makefile.am > >> > +++ b/Makefile.am > >> > @@ -3,6 +3,7 @@ unstable_protocols = > >> >\ > >> > unstable/fullscreen-shell/fullscreen-shell-unstable-v1.xml > >> >\ > >> > unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml > >> >\ > >> > unstable/text-input/text-input-unstable-v1.xml > >> >\ > >> > + unstable/text-input/text-input-unstable-v3.xml > >> >\ > >> > unstable/input-method/input-method-unstable-v1.xml > >> >\ > >> > unstable/xdg-shell/xdg-shell-unstable-v5.xml > >> >\ > >> > unstable/xdg-shell/xdg-shell-unstable-v6.xml > >> >\ > >> > diff --git a/unstable/text-input/text-input-unstable-v3.xml > >> > b/unstable/text-input/text-input-unstable-v3.xml > >> > new file mode 100644 > >> > index 000..ed5204f > >> > --- /dev/null > >> > +++ b/unstable/text-input/text-input-unstable-v3.xml > >> > @@ -0,0 +1,362 @@ > >> > + > >> > + > >> > + > >> > + > >> > +Copyright © 2012, 2013 Intel Corporation > >> > +Copyright © 2015, 2016 Jan Arne Petersen > >> > +Copyright © 2017, 2018 Red Hat, Inc. > >> > +Copyright © 2018 Purism SPC > >> > + >
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
Hi!, (Way way late, trying to revive the conversation...) On Thu, May 3, 2018 at 9:22 PM, Dorota Czaplejewicz wrote: > On Thu, 3 May 2018 20:47:27 +0200 > Silvan Jegen wrote: > >> Hi Dorota >> >> Some comments and typo fixes below. >> >> On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote: >> > This new protocol description is a simplification over v2. >> > >> > - All pre-edit text styling is gone. >> > - Pre-edit cursor can span characters. >> > - No events regarding input panel (OSK) state nor covered rectangle. >> > Compositors are still free to handle situations where the keyboard >> > focus rectangle is covered by the input panel. >> > - No set_preferred_language request for clients. >> > - There is no event to send keysyms. Compositors can use wl_keyboard >> > interface instead. >> > - All state is double-buffered, with specified state. >> > - Use Unicode codepoints to measure strings. >> > >> > Signed-off-by: Dorota Czaplejewicz >> > Signed-off-by: Carlos Garnacho >> > --- >> > This is the next update coming from Purism to perfect the text input >> > protocol. >> > >> > The following changes added on top of PATCHv3: >> > >> > - Fixed whitespaces. >> > - Removed enable flags - the same information can be gathered from the >> > first requests after enter. >> > - Changed offsets inside UTF-8 strings to use Unicode character counts in >> > order to remove the possibility of communicating invalid state. >> > - Specified the exact lifetime of double-buffered state, and initial >> > values. >> > - Made changes requested by the IM double-buffered. >> > >> > Some questions remain open. One is: how to specify how much text to >> > capture in set_surrounding_text, and how often to update? IMHO the only reason to state it here is that it's more likely that a lazy implementation will try to squeeze a full book here, than eg. an application setting an insanely long title. But certainly other messages across protocols may hit this limit (the long title issue wasn't made up :). As for how much, I think it ultimately depends on the IM behind. Text correction probably just wants the current word, any sort of prediction will probably require phrases to paragraphs, char composition can probably do without. Sounds like this could be some sort of hint, but I don't think IMs can tell you today how much text do they want... >> > >> > A possible change that I decided against for now is to replace >> > enable/disable events by create/destroy of a new object, which would make >> > more state lifetimes encoded in the protocol. >> > >> > After reading a blog post on fcitx [0], I got the impression that letting >> > the compositor know some persistent ID of a text edit instance could be >> > useful, however I'm not sure what the use cases are. >> > >> > As always, I'm happy to hear feedback. >> > >> > Cheers, >> > Dorota Czaplejewicz >> > >> > [0] >> > https://www.csslayer.info/wordpress/fcitx-dev/gaps-between-wayland-and-fcitx-or-all-input-methods/ >> > >> > Makefile.am| 1 + >> > unstable/text-input/text-input-unstable-v3.xml | 362 >> > + >> > 2 files changed, 363 insertions(+) >> > create mode 100644 unstable/text-input/text-input-unstable-v3.xml >> > >> > diff --git a/Makefile.am b/Makefile.am >> > index 4b9a901..86d7ca9 100644 >> > --- a/Makefile.am >> > +++ b/Makefile.am >> > @@ -3,6 +3,7 @@ unstable_protocols = >> > \ >> > unstable/fullscreen-shell/fullscreen-shell-unstable-v1.xml >> > \ >> > unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml >> > \ >> > unstable/text-input/text-input-unstable-v1.xml >> > \ >> > + unstable/text-input/text-input-unstable-v3.xml >> > \ >> > unstable/input-method/input-method-unstable-v1.xml >> > \ >> > unstable/xdg-shell/xdg-shell-unstable-v5.xml >> > \ >> > unstable/xdg-shell/xdg-shell-unstable-v6.xml >> > \ >> > diff --git a/unstable/text-input/text-input-unstable-v3.xml >> > b/unstable/text-input/text-input-unstable-v3.xml >> > new file mode 100644 >> > index 000..ed5204f >> > --- /dev/null >> > +++ b/unstable/text-input/text-input-unstable-v3.xml >> > @@ -0,0 +1,362 @@ >> > + >> > + >> > + >> > + >> > +Copyright © 2012, 2013 Intel Corporation >> > +Copyright © 2015, 2016 Jan Arne Petersen >> > +Copyright © 2017, 2018 Red Hat, Inc. >> > +Copyright © 2018 Purism SPC >> > + >> > +Permission to use, copy, modify, distribute, and sell this >> > +software and its documentation for any purpose is hereby granted >> > +without fee, provided that the above copyright notice appear in >> > +all copies and that both that copyright notice and this permission >> > +notice appear in supporting
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Thu, 17 May 2018 18:05:34 +0100 Daniel Stonewrote: > Hi Dorota, > > On 3 May 2018 at 16:41, Dorota Czaplejewicz > wrote: > > - There is no event to send keysyms. Compositors can use wl_keyboard > > interface instead. > > The reason we explicitly chose to have a keysym (really, 'Unicode > codepoint') event, is to support characters which don't appear in any > keymap. As a trivial example, emoji keyboards will want to send > symbols which appear in no sane keymap. Similarly, CJK input methods > may offer streams of characters pre-composed from component runs; it > is not practical to insert the entire CJK unicode space into a keymap. > > Cheers, > Daniel Hi Daniel, I think that anyone wanting to support inserting arbitrary Unicode characters should use the text composition requests instead (commit_string and friends). Input methods, especially CJK ones, will make use of that functionality anyway. If removing keysyms makes something impossible, I would rather fix the text composition portion of the protocol. Cheers, Dorota pgpVUsHmP0Hy3.pgp Description: OpenPGP digital signature ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Thu, May 10, 2018 at 11:46:32AM +0200, Dorota Czaplejewicz wrote: > On Thu, 10 May 2018 11:43:12 +0200 > Dorota Czaplejewiczwrote: > > > On Tue, 08 May 2018 07:07:24 + > > Silvan Jegen wrote: > > > > > On Mon, May 7, 2018 at 5:11 AM Joshua Watt wrote: > > > > IMHO, if you are doing UTF-8 (which you should), you should *always* > > > > specify any offset in the string as a byte offset. I have a few > > > > reasons for this justification: > > > > > > I agree with this as well. I thought some more about how to spell out my > > > gut feeling on this matter in more technical terms. > > > > > > UTF-8 is a byte (sequence) representation of Unicode code points. This > > > indicates to me that an offset within an UTF-8-encoded string should also > > > be given in bytes. Specifying the offset in Unicode points mixes the > > > abstraction of the Unicode code point with (one of) its representations as > > > a byte sequence. This is reflected in the fact that an offset in Unicode > > > code points is not applicable to the UTF-8 string without first processing > > > the string. > > > > > > Unicode code points do not give us that much either since what we most > > > likely want are grapheme clusters anyway (which, like any more advanced > > > Unicode processing, should be handled by a specialised library): > > > http://utf8everywhere.org/#myth.strlen > > > > > > > > > Cheers, > > > > > > Silvan > > > > This message made me feel obliged to turn my own gut feeling into > > words. This is not to be construed as an argument, but more of an > > explanation. > > > > I view wayland protocols as rather high level: their responsibility > > is to specify the type and the purpose of the data they are > > transporting. In this case, the data is a Unicode string, and the > > purpose is display. Or, the data is a number and the purpose is > > indexing. > > > > I think that when a protocol starts to specify the type and purpose, > > it can no longer be thought as high level. In this view, indexing a > > Unicode string in terms of bytes would be akin to indexing any other > > vector of Foo in bytes. (I didn't actually check if there is any > > other vector, or bytes type available in wayland). > > > > As you noted, there is some mixing between abstraction levels in > > the protocol. Hardcoding that it's not *just* Unicode, but also the > > particular encoding (UTF-8) eliminates problems with byte indexing > > we would have encountered if we decided to use things like Punycode > > (München => Mnchen-3ya). Knowing that it's always UTF-8 allows the > > protocol to use a tailoring indexing scheme. While I consider this a > > layer-breaking hack, nevertheless, this property partially counters > > the above reasoning. > > > > * * * > > > > To be honest, neither Unicode code points nor graphemes nor clusters > > are what we're truly looking for here. To understand what I mean, I > > recommend to play with this grapheme cluster: > > > > नमस्ते > > > > According to the Rust book [0], it's composed of 6 code points: > > ['न', 'म', 'स', '्', 'त', 'े'], but moving the cursor > > around, I would be led to believe it's 4 "pieces" long only. > > > > Cheers, > > Dorota > > > > [0] https://doc.rust-lang.org/book/second-edition/ch08-02-strings.html > > On a second thought, perhaps graphemes are actually the relevant thing here... Yes, that's also mentioned in the rust book: https://doc.rust-lang.org/book/second-edition/ch08-02-strings.html#bytes-and-scalar-values-and-grapheme-clusters-oh-my and what I mentioned in my mail. I agree with what is mentioned in http://utf8everywhere.org/#myth.strlen which is that Unicode code points are almost never what people making use of the protocol would want: "Yet, the number of code points in it is irrelevant to almost any software engineering task, with perhaps the only exception of converting the string to UTF-32" So instead just specifying a byte offset (thus not mixing layers of abstraction) and leaving more specialized Unicode handling (if desired by the client) to specialized libraries seems like the best way to go. Cheers, Silvan signature.asc Description: PGP signature ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Thu, 10 May 2018 11:43:12 +0200 Dorota Czaplejewiczwrote: > On Tue, 08 May 2018 07:07:24 + > Silvan Jegen wrote: > > > On Mon, May 7, 2018 at 5:11 AM Joshua Watt wrote: > > > IMHO, if you are doing UTF-8 (which you should), you should *always* > > > specify any offset in the string as a byte offset. I have a few > > > reasons for this justification: > > > > I agree with this as well. I thought some more about how to spell out my > > gut feeling on this matter in more technical terms. > > > > UTF-8 is a byte (sequence) representation of Unicode code points. This > > indicates to me that an offset within an UTF-8-encoded string should also > > be given in bytes. Specifying the offset in Unicode points mixes the > > abstraction of the Unicode code point with (one of) its representations as > > a byte sequence. This is reflected in the fact that an offset in Unicode > > code points is not applicable to the UTF-8 string without first processing > > the string. > > > > Unicode code points do not give us that much either since what we most > > likely want are grapheme clusters anyway (which, like any more advanced > > Unicode processing, should be handled by a specialised library): > > http://utf8everywhere.org/#myth.strlen > > > > > > Cheers, > > > > Silvan > > This message made me feel obliged to turn my own gut feeling into words. This > is not to be construed as an argument, but more of an explanation. > > I view wayland protocols as rather high level: their responsibility is to > specify the type and the purpose of the data they are transporting. In this > case, the data is a Unicode string, and the purpose is display. Or, the data > is a number and the purpose is indexing. > > I think that when a protocol starts to specify the type and purpose, it can > no longer be thought as high level. In this view, indexing a Unicode string > in terms of bytes would be akin to indexing any other vector of Foo in bytes. > (I didn't actually check if there is any other vector, or bytes type > available in wayland). > > As you noted, there is some mixing between abstraction levels in the > protocol. Hardcoding that it's not *just* Unicode, but also the particular > encoding (UTF-8) eliminates problems with byte indexing we would have > encountered if we decided to use things like Punycode (München => > Mnchen-3ya). Knowing that it's always UTF-8 allows the protocol to use a > tailoring indexing scheme. While I consider this a layer-breaking hack, > nevertheless, this property partially counters the above reasoning. > > * * * > > To be honest, neither Unicode code points nor graphemes nor clusters are what > we're truly looking for here. To understand what I mean, I recommend to play > with this grapheme cluster: > > नमस्ते > > According to the Rust book [0], it's composed of 6 code points: ['न', 'म', > 'स', '्', 'त', 'े'], but moving the cursor around, I would be led to believe > it's 4 "pieces" long only. > > Cheers, > Dorota > > [0] https://doc.rust-lang.org/book/second-edition/ch08-02-strings.html On a second thought, perhaps graphemes are actually the relevant thing here... pgpM9K5WOPO5U.pgp Description: OpenPGP digital signature ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Tue, 08 May 2018 07:07:24 + Silvan Jegenwrote: > On Mon, May 7, 2018 at 5:11 AM Joshua Watt wrote: > > IMHO, if you are doing UTF-8 (which you should), you should *always* > > specify any offset in the string as a byte offset. I have a few > > reasons for this justification: > > I agree with this as well. I thought some more about how to spell out my > gut feeling on this matter in more technical terms. > > UTF-8 is a byte (sequence) representation of Unicode code points. This > indicates to me that an offset within an UTF-8-encoded string should also > be given in bytes. Specifying the offset in Unicode points mixes the > abstraction of the Unicode code point with (one of) its representations as > a byte sequence. This is reflected in the fact that an offset in Unicode > code points is not applicable to the UTF-8 string without first processing > the string. > > Unicode code points do not give us that much either since what we most > likely want are grapheme clusters anyway (which, like any more advanced > Unicode processing, should be handled by a specialised library): > http://utf8everywhere.org/#myth.strlen > > > Cheers, > > Silvan This message made me feel obliged to turn my own gut feeling into words. This is not to be construed as an argument, but more of an explanation. I view wayland protocols as rather high level: their responsibility is to specify the type and the purpose of the data they are transporting. In this case, the data is a Unicode string, and the purpose is display. Or, the data is a number and the purpose is indexing. I think that when a protocol starts to specify the type and purpose, it can no longer be thought as high level. In this view, indexing a Unicode string in terms of bytes would be akin to indexing any other vector of Foo in bytes. (I didn't actually check if there is any other vector, or bytes type available in wayland). As you noted, there is some mixing between abstraction levels in the protocol. Hardcoding that it's not *just* Unicode, but also the particular encoding (UTF-8) eliminates problems with byte indexing we would have encountered if we decided to use things like Punycode (München => Mnchen-3ya). Knowing that it's always UTF-8 allows the protocol to use a tailoring indexing scheme. While I consider this a layer-breaking hack, nevertheless, this property partially counters the above reasoning. * * * To be honest, neither Unicode code points nor graphemes nor clusters are what we're truly looking for here. To understand what I mean, I recommend to play with this grapheme cluster: नमस्ते According to the Rust book [0], it's composed of 6 code points: ['न', 'म', 'स', '्', 'त', 'े'], but moving the cursor around, I would be led to believe it's 4 "pieces" long only. Cheers, Dorota [0] https://doc.rust-lang.org/book/second-edition/ch08-02-strings.html pgp5NljID7Inq.pgp Description: OpenPGP digital signature ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Mon, May 7, 2018 at 5:11 AM Joshua Wattwrote: > IMHO, if you are doing UTF-8 (which you should), you should *always* > specify any offset in the string as a byte offset. I have a few > reasons for this justification: I agree with this as well. I thought some more about how to spell out my gut feeling on this matter in more technical terms. UTF-8 is a byte (sequence) representation of Unicode code points. This indicates to me that an offset within an UTF-8-encoded string should also be given in bytes. Specifying the offset in Unicode points mixes the abstraction of the Unicode code point with (one of) its representations as a byte sequence. This is reflected in the fact that an offset in Unicode code points is not applicable to the UTF-8 string without first processing the string. Unicode code points do not give us that much either since what we most likely want are grapheme clusters anyway (which, like any more advanced Unicode processing, should be handled by a specialised library): http://utf8everywhere.org/#myth.strlen Cheers, Silvan ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Sun, May 06, 2018 at 10:37:57PM +0200, Dorota Czaplejewicz wrote: > On Sat, 5 May 2018 13:37:44 +0200 > Silvan Jegenwrote: > > > On Sat, May 05, 2018 at 11:09:10AM +0200, Dorota Czaplejewicz wrote: > > > On Fri, 4 May 2018 22:32:15 +0200 > > > Silvan Jegen wrote: > > > > > > > On Thu, May 03, 2018 at 10:46:47PM +0200, Dorota Czaplejewicz wrote: > > > > > On Thu, 3 May 2018 21:55:40 +0200 > > > > > Silvan Jegen wrote: > > > > > > [...] > > > > > > In the end, I'm not an expert in that area either - perhaps treating > > > client side strings as UTF-8 buffers makes sense, but at the moment > > > I'm still leaning towards the code point abstraction. > > > > Someone (™) should probably implement a client making use of the protocol > > to see what the real world impact of this protocol change would be. > > > > The editor in the weston project uses pango for its text layout: > > > > https://cgit.freedesktop.org/wayland/weston/tree/clients/editor.c#n824 > > > > so it would have to parse the UTF-8 string twice. The same is most likely > > true for all programs using GTK... > > > > > > I made an attempt to dig deeper, and while I stopped short of becoming > this Someone for now, I gathered what I think are some important > results. > > First, the state of the libraries. There's a lot of data I gathered, > so I'll keep this section rather dense. First, another contender > for the title of text layout library, and that one uses code points > exclusively: > > https://github.com/silnrsi/graphite/blob/master/include/graphite2/Segment.h > `gr_make_seg` > > https://github.com/silnrsi/graphite/blob/master/tests/examples/simple.c > > Afterwards, I focused on GTK and Qt. As an input method plugin > developer, I looked at the IM interfaces and internal data structures > they expose. The results were not that clear - no mention of "code > points", some references to "bytes", many to "characters" (not > "chars"). What is certain is that there's a lot of converting going on Yes, it's very unfortunate that a lot of developers do not strife for more clarity and precision in terminology when processing text. > behind the scenes anyway. First off, GTK seems to be moving away from > bytes, judging by the comments: > > gtk 3.22 (`gtkimcontext.c`) > > `gtk_im_context_delete_surrounding` > > > * Asks the widget that the input context is attached to to delete > > * characters around the cursor position by emitting the > > * GtkIMContext::delete_surrounding signal. Note that @offset and @n_chars > > * are in characters not in bytes which differs from the usage other > > * places in #GtkIMContext. > > `gtk_im_context_get_preedit_string` > > > * @cursor_pos: (out): location to store position of cursor (in characters) > > * within the preedit string. > > `gtk_im_context_get_surrounding` > > > * @cursor_index: (out): location to store byte index of the insertion > > *cursor within @text. > > gtkEntry seems to store things internally as characters. They mention "characters" but what they most likely mean are Unicode code points. One would think they would try to keep their APIs consistent but that doesn't seem to be the case. > While GTK using code points internally is not a proof of anything, > it's a suggestion that there is a reason not to use bytes. > > Then, Qt, from https://doc.qt.io/qt-5/qinputmethodevent.html#setCommitString > > > replaceLength specifies the number of characters to be replaced > > a confirmation that "characters" means "code points" comes from > https://doc.qt.io/qt-5/qlineedit.html#cursorPosition-prop . The value > reported when "æþ|" is displayed is 2. https://doc.qt.io/qt-5/qstring.html Qt uses UTF-16 internally so they *could* also be counting "QChars" which are 16-bit (assuming the position is 0 indexed): Python 3.6.5 (default, Apr 14 2018, 13:17:30) [GCC 7.3.1 20180406] on linux Type "help", "copyright", "credits" or "license" for more information. >>> "æþ" 'æþ' >>> "æþ".encode("utf-16") b'\xff\xfe\xe6\x00\xfe\x00' If they are really doing that you would only notice it with characters outside of the BMP because: "(Unicode characters with code values above 65535 are stored using surrogate pairs, i.e., two consecutive QChars.)" I think everybody agrees that (Unicode) text handling is a mess in general... > I also spent more time than I should writing a demo implementation > of an input method and a client connecting to it to check out the > proposed interfaces. Predictably, it gave me a lot of trouble > on the edges between bytes and code points, but I blame it on > Rust's scarcity of UTF handling functions. The hack is available at > https://code.puri.sm/dorota.czaplejewicz/impoc Thanks for taking the time! I compiled and ran it but my rust is weak... Rust has an interesting String type: https://doc.rust-lang.org/std/string/struct.String.html#utf-8 It's UTF-8 encoded but you are
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Sun, 6 May 2018 22:11:32 -0500 Joshua Wattwrote: > On Sun, May 6, 2018 at 3:37 PM, Dorota Czaplejewicz > wrote: > > On Sat, 5 May 2018 13:37:44 +0200 > > Silvan Jegen wrote: > > > >> On Sat, May 05, 2018 at 11:09:10AM +0200, Dorota Czaplejewicz wrote: > >> > On Fri, 4 May 2018 22:32:15 +0200 > >> > Silvan Jegen wrote: > >> > > >> > > On Thu, May 03, 2018 at 10:46:47PM +0200, Dorota Czaplejewicz wrote: > >> > > > On Thu, 3 May 2018 21:55:40 +0200 > >> > > > Silvan Jegen wrote: > >> > > > > >> > > > > On Thu, May 03, 2018 at 09:22:46PM +0200, Dorota Czaplejewicz > >> > > > > wrote: > >> > > > > > On Thu, 3 May 2018 20:47:27 +0200 > >> > > > > > Silvan Jegen wrote: > >> > > > > > > >> > > > > > > Hi Dorota > >> > > > > > > > >> > > > > > > Some comments and typo fixes below. > >> > > > > > > > >> > > > > > > On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz > >> > > > > > > wrote: > >> > > > > > > > + Text is valid UTF-8 encoded, indices and lengths are > >> > > > > > > > in code points. If a > >> > > > > > > > + grapheme is made up of multiple code points, an index > >> > > > > > > > pointing to any of > >> > > > > > > > + them should be interpreted as pointing to the first > >> > > > > > > > one. > >> > > > > > > > >> > > > > > > That way we make sure we don't put the cursor/anchor between > >> > > > > > > bytes that > >> > > > > > > belong to the same UTF-8 encoded Unicode code point which is > >> > > > > > > nice. It > >> > > > > > > also means that the client has to parse all the UTF-8 encoded > >> > > > > > > strings > >> > > > > > > into Unicode code points up to the desired cursor/anchor > >> > > > > > > position > >> > > > > > > on each "preedit_string" event. For each > >> > > > > > > "delete_surrounding_text" event > >> > > > > > > the client has to parse the UTF-8 sequences before and after > >> > > > > > > the cursor > >> > > > > > > position up to the requested Unicode code point. > >> > > > > > > > >> > > > > > > I feel like we are processing the UTF-8 string already in the > >> > > > > > > input-method. So I am not sure that we should parse it again > >> > > > > > > on the > >> > > > > > > client side. Parsing it again would also mean that the client > >> > > > > > > would need > >> > > > > > > to know about UTF-8 which would be nice to avoid. > >> > > > > > > > >> > > > > > > Thoughts? > >> > > > > > > >> > > > > > The client needs to know about Unicode, but not necessarily about > >> > > > > > UTF-8. Specifying code points is actually an advantage here, > >> > > > > > because > >> > > > > > byte offsets are inherently expressed relative to UTF-8. By > >> > > > > > counting > >> > > > > > with code points, client's internal representation can be UTF-16 > >> > > > > > or > >> > > > > > maybe even something else. > >> > > > > > >> > > > > Maybe I am misunderstanding something but the protocol specifies > >> > > > > that > >> > > > > the strings are valid UTF-8 encoded and the cursor/anchor offsets > >> > > > > into > >> > > > > the strings are specified in Unicode points. To me that indicates > >> > > > > that > >> > > > > the application *has to parse* the UTF-8 string into Unicode points > >> > > > > when receiving the event otherwise it doesn't know after which > >> > > > > Unicode > >> > > > > point to draw the cursor. Of course the application can then > >> > > > > decide to > >> > > > > convert the UTF-8 string into another encoding like UTF-16 for > >> > > > > internal > >> > > > > processing (for whatever reason) but that doesn't change the fact > >> > > > > that > >> > > > > it still would have to parse the incoming UTF-8 (and thus know > >> > > > > about > >> > > > > UTF-8). > >> > > > > > >> > > > Can you see any way to avoid parsing UTF-8 in order to draw the > >> > > > cursor? I tried to come up with a way to do that, but even with > >> > > > specifying byte strings, I believe that calculating the position of > >> > > > the cursor - either in pixels or in glyphs - requires full parsing of > >> > > > the input string. > >> > > > >> > > Yes, I don't think it's avoidable either. You just don't have to do > >> > > it twice if your text rendering library consumes UTF-8 strings with > >> > > byte-offsets though. See my response below. > >> > > > >> > > > >> > > > > > There's no avoiding the parsing either. What the application > >> > > > > > cares > >> > > > > > about is that the cursor falls between glyphs. The application > >> > > > > > cannot > >> > > > > > know that in all cases. Unicode allows the same sequence to be > >> > > > > > displayed in multiple ways (fallback): > >> > > > > > > >> > > > > > https://unicode.org/emoji/charts/emoji-zwj-sequences.html > >> > > > > > > >> > > > > > One could make an argument that byte offsets should never be > >> > > > > > close
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Sun, May 6, 2018 at 3:37 PM, Dorota Czaplejewiczwrote: > On Sat, 5 May 2018 13:37:44 +0200 > Silvan Jegen wrote: > >> On Sat, May 05, 2018 at 11:09:10AM +0200, Dorota Czaplejewicz wrote: >> > On Fri, 4 May 2018 22:32:15 +0200 >> > Silvan Jegen wrote: >> > >> > > On Thu, May 03, 2018 at 10:46:47PM +0200, Dorota Czaplejewicz wrote: >> > > > On Thu, 3 May 2018 21:55:40 +0200 >> > > > Silvan Jegen wrote: >> > > > >> > > > > On Thu, May 03, 2018 at 09:22:46PM +0200, Dorota Czaplejewicz wrote: >> > > > > > On Thu, 3 May 2018 20:47:27 +0200 >> > > > > > Silvan Jegen wrote: >> > > > > > >> > > > > > > Hi Dorota >> > > > > > > >> > > > > > > Some comments and typo fixes below. >> > > > > > > >> > > > > > > On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz >> > > > > > > wrote: >> > > > > > > > + Text is valid UTF-8 encoded, indices and lengths are in >> > > > > > > > code points. If a >> > > > > > > > + grapheme is made up of multiple code points, an index >> > > > > > > > pointing to any of >> > > > > > > > + them should be interpreted as pointing to the first one. >> > > > > > > >> > > > > > > That way we make sure we don't put the cursor/anchor between >> > > > > > > bytes that >> > > > > > > belong to the same UTF-8 encoded Unicode code point which is >> > > > > > > nice. It >> > > > > > > also means that the client has to parse all the UTF-8 encoded >> > > > > > > strings >> > > > > > > into Unicode code points up to the desired cursor/anchor position >> > > > > > > on each "preedit_string" event. For each >> > > > > > > "delete_surrounding_text" event >> > > > > > > the client has to parse the UTF-8 sequences before and after the >> > > > > > > cursor >> > > > > > > position up to the requested Unicode code point. >> > > > > > > >> > > > > > > I feel like we are processing the UTF-8 string already in the >> > > > > > > input-method. So I am not sure that we should parse it again on >> > > > > > > the >> > > > > > > client side. Parsing it again would also mean that the client >> > > > > > > would need >> > > > > > > to know about UTF-8 which would be nice to avoid. >> > > > > > > >> > > > > > > Thoughts? >> > > > > > >> > > > > > The client needs to know about Unicode, but not necessarily about >> > > > > > UTF-8. Specifying code points is actually an advantage here, >> > > > > > because >> > > > > > byte offsets are inherently expressed relative to UTF-8. By >> > > > > > counting >> > > > > > with code points, client's internal representation can be UTF-16 or >> > > > > > maybe even something else. >> > > > > >> > > > > Maybe I am misunderstanding something but the protocol specifies that >> > > > > the strings are valid UTF-8 encoded and the cursor/anchor offsets >> > > > > into >> > > > > the strings are specified in Unicode points. To me that indicates >> > > > > that >> > > > > the application *has to parse* the UTF-8 string into Unicode points >> > > > > when receiving the event otherwise it doesn't know after which >> > > > > Unicode >> > > > > point to draw the cursor. Of course the application can then decide >> > > > > to >> > > > > convert the UTF-8 string into another encoding like UTF-16 for >> > > > > internal >> > > > > processing (for whatever reason) but that doesn't change the fact >> > > > > that >> > > > > it still would have to parse the incoming UTF-8 (and thus know about >> > > > > UTF-8). >> > > > > >> > > > Can you see any way to avoid parsing UTF-8 in order to draw the >> > > > cursor? I tried to come up with a way to do that, but even with >> > > > specifying byte strings, I believe that calculating the position of >> > > > the cursor - either in pixels or in glyphs - requires full parsing of >> > > > the input string. >> > > >> > > Yes, I don't think it's avoidable either. You just don't have to do >> > > it twice if your text rendering library consumes UTF-8 strings with >> > > byte-offsets though. See my response below. >> > > >> > > >> > > > > > There's no avoiding the parsing either. What the application cares >> > > > > > about is that the cursor falls between glyphs. The application >> > > > > > cannot >> > > > > > know that in all cases. Unicode allows the same sequence to be >> > > > > > displayed in multiple ways (fallback): >> > > > > > >> > > > > > https://unicode.org/emoji/charts/emoji-zwj-sequences.html >> > > > > > >> > > > > > One could make an argument that byte offsets should never be close >> > > > > > to ZWJ characters, but I think this decision is better left to the >> > > > > > application, which knows what exactly it is presenting to the user. >> > > > > >> > > > > The idea of the previous version of the protocol (from my >> > > > > understanding) >> > > > > was to make sure that only valid UTF-8 and valid byte-offsets (== not >> > > > > falling between bytes of a Unicode code point) into the string
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Sat, 5 May 2018 13:37:44 +0200 Silvan Jegenwrote: > On Sat, May 05, 2018 at 11:09:10AM +0200, Dorota Czaplejewicz wrote: > > On Fri, 4 May 2018 22:32:15 +0200 > > Silvan Jegen wrote: > > > > > On Thu, May 03, 2018 at 10:46:47PM +0200, Dorota Czaplejewicz wrote: > > > > On Thu, 3 May 2018 21:55:40 +0200 > > > > Silvan Jegen wrote: > > > > > > > > > On Thu, May 03, 2018 at 09:22:46PM +0200, Dorota Czaplejewicz wrote: > > > > > > > > > > > On Thu, 3 May 2018 20:47:27 +0200 > > > > > > Silvan Jegen wrote: > > > > > > > > > > > > > Hi Dorota > > > > > > > > > > > > > > Some comments and typo fixes below. > > > > > > > > > > > > > > On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz > > > > > > > wrote: > > > > > > > > + Text is valid UTF-8 encoded, indices and lengths are in > > > > > > > > code points. If a > > > > > > > > + grapheme is made up of multiple code points, an index > > > > > > > > pointing to any of > > > > > > > > + them should be interpreted as pointing to the first one. > > > > > > > > > > > > > > > > > > > > > > That way we make sure we don't put the cursor/anchor between > > > > > > > bytes that > > > > > > > belong to the same UTF-8 encoded Unicode code point which is > > > > > > > nice. It > > > > > > > also means that the client has to parse all the UTF-8 encoded > > > > > > > strings > > > > > > > into Unicode code points up to the desired cursor/anchor position > > > > > > > on each "preedit_string" event. For each > > > > > > > "delete_surrounding_text" event > > > > > > > the client has to parse the UTF-8 sequences before and after the > > > > > > > cursor > > > > > > > position up to the requested Unicode code point. > > > > > > > > > > > > > > I feel like we are processing the UTF-8 string already in the > > > > > > > input-method. So I am not sure that we should parse it again on > > > > > > > the > > > > > > > client side. Parsing it again would also mean that the client > > > > > > > would need > > > > > > > to know about UTF-8 which would be nice to avoid. > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > The client needs to know about Unicode, but not necessarily about > > > > > > UTF-8. Specifying code points is actually an advantage here, because > > > > > > byte offsets are inherently expressed relative to UTF-8. By counting > > > > > > with code points, client's internal representation can be UTF-16 or > > > > > > maybe even something else. > > > > > > > > > > Maybe I am misunderstanding something but the protocol specifies that > > > > > the strings are valid UTF-8 encoded and the cursor/anchor offsets into > > > > > the strings are specified in Unicode points. To me that indicates that > > > > > the application *has to parse* the UTF-8 string into Unicode points > > > > > when receiving the event otherwise it doesn't know after which Unicode > > > > > point to draw the cursor. Of course the application can then decide to > > > > > convert the UTF-8 string into another encoding like UTF-16 for > > > > > internal > > > > > processing (for whatever reason) but that doesn't change the fact that > > > > > it still would have to parse the incoming UTF-8 (and thus know about > > > > > UTF-8). > > > > > > > > > Can you see any way to avoid parsing UTF-8 in order to draw the > > > > cursor? I tried to come up with a way to do that, but even with > > > > specifying byte strings, I believe that calculating the position of > > > > the cursor - either in pixels or in glyphs - requires full parsing of > > > > the input string. > > > > > > Yes, I don't think it's avoidable either. You just don't have to do > > > it twice if your text rendering library consumes UTF-8 strings with > > > byte-offsets though. See my response below. > > > > > > > > > > > > There's no avoiding the parsing either. What the application cares > > > > > > about is that the cursor falls between glyphs. The application > > > > > > cannot > > > > > > know that in all cases. Unicode allows the same sequence to be > > > > > > displayed in multiple ways (fallback): > > > > > > > > > > > > https://unicode.org/emoji/charts/emoji-zwj-sequences.html > > > > > > > > > > > > One could make an argument that byte offsets should never be close > > > > > > to ZWJ characters, but I think this decision is better left to the > > > > > > application, which knows what exactly it is presenting to the user. > > > > > > > > > > > > > > > > The idea of the previous version of the protocol (from my > > > > > understanding) > > > > > was to make sure that only valid UTF-8 and valid byte-offsets (== not > > > > > falling between bytes of a Unicode code point) into the string will be > > > > > sent to the client. If you just get a byte-offset into a UTF-8 encoded > > > > > string you trust the sender to honor the protocol and
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Sat, May 05, 2018 at 11:09:10AM +0200, Dorota Czaplejewicz wrote: > On Fri, 4 May 2018 22:32:15 +0200 > Silvan Jegenwrote: > > > On Thu, May 03, 2018 at 10:46:47PM +0200, Dorota Czaplejewicz wrote: > > > On Thu, 3 May 2018 21:55:40 +0200 > > > Silvan Jegen wrote: > > > > > > > On Thu, May 03, 2018 at 09:22:46PM +0200, Dorota Czaplejewicz wrote: > > > > > On Thu, 3 May 2018 20:47:27 +0200 > > > > > Silvan Jegen wrote: > > > > > > > > > > > Hi Dorota > > > > > > > > > > > > Some comments and typo fixes below. > > > > > > > > > > > > On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz > > > > > > wrote: > > > > > > > + Text is valid UTF-8 encoded, indices and lengths are in > > > > > > > code points. If a > > > > > > > + grapheme is made up of multiple code points, an index > > > > > > > pointing to any of > > > > > > > + them should be interpreted as pointing to the first one. > > > > > > > > > > > > > > > > > > > That way we make sure we don't put the cursor/anchor between bytes > > > > > > that > > > > > > belong to the same UTF-8 encoded Unicode code point which is nice. > > > > > > It > > > > > > also means that the client has to parse all the UTF-8 encoded > > > > > > strings > > > > > > into Unicode code points up to the desired cursor/anchor position > > > > > > on each "preedit_string" event. For each "delete_surrounding_text" > > > > > > event > > > > > > the client has to parse the UTF-8 sequences before and after the > > > > > > cursor > > > > > > position up to the requested Unicode code point. > > > > > > > > > > > > I feel like we are processing the UTF-8 string already in the > > > > > > input-method. So I am not sure that we should parse it again on the > > > > > > client side. Parsing it again would also mean that the client would > > > > > > need > > > > > > to know about UTF-8 which would be nice to avoid. > > > > > > > > > > > > Thoughts? > > > > > > > > > > The client needs to know about Unicode, but not necessarily about > > > > > UTF-8. Specifying code points is actually an advantage here, because > > > > > byte offsets are inherently expressed relative to UTF-8. By counting > > > > > with code points, client's internal representation can be UTF-16 or > > > > > maybe even something else. > > > > > > > > Maybe I am misunderstanding something but the protocol specifies that > > > > the strings are valid UTF-8 encoded and the cursor/anchor offsets into > > > > the strings are specified in Unicode points. To me that indicates that > > > > the application *has to parse* the UTF-8 string into Unicode points > > > > when receiving the event otherwise it doesn't know after which Unicode > > > > point to draw the cursor. Of course the application can then decide to > > > > convert the UTF-8 string into another encoding like UTF-16 for internal > > > > processing (for whatever reason) but that doesn't change the fact that > > > > it still would have to parse the incoming UTF-8 (and thus know about > > > > UTF-8). > > > > > > > Can you see any way to avoid parsing UTF-8 in order to draw the > > > cursor? I tried to come up with a way to do that, but even with > > > specifying byte strings, I believe that calculating the position of > > > the cursor - either in pixels or in glyphs - requires full parsing of > > > the input string. > > > > Yes, I don't think it's avoidable either. You just don't have to do > > it twice if your text rendering library consumes UTF-8 strings with > > byte-offsets though. See my response below. > > > > > > > > > There's no avoiding the parsing either. What the application cares > > > > > about is that the cursor falls between glyphs. The application cannot > > > > > know that in all cases. Unicode allows the same sequence to be > > > > > displayed in multiple ways (fallback): > > > > > > > > > > https://unicode.org/emoji/charts/emoji-zwj-sequences.html > > > > > > > > > > One could make an argument that byte offsets should never be close > > > > > to ZWJ characters, but I think this decision is better left to the > > > > > application, which knows what exactly it is presenting to the user. > > > > > > > > > > > > > The idea of the previous version of the protocol (from my understanding) > > > > was to make sure that only valid UTF-8 and valid byte-offsets (== not > > > > falling between bytes of a Unicode code point) into the string will be > > > > sent to the client. If you just get a byte-offset into a UTF-8 encoded > > > > string you trust the sender to honor the protocol and thus you can just > > > > pass the UTF-8 encoded string unprocessed to your text rendering library > > > > (provided that the library supports UTF-8 strings which is what I am > > > > assuming) without having to parse the UTF-8 string into Unicode code > > > > points. > > > > > > > > Of course the Unicode code points will have to be parsed at
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Fri, 4 May 2018 22:32:15 +0200 Silvan Jegenwrote: > On Thu, May 03, 2018 at 10:46:47PM +0200, Dorota Czaplejewicz wrote: > > On Thu, 3 May 2018 21:55:40 +0200 > > Silvan Jegen wrote: > > > > > On Thu, May 03, 2018 at 09:22:46PM +0200, Dorota Czaplejewicz wrote: > > > > On Thu, 3 May 2018 20:47:27 +0200 > > > > Silvan Jegen wrote: > > > > > > > > > Hi Dorota > > > > > > > > > > Some comments and typo fixes below. > > > > > > > > > > On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote: > > > > > > > > > > > + Text is valid UTF-8 encoded, indices and lengths are in code > > > > > > points. If a > > > > > > + grapheme is made up of multiple code points, an index > > > > > > pointing to any of > > > > > > + them should be interpreted as pointing to the first one. > > > > > > > > > > > > > > > > That way we make sure we don't put the cursor/anchor between bytes > > > > > that > > > > > belong to the same UTF-8 encoded Unicode code point which is nice. It > > > > > also means that the client has to parse all the UTF-8 encoded strings > > > > > into Unicode code points up to the desired cursor/anchor position > > > > > on each "preedit_string" event. For each "delete_surrounding_text" > > > > > event > > > > > the client has to parse the UTF-8 sequences before and after the > > > > > cursor > > > > > position up to the requested Unicode code point. > > > > > > > > > > I feel like we are processing the UTF-8 string already in the > > > > > input-method. So I am not sure that we should parse it again on the > > > > > client side. Parsing it again would also mean that the client would > > > > > need > > > > > to know about UTF-8 which would be nice to avoid. > > > > > > > > > > Thoughts? > > > > > > > > The client needs to know about Unicode, but not necessarily about > > > > UTF-8. Specifying code points is actually an advantage here, because > > > > byte offsets are inherently expressed relative to UTF-8. By counting > > > > with code points, client's internal representation can be UTF-16 or > > > > maybe even something else. > > > > > > Maybe I am misunderstanding something but the protocol specifies that > > > the strings are valid UTF-8 encoded and the cursor/anchor offsets into > > > the strings are specified in Unicode points. To me that indicates that > > > the application *has to parse* the UTF-8 string into Unicode points > > > when receiving the event otherwise it doesn't know after which Unicode > > > point to draw the cursor. Of course the application can then decide to > > > convert the UTF-8 string into another encoding like UTF-16 for internal > > > processing (for whatever reason) but that doesn't change the fact that > > > it still would have to parse the incoming UTF-8 (and thus know about > > > UTF-8). > > > > > Can you see any way to avoid parsing UTF-8 in order to draw the > > cursor? I tried to come up with a way to do that, but even with > > specifying byte strings, I believe that calculating the position of > > the cursor - either in pixels or in glyphs - requires full parsing of > > the input string. > > Yes, I don't think it's avoidable either. You just don't have to do > it twice if your text rendering library consumes UTF-8 strings with > byte-offsets though. See my response below. > > > > > > There's no avoiding the parsing either. What the application cares > > > > about is that the cursor falls between glyphs. The application cannot > > > > know that in all cases. Unicode allows the same sequence to be > > > > displayed in multiple ways (fallback): > > > > > > > > https://unicode.org/emoji/charts/emoji-zwj-sequences.html > > > > > > > > One could make an argument that byte offsets should never be close > > > > to ZWJ characters, but I think this decision is better left to the > > > > application, which knows what exactly it is presenting to the user. > > > > > > The idea of the previous version of the protocol (from my understanding) > > > was to make sure that only valid UTF-8 and valid byte-offsets (== not > > > falling between bytes of a Unicode code point) into the string will be > > > sent to the client. If you just get a byte-offset into a UTF-8 encoded > > > string you trust the sender to honor the protocol and thus you can just > > > pass the UTF-8 encoded string unprocessed to your text rendering library > > > (provided that the library supports UTF-8 strings which is what I am > > > assuming) without having to parse the UTF-8 string into Unicode code > > > points. > > > > > > Of course the Unicode code points will have to be parsed at some point > > > if you want to render them. Using byte-offsets just lets you do that at > > > a later stage if your libraries support UTF-8. > > > > > > > > Doesn't that chiefly depend on what kind of the text rendering library > > though? As far as I understand, passing text to
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Thu, May 03, 2018 at 10:46:47PM +0200, Dorota Czaplejewicz wrote: > On Thu, 3 May 2018 21:55:40 +0200 > Silvan Jegenwrote: > > > On Thu, May 03, 2018 at 09:22:46PM +0200, Dorota Czaplejewicz wrote: > > > On Thu, 3 May 2018 20:47:27 +0200 > > > Silvan Jegen wrote: > > > > > > > Hi Dorota > > > > > > > > Some comments and typo fixes below. > > > > > > > > On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote: > > > > > + Text is valid UTF-8 encoded, indices and lengths are in code > > > > > points. If a > > > > > + grapheme is made up of multiple code points, an index pointing > > > > > to any of > > > > > + them should be interpreted as pointing to the first one. > > > > > > > > That way we make sure we don't put the cursor/anchor between bytes that > > > > belong to the same UTF-8 encoded Unicode code point which is nice. It > > > > also means that the client has to parse all the UTF-8 encoded strings > > > > into Unicode code points up to the desired cursor/anchor position > > > > on each "preedit_string" event. For each "delete_surrounding_text" event > > > > the client has to parse the UTF-8 sequences before and after the cursor > > > > position up to the requested Unicode code point. > > > > > > > > I feel like we are processing the UTF-8 string already in the > > > > input-method. So I am not sure that we should parse it again on the > > > > client side. Parsing it again would also mean that the client would need > > > > to know about UTF-8 which would be nice to avoid. > > > > > > > > Thoughts? > > > > > > The client needs to know about Unicode, but not necessarily about > > > UTF-8. Specifying code points is actually an advantage here, because > > > byte offsets are inherently expressed relative to UTF-8. By counting > > > with code points, client's internal representation can be UTF-16 or > > > maybe even something else. > > > > Maybe I am misunderstanding something but the protocol specifies that > > the strings are valid UTF-8 encoded and the cursor/anchor offsets into > > the strings are specified in Unicode points. To me that indicates that > > the application *has to parse* the UTF-8 string into Unicode points > > when receiving the event otherwise it doesn't know after which Unicode > > point to draw the cursor. Of course the application can then decide to > > convert the UTF-8 string into another encoding like UTF-16 for internal > > processing (for whatever reason) but that doesn't change the fact that > > it still would have to parse the incoming UTF-8 (and thus know about > > UTF-8). > > > Can you see any way to avoid parsing UTF-8 in order to draw the > cursor? I tried to come up with a way to do that, but even with > specifying byte strings, I believe that calculating the position of > the cursor - either in pixels or in glyphs - requires full parsing of > the input string. Yes, I don't think it's avoidable either. You just don't have to do it twice if your text rendering library consumes UTF-8 strings with byte-offsets though. See my response below. > > > There's no avoiding the parsing either. What the application cares > > > about is that the cursor falls between glyphs. The application cannot > > > know that in all cases. Unicode allows the same sequence to be > > > displayed in multiple ways (fallback): > > > > > > https://unicode.org/emoji/charts/emoji-zwj-sequences.html > > > > > > One could make an argument that byte offsets should never be close > > > to ZWJ characters, but I think this decision is better left to the > > > application, which knows what exactly it is presenting to the user. > > > > The idea of the previous version of the protocol (from my understanding) > > was to make sure that only valid UTF-8 and valid byte-offsets (== not > > falling between bytes of a Unicode code point) into the string will be > > sent to the client. If you just get a byte-offset into a UTF-8 encoded > > string you trust the sender to honor the protocol and thus you can just > > pass the UTF-8 encoded string unprocessed to your text rendering library > > (provided that the library supports UTF-8 strings which is what I am > > assuming) without having to parse the UTF-8 string into Unicode code > > points. > > > > Of course the Unicode code points will have to be parsed at some point > > if you want to render them. Using byte-offsets just lets you do that at > > a later stage if your libraries support UTF-8. > > > > > Doesn't that chiefly depend on what kind of the text rendering library > though? As far as I understand, passing text to rendering is necessary > to calculate the cursor position. At the same time, it doesn't matter > much for the calculations whether the cursor offset is in bytes or > code points - the library does the parsing in the last step anyway. > > I think you mean that if the rendering library accepts byte offsets > as the only format, the application would
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Thu, 3 May 2018 21:55:40 +0200 Silvan Jegenwrote: > On Thu, May 03, 2018 at 09:22:46PM +0200, Dorota Czaplejewicz wrote: > > On Thu, 3 May 2018 20:47:27 +0200 > > Silvan Jegen wrote: > > > > > Hi Dorota > > > > > > Some comments and typo fixes below. > > > > > > On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote: > > > > + Text is valid UTF-8 encoded, indices and lengths are in code > > > > points. If a > > > > + grapheme is made up of multiple code points, an index pointing > > > > to any of > > > > + them should be interpreted as pointing to the first one. > > > > > > That way we make sure we don't put the cursor/anchor between bytes that > > > belong to the same UTF-8 encoded Unicode code point which is nice. It > > > also means that the client has to parse all the UTF-8 encoded strings > > > into Unicode code points up to the desired cursor/anchor position > > > on each "preedit_string" event. For each "delete_surrounding_text" event > > > the client has to parse the UTF-8 sequences before and after the cursor > > > position up to the requested Unicode code point. > > > > > > I feel like we are processing the UTF-8 string already in the > > > input-method. So I am not sure that we should parse it again on the > > > client side. Parsing it again would also mean that the client would need > > > to know about UTF-8 which would be nice to avoid. > > > > > > Thoughts? > > > > The client needs to know about Unicode, but not necessarily about > > UTF-8. Specifying code points is actually an advantage here, because > > byte offsets are inherently expressed relative to UTF-8. By counting > > with code points, client's internal representation can be UTF-16 or > > maybe even something else. > > Maybe I am misunderstanding something but the protocol specifies that > the strings are valid UTF-8 encoded and the cursor/anchor offsets into > the strings are specified in Unicode points. To me that indicates that > the application *has to parse* the UTF-8 string into Unicode points > when receiving the event otherwise it doesn't know after which Unicode > point to draw the cursor. Of course the application can then decide to > convert the UTF-8 string into another encoding like UTF-16 for internal > processing (for whatever reason) but that doesn't change the fact that > it still would have to parse the incoming UTF-8 (and thus know about > UTF-8). > Can you see any way to avoid parsing UTF-8 in order to draw the cursor? I tried to come up with a way to do that, but even with specifying byte strings, I believe that calculating the position of the cursor - either in pixels or in glyphs - requires full parsing of the input string. > > > There's no avoiding the parsing either. What the application cares > > about is that the cursor falls between glyphs. The application cannot > > know that in all cases. Unicode allows the same sequence to be > > displayed in multiple ways (fallback): > > > > https://unicode.org/emoji/charts/emoji-zwj-sequences.html > > > > One could make an argument that byte offsets should never be close > > to ZWJ characters, but I think this decision is better left to the > > application, which knows what exactly it is presenting to the user. > > The idea of the previous version of the protocol (from my understanding) > was to make sure that only valid UTF-8 and valid byte-offsets (== not > falling between bytes of a Unicode code point) into the string will be > sent to the client. If you just get a byte-offset into a UTF-8 encoded > string you trust the sender to honor the protocol and thus you can just > pass the UTF-8 encoded string unprocessed to your text rendering library > (provided that the library supports UTF-8 strings which is what I am > assuming) without having to parse the UTF-8 string into Unicode code > points. > > Of course the Unicode code points will have to be parsed at some point > if you want to render them. Using byte-offsets just lets you do that at > a later stage if your libraries support UTF-8. > > Doesn't that chiefly depend on what kind of the text rendering library though? As far as I understand, passing text to rendering is necessary to calculate the cursor position. At the same time, it doesn't matter much for the calculations whether the cursor offset is in bytes or code points - the library does the parsing in the last step anyway. I think you mean that if the rendering library accepts byte offsets as the only format, the application would have to parse the UTF-8 unnecessarily. I agree with this, but I'm not sure we should optimize for this case. Other libraries may support only code points instead. Did I understand you correctly? Cheers, Dorota pgpRcIk5PzRW4.pgp Description: OpenPGP digital signature ___ wayland-devel mailing list wayland-devel@lists.freedesktop.org
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Thu, May 03, 2018 at 09:22:46PM +0200, Dorota Czaplejewicz wrote: > On Thu, 3 May 2018 20:47:27 +0200 > Silvan Jegenwrote: > > > Hi Dorota > > > > Some comments and typo fixes below. > > > > On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote: > > > This new protocol description is a simplification over v2. > > > > > > - All pre-edit text styling is gone. > > > - Pre-edit cursor can span characters. > > > - No events regarding input panel (OSK) state nor covered rectangle. > > > Compositors are still free to handle situations where the keyboard > > > focus rectangle is covered by the input panel. > > > - No set_preferred_language request for clients. > > > - There is no event to send keysyms. Compositors can use wl_keyboard > > > interface instead. > > > - All state is double-buffered, with specified state. > > > - Use Unicode codepoints to measure strings. > > > > > > Signed-off-by: Dorota Czaplejewicz > > > Signed-off-by: Carlos Garnacho > > > --- > > > This is the next update coming from Purism to perfect the text input > > > protocol. > > > > > > The following changes added on top of PATCHv3: > > > > > > - Fixed whitespaces. > > > - Removed enable flags - the same information can be gathered from > > > the first requests after enter. > > > - Changed offsets inside UTF-8 strings to use Unicode character > > > counts in order to remove the possibility of communicating invalid > > > state. > > > - Specified the exact lifetime of double-buffered state, and initial > > > values. > > > - Made changes requested by the IM double-buffered. > > > > > > Some questions remain open. One is: how to specify how much text > > > to capture in set_surrounding_text, and how often to update? > > > > > > A possible change that I decided against for now is to replace > > > enable/disable events by create/destroy of a new object, which > > > would make more state lifetimes encoded in the protocol. > > > > > > After reading a blog post on fcitx [0], I got the impression that > > > letting the compositor know some persistent ID of a text edit > > > instance could be useful, however I'm not sure what the use cases > > > are. > > > > > > As always, I'm happy to hear feedback. > > > > > > Cheers, > > > Dorota Czaplejewicz > > > > > > [0] > > > https://www.csslayer.info/wordpress/fcitx-dev/gaps-between-wayland-and-fcitx-or-all-input-methods/ > > > > > > Makefile.am| 1 + > > > unstable/text-input/text-input-unstable-v3.xml | 362 > > > + > > > 2 files changed, 363 insertions(+) > > > create mode 100644 unstable/text-input/text-input-unstable-v3.xml > > > > > > diff --git a/Makefile.am b/Makefile.am > > > index 4b9a901..86d7ca9 100644 > > > --- a/Makefile.am > > > +++ b/Makefile.am > > > @@ -3,6 +3,7 @@ unstable_protocols = > > > \ > > > unstable/fullscreen-shell/fullscreen-shell-unstable-v1.xml > > > \ > > > unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml > > > \ > > > unstable/text-input/text-input-unstable-v1.xml > > > \ > > > + unstable/text-input/text-input-unstable-v3.xml > > > \ > > > unstable/input-method/input-method-unstable-v1.xml > > > \ > > > unstable/xdg-shell/xdg-shell-unstable-v5.xml > > > \ > > > unstable/xdg-shell/xdg-shell-unstable-v6.xml > > > \ > > > diff --git a/unstable/text-input/text-input-unstable-v3.xml > > > b/unstable/text-input/text-input-unstable-v3.xml > > > new file mode 100644 > > > index 000..ed5204f > > > --- /dev/null > > > +++ b/unstable/text-input/text-input-unstable-v3.xml > > > @@ -0,0 +1,362 @@ > > > + > > > + > > > + > > > + > > > +Copyright © 2012, 2013 Intel Corporation > > > +Copyright © 2015, 2016 Jan Arne Petersen > > > +Copyright © 2017, 2018 Red Hat, Inc. > > > +Copyright © 2018 Purism SPC > > > + > > > +Permission to use, copy, modify, distribute, and sell this > > > +software and its documentation for any purpose is hereby granted > > > +without fee, provided that the above copyright notice appear in > > > +all copies and that both that copyright notice and this permission > > > +notice appear in supporting documentation, and that the name of > > > +the copyright holders not be used in advertising or publicity > > > +pertaining to distribution of the software without specific, > > > +written prior permission. The copyright holders make no > > > +representations about the suitability of this software for any > > > +purpose. It is provided "as is" without express or implied > > > +warranty. > > > + > > > +THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS > > > +SOFTWARE,
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
On Thu, 3 May 2018 20:47:27 +0200 Silvan Jegenwrote: > Hi Dorota > > Some comments and typo fixes below. > > On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote: > > This new protocol description is a simplification over v2. > > > > - All pre-edit text styling is gone. > > - Pre-edit cursor can span characters. > > - No events regarding input panel (OSK) state nor covered rectangle. > > Compositors are still free to handle situations where the keyboard > > focus rectangle is covered by the input panel. > > - No set_preferred_language request for clients. > > - There is no event to send keysyms. Compositors can use wl_keyboard > > interface instead. > > - All state is double-buffered, with specified state. > > - Use Unicode codepoints to measure strings. > > > > Signed-off-by: Dorota Czaplejewicz > > Signed-off-by: Carlos Garnacho > > --- > > This is the next update coming from Purism to perfect the text input > > protocol. > > > > The following changes added on top of PATCHv3: > > > > - Fixed whitespaces. > > - Removed enable flags - the same information can be gathered from the > > first requests after enter. > > - Changed offsets inside UTF-8 strings to use Unicode character counts in > > order to remove the possibility of communicating invalid state. > > - Specified the exact lifetime of double-buffered state, and initial values. > > - Made changes requested by the IM double-buffered. > > > > Some questions remain open. One is: how to specify how much text to capture > > in set_surrounding_text, and how often to update? > > > > A possible change that I decided against for now is to replace > > enable/disable events by create/destroy of a new object, which would make > > more state lifetimes encoded in the protocol. > > > > After reading a blog post on fcitx [0], I got the impression that letting > > the compositor know some persistent ID of a text edit instance could be > > useful, however I'm not sure what the use cases are. > > > > As always, I'm happy to hear feedback. > > > > Cheers, > > Dorota Czaplejewicz > > > > [0] > > https://www.csslayer.info/wordpress/fcitx-dev/gaps-between-wayland-and-fcitx-or-all-input-methods/ > > > > Makefile.am| 1 + > > unstable/text-input/text-input-unstable-v3.xml | 362 > > + > > 2 files changed, 363 insertions(+) > > create mode 100644 unstable/text-input/text-input-unstable-v3.xml > > > > diff --git a/Makefile.am b/Makefile.am > > index 4b9a901..86d7ca9 100644 > > --- a/Makefile.am > > +++ b/Makefile.am > > @@ -3,6 +3,7 @@ unstable_protocols = > > \ > > unstable/fullscreen-shell/fullscreen-shell-unstable-v1.xml > > \ > > unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml > > \ > > unstable/text-input/text-input-unstable-v1.xml > > \ > > + unstable/text-input/text-input-unstable-v3.xml > > \ > > unstable/input-method/input-method-unstable-v1.xml > > \ > > unstable/xdg-shell/xdg-shell-unstable-v5.xml > > \ > > unstable/xdg-shell/xdg-shell-unstable-v6.xml > > \ > > diff --git a/unstable/text-input/text-input-unstable-v3.xml > > b/unstable/text-input/text-input-unstable-v3.xml > > new file mode 100644 > > index 000..ed5204f > > --- /dev/null > > +++ b/unstable/text-input/text-input-unstable-v3.xml > > @@ -0,0 +1,362 @@ > > + > > + > > + > > + > > +Copyright © 2012, 2013 Intel Corporation > > +Copyright © 2015, 2016 Jan Arne Petersen > > +Copyright © 2017, 2018 Red Hat, Inc. > > +Copyright © 2018 Purism SPC > > + > > +Permission to use, copy, modify, distribute, and sell this > > +software and its documentation for any purpose is hereby granted > > +without fee, provided that the above copyright notice appear in > > +all copies and that both that copyright notice and this permission > > +notice appear in supporting documentation, and that the name of > > +the copyright holders not be used in advertising or publicity > > +pertaining to distribution of the software without specific, > > +written prior permission. The copyright holders make no > > +representations about the suitability of this software for any > > +purpose. It is provided "as is" without express or implied > > +warranty. > > + > > +THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS > > +SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND > > +FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY > > +SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES > > +WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN > > +AN ACTION
Re: [PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
Hi Dorota Some comments and typo fixes below. On Thu, May 03, 2018 at 05:41:21PM +0200, Dorota Czaplejewicz wrote: > This new protocol description is a simplification over v2. > > - All pre-edit text styling is gone. > - Pre-edit cursor can span characters. > - No events regarding input panel (OSK) state nor covered rectangle. > Compositors are still free to handle situations where the keyboard > focus rectangle is covered by the input panel. > - No set_preferred_language request for clients. > - There is no event to send keysyms. Compositors can use wl_keyboard > interface instead. > - All state is double-buffered, with specified state. > - Use Unicode codepoints to measure strings. > > Signed-off-by: Dorota Czaplejewicz> Signed-off-by: Carlos Garnacho > --- > This is the next update coming from Purism to perfect the text input protocol. > > The following changes added on top of PATCHv3: > > - Fixed whitespaces. > - Removed enable flags - the same information can be gathered from the first > requests after enter. > - Changed offsets inside UTF-8 strings to use Unicode character counts in > order to remove the possibility of communicating invalid state. > - Specified the exact lifetime of double-buffered state, and initial values. > - Made changes requested by the IM double-buffered. > > Some questions remain open. One is: how to specify how much text to capture > in set_surrounding_text, and how often to update? > > A possible change that I decided against for now is to replace enable/disable > events by create/destroy of a new object, which would make more state > lifetimes encoded in the protocol. > > After reading a blog post on fcitx [0], I got the impression that letting the > compositor know some persistent ID of a text edit instance could be useful, > however I'm not sure what the use cases are. > > As always, I'm happy to hear feedback. > > Cheers, > Dorota Czaplejewicz > > [0] > https://www.csslayer.info/wordpress/fcitx-dev/gaps-between-wayland-and-fcitx-or-all-input-methods/ > > Makefile.am| 1 + > unstable/text-input/text-input-unstable-v3.xml | 362 > + > 2 files changed, 363 insertions(+) > create mode 100644 unstable/text-input/text-input-unstable-v3.xml > > diff --git a/Makefile.am b/Makefile.am > index 4b9a901..86d7ca9 100644 > --- a/Makefile.am > +++ b/Makefile.am > @@ -3,6 +3,7 @@ unstable_protocols = > \ > unstable/fullscreen-shell/fullscreen-shell-unstable-v1.xml > \ > unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml > \ > unstable/text-input/text-input-unstable-v1.xml > \ > + unstable/text-input/text-input-unstable-v3.xml > \ > unstable/input-method/input-method-unstable-v1.xml > \ > unstable/xdg-shell/xdg-shell-unstable-v5.xml > \ > unstable/xdg-shell/xdg-shell-unstable-v6.xml > \ > diff --git a/unstable/text-input/text-input-unstable-v3.xml > b/unstable/text-input/text-input-unstable-v3.xml > new file mode 100644 > index 000..ed5204f > --- /dev/null > +++ b/unstable/text-input/text-input-unstable-v3.xml > @@ -0,0 +1,362 @@ > + > + > + > + > +Copyright © 2012, 2013 Intel Corporation > +Copyright © 2015, 2016 Jan Arne Petersen > +Copyright © 2017, 2018 Red Hat, Inc. > +Copyright © 2018 Purism SPC > + > +Permission to use, copy, modify, distribute, and sell this > +software and its documentation for any purpose is hereby granted > +without fee, provided that the above copyright notice appear in > +all copies and that both that copyright notice and this permission > +notice appear in supporting documentation, and that the name of > +the copyright holders not be used in advertising or publicity > +pertaining to distribution of the software without specific, > +written prior permission. The copyright holders make no > +representations about the suitability of this software for any > +purpose. It is provided "as is" without express or implied > +warranty. > + > +THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS > +SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND > +FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY > +SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES > +WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN > +AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, > +ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF > +THIS SOFTWARE. > + > + > + > + > + The zwp_text_input_v3 interface represents text input and input methods > + associated with a seat. It provides
[PATCHv4 wayland-protocols] text-input: Add v3 of the text-input protocol
From: Carlos GarnachoThis new protocol description is a simplification over v2. - All pre-edit text styling is gone. - Pre-edit cursor can span characters. - No events regarding input panel (OSK) state nor covered rectangle. Compositors are still free to handle situations where the keyboard focus rectangle is covered by the input panel. - No set_preferred_language request for clients. - There is no event to send keysyms. Compositors can use wl_keyboard interface instead. - All state is double-buffered, with specified state. - Use Unicode codepoints to measure strings. Signed-off-by: Dorota Czaplejewicz Signed-off-by: Carlos Garnacho --- This is the next update coming from Purism to perfect the text input protocol. The following changes added on top of PATCHv3: - Fixed whitespaces. - Removed enable flags - the same information can be gathered from the first requests after enter. - Changed offsets inside UTF-8 strings to use Unicode character counts in order to remove the possibility of communicating invalid state. - Specified the exact lifetime of double-buffered state, and initial values. - Made changes requested by the IM double-buffered. Some questions remain open. One is: how to specify how much text to capture in set_surrounding_text, and how often to update? A possible change that I decided against for now is to replace enable/disable events by create/destroy of a new object, which would make more state lifetimes encoded in the protocol. After reading a blog post on fcitx [0], I got the impression that letting the compositor know some persistent ID of a text edit instance could be useful, however I'm not sure what the use cases are. As always, I'm happy to hear feedback. Cheers, Dorota Czaplejewicz [0] https://www.csslayer.info/wordpress/fcitx-dev/gaps-between-wayland-and-fcitx-or-all-input-methods/ Makefile.am| 1 + unstable/text-input/text-input-unstable-v3.xml | 362 + 2 files changed, 363 insertions(+) create mode 100644 unstable/text-input/text-input-unstable-v3.xml diff --git a/Makefile.am b/Makefile.am index 4b9a901..86d7ca9 100644 --- a/Makefile.am +++ b/Makefile.am @@ -3,6 +3,7 @@ unstable_protocols = \ unstable/fullscreen-shell/fullscreen-shell-unstable-v1.xml \ unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml \ unstable/text-input/text-input-unstable-v1.xml \ + unstable/text-input/text-input-unstable-v3.xml \ unstable/input-method/input-method-unstable-v1.xml \ unstable/xdg-shell/xdg-shell-unstable-v5.xml \ unstable/xdg-shell/xdg-shell-unstable-v6.xml \ diff --git a/unstable/text-input/text-input-unstable-v3.xml b/unstable/text-input/text-input-unstable-v3.xml new file mode 100644 index 000..ed5204f --- /dev/null +++ b/unstable/text-input/text-input-unstable-v3.xml @@ -0,0 +1,362 @@ + + + + +Copyright © 2012, 2013 Intel Corporation +Copyright © 2015, 2016 Jan Arne Petersen +Copyright © 2017, 2018 Red Hat, Inc. +Copyright © 2018 Purism SPC + +Permission to use, copy, modify, distribute, and sell this +software and its documentation for any purpose is hereby granted +without fee, provided that the above copyright notice appear in +all copies and that both that copyright notice and this permission +notice appear in supporting documentation, and that the name of +the copyright holders not be used in advertising or publicity +pertaining to distribution of the software without specific, +written prior permission. The copyright holders make no +representations about the suitability of this software for any +purpose. It is provided "as is" without express or implied +warranty. + +THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS +SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND +FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY +SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN +AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, +ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF +THIS SOFTWARE. + + + + + The zwp_text_input_v3 interface represents text input and input methods + associated with a seat. It provides enter/leave events to follow the + text input focus for a seat. + + Requests are used to enable/disable the text-input object and set + state information like surrounding and selected text or the content type. + The information about the entered text is sent to the