Re: Composition, IME, etc.
On Jun 23, 2014, at 8:45 AM, Robin Berjon wrote: > On 06/06/2014 19:13 , Ryosuke Niwa wrote: >> On Jun 6, 2014, at 7:24 AM, Robin Berjon wrote: >>> In order to handle them you have two basic options: >>> >>> a) Let the browser handle them for you (possibly calling up some >>> platform functionality). This works as closely to user expectations >>> as a Web app can hope to get but how do you render it? If it >>> touches your DOM then you lose the indirection you need for >>> sensible editing; if it doesn't I don't know how you show it. >>> >>> b) Provide the app with enough information to do the right thing. >>> This gives you the indirection, but "doing the right thing" can be >>> pretty hard. >>> >>> I am still leaning towards (b) being the approach to follow, but >>> I'll admit that that's mostly because I can't see how to make (a) >>> actually work. If (b) is the way, then we need to make sure that >>> it's not so hard that everyone gets it wrong as soon as the input >>> is anything other than basic English. >> >> I'm not convinced b is the right approach. > > As I said though, it's better than (a) which is largely unusable. > > That said, I have a proposal that improves on (b) and I believes addresses > your concerns (essentially by merging both approaches into a single one). > >>> If the browser doesn't know because the platform can't tell the >>> difference between Korean and Japanese (a problem with which >>> Unicode doesn't help) then there really isn't much that we can do >>> to help the Web app. >> >> This predicates on using approach b. I'm not convinced that that's >> the right thing to do here. > > No, it doesn't. If the browser has no clue whatsoever how to present > composition then it can't offer the right UI itself any more than it can help > the application do things well. I am merely ruling that situation, which you > mentioned, out as unsolvable (by us). > >>> However if the browser knows, it can provide the app with >>> information. I don't have enough expertise to know how much >>> information it needs to convey — if it's mostly style that can be >>> done (it might be unwieldy to handle but we can look at it). >> >> The problem here is that we don't know if underlining is the only >> difference input methods ever need. We could imagine future new UI >> paradigms would require other styling such as bolding text, enlarging >> the text for easier readability while typing, etc... > > I never said that the browser would only provide underlining information. I > said it can convey *style*. If it knows that the specific composition being > carried out requires bolding, then it could provide the matching CSS > declaration. If there is an alien composition method that requires red > blinking with a green top border, it could convey that. > > Having said that, having the browser convey style information to the script > with the expectation that the script would create the correct Range for the > composition in progress and apply that style to it, even though possible, > seems like a lot of hoops to jump through that are essentially guaranteed to > be exactly the same in every single instance. > > I think we can do better. It's a complicated-sounding solution but the > problem is itself complex, and I *think* that it is doable and the best of > all options I can think of. > > To restate the problem: > > • We don't want the browser editing the DOM directly because that just > creates madness > • We want to enable any manner of text composition, from a broad array of > options, while showing the best UI for the user. > > These two requirements are at odds because rich, powerful composition that is > great for the user *has* to rely on the browser, but the logical way for the > browser to expose that is to use the DOM. > > The idea to ally both is to use a "shadow text insertion point". Basically, > it is a small DOM tree injected as a shadow at the insertion point (with > author styles applied to it). The browser can do *anything* it wants in there > in order to create a correct editing UI. While composition is ongoing, the > script still receives composition events but can safely just ignore them for > the vast majority of cases (since you can't generally usefully validate > composition in progress anyway). When the composition terminates, the input > event contains the *text* content of the shadow DOM, which is reclaimed. That's an interesting idea. It does works around the issue of UA having to draw the composting text while still allowing authors to style it. > I guess that the shadow text insertion point would participate in the tree in > the same way that a pseudo-element does. (Yes, I realise this basically means > "magic".) > > I believe this works well for the insertion of new text; I need to mull it > over further to think about editing existing content (notably the case that > happens in autocorrect, predictive, and I believe Kot
Re: Composition, IME, etc.
On 06/06/2014 19:13 , Ryosuke Niwa wrote: On Jun 6, 2014, at 7:24 AM, Robin Berjon wrote: In order to handle them you have two basic options: a) Let the browser handle them for you (possibly calling up some platform functionality). This works as closely to user expectations as a Web app can hope to get but how do you render it? If it touches your DOM then you lose the indirection you need for sensible editing; if it doesn't I don't know how you show it. b) Provide the app with enough information to do the right thing. This gives you the indirection, but "doing the right thing" can be pretty hard. I am still leaning towards (b) being the approach to follow, but I'll admit that that's mostly because I can't see how to make (a) actually work. If (b) is the way, then we need to make sure that it's not so hard that everyone gets it wrong as soon as the input is anything other than basic English. I'm not convinced b is the right approach. As I said though, it's better than (a) which is largely unusable. That said, I have a proposal that improves on (b) and I believes addresses your concerns (essentially by merging both approaches into a single one). If the browser doesn't know because the platform can't tell the difference between Korean and Japanese (a problem with which Unicode doesn't help) then there really isn't much that we can do to help the Web app. This predicates on using approach b. I'm not convinced that that's the right thing to do here. No, it doesn't. If the browser has no clue whatsoever how to present composition then it can't offer the right UI itself any more than it can help the application do things well. I am merely ruling that situation, which you mentioned, out as unsolvable (by us). However if the browser knows, it can provide the app with information. I don't have enough expertise to know how much information it needs to convey — if it's mostly style that can be done (it might be unwieldy to handle but we can look at it). The problem here is that we don't know if underlining is the only difference input methods ever need. We could imagine future new UI paradigms would require other styling such as bolding text, enlarging the text for easier readability while typing, etc... I never said that the browser would only provide underlining information. I said it can convey *style*. If it knows that the specific composition being carried out requires bolding, then it could provide the matching CSS declaration. If there is an alien composition method that requires red blinking with a green top border, it could convey that. Having said that, having the browser convey style information to the script with the expectation that the script would create the correct Range for the composition in progress and apply that style to it, even though possible, seems like a lot of hoops to jump through that are essentially guaranteed to be exactly the same in every single instance. I think we can do better. It's a complicated-sounding solution but the problem is itself complex, and I *think* that it is doable and the best of all options I can think of. To restate the problem: • We don't want the browser editing the DOM directly because that just creates madness • We want to enable any manner of text composition, from a broad array of options, while showing the best UI for the user. These two requirements are at odds because rich, powerful composition that is great for the user *has* to rely on the browser, but the logical way for the browser to expose that is to use the DOM. The idea to ally both is to use a "shadow text insertion point". Basically, it is a small DOM tree injected as a shadow at the insertion point (with author styles applied to it). The browser can do *anything* it wants in there in order to create a correct editing UI. While composition is ongoing, the script still receives composition events but can safely just ignore them for the vast majority of cases (since you can't generally usefully validate composition in progress anyway). When the composition terminates, the input event contains the *text* content of the shadow DOM, which is reclaimed. I guess that the shadow text insertion point would participate in the tree in the same way that a pseudo-element does. (Yes, I realise this basically means "magic".) I believe this works well for the insertion of new text; I need to mull it over further to think about editing existing content (notably the case that happens in autocorrect, predictive, and I believe Kotoeri where you place a cursor mid-word and it will take into account what's before it but not after). But I think it's worth giving it some thought; particularly because I don't see how we can solve this problem properly otherwise. This has the advantage that it is also a lot simpler to handle for authors. -- Robin Berjon - http://berjon.com/ - @robinberjon
Re: Composition, IME, etc. (was: contentEditable=minimal)
On Jun 6, 2014, at 10:13 AM, Ryosuke Niwa wrote: > > On Jun 6, 2014, at 7:24 AM, Robin Berjon wrote: > >> On 05/06/2014 09:09 , Ryosuke Niwa wrote: >>> On May 23, 2014, at 1:37 PM, Robin Berjon wrote: Semantically, autocorrect and compositing really are the same thing. >>> >>> They are not. Word substations and input method compositions are >>> semantically different operations. >> >> Ok, I'll accept that depending on the level of abstraction at which you're >> looking at the problem they may or may not be the same thing. >> >> The core of the problem is this: there is a wide array of situations in >> which some form of "indirect text input" (deliberately going for a new term >> with no baggage) takes place. This includes (but is not limited to): >> >> • dead key composition (Alt-N, N -> ñ) >> • assumed international composition (',e -> é, if you just want an >> apostrophe you have to compose ',space) >> • inline composition for pretty much everything >> • popup composition >> • autocorrect >> • speed-typing input (T9, swiping inputs) >> >> In order to handle them you have two basic options: >> >> a) Let the browser handle them for you (possibly calling up some platform >> functionality). This works as closely to user expectations as a Web app can >> hope to get but how do you render it? If it touches your DOM then you lose >> the indirection you need for sensible editing; if it doesn't I don't know >> how you show it. >> >> b) Provide the app with enough information to do the right thing. This gives >> you the indirection, but "doing the right thing" can be pretty hard. >> >> I am still leaning towards (b) being the approach to follow, but I'll admit >> that that's mostly because I can't see how to make (a) actually work. If (b) >> is the way, then we need to make sure that it's not so hard that everyone >> gets it wrong as soon as the input is anything other than basic English. > > I'm not convinced b is the right approach. > Note that if there is a degree of refinement such that we may want to make it possible for authors to style compositing-for-characters and compositing-for-autocorrect, then that ought to go into the styling system. >>> >>> In older versions of Windows, for example, the browser itself can't >>> figure out what kind of style is used by IME. Korean and Japanese >>> IME on Windows, for example, use bolded lines and dotted lines for >>> opposite purposes. And we get bug reports saying that WebKit's >>> rendering for Korean IME is incorrect because we decided to follow >>> Japanese IME's convention. >> >> Right. In this case we need to distinguish between the browser not knowing >> and the Web app not knowing. >> >> If the browser doesn't know because the platform can't tell the difference >> between Korean and Japanese (a problem with which Unicode doesn't help) then >> there really isn't much that we can do to help the Web app. > > This predicates on using approach b. I'm not convinced that that's the right > thing to do here. > >> However if the browser knows, it can provide the app with information. I >> don't have enough expertise to know how much information it needs to convey >> — if it's mostly style that can be done (it might be unwieldy to handle but >> we can look at it). > > The problem here is that we don't know if underlining is the only difference > input methods ever need. We could imagine future new UI paradigms would > require other styling such as bolding text, enlarging the text for easier > readability while typing, etc... > We /could/ consider adding a field to compositing events that would capture some form of ontology of input systems. But I think that's sort of far-fetched and we can get by with the above. (And yes, I'm using "ontology" on purpose. It wouldn't look good :) >>> >>> In my opinion, it's a requirement that input methods work and look >>> native on editors that use this new API. IME is not a nice-to-have >>> feature. It's a feature required for billions of people to type any >>> text. >> >> That is *exactly* my point. At this point I believe that if we just added >> something like a compositionType = deadkey | kr | jp | t9 | autocorrect | >> ... field and leave it at that we're not helping anyone. The script will >> need to know not just how to render all of these but how they are supposed >> to look on each platform. That's why I am arguing for primitives that enable >> the script to do the right thing *without* having to know everything about >> all the possible IMEs. > > Right. We need a primitive to support all without having to explicitly > support each. > >> Having said that, I was initially hoping that a mixture of composition >> events plus IME API would cover a lot of ground already. Thinking about it >> some more, it's not enough. >> >> Can you help me come up with a list of aspects that need to be captured in >> order to en
Re: Composition, IME, etc. (was: contentEditable=minimal)
On Jun 6, 2014, at 10:13 AM, Ryosuke Niwa wrote: > > On Jun 6, 2014, at 7:24 AM, Robin Berjon wrote: > >> On 05/06/2014 09:09 , Ryosuke Niwa wrote: >>> On May 23, 2014, at 1:37 PM, Robin Berjon wrote: Semantically, autocorrect and compositing really are the same thing. >>> >>> They are not. Word substations and input method compositions are >>> semantically different operations. >> >> Ok, I'll accept that depending on the level of abstraction at which you're >> looking at the problem they may or may not be the same thing. >> >> The core of the problem is this: there is a wide array of situations in >> which some form of "indirect text input" (deliberately going for a new term >> with no baggage) takes place. This includes (but is not limited to): >> >> • dead key composition (Alt-N, N -> ñ) >> • assumed international composition (',e -> é, if you just want an >> apostrophe you have to compose ',space) >> • inline composition for pretty much everything >> • popup composition >> • autocorrect >> • speed-typing input (T9, swiping inputs) >> >> In order to handle them you have two basic options: >> >> a) Let the browser handle them for you (possibly calling up some platform >> functionality). This works as closely to user expectations as a Web app can >> hope to get but how do you render it? If it touches your DOM then you lose >> the indirection you need for sensible editing; if it doesn't I don't know >> how you show it. >> >> b) Provide the app with enough information to do the right thing. This gives >> you the indirection, but "doing the right thing" can be pretty hard. >> >> I am still leaning towards (b) being the approach to follow, but I'll admit >> that that's mostly because I can't see how to make (a) actually work. If (b) >> is the way, then we need to make sure that it's not so hard that everyone >> gets it wrong as soon as the input is anything other than basic English. > > I'm not convinced b is the right approach. > Note that if there is a degree of refinement such that we may want to make it possible for authors to style compositing-for-characters and compositing-for-autocorrect, then that ought to go into the styling system. >>> >>> In older versions of Windows, for example, the browser itself can't >>> figure out what kind of style is used by IME. Korean and Japanese >>> IME on Windows, for example, use bolded lines and dotted lines for >>> opposite purposes. And we get bug reports saying that WebKit's >>> rendering for Korean IME is incorrect because we decided to follow >>> Japanese IME's convention. >> >> Right. In this case we need to distinguish between the browser not knowing >> and the Web app not knowing. >> >> If the browser doesn't know because the platform can't tell the difference >> between Korean and Japanese (a problem with which Unicode doesn't help) then >> there really isn't much that we can do to help the Web app. > > This predicates on using approach b. I'm not convinced that that's the right > thing to do here. > >> However if the browser knows, it can provide the app with information. I >> don't have enough expertise to know how much information it needs to convey >> — if it's mostly style that can be done (it might be unwieldy to handle but >> we can look at it). > > The problem here is that we don't know if underlining is the only difference > input methods ever need. We could imagine future new UI paradigms would > require other styling such as bolding text, enlarging the text for easier > readability while typing, etc... > We /could/ consider adding a field to compositing events that would capture some form of ontology of input systems. But I think that's sort of far-fetched and we can get by with the above. (And yes, I'm using "ontology" on purpose. It wouldn't look good :) >>> >>> In my opinion, it's a requirement that input methods work and look >>> native on editors that use this new API. IME is not a nice-to-have >>> feature. It's a feature required for billions of people to type any >>> text. >> >> That is *exactly* my point. At this point I believe that if we just added >> something like a compositionType = deadkey | kr | jp | t9 | autocorrect | >> ... field and leave it at that we're not helping anyone. The script will >> need to know not just how to render all of these but how they are supposed >> to look on each platform. That's why I am arguing for primitives that enable >> the script to do the right thing *without* having to know everything about >> all the possible IMEs. > > Right. We need a primitive to support all without having to explicitly > support each. > >> Having said that, I was initially hoping that a mixture of composition >> events plus IME API would cover a lot of ground already. Thinking about it >> some more, it's not enough. >> >> Can you help me come up with a list of aspects that need to be captured in >> order to en
Re: Composition, IME, etc. (was: contentEditable=minimal)
On Jun 6, 2014, at 7:24 AM, Robin Berjon wrote: > On 05/06/2014 09:09 , Ryosuke Niwa wrote: >> On May 23, 2014, at 1:37 PM, Robin Berjon wrote: >>> Semantically, autocorrect and compositing really are the same >>> thing. >> >> They are not. Word substations and input method compositions are >> semantically different operations. > > Ok, I'll accept that depending on the level of abstraction at which you're > looking at the problem they may or may not be the same thing. > > The core of the problem is this: there is a wide array of situations in which > some form of "indirect text input" (deliberately going for a new term with no > baggage) takes place. This includes (but is not limited to): > > • dead key composition (Alt-N, N -> ñ) > • assumed international composition (',e -> é, if you just want an > apostrophe you have to compose ',space) > • inline composition for pretty much everything > • popup composition > • autocorrect > • speed-typing input (T9, swiping inputs) > > In order to handle them you have two basic options: > > a) Let the browser handle them for you (possibly calling up some platform > functionality). This works as closely to user expectations as a Web app can > hope to get but how do you render it? If it touches your DOM then you lose > the indirection you need for sensible editing; if it doesn't I don't know how > you show it. > > b) Provide the app with enough information to do the right thing. This gives > you the indirection, but "doing the right thing" can be pretty hard. > > I am still leaning towards (b) being the approach to follow, but I'll admit > that that's mostly because I can't see how to make (a) actually work. If (b) > is the way, then we need to make sure that it's not so hard that everyone > gets it wrong as soon as the input is anything other than basic English. I'm not convinced b is the right approach. >>> Note that if there is a degree of refinement such that we may want >>> to make it possible for authors to style compositing-for-characters >>> and compositing-for-autocorrect, then that ought to go into the >>> styling system. >> >> In older versions of Windows, for example, the browser itself can't >> figure out what kind of style is used by IME. Korean and Japanese >> IME on Windows, for example, use bolded lines and dotted lines for >> opposite purposes. And we get bug reports saying that WebKit's >> rendering for Korean IME is incorrect because we decided to follow >> Japanese IME's convention. > > Right. In this case we need to distinguish between the browser not knowing > and the Web app not knowing. > > If the browser doesn't know because the platform can't tell the difference > between Korean and Japanese (a problem with which Unicode doesn't help) then > there really isn't much that we can do to help the Web app. This predicates on using approach b. I'm not convinced that that's the right thing to do here. > However if the browser knows, it can provide the app with information. I > don't have enough expertise to know how much information it needs to convey — > if it's mostly style that can be done (it might be unwieldy to handle but we > can look at it). The problem here is that we don't know if underlining is the only difference input methods ever need. We could imagine future new UI paradigms would require other styling such as bolding text, enlarging the text for easier readability while typing, etc... >>> We /could/ consider adding a field to compositing events that would >>> capture some form of ontology of input systems. But I think that's >>> sort of far-fetched and we can get by with the above. (And yes, I'm >>> using "ontology" on purpose. It wouldn't look good :) >> >> In my opinion, it's a requirement that input methods work and look >> native on editors that use this new API. IME is not a nice-to-have >> feature. It's a feature required for billions of people to type any >> text. > > That is *exactly* my point. At this point I believe that if we just added > something like a compositionType = deadkey | kr | jp | t9 | autocorrect | ... > field and leave it at that we're not helping anyone. The script will need to > know not just how to render all of these but how they are supposed to look on > each platform. That's why I am arguing for primitives that enable the script > to do the right thing *without* having to know everything about all the > possible IMEs. Right. We need a primitive to support all without having to explicitly support each. > Having said that, I was initially hoping that a mixture of composition events > plus IME API would cover a lot of ground already. Thinking about it some > more, it's not enough. > > Can you help me come up with a list of aspects that need to be captured in > order to enable the app to render the right UI? Or do you have another > proposal? The biggest difference between European alphabet substation (e.g. e -> é) and CJK input meth
Composition, IME, etc. (was: contentEditable=minimal)
On 05/06/2014 09:09 , Ryosuke Niwa wrote: On May 23, 2014, at 1:37 PM, Robin Berjon wrote: Semantically, autocorrect and compositing really are the same thing. They are not. Word substations and input method compositions are semantically different operations. Ok, I'll accept that depending on the level of abstraction at which you're looking at the problem they may or may not be the same thing. The core of the problem is this: there is a wide array of situations in which some form of "indirect text input" (deliberately going for a new term with no baggage) takes place. This includes (but is not limited to): • dead key composition (Alt-N, N -> ñ) • assumed international composition (',e -> é, if you just want an apostrophe you have to compose ',space) • inline composition for pretty much everything • popup composition • autocorrect • speed-typing input (T9, swiping inputs) In order to handle them you have two basic options: a) Let the browser handle them for you (possibly calling up some platform functionality). This works as closely to user expectations as a Web app can hope to get but how do you render it? If it touches your DOM then you lose the indirection you need for sensible editing; if it doesn't I don't know how you show it. b) Provide the app with enough information to do the right thing. This gives you the indirection, but "doing the right thing" can be pretty hard. I am still leaning towards (b) being the approach to follow, but I'll admit that that's mostly because I can't see how to make (a) actually work. If (b) is the way, then we need to make sure that it's not so hard that everyone gets it wrong as soon as the input is anything other than basic English. Note that if there is a degree of refinement such that we may want to make it possible for authors to style compositing-for-characters and compositing-for-autocorrect, then that ought to go into the styling system. In older versions of Windows, for example, the browser itself can't figure out what kind of style is used by IME. Korean and Japanese IME on Windows, for example, use bolded lines and dotted lines for opposite purposes. And we get bug reports saying that WebKit's rendering for Korean IME is incorrect because we decided to follow Japanese IME's convention. Right. In this case we need to distinguish between the browser not knowing and the Web app not knowing. If the browser doesn't know because the platform can't tell the difference between Korean and Japanese (a problem with which Unicode doesn't help) then there really isn't much that we can do to help the Web app. However if the browser knows, it can provide the app with information. I don't have enough expertise to know how much information it needs to convey — if it's mostly style that can be done (it might be unwieldy to handle but we can look at it). We /could/ consider adding a field to compositing events that would capture some form of ontology of input systems. But I think that's sort of far-fetched and we can get by with the above. (And yes, I'm using "ontology" on purpose. It wouldn't look good :) In my opinion, it's a requirement that input methods work and look native on editors that use this new API. IME is not a nice-to-have feature. It's a feature required for billions of people to type any text. That is *exactly* my point. At this point I believe that if we just added something like a compositionType = deadkey | kr | jp | t9 | autocorrect | ... field and leave it at that we're not helping anyone. The script will need to know not just how to render all of these but how they are supposed to look on each platform. That's why I am arguing for primitives that enable the script to do the right thing *without* having to know everything about all the possible IMEs. Having said that, I was initially hoping that a mixture of composition events plus IME API would cover a lot of ground already. Thinking about it some more, it's not enough. Can you help me come up with a list of aspects that need to be captured in order to enable the app to render the right UI? Or do you have another proposal? -- Robin Berjon - http://berjon.com/ - @robinberjon