Re: Composition, IME, etc.

2014-06-30 Thread Ryosuke Niwa

On Jun 23, 2014, at 8:45 AM, Robin Berjon  wrote:

> On 06/06/2014 19:13 , Ryosuke Niwa wrote:
>> On Jun 6, 2014, at 7:24 AM, Robin Berjon  wrote:
>>> In order to handle them you have two basic options:
>>> 
>>> a) Let the browser handle them for you (possibly calling up some
>>> platform functionality). This works as closely to user expectations
>>> as a Web app can hope to get but how do you render it? If it
>>> touches your DOM then you lose the indirection you need for
>>> sensible editing; if it doesn't I don't know how you show it.
>>> 
>>> b) Provide the app with enough information to do the right thing.
>>> This gives you the indirection, but "doing the right thing" can be
>>> pretty hard.
>>> 
>>> I am still leaning towards (b) being the approach to follow, but
>>> I'll admit that that's mostly because I can't see how to make (a)
>>> actually work. If (b) is the way, then we need to make sure that
>>> it's not so hard that everyone gets it wrong as soon as the input
>>> is anything other than basic English.
>> 
>> I'm not convinced b is the right approach.
> 
> As I said though, it's better than (a) which is largely unusable.
> 
> That said, I have a proposal that improves on (b) and I believes addresses 
> your concerns (essentially by merging both approaches into a single one).
> 
>>> If the browser doesn't know because the platform can't tell the
>>> difference between Korean and Japanese (a problem with which
>>> Unicode doesn't help) then there really isn't much that we can do
>>> to help the Web app.
>> 
>> This predicates on using approach b.  I'm not convinced that that's
>> the right thing to do here.
> 
> No, it doesn't. If the browser has no clue whatsoever how to present 
> composition then it can't offer the right UI itself any more than it can help 
> the application do things well. I am merely ruling that situation, which you 
> mentioned, out as unsolvable (by us).
> 
>>> However if the browser knows, it can provide the app with
>>> information. I don't have enough expertise to know how much
>>> information it needs to convey — if it's mostly style that can be
>>> done (it might be unwieldy to handle but we can look at it).
>> 
>> The problem here is that we don't know if underlining is the only
>> difference input methods ever need.  We could imagine future new UI
>> paradigms would require other styling such as bolding text, enlarging
>> the text for easier readability while typing, etc...
> 
> I never said that the browser would only provide underlining information. I 
> said it can convey *style*. If it knows that the specific composition being 
> carried out requires bolding, then it could provide the matching CSS 
> declaration. If there is an alien composition method that requires red 
> blinking with a green top border, it could convey that.
> 
> Having said that, having the browser convey style information to the script 
> with the expectation that the script would create the correct Range for the 
> composition in progress and apply that style to it, even though possible, 
> seems like a lot of hoops to jump through that are essentially guaranteed to 
> be exactly the same in every single instance.
> 
> I think we can do better. It's a complicated-sounding solution but the 
> problem is itself complex, and I *think* that it is doable and the best of 
> all options I can think of.
> 
> To restate the problem:
> 
>  • We don't want the browser editing the DOM directly because that just 
> creates madness
>  • We want to enable any manner of text composition, from a broad array of 
> options, while showing the best UI for the user.
> 
> These two requirements are at odds because rich, powerful composition that is 
> great for the user *has* to rely on the browser, but the logical way for the 
> browser to expose that is to use the DOM.
> 
> The idea to ally both is to use a "shadow text insertion point". Basically, 
> it is a small DOM tree injected as a shadow at the insertion point (with 
> author styles applied to it). The browser can do *anything* it wants in there 
> in order to create a correct editing UI. While composition is ongoing, the 
> script still receives composition events but can safely just ignore them for 
> the vast majority of cases (since you can't generally usefully validate 
> composition in progress anyway). When the composition terminates, the input 
> event contains the *text* content of the shadow DOM, which is reclaimed.

That's an interesting idea. It does works around the issue of UA having to draw 
the composting text while still allowing authors to style it.

> I guess that the shadow text insertion point would participate in the tree in 
> the same way that a pseudo-element does. (Yes, I realise this basically means 
> "magic".)
> 
> I believe this works well for the insertion of new text; I need to mull it 
> over further to think about editing existing content (notably the case that 
> happens in autocorrect, predictive, and I believe Kot

Re: Composition, IME, etc.

2014-06-23 Thread Robin Berjon

On 06/06/2014 19:13 , Ryosuke Niwa wrote:

On Jun 6, 2014, at 7:24 AM, Robin Berjon  wrote:

In order to handle them you have two basic options:

a) Let the browser handle them for you (possibly calling up some
platform functionality). This works as closely to user expectations
as a Web app can hope to get but how do you render it? If it
touches your DOM then you lose the indirection you need for
sensible editing; if it doesn't I don't know how you show it.

b) Provide the app with enough information to do the right thing.
This gives you the indirection, but "doing the right thing" can be
pretty hard.

I am still leaning towards (b) being the approach to follow, but
I'll admit that that's mostly because I can't see how to make (a)
actually work. If (b) is the way, then we need to make sure that
it's not so hard that everyone gets it wrong as soon as the input
is anything other than basic English.


I'm not convinced b is the right approach.


As I said though, it's better than (a) which is largely unusable.

That said, I have a proposal that improves on (b) and I believes 
addresses your concerns (essentially by merging both approaches into a 
single one).



If the browser doesn't know because the platform can't tell the
difference between Korean and Japanese (a problem with which
Unicode doesn't help) then there really isn't much that we can do
to help the Web app.


This predicates on using approach b.  I'm not convinced that that's
the right thing to do here.


No, it doesn't. If the browser has no clue whatsoever how to present 
composition then it can't offer the right UI itself any more than it can 
help the application do things well. I am merely ruling that situation, 
which you mentioned, out as unsolvable (by us).



However if the browser knows, it can provide the app with
information. I don't have enough expertise to know how much
information it needs to convey — if it's mostly style that can be
done (it might be unwieldy to handle but we can look at it).


The problem here is that we don't know if underlining is the only
difference input methods ever need.  We could imagine future new UI
paradigms would require other styling such as bolding text, enlarging
the text for easier readability while typing, etc...


I never said that the browser would only provide underlining 
information. I said it can convey *style*. If it knows that the specific 
composition being carried out requires bolding, then it could provide 
the matching CSS declaration. If there is an alien composition method 
that requires red blinking with a green top border, it could convey that.


Having said that, having the browser convey style information to the 
script with the expectation that the script would create the correct 
Range for the composition in progress and apply that style to it, even 
though possible, seems like a lot of hoops to jump through that are 
essentially guaranteed to be exactly the same in every single instance.


I think we can do better. It's a complicated-sounding solution but the 
problem is itself complex, and I *think* that it is doable and the best 
of all options I can think of.


To restate the problem:

  • We don't want the browser editing the DOM directly because that 
just creates madness
  • We want to enable any manner of text composition, from a broad 
array of options, while showing the best UI for the user.


These two requirements are at odds because rich, powerful composition 
that is great for the user *has* to rely on the browser, but the logical 
way for the browser to expose that is to use the DOM.


The idea to ally both is to use a "shadow text insertion point". 
Basically, it is a small DOM tree injected as a shadow at the insertion 
point (with author styles applied to it). The browser can do *anything* 
it wants in there in order to create a correct editing UI. While 
composition is ongoing, the script still receives composition events but 
can safely just ignore them for the vast majority of cases (since you 
can't generally usefully validate composition in progress anyway). When 
the composition terminates, the input event contains the *text* content 
of the shadow DOM, which is reclaimed.


I guess that the shadow text insertion point would participate in the 
tree in the same way that a pseudo-element does. (Yes, I realise this 
basically means "magic".)


I believe this works well for the insertion of new text; I need to mull 
it over further to think about editing existing content (notably the 
case that happens in autocorrect, predictive, and I believe Kotoeri 
where you place a cursor mid-word and it will take into account what's 
before it but not after). But I think it's worth giving it some thought; 
particularly because I don't see how we can solve this problem properly 
otherwise.


This has the advantage that it is also a lot simpler to handle for authors.

--
Robin Berjon - http://berjon.com/ - @robinberjon



Re: Composition, IME, etc. (was: contentEditable=minimal)

2014-06-06 Thread Ryosuke Niwa

On Jun 6, 2014, at 10:13 AM, Ryosuke Niwa  wrote:

> 
> On Jun 6, 2014, at 7:24 AM, Robin Berjon  wrote:
> 
>> On 05/06/2014 09:09 , Ryosuke Niwa wrote:
>>> On May 23, 2014, at 1:37 PM, Robin Berjon  wrote:
 Semantically, autocorrect and compositing really are the same
 thing.
>>> 
>>> They are not.  Word substations and input method compositions are
>>> semantically different operations.
>> 
>> Ok, I'll accept that depending on the level of abstraction at which you're 
>> looking at the problem they may or may not be the same thing.
>> 
>> The core of the problem is this: there is a wide array of situations in 
>> which some form of "indirect text input" (deliberately going for a new term 
>> with no baggage) takes place. This includes (but is not limited to):
>> 
>> • dead key composition (Alt-N, N -> ñ)
>> • assumed international composition (',e -> é, if you just want an 
>> apostrophe you have to compose ',space)
>> • inline composition for pretty much everything
>> • popup composition
>> • autocorrect
>> • speed-typing input (T9, swiping inputs)
>> 
>> In order to handle them you have two basic options:
>> 
>> a) Let the browser handle them for you (possibly calling up some platform 
>> functionality). This works as closely to user expectations as a Web app can 
>> hope to get but how do you render it? If it touches your DOM then you lose 
>> the indirection you need for sensible editing; if it doesn't I don't know 
>> how you show it.
>> 
>> b) Provide the app with enough information to do the right thing. This gives 
>> you the indirection, but "doing the right thing" can be pretty hard.
>> 
>> I am still leaning towards (b) being the approach to follow, but I'll admit 
>> that that's mostly because I can't see how to make (a) actually work. If (b) 
>> is the way, then we need to make sure that it's not so hard that everyone 
>> gets it wrong as soon as the input is anything other than basic English.
> 
> I'm not convinced b is the right approach.
> 
 Note that if there is a degree of refinement such that we may want
 to make it possible for authors to style compositing-for-characters
 and compositing-for-autocorrect, then that ought to go into the
 styling system.
>>> 
>>> In older versions of Windows, for example, the browser itself can't
>>> figure out what kind of style is used by IME.  Korean and Japanese
>>> IME on Windows, for example, use bolded lines and dotted lines for
>>> opposite purposes.  And we get bug reports saying that WebKit's
>>> rendering for Korean IME is incorrect because we decided to follow
>>> Japanese IME's convention.
>> 
>> Right. In this case we need to distinguish between the browser not knowing 
>> and the Web app not knowing.
>> 
>> If the browser doesn't know because the platform can't tell the difference 
>> between Korean and Japanese (a problem with which Unicode doesn't help) then 
>> there really isn't much that we can do to help the Web app.
> 
> This predicates on using approach b.  I'm not convinced that that's the right 
> thing to do here.
> 
>> However if the browser knows, it can provide the app with information. I 
>> don't have enough expertise to know how much information it needs to convey 
>> — if it's mostly style that can be done (it might be unwieldy to handle but 
>> we can look at it).
> 
> The problem here is that we don't know if underlining is the only difference 
> input methods ever need.  We could imagine future new UI paradigms would 
> require other styling such as bolding text, enlarging the text for easier 
> readability while typing, etc... 
> 
 We /could/ consider adding a field to compositing events that would
 capture some form of ontology of input systems. But I think that's
 sort of far-fetched and we can get by with the above. (And yes, I'm
 using "ontology" on purpose. It wouldn't look good :)
>>> 
>>> In my opinion, it's a requirement that input methods work and look
>>> native on editors that use this new API.  IME is not a nice-to-have
>>> feature.  It's a feature required for billions of people to type any
>>> text.
>> 
>> That is *exactly* my point. At this point I believe that if we just added 
>> something like a compositionType = deadkey | kr | jp | t9 | autocorrect | 
>> ... field and leave it at that we're not helping anyone. The script will 
>> need to know not just how to render all of these but how they are supposed 
>> to look on each platform. That's why I am arguing for primitives that enable 
>> the script to do the right thing *without* having to know everything about 
>> all the possible IMEs.
> 
> Right.  We need a primitive to support all without having to explicitly 
> support each.
> 
>> Having said that, I was initially hoping that a mixture of composition 
>> events plus IME API would cover a lot of ground already. Thinking about it 
>> some more, it's not enough.
>> 
>> Can you help me come up with a list of aspects that need to be captured in 
>> order to en

Re: Composition, IME, etc. (was: contentEditable=minimal)

2014-06-06 Thread Ryosuke Niwa

On Jun 6, 2014, at 10:13 AM, Ryosuke Niwa  wrote:

> 
> On Jun 6, 2014, at 7:24 AM, Robin Berjon  wrote:
> 
>> On 05/06/2014 09:09 , Ryosuke Niwa wrote:
>>> On May 23, 2014, at 1:37 PM, Robin Berjon  wrote:
 Semantically, autocorrect and compositing really are the same
 thing.
>>> 
>>> They are not.  Word substations and input method compositions are
>>> semantically different operations.
>> 
>> Ok, I'll accept that depending on the level of abstraction at which you're 
>> looking at the problem they may or may not be the same thing.
>> 
>> The core of the problem is this: there is a wide array of situations in 
>> which some form of "indirect text input" (deliberately going for a new term 
>> with no baggage) takes place. This includes (but is not limited to):
>> 
>> • dead key composition (Alt-N, N -> ñ)
>> • assumed international composition (',e -> é, if you just want an 
>> apostrophe you have to compose ',space)
>> • inline composition for pretty much everything
>> • popup composition
>> • autocorrect
>> • speed-typing input (T9, swiping inputs)
>> 
>> In order to handle them you have two basic options:
>> 
>> a) Let the browser handle them for you (possibly calling up some platform 
>> functionality). This works as closely to user expectations as a Web app can 
>> hope to get but how do you render it? If it touches your DOM then you lose 
>> the indirection you need for sensible editing; if it doesn't I don't know 
>> how you show it.
>> 
>> b) Provide the app with enough information to do the right thing. This gives 
>> you the indirection, but "doing the right thing" can be pretty hard.
>> 
>> I am still leaning towards (b) being the approach to follow, but I'll admit 
>> that that's mostly because I can't see how to make (a) actually work. If (b) 
>> is the way, then we need to make sure that it's not so hard that everyone 
>> gets it wrong as soon as the input is anything other than basic English.
> 
> I'm not convinced b is the right approach.
> 
 Note that if there is a degree of refinement such that we may want
 to make it possible for authors to style compositing-for-characters
 and compositing-for-autocorrect, then that ought to go into the
 styling system.
>>> 
>>> In older versions of Windows, for example, the browser itself can't
>>> figure out what kind of style is used by IME.  Korean and Japanese
>>> IME on Windows, for example, use bolded lines and dotted lines for
>>> opposite purposes.  And we get bug reports saying that WebKit's
>>> rendering for Korean IME is incorrect because we decided to follow
>>> Japanese IME's convention.
>> 
>> Right. In this case we need to distinguish between the browser not knowing 
>> and the Web app not knowing.
>> 
>> If the browser doesn't know because the platform can't tell the difference 
>> between Korean and Japanese (a problem with which Unicode doesn't help) then 
>> there really isn't much that we can do to help the Web app.
> 
> This predicates on using approach b.  I'm not convinced that that's the right 
> thing to do here.
> 
>> However if the browser knows, it can provide the app with information. I 
>> don't have enough expertise to know how much information it needs to convey 
>> — if it's mostly style that can be done (it might be unwieldy to handle but 
>> we can look at it).
> 
> The problem here is that we don't know if underlining is the only difference 
> input methods ever need.  We could imagine future new UI paradigms would 
> require other styling such as bolding text, enlarging the text for easier 
> readability while typing, etc... 
> 
 We /could/ consider adding a field to compositing events that would
 capture some form of ontology of input systems. But I think that's
 sort of far-fetched and we can get by with the above. (And yes, I'm
 using "ontology" on purpose. It wouldn't look good :)
>>> 
>>> In my opinion, it's a requirement that input methods work and look
>>> native on editors that use this new API.  IME is not a nice-to-have
>>> feature.  It's a feature required for billions of people to type any
>>> text.
>> 
>> That is *exactly* my point. At this point I believe that if we just added 
>> something like a compositionType = deadkey | kr | jp | t9 | autocorrect | 
>> ... field and leave it at that we're not helping anyone. The script will 
>> need to know not just how to render all of these but how they are supposed 
>> to look on each platform. That's why I am arguing for primitives that enable 
>> the script to do the right thing *without* having to know everything about 
>> all the possible IMEs.
> 
> Right.  We need a primitive to support all without having to explicitly 
> support each.
> 
>> Having said that, I was initially hoping that a mixture of composition 
>> events plus IME API would cover a lot of ground already. Thinking about it 
>> some more, it's not enough.
>> 
>> Can you help me come up with a list of aspects that need to be captured in 
>> order to en

Re: Composition, IME, etc. (was: contentEditable=minimal)

2014-06-06 Thread Ryosuke Niwa

On Jun 6, 2014, at 7:24 AM, Robin Berjon  wrote:

> On 05/06/2014 09:09 , Ryosuke Niwa wrote:
>> On May 23, 2014, at 1:37 PM, Robin Berjon  wrote:
>>> Semantically, autocorrect and compositing really are the same
>>> thing.
>> 
>> They are not.  Word substations and input method compositions are
>> semantically different operations.
> 
> Ok, I'll accept that depending on the level of abstraction at which you're 
> looking at the problem they may or may not be the same thing.
> 
> The core of the problem is this: there is a wide array of situations in which 
> some form of "indirect text input" (deliberately going for a new term with no 
> baggage) takes place. This includes (but is not limited to):
> 
>  • dead key composition (Alt-N, N -> ñ)
>  • assumed international composition (',e -> é, if you just want an 
> apostrophe you have to compose ',space)
>  • inline composition for pretty much everything
>  • popup composition
>  • autocorrect
>  • speed-typing input (T9, swiping inputs)
> 
> In order to handle them you have two basic options:
> 
>  a) Let the browser handle them for you (possibly calling up some platform 
> functionality). This works as closely to user expectations as a Web app can 
> hope to get but how do you render it? If it touches your DOM then you lose 
> the indirection you need for sensible editing; if it doesn't I don't know how 
> you show it.
> 
>  b) Provide the app with enough information to do the right thing. This gives 
> you the indirection, but "doing the right thing" can be pretty hard.
> 
> I am still leaning towards (b) being the approach to follow, but I'll admit 
> that that's mostly because I can't see how to make (a) actually work. If (b) 
> is the way, then we need to make sure that it's not so hard that everyone 
> gets it wrong as soon as the input is anything other than basic English.

I'm not convinced b is the right approach.

>>> Note that if there is a degree of refinement such that we may want
>>> to make it possible for authors to style compositing-for-characters
>>> and compositing-for-autocorrect, then that ought to go into the
>>> styling system.
>> 
>> In older versions of Windows, for example, the browser itself can't
>> figure out what kind of style is used by IME.  Korean and Japanese
>> IME on Windows, for example, use bolded lines and dotted lines for
>> opposite purposes.  And we get bug reports saying that WebKit's
>> rendering for Korean IME is incorrect because we decided to follow
>> Japanese IME's convention.
> 
> Right. In this case we need to distinguish between the browser not knowing 
> and the Web app not knowing.
> 
> If the browser doesn't know because the platform can't tell the difference 
> between Korean and Japanese (a problem with which Unicode doesn't help) then 
> there really isn't much that we can do to help the Web app.

This predicates on using approach b.  I'm not convinced that that's the right 
thing to do here.

> However if the browser knows, it can provide the app with information. I 
> don't have enough expertise to know how much information it needs to convey — 
> if it's mostly style that can be done (it might be unwieldy to handle but we 
> can look at it).

The problem here is that we don't know if underlining is the only difference 
input methods ever need.  We could imagine future new UI paradigms would 
require other styling such as bolding text, enlarging the text for easier 
readability while typing, etc... 

>>> We /could/ consider adding a field to compositing events that would
>>> capture some form of ontology of input systems. But I think that's
>>> sort of far-fetched and we can get by with the above. (And yes, I'm
>>> using "ontology" on purpose. It wouldn't look good :)
>> 
>> In my opinion, it's a requirement that input methods work and look
>> native on editors that use this new API.  IME is not a nice-to-have
>> feature.  It's a feature required for billions of people to type any
>> text.
> 
> That is *exactly* my point. At this point I believe that if we just added 
> something like a compositionType = deadkey | kr | jp | t9 | autocorrect | ... 
> field and leave it at that we're not helping anyone. The script will need to 
> know not just how to render all of these but how they are supposed to look on 
> each platform. That's why I am arguing for primitives that enable the script 
> to do the right thing *without* having to know everything about all the 
> possible IMEs.

Right.  We need a primitive to support all without having to explicitly support 
each.

> Having said that, I was initially hoping that a mixture of composition events 
> plus IME API would cover a lot of ground already. Thinking about it some 
> more, it's not enough.
> 
> Can you help me come up with a list of aspects that need to be captured in 
> order to enable the app to render the right UI? Or do you have another 
> proposal?

The biggest difference between European alphabet substation (e.g. e -> é) and 
CJK input meth

Composition, IME, etc. (was: contentEditable=minimal)

2014-06-06 Thread Robin Berjon

On 05/06/2014 09:09 , Ryosuke Niwa wrote:

On May 23, 2014, at 1:37 PM, Robin Berjon  wrote:

Semantically, autocorrect and compositing really are the same
thing.


They are not.  Word substations and input method compositions are
semantically different operations.


Ok, I'll accept that depending on the level of abstraction at which 
you're looking at the problem they may or may not be the same thing.


The core of the problem is this: there is a wide array of situations in 
which some form of "indirect text input" (deliberately going for a new 
term with no baggage) takes place. This includes (but is not limited to):


  • dead key composition (Alt-N, N -> ñ)
  • assumed international composition (',e -> é, if you just want an 
apostrophe you have to compose ',space)

  • inline composition for pretty much everything
  • popup composition
  • autocorrect
  • speed-typing input (T9, swiping inputs)

In order to handle them you have two basic options:

  a) Let the browser handle them for you (possibly calling up some 
platform functionality). This works as closely to user expectations as a 
Web app can hope to get but how do you render it? If it touches your DOM 
then you lose the indirection you need for sensible editing; if it 
doesn't I don't know how you show it.


  b) Provide the app with enough information to do the right thing. 
This gives you the indirection, but "doing the right thing" can be 
pretty hard.


I am still leaning towards (b) being the approach to follow, but I'll 
admit that that's mostly because I can't see how to make (a) actually 
work. If (b) is the way, then we need to make sure that it's not so hard 
that everyone gets it wrong as soon as the input is anything other than 
basic English.



Note that if there is a degree of refinement such that we may want
to make it possible for authors to style compositing-for-characters
and compositing-for-autocorrect, then that ought to go into the
styling system.


In older versions of Windows, for example, the browser itself can't
figure out what kind of style is used by IME.  Korean and Japanese
IME on Windows, for example, use bolded lines and dotted lines for
opposite purposes.  And we get bug reports saying that WebKit's
rendering for Korean IME is incorrect because we decided to follow
Japanese IME's convention.


Right. In this case we need to distinguish between the browser not 
knowing and the Web app not knowing.


If the browser doesn't know because the platform can't tell the 
difference between Korean and Japanese (a problem with which Unicode 
doesn't help) then there really isn't much that we can do to help the 
Web app.


However if the browser knows, it can provide the app with information. I 
don't have enough expertise to know how much information it needs to 
convey — if it's mostly style that can be done (it might be unwieldy to 
handle but we can look at it).



We /could/ consider adding a field to compositing events that would
capture some form of ontology of input systems. But I think that's
sort of far-fetched and we can get by with the above. (And yes, I'm
using "ontology" on purpose. It wouldn't look good :)


In my opinion, it's a requirement that input methods work and look
native on editors that use this new API.  IME is not a nice-to-have
feature.  It's a feature required for billions of people to type any
text.


That is *exactly* my point. At this point I believe that if we just 
added something like a compositionType = deadkey | kr | jp | t9 | 
autocorrect | ... field and leave it at that we're not helping anyone. 
The script will need to know not just how to render all of these but how 
they are supposed to look on each platform. That's why I am arguing for 
primitives that enable the script to do the right thing *without* having 
to know everything about all the possible IMEs.


Having said that, I was initially hoping that a mixture of composition 
events plus IME API would cover a lot of ground already. Thinking about 
it some more, it's not enough.


Can you help me come up with a list of aspects that need to be captured 
in order to enable the app to render the right UI? Or do you have 
another proposal?


--
Robin Berjon - http://berjon.com/ - @robinberjon