Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
I feel a lot less embarrassed about not finding that bug number now that I know how long this thread has been running. :) Eric Shepherd Developer Documentation Lead Mozilla http://www.bitstampede.com/ On Dec 30, 2014, at 12:25 PM, L. David Baron dba...@dbaron.org wrote: From the message at the start of the thread (six months ago): https://bugzilla.mozilla.org/show_bug.cgi?id=664104 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
Is there a bug for the changes being discussed here, and is it marked with dev-doc-needed? Sounds like there will be, at a minimum, a few tweaks to the discussion about how this stuff works. Thanks! Eric Shepherd Developer Documentation Lead Mozilla http://www.bitstampede.com/ On Dec 27, 2014, at 8:50 PM, L. David Baron dba...@dbaron.org wrote: On Sunday 2014-12-28 03:04 +0900, Michael[tm] Smith wrote: So as long as the spec is going to require UAs to resort to magic behavior, I think the magic could instead just be autohide any ruby annotations for kana characters. And then you could just have simpler markup like this: ruby振り仮名rtふりがな/rt/ruby ...and UAs would display as expected -- with no annotation for the り. I don't see how UAs could determine which kana to eliminate. What if the markup were instead: ruby振り仮名rtふりりがな/rt/ruby (After all, many characters need more than one kana for their ruby.) How would the browser know whether to center ふり over 振, hide the second り, and center がな over 仮名, or whether to center ふ over 振, hide the first り, and center りがな over 仮名? The split between container and annotation is what gives the browser the information to do that separation correctly. -David -- 턞 L. David Baron http://dbaron.org/ 턂 턢 Mozilla https://www.mozilla.org/ 턂 Before I built a wall I'd ask to know What I was walling in or walling out, And to whom I was like to give offense. - Robert Frost, Mending Wall (1914) ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform smime.p7s Description: S/MIME cryptographic signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On Tuesday 2014-12-30 12:14 -0500, Eric Shepherd wrote: Is there a bug for the changes being discussed here, and is it marked with dev-doc-needed? Sounds like there will be, at a minimum, a few tweaks to the discussion about how this stuff works. From the message at the start of the thread (six months ago): https://bugzilla.mozilla.org/show_bug.cgi?id=664104 -David -- 턞 L. David Baron http://dbaron.org/ 턂 턢 Mozilla https://www.mozilla.org/ 턂 Before I built a wall I'd ask to know What I was walling in or walling out, And to whom I was like to give offense. - Robert Frost, Mending Wall (1914) signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
Xidorn Quan quanxunz...@gmail.com, 2014-12-27 10:12 +1100: On Sat, Dec 27, 2014 at 12:23 AM, Michael[tm] Smith m...@w3.org wrote: ... Xidorn Quan quanxunz...@gmail.com, 2014-12-26 04:41 -0800: ... The difference in expression ability becomes more important when there are words mixed with kanji and kana, such as 振り仮名. For this word, you won't even have the second option above, because I don't think people want to write something like ruby振り仮名rtふりがな/rt/ruby What would be the right way to mark that up with rb? In particular, what would be the right way if the authors wants to switch between the inline form and ruby? It would be rubyrb振rbりrb仮rb名rtふrtりrtがrtな/ruby The rt for り here could be individually hidden in ruby form by stylesheets. In fact, in CSS Ruby, we currently have autohide rule which automatically hide the the annotation when it is equal to the base. Thanks, from looking at the current CSS Ruby draft, I see you must mean this: http://drafts.csswg.org/css-ruby/#autohide And maybe I'm missing something but from that I see this autohide thing seems to be magic the UA does without exposing any means for Web content to cleanly override it -- neither through CSS nor script. (Future levels of CSS Ruby may add controls for auto-hiding, but in this level it is always forced.) If so, I think that kind of thing is something that a lot of web devs has said they'd rather browsers quit doing -- and that most new specs these days seem to try to avoid doing. But again, maybe I'm missing something. But anyway it makes me wonder why it's specced this way to begin with. Other than the case where a base is kana I don't know what other real-world case there might be where an annotation might be equal to its base. Further, I don't know of any typical case where if a base character is kana, why you'd ever want to display furigana/yomigana for it. So as long as the spec is going to require UAs to resort to magic behavior, I think the magic could instead just be autohide any ruby annotations for kana characters. And then you could just have simpler markup like this: ruby振り仮名rtふりがな/rt/ruby ...and UAs would display as expected -- with no annotation for the り. It doesn't seem like that magic would be any more difficult for UAs to implement and wouldn't be any worse than the hide the annotation when it is equal to the base magic the CSS Ruby spec currently requires UAs to do. So anyway, to get back to the Could you elaborate on why we are using the more complicated W3C rules here instead of the simpler WHATWG rules, given that the WHATWG rules also address the same use cases? question that Hixie had originally asked at that you responded to in your earlier message at https://lists.mozilla.org/pipermail/dev-platform/2014-December/008123.html ...from the above it seems the base-consisting-of-kanji-mixed-with-kana case may not be such a compelling case for illustrating the need for rb to be included in HTML. At least it's not as long as UAs are just doing magic autohiding without exposing any way for Web content to override it. --Mike -- Michael[tm] Smith https://people.w3.org/mike signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On 2014/12/28 3:04, Michael[tm] Smith wrote: Further, I don't know of any typical case where if a base character is kana, why you'd ever want to display furigana/yomigana for it. Ruby is not used only for furigana/yomigana. I know one example from a very popular Japanese novel: ruby赤眼の魔王rtルビーアイ/rt/ruby This is not the only example. I'm confident we could find many case from some Japanese novels. Probably your next word is It is not typical. or Statistics, please.. But unlike Xidorn Quan, I'm not interested in what WHATWG people are doing because I know they are not serious about ruby at all. Feel free to mess around with the ruby spec. So as long as the spec is going to require UAs to resort to magic behavior, I think the magic could instead just be autohide any ruby annotations for kana characters. How to determine what ruby annotation corresponds to what base character if the character count does not match? (Again, I'm not interested in your answer. I know whatever case the WHATWG spec cannot deal with is not typical.) -- vyv03...@nifty.ne.jp ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On Sun, Dec 28, 2014 at 5:04 AM, Michael[tm] Smith m...@w3.org wrote: Xidorn Quan quanxunz...@gmail.com, 2014-12-27 10:12 +1100: On Sat, Dec 27, 2014 at 12:23 AM, Michael[tm] Smith m...@w3.org wrote: ... Xidorn Quan quanxunz...@gmail.com, 2014-12-26 04:41 -0800: ... The difference in expression ability becomes more important when there are words mixed with kanji and kana, such as 振り仮名. For this word, you won't even have the second option above, because I don't think people want to write something like ruby振り仮名rtふりがな/rt/ruby What would be the right way to mark that up with rb? In particular, what would be the right way if the authors wants to switch between the inline form and ruby? It would be rubyrb振rbりrb仮rb名rtふrtりrtがrtな/ruby The rt for り here could be individually hidden in ruby form by stylesheets. In fact, in CSS Ruby, we currently have autohide rule which automatically hide the the annotation when it is equal to the base. Thanks, from looking at the current CSS Ruby draft, I see you must mean this: http://drafts.csswg.org/css-ruby/#autohide And maybe I'm missing something but from that I see this autohide thing seems to be magic the UA does without exposing any means for Web content to cleanly override it -- neither through CSS nor script. (Future levels of CSS Ruby may add controls for auto-hiding, but in this level it is always forced.) If so, I think that kind of thing is something that a lot of web devs has said they'd rather browsers quit doing -- and that most new specs these days seem to try to avoid doing. But again, maybe I'm missing something. But anyway it makes me wonder why it's specced this way to begin with. Other than the case where a base is kana I don't know what other real-world case there might be where an annotation might be equal to its base. Further, I don't know of any typical case where if a base character is kana, why you'd ever want to display furigana/yomigana for it. So as long as the spec is going to require UAs to resort to magic behavior, I think the magic could instead just be autohide any ruby annotations for kana characters. And then you could just have simpler markup like this: ruby振り仮名rtふりがな/rt/ruby ...and UAs would display as expected -- with no annotation for the り. It doesn't seem like that magic would be any more difficult for UAs to implement and wouldn't be any worse than the hide the annotation when it is equal to the base magic the CSS Ruby spec currently requires UAs to do. There are two problems if the UA wants to remove an individual kana in the annotation. The first is, how do you display this ruby after removing that character? There are three options for you: (1) ruby振り仮名rtふがな/rt/ruby (2) ruby振rtふ/rtりrt/rt仮名rtがな/rt/ruby (3) ruby振rtふ/rtりrt/rt仮rtが/rt名rtな/rt/ruby (1) is completely wrong. In the three options, (3) is the preferable way, but it is hard, if not impossible, for UAs to decide between (2) and (3). They can do so only if they have Japanese dictionary integrated. The second is, how do you know the り in the annotation matches the り in the base? In this case, it might seems to be obvious, but Japanese also has words like 言い訳 (いいわけ), 聞き手 (ききて). There is also more complex use case which uses inline form to mark a novel title, such as 電波女と青春男 (でんぱおんなとせいしゅんおとこ). In addition, IIRC, CSS operates more on box level, not character level, right? It would make UAs much harder to implement if a style affects individual characters. In conclusion, yes, I would admit that everything is also possible under the current WHATWG rules, with UAs knowing every magic in Japanese. But in this way, the advantage of the WHATWG rules, that they are simpler, is no longer true. The W3C rules are much simpler in handling these use cases. So anyway, to get back to the Could you elaborate on why we are using the more complicated W3C rules here instead of the simpler WHATWG rules, given that the WHATWG rules also address the same use cases? question that Hixie had originally asked at that you responded to in your earlier message at https://lists.mozilla.org/pipermail/dev-platform/2014-December/008123.html ...from the above it seems the base-consisting-of-kanji-mixed-with-kana case may not be such a compelling case for illustrating the need for rb to be included in HTML. At least it's not as long as UAs are just doing magic autohiding without exposing any way for Web content to override it. Although I don't think authors want in most cases, it is still possible for them to suppress this behavior by, for example, inserting a zero-width space in that annotation. I know it is a bit awkward, but it is possible, if one really wants to. But it would be impossible to do so in your model with WHATWG rules. Anyway, the other case still makes the W3C rule preferable. - Xidorn ___ dev-platform mailing
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On Sunday 2014-12-28 03:04 +0900, Michael[tm] Smith wrote: So as long as the spec is going to require UAs to resort to magic behavior, I think the magic could instead just be autohide any ruby annotations for kana characters. And then you could just have simpler markup like this: ruby振り仮名rtふりがな/rt/ruby ...and UAs would display as expected -- with no annotation for the り. I don't see how UAs could determine which kana to eliminate. What if the markup were instead: ruby振り仮名rtふりりがな/rt/ruby (After all, many characters need more than one kana for their ruby.) How would the browser know whether to center ふり over 振, hide the second り, and center がな over 仮名, or whether to center ふ over 振, hide the first り, and center りがな over 仮名? The split between container and annotation is what gives the browser the information to do that separation correctly. -David -- 턞 L. David Baron http://dbaron.org/ 턂 턢 Mozilla https://www.mozilla.org/ 턂 Before I built a wall I'd ask to know What I was walling in or walling out, And to whom I was like to give offense. - Robert Frost, Mending Wall (1914) signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On Tuesday, July 8, 2014 3:34:50 AM UTC+10, ian.h...@gmail.com wrote: On Tuesday, July 1, 2014 12:58:45 PM UTC-7, Koji Ishii wrote: Summary: Two recent HTML changes improve ruby support: 1) Addition of the rb and rtc elements (but not rbc); and 2) Matching update to the tag omission rules to make ruby authoring easier. By implementing these changes, Gecko supports the parsing side of all the ruby use cases required for the internationalization of HTML (see use cases document below for details). It also enables the implementation of the CSS Ruby Layout. The Japanese education market strongly requires this and a Mozilla developer has already started working on it. Could you elaborate on why we are using the more complicated W3C rules here instead of the simpler WHATWG rules, given that the WHATWG rules also address the same use cases? IMO, the main reason is that, the W3C rules provide more flexibility for authors to make the document more semantic and stylable. Please note that, the inline form is not limited to providing compatibility. You can see an example in JLREQ Fig. 3.9. It is a use case includes inline kana. If you want the word 明朝体 to be marked in ruby in separate form, with the WHATWG rules, you must write it as: ruby明rtみん/rt朝rtちょう/rt体rtたい/rt/ruby It is incompatible with the inline form, which means, if an author wants to switch between the inline form and ruby, there are only two options: 1. provide a different document for each form; 2. drop the separate form and use only the collapsed form for ruby. Neither of them perfectly matches the requirement. But with the W3C rules, it can be written as: rubyrb明rb朝rb体rp(rtみんrtちょうrtたいrp)/ruby which is obviously compatible with the inline form. The difference in expression ability becomes more important when there are words mixed with kanji and kana, such as 振り仮名. For this word, you won't even have the second option above, because I don't think people want to write something like ruby振り仮名rtふりがな/rt/ruby In conclusion, I think the WHATWG rules are not flexible enough for multi-pair rubies, which limits both the semantization and the stylability of documents. In other words, I don't think the two rule sets address the same use cases, especially in perspective of semantics. The W3C rules are much more powerful, though also more complicated, than the WHATWG rules. - Xidorn ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
Hi Xidorn, Xidorn Quan quanxunz...@gmail.com, 2014-12-26 04:41 -0800: ... If you want the word 明朝体 to be marked in ruby in separate form, with the WHATWG rules, you must write it as: ruby明rtみん/rt朝rtちょう/rt体rtたい/rt/ruby It is incompatible with the inline form, which means, if an author wants to switch between the inline form and ruby, there are only two options: 1. provide a different document for each form; 2. drop the separate form and use only the collapsed form for ruby. Neither of them perfectly matches the requirement. But with the W3C rules, it can be written as: rubyrb明rb朝rb体rp(rtみんrtちょうrtたいrp)/ruby which is obviously compatible with the inline form. The difference in expression ability becomes more important when there are words mixed with kanji and kana, such as 振り仮名. For this word, you won't even have the second option above, because I don't think people want to write something like ruby振り仮名rtふりがな/rt/ruby What would be the right way to mark that up with rb? In particular, what would be the right way if the authors wants to switch between the inline form and ruby? --Mike In conclusion, I think the WHATWG rules are not flexible enough for multi-pair rubies, which limits both the semantization and the stylability of documents. In other words, I don't think the two rule sets address the same use cases, especially in perspective of semantics. The W3C rules are much more powerful, though also more complicated, than the WHATWG rules. -- Michael[tm] Smith https://people.w3.org/mike signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On Sat, Dec 27, 2014 at 12:23 AM, Michael[tm] Smith m...@w3.org wrote: Hi Xidorn, Xidorn Quan quanxunz...@gmail.com, 2014-12-26 04:41 -0800: ... If you want the word 明朝体 to be marked in ruby in separate form, with the WHATWG rules, you must write it as: ruby明rtみん/rt朝rtちょう/rt体rtたい/rt/ruby It is incompatible with the inline form, which means, if an author wants to switch between the inline form and ruby, there are only two options: 1. provide a different document for each form; 2. drop the separate form and use only the collapsed form for ruby. Neither of them perfectly matches the requirement. But with the W3C rules, it can be written as: rubyrb明rb朝rb体rp(rtみんrtちょうrtたいrp)/ruby which is obviously compatible with the inline form. The difference in expression ability becomes more important when there are words mixed with kanji and kana, such as 振り仮名. For this word, you won't even have the second option above, because I don't think people want to write something like ruby振り仮名rtふりがな/rt/ruby What would be the right way to mark that up with rb? In particular, what would be the right way if the authors wants to switch between the inline form and ruby? It would be rubyrb振rbりrb仮rb名rtふrtりrtがrtな/ruby The rt for り here could be individually hidden in ruby form by stylesheets. In fact, in CSS Ruby, we currently have autohide rule which automatically hide the the annotation when it is equal to the base. - Xidorn ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On Tuesday, July 1, 2014 12:58:45 PM UTC-7, Koji Ishii wrote: Summary: Two recent HTML changes improve ruby support: 1) Addition of the rb and rtc elements (but not rbc); and 2) Matching update to the tag omission rules to make ruby authoring easier. By implementing these changes, Gecko supports the parsing side of all the ruby use cases required for the internationalization of HTML (see use cases document below for details). It also enables the implementation of the CSS Ruby Layout. The Japanese education market strongly requires this and a Mozilla developer has already started working on it. Could you elaborate on why we are using the more complicated W3C rules here instead of the simpler WHATWG rules, given that the WHATWG rules also address the same use cases? See: https://bugzilla.mozilla.org/show_bug.cgi?id=9#c110 -- Ian Hickson ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On Tue, Jul 1, 2014 at 10:58 PM, Koji Ishii kojii...@gluesoft.co.jp wrote: Platform coverage: all platforms (parsing only, layout will be in separate intents) The parsing change is the easy part. Is there a plan to get the layout part implemented? My general take on this issue is: 1) As far as assigning the time of core developers goes, it seems that there's always higher-priority stuff to work on instead of complex ruby layout. 2) If someone else has different priorities, really values complex ruby working and can develop an implementation that truly just takes normal review time from the core developers, I think it makes sense let someone other than the core developers to implement complex ruby. 3) I think the HTML parsing algorithm shouldn't be used as a way to block point 2 from happening. But is point 2 happening? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On 02/07/14 17:05, Henri Sivonen wrote: On Tue, Jul 1, 2014 at 10:58 PM, Koji Ishii kojii...@gluesoft.co.jp wrote: Platform coverage: all platforms (parsing only, layout will be in separate intents) The parsing change is the easy part. Is there a plan to get the layout part implemented? My general take on this issue is: 1) As far as assigning the time of core developers goes, it seems that there's always higher-priority stuff to work on instead of complex ruby layout. 2) If someone else has different priorities, really values complex ruby working and can develop an implementation that truly just takes normal review time from the core developers, I think it makes sense let someone other than the core developers to implement complex ruby. 3) I think the HTML parsing algorithm shouldn't be used as a way to block point 2 from happening. But is point 2 happening? Some work has begun on ruby layout by sgbowen in bug 1021952 recently. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement and ship: Improved ruby parsing in HTML with new tag omission rules
On Wednesday 2014-07-02 10:05 +0300, Henri Sivonen wrote: On Tue, Jul 1, 2014 at 10:58 PM, Koji Ishii kojii...@gluesoft.co.jp wrote: Platform coverage: all platforms (parsing only, layout will be in separate intents) The parsing change is the easy part. Is there a plan to get the layout part implemented? My general take on this issue is: 1) As far as assigning the time of core developers goes, it seems that there's always higher-priority stuff to work on instead of complex ruby layout. 2) If someone else has different priorities, really values complex ruby working and can develop an implementation that truly just takes normal review time from the core developers, I think it makes sense let someone other than the core developers to implement complex ruby. 3) I think the HTML parsing algorithm shouldn't be used as a way to block point 2 from happening. But is point 2 happening? We have a summer intern working on ruby this summer; I'm reasonably optimistic that she'll get much of css-ruby implemented, although maybe omitting some of the harder bits like 'ruby-position: inter-character' (which really isn't so much hard as different from the rest and therefore requiring separate code). See the 7-digit dependencies of https://bugzilla.mozilla.org/showdependencytree.cgi?id=256274maxdepth=1hide_resolved=0 -David -- 턞 L. David Baron http://dbaron.org/ 턂 턢 Mozilla https://www.mozilla.org/ 턂 Before I built a wall I'd ask to know What I was walling in or walling out, And to whom I was like to give offense. - Robert Frost, Mending Wall (1914) signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform