Re: Comments on internationalization API
You make some good points (and many that I agree with), but the main issue is that we are having to produce a model that all the browser vendors can sign up to. That necessitates some compromises, including some areas where we can't have a concrete specification because the implementors want the freedom to implement the functionality in different ways. If you want to engage more, there is a F2F next week. Cira can get you details. Mark *— Il meglio è l’inimico del bene —* On Thu, Jul 21, 2011 at 17:14, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: Hi Mark, Thanks for your comments! Replies to some of them below. I also noticed some additional issues: 19. DateTimeFormat.prototype.getMonths needs a second parameter {boolean} standalone, default value false. 20. There needs to be a way to determine the actual language, region, and options of a Collator, NumberFormat, or DateTimeFormat. E.g., if I request ar-MA-u-ca-islamic, did I get exactly what I requested, or ar-MA-u-ca-islamicc, ar-MA-u-ca-gregory, ar-u-ca-gregory, or yet something else? Best regards, Norbert On Jul 20, 2011, at 9:46 , Mark Davis ☕ wrote: I have comments on some of these. Mark — Il meglio è l’inimico del bene — On Tue, Jul 19, 2011 at 01:29, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: Hi all, I'm sorry for not having been able to contribute to the internationalization API earlier. I finally have reviewed the straw man [1], and am pleased to see that it contains a good subset of internationalization functionality to start with. Number and date formatting and collation are issues that most applications have to deal with. Collation especially, but also date formatting with support for multiple time zones and calendars are hard to implement as downloadable libraries. I have some comments on the details though: 1. In the background section, it might be useful to add that with Node.js server-side JavaScript is seeing a rebound, and applications don't really want to have to call out to a non-JavaScript server in order to handle basic internationalization. 2. In the goals section, I'd qualify the reuse of objects goal as a reuse of implementation data structures, or even better replace it with measurable performance goals. Reuse of objects that are visible to applications has security and privacy implications, especially when loading third party code (apps or ads) onto pages [2]. I'd recommend letting applications freely construct Collator, NumberFormat, and DateTimeFormat objects, but have these objects share implementation objects (such as ICU objects) as much as possible. If the API does return shared objects, the security issues need to be dealt with, e.g., by specifying that the shared objects are immutable. I think it is reasonable to rephrase this as implementation data structures. 3. I'm very uncomfortable with the LocaleInfo class. It seems to pretend being the central source of all locale-related information, but can't live up to that claim because its design is limited to number and date formatting and collation. Developers will need to create other functionality such as text segmentation, spelling checking, message lookup, shoe size conversion, etc. LocaleInfo appears to perform some magic to derive regions, currencies, and possibly time zones, but doesn't specify it, and makes none of it available to other internationalization classes. It also does duty as a namespace, which looks odd in an EcmaScript standard that otherwise doesn't know namespaces. I don't think it is ideal; I share some of your qualms about it. However, it is what we were able to compromise on. Because the LocaleInfo class does do the resolution, and that information is available after creation, the information is available for other services. And people could (being ES) hang services off of their own LocaleInfo class. So is this the current recommendation?: A library that provides word break and line break functionality should be based on a class MyLocaleInfo, which provides WordBreak and LineBreak classes whose constructors clients should not call, and wordBreak and lineBreak functions that return objects of these classes. An application that uses multiple such libraries (providing different sets of internationalized functionality) has to create objects of all their LocaleInfo classes so that it can request objects of the classes that it actually needs. What value do these LocaleInfo classes add, compared to having constructors of the actually needed classes that can be called directly? Also, the LocaleInfo API, as currently documented, doesn't provide any information that a third party internationalization library could use. Some comments sound like there should be a property options, but this property and the derivation of its values aren't actually documented. Other internationalization libraries have a core
Re: Comments on internationalization API
Hi Mark, Thanks for your comments! Replies to some of them below. I also noticed some additional issues: 19. DateTimeFormat.prototype.getMonths needs a second parameter {boolean} standalone, default value false. 20. There needs to be a way to determine the actual language, region, and options of a Collator, NumberFormat, or DateTimeFormat. E.g., if I request ar-MA-u-ca-islamic, did I get exactly what I requested, or ar-MA-u-ca-islamicc, ar-MA-u-ca-gregory, ar-u-ca-gregory, or yet something else? Best regards, Norbert On Jul 20, 2011, at 9:46 , Mark Davis ☕ wrote: I have comments on some of these. Mark — Il meglio è l’inimico del bene — On Tue, Jul 19, 2011 at 01:29, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: Hi all, I'm sorry for not having been able to contribute to the internationalization API earlier. I finally have reviewed the straw man [1], and am pleased to see that it contains a good subset of internationalization functionality to start with. Number and date formatting and collation are issues that most applications have to deal with. Collation especially, but also date formatting with support for multiple time zones and calendars are hard to implement as downloadable libraries. I have some comments on the details though: 1. In the background section, it might be useful to add that with Node.js server-side JavaScript is seeing a rebound, and applications don't really want to have to call out to a non-JavaScript server in order to handle basic internationalization. 2. In the goals section, I'd qualify the reuse of objects goal as a reuse of implementation data structures, or even better replace it with measurable performance goals. Reuse of objects that are visible to applications has security and privacy implications, especially when loading third party code (apps or ads) onto pages [2]. I'd recommend letting applications freely construct Collator, NumberFormat, and DateTimeFormat objects, but have these objects share implementation objects (such as ICU objects) as much as possible. If the API does return shared objects, the security issues need to be dealt with, e.g., by specifying that the shared objects are immutable. I think it is reasonable to rephrase this as implementation data structures. 3. I'm very uncomfortable with the LocaleInfo class. It seems to pretend being the central source of all locale-related information, but can't live up to that claim because its design is limited to number and date formatting and collation. Developers will need to create other functionality such as text segmentation, spelling checking, message lookup, shoe size conversion, etc. LocaleInfo appears to perform some magic to derive regions, currencies, and possibly time zones, but doesn't specify it, and makes none of it available to other internationalization classes. It also does duty as a namespace, which looks odd in an EcmaScript standard that otherwise doesn't know namespaces. I don't think it is ideal; I share some of your qualms about it. However, it is what we were able to compromise on. Because the LocaleInfo class does do the resolution, and that information is available after creation, the information is available for other services. And people could (being ES) hang services off of their own LocaleInfo class. So is this the current recommendation?: A library that provides word break and line break functionality should be based on a class MyLocaleInfo, which provides WordBreak and LineBreak classes whose constructors clients should not call, and wordBreak and lineBreak functions that return objects of these classes. An application that uses multiple such libraries (providing different sets of internationalized functionality) has to create objects of all their LocaleInfo classes so that it can request objects of the classes that it actually needs. What value do these LocaleInfo classes add, compared to having constructors of the actually needed classes that can be called directly? Also, the LocaleInfo API, as currently documented, doesn't provide any information that a third party internationalization library could use. Some comments sound like there should be a property options, but this property and the derivation of its values aren't actually documented. Other internationalization libraries have a core that anybody can build on to create internationalization functionality. In Java, for example, the Locale and Currency classes handles a variety of identifier mappings, while the ResourceBundle class handles loading of localized data with fallbacks [3]. In the Yahoo User Interface library, the Intl module does language negotiation and collaborates with the YUI loader in loading localized data [4]. I'd suggest separating similar functionality in LocaleInfo from the formatting and collation functionality and making it available to all. I suspect
Re: Comments on internationalization API
I have comments on some of these. Mark *— Il meglio è l’inimico del bene —* On Tue, Jul 19, 2011 at 01:29, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: Hi all, I'm sorry for not having been able to contribute to the internationalization API earlier. I finally have reviewed the straw man [1], and am pleased to see that it contains a good subset of internationalization functionality to start with. Number and date formatting and collation are issues that most applications have to deal with. Collation especially, but also date formatting with support for multiple time zones and calendars are hard to implement as downloadable libraries. I have some comments on the details though: 1. In the background section, it might be useful to add that with Node.js server-side JavaScript is seeing a rebound, and applications don't really want to have to call out to a non-JavaScript server in order to handle basic internationalization. 2. In the goals section, I'd qualify the reuse of objects goal as a reuse of implementation data structures, or even better replace it with measurable performance goals. Reuse of objects that are visible to applications has security and privacy implications, especially when loading third party code (apps or ads) onto pages [2]. I'd recommend letting applications freely construct Collator, NumberFormat, and DateTimeFormat objects, but have these objects share implementation objects (such as ICU objects) as much as possible. If the API does return shared objects, the security issues need to be dealt with, e.g., by specifying that the shared objects are immutable. I think it is reasonable to rephrase this as implementation data structures. 3. I'm very uncomfortable with the LocaleInfo class. It seems to pretend being the central source of all locale-related information, but can't live up to that claim because its design is limited to number and date formatting and collation. Developers will need to create other functionality such as text segmentation, spelling checking, message lookup, shoe size conversion, etc. LocaleInfo appears to perform some magic to derive regions, currencies, and possibly time zones, but doesn't specify it, and makes none of it available to other internationalization classes. It also does duty as a namespace, which looks odd in an EcmaScript standard that otherwise doesn't know namespaces. I don't think it is ideal; I share some of your qualms about it. However, it is what we were able to compromise on. Because the LocaleInfo class does do the resolution, and that information is available after creation, the information is available for other services. And people could (being ES) hang services off of their own LocaleInfo class. Other internationalization libraries have a core that anybody can build on to create internationalization functionality. In Java, for example, the Locale and Currency classes handles a variety of identifier mappings, while the ResourceBundle class handles loading of localized data with fallbacks [3]. In the Yahoo User Interface library, the Intl module does language negotiation and collaborates with the YUI loader in loading localized data [4]. I'd suggest separating similar functionality in LocaleInfo from the formatting and collation functionality and making it available to all. I suspect though that some of the current magic will turn out to be misguided when looked at in the clear light of a specification and will need to be discarded. 4. Language IDs in the library should be those of BCP 47, not of Unicode LDML. The two are similar, but there are subtle differences, as described in the LDML spec: LDML excludes some BCP 47 tags and subtags, adds a separator and the root locale, and changes the semantics of some tags [5]. Since BCP 47 is the dominant standard for language identification, internationalized applications have to support it. If an implementation of the internationalization API is based on LDML, it should handle the mapping from/to BCP 47 itself rather than burdening applications with it. Every LDML language ID is also a BCP 47 language tag. LDML eliminates some of the deadwood in BCP47 (the old irregular forms) but has the same expressive power and somewhat more. There are some codes that are not defined in BCP47 that turn out to be very important for implementations, like the Unknown region. I'm well familiar with both, being an author of each. 5. The specification mentions that a few Unicode extensions in BCP 47 (-u-ca-, -u-co-, can be used for specific purposes, but is silent on whether other extension are encouraged/allowed/ignored/illegal. This should be clarified. Agreed. What it should add is one line saying that the implementation of any other BCP47 extensions are implementation dependent. 6. Region IDs should be those of ISO 3166. The straw man references LDML region subtags instead; I haven't been able to find a definition
Comments on internationalization API
Hi all, I'm sorry for not having been able to contribute to the internationalization API earlier. I finally have reviewed the straw man [1], and am pleased to see that it contains a good subset of internationalization functionality to start with. Number and date formatting and collation are issues that most applications have to deal with. Collation especially, but also date formatting with support for multiple time zones and calendars are hard to implement as downloadable libraries. I have some comments on the details though: 1. In the background section, it might be useful to add that with Node.js server-side JavaScript is seeing a rebound, and applications don't really want to have to call out to a non-JavaScript server in order to handle basic internationalization. 2. In the goals section, I'd qualify the reuse of objects goal as a reuse of implementation data structures, or even better replace it with measurable performance goals. Reuse of objects that are visible to applications has security and privacy implications, especially when loading third party code (apps or ads) onto pages [2]. I'd recommend letting applications freely construct Collator, NumberFormat, and DateTimeFormat objects, but have these objects share implementation objects (such as ICU objects) as much as possible. If the API does return shared objects, the security issues need to be dealt with, e.g., by specifying that the shared objects are immutable. 3. I'm very uncomfortable with the LocaleInfo class. It seems to pretend being the central source of all locale-related information, but can't live up to that claim because its design is limited to number and date formatting and collation. Developers will need to create other functionality such as text segmentation, spelling checking, message lookup, shoe size conversion, etc. LocaleInfo appears to perform some magic to derive regions, currencies, and possibly time zones, but doesn't specify it, and makes none of it available to other internationalization classes. It also does duty as a namespace, which looks odd in an EcmaScript standard that otherwise doesn't know namespaces. Other internationalization libraries have a core that anybody can build on to create internationalization functionality. In Java, for example, the Locale and Currency classes handles a variety of identifier mappings, while the ResourceBundle class handles loading of localized data with fallbacks [3]. In the Yahoo User Interface library, the Intl module does language negotiation and collaborates with the YUI loader in loading localized data [4]. I'd suggest separating similar functionality in LocaleInfo from the formatting and collation functionality and making it available to all. I suspect though that some of the current magic will turn out to be misguided when looked at in the clear light of a specification and will need to be discarded. 4. Language IDs in the library should be those of BCP 47, not of Unicode LDML. The two are similar, but there are subtle differences, as described in the LDML spec: LDML excludes some BCP 47 tags and subtags, adds a separator and the root locale, and changes the semantics of some tags [5]. Since BCP 47 is the dominant standard for language identification, internationalized applications have to support it. If an implementation of the internationalization API is based on LDML, it should handle the mapping from/to BCP 47 itself rather than burdening applications with it. 5. The specification mentions that a few Unicode extensions in BCP 47 (-u-ca-, -u-co-, can be used for specific purposes, but is silent on whether other extension are encouraged/allowed/ignored/illegal. This should be clarified. 6. Region IDs should be those of ISO 3166. The straw man references LDML region subtags instead; I haven't been able to find a definition of this term. If ZZ is really necessary for the API, then it should be called out directly in the API spec. But what information does ZZ convey that EcmaScript's undefined doesn't? 7. The priority list matching algorithm is not well specified. It doesn't seem to match the BCP 47 Lookup algorithm however [6], and I'd expect that algorithm to be available at least as a baseline (enhancements might be offered as well). 8. The specifications of NumberFormat and DateTimeFormat list several optional features: Support for scientific notation in NumberFormat; support for various styles and skeletons in DateTimeFormat. How can applications find out which of these optional features are supported by an actual implementation? 9. Currency formatting should require applications to explicitly specify the currency, using an ISO 4217 currency code, when constructing a currency number format. Currencies are really part of the value; they're not a presentation preference. Imagine a European e-commerce site calculating its prices in euro, but then displaying the values with the Korean won symbol just