Re: Comments on internationalization API

2011-07-22 Thread Mark Davis ☕
You make some good points (and many that I agree with), but the main issue
is that we are having to produce a model that all the browser vendors can
sign up to. That necessitates some compromises, including some areas where
we can't have a concrete specification because the implementors want the
freedom to implement the functionality in different ways.

If you want to engage more, there is a F2F next week. Cira can get you
details.

Mark
*— Il meglio è l’inimico del bene —*


On Thu, Jul 21, 2011 at 17:14, Norbert Lindenberg 
ecmascr...@norbertlindenberg.com wrote:

 Hi Mark,

 Thanks for your comments! Replies to some of them below. I also noticed
 some additional issues:

 19. DateTimeFormat.prototype.getMonths needs a second parameter {boolean}
 standalone, default value false.

 20. There needs to be a way to determine the actual language, region, and
 options of a Collator, NumberFormat, or DateTimeFormat. E.g., if I request
 ar-MA-u-ca-islamic, did I get exactly what I requested, or
 ar-MA-u-ca-islamicc, ar-MA-u-ca-gregory, ar-u-ca-gregory, or yet something
 else?

 Best regards,
 Norbert


 On Jul 20, 2011, at 9:46 , Mark Davis ☕ wrote:

  I have comments on some of these.
 
  Mark
  — Il meglio è l’inimico del bene —
 
 
  On Tue, Jul 19, 2011 at 01:29, Norbert Lindenberg 
 ecmascr...@norbertlindenberg.com wrote:
  Hi all,
 
  I'm sorry for not having been able to contribute to the
 internationalization API earlier. I finally have reviewed the straw man [1],
 and am pleased to see that it contains a good subset of internationalization
 functionality to start with. Number and date formatting and collation are
 issues that most applications have to deal with. Collation especially, but
 also date formatting with support for multiple time zones and calendars are
 hard to implement as downloadable libraries.
 
  I have some comments on the details though:
 
  1. In the background section, it might be useful to add that with
 Node.js server-side JavaScript is seeing a rebound, and applications don't
 really want to have to call out to a non-JavaScript server in order to
 handle basic internationalization.
 
  2. In the goals section, I'd qualify the reuse of objects goal as a
 reuse of implementation data structures, or even better replace it with
 measurable performance goals. Reuse of objects that are visible to
 applications has security and privacy implications, especially when loading
 third party code (apps or ads) onto pages [2]. I'd recommend letting
 applications freely construct Collator, NumberFormat, and DateTimeFormat
 objects, but have these objects share implementation objects (such as ICU
 objects) as much as possible. If the API does return shared objects, the
 security issues need to be dealt with, e.g., by specifying that the shared
 objects are immutable.
 
  I think it is reasonable to rephrase this as implementation data
 structures.
 
  3. I'm very uncomfortable with the LocaleInfo class. It seems to pretend
 being the central source of all locale-related information, but can't live
 up to that claim because its design is limited to number and date formatting
 and collation. Developers will need to create other functionality such as
 text segmentation, spelling checking, message lookup, shoe size conversion,
 etc. LocaleInfo appears to perform some magic to derive regions, currencies,
 and possibly time zones, but doesn't specify it, and makes none of it
 available to other internationalization classes. It also does duty as a
 namespace, which looks odd in an EcmaScript standard that otherwise doesn't
 know namespaces.
 
  I don't think it is ideal; I share some of your qualms about it. However,
 it is what we were able to compromise on. Because the LocaleInfo class does
 do the resolution, and that information is available after creation, the
 information is available for other services. And people could (being ES)
 hang services off of their own LocaleInfo class.

 So is this the current recommendation?: A library that provides word break
 and line break functionality should be based on a class MyLocaleInfo, which
 provides WordBreak and LineBreak classes whose constructors clients should
 not call, and wordBreak and lineBreak functions that return objects of these
 classes. An application that uses multiple such libraries (providing
 different sets of internationalized functionality) has to create objects of
 all their LocaleInfo classes so that it can request objects of the classes
 that it actually needs.

 What value do these LocaleInfo classes add, compared to having constructors
 of the actually needed classes that can be called directly?

 Also, the LocaleInfo API, as currently documented, doesn't provide any
 information that a third party internationalization library could use. Some
 comments sound like there should be a property options, but this property
 and the derivation of its values aren't actually documented.

  Other internationalization libraries have a core 

Re: Comments on internationalization API

2011-07-21 Thread Norbert Lindenberg
Hi Mark,

Thanks for your comments! Replies to some of them below. I also noticed some 
additional issues:

19. DateTimeFormat.prototype.getMonths needs a second parameter {boolean} 
standalone, default value false.

20. There needs to be a way to determine the actual language, region, and 
options of a Collator, NumberFormat, or DateTimeFormat. E.g., if I request 
ar-MA-u-ca-islamic, did I get exactly what I requested, or ar-MA-u-ca-islamicc, 
ar-MA-u-ca-gregory, ar-u-ca-gregory, or yet something else?

Best regards,
Norbert


On Jul 20, 2011, at 9:46 , Mark Davis ☕ wrote:

 I have comments on some of these.
 
 Mark
 — Il meglio è l’inimico del bene —
 
 
 On Tue, Jul 19, 2011 at 01:29, Norbert Lindenberg 
 ecmascr...@norbertlindenberg.com wrote:
 Hi all,
 
 I'm sorry for not having been able to contribute to the internationalization 
 API earlier. I finally have reviewed the straw man [1], and am pleased to 
 see that it contains a good subset of internationalization functionality to 
 start with. Number and date formatting and collation are issues that most 
 applications have to deal with. Collation especially, but also date 
 formatting with support for multiple time zones and calendars are hard to 
 implement as downloadable libraries.
 
 I have some comments on the details though:
 
 1. In the background section, it might be useful to add that with Node.js 
 server-side JavaScript is seeing a rebound, and applications don't really 
 want to have to call out to a non-JavaScript server in order to handle basic 
 internationalization.
 
 2. In the goals section, I'd qualify the reuse of objects goal as a reuse 
 of implementation data structures, or even better replace it with measurable 
 performance goals. Reuse of objects that are visible to applications has 
 security and privacy implications, especially when loading third party code 
 (apps or ads) onto pages [2]. I'd recommend letting applications freely 
 construct Collator, NumberFormat, and DateTimeFormat objects, but have these 
 objects share implementation objects (such as ICU objects) as much as 
 possible. If the API does return shared objects, the security issues need to 
 be dealt with, e.g., by specifying that the shared objects are immutable.
 
 I think it is reasonable to rephrase this as implementation data structures.
 
 3. I'm very uncomfortable with the LocaleInfo class. It seems to pretend 
 being the central source of all locale-related information, but can't live 
 up to that claim because its design is limited to number and date formatting 
 and collation. Developers will need to create other functionality such as 
 text segmentation, spelling checking, message lookup, shoe size conversion, 
 etc. LocaleInfo appears to perform some magic to derive regions, currencies, 
 and possibly time zones, but doesn't specify it, and makes none of it 
 available to other internationalization classes. It also does duty as a 
 namespace, which looks odd in an EcmaScript standard that otherwise doesn't 
 know namespaces.
 
 I don't think it is ideal; I share some of your qualms about it. However, it 
 is what we were able to compromise on. Because the LocaleInfo class does do 
 the resolution, and that information is available after creation, the 
 information is available for other services. And people could (being ES) hang 
 services off of their own LocaleInfo class.

So is this the current recommendation?: A library that provides word break and 
line break functionality should be based on a class MyLocaleInfo, which 
provides WordBreak and LineBreak classes whose constructors clients should not 
call, and wordBreak and lineBreak functions that return objects of these 
classes. An application that uses multiple such libraries (providing different 
sets of internationalized functionality) has to create objects of all their 
LocaleInfo classes so that it can request objects of the classes that it 
actually needs.

What value do these LocaleInfo classes add, compared to having constructors of 
the actually needed classes that can be called directly?

Also, the LocaleInfo API, as currently documented, doesn't provide any 
information that a third party internationalization library could use. Some 
comments sound like there should be a property options, but this property and 
the derivation of its values aren't actually documented.

 Other internationalization libraries have a core that anybody can build on 
 to create internationalization functionality. In Java, for example, the 
 Locale and Currency classes handles a variety of identifier mappings, while 
 the ResourceBundle class handles loading of localized data with fallbacks 
 [3]. In the Yahoo User Interface library, the Intl module does language 
 negotiation and collaborates with the YUI loader in loading localized data 
 [4]. I'd suggest separating similar functionality in LocaleInfo from the 
 formatting and collation functionality and making it available to all. I 
 suspect 

Re: Comments on internationalization API

2011-07-20 Thread Mark Davis ☕
I have comments on some of these.

Mark
*— Il meglio è l’inimico del bene —*


On Tue, Jul 19, 2011 at 01:29, Norbert Lindenberg 
ecmascr...@norbertlindenberg.com wrote:

 Hi all,

 I'm sorry for not having been able to contribute to the
 internationalization API earlier. I finally have reviewed the straw man [1],
 and am pleased to see that it contains a good subset of internationalization
 functionality to start with. Number and date formatting and collation are
 issues that most applications have to deal with. Collation especially, but
 also date formatting with support for multiple time zones and calendars are
 hard to implement as downloadable libraries.

 I have some comments on the details though:

 1. In the background section, it might be useful to add that with Node.js
 server-side JavaScript is seeing a rebound, and applications don't really
 want to have to call out to a non-JavaScript server in order to handle basic
 internationalization.

 2. In the goals section, I'd qualify the reuse of objects goal as a reuse
 of implementation data structures, or even better replace it with measurable
 performance goals. Reuse of objects that are visible to applications has
 security and privacy implications, especially when loading third party code
 (apps or ads) onto pages [2]. I'd recommend letting applications freely
 construct Collator, NumberFormat, and DateTimeFormat objects, but have these
 objects share implementation objects (such as ICU objects) as much as
 possible. If the API does return shared objects, the security issues need to
 be dealt with, e.g., by specifying that the shared objects are immutable.


I think it is reasonable to rephrase this as implementation data
structures.


 3. I'm very uncomfortable with the LocaleInfo class. It seems to pretend
 being the central source of all locale-related information, but can't live
 up to that claim because its design is limited to number and date formatting
 and collation. Developers will need to create other functionality such as
 text segmentation, spelling checking, message lookup, shoe size conversion,
 etc. LocaleInfo appears to perform some magic to derive regions, currencies,
 and possibly time zones, but doesn't specify it, and makes none of it
 available to other internationalization classes. It also does duty as a
 namespace, which looks odd in an EcmaScript standard that otherwise doesn't
 know namespaces.


I don't think it is ideal; I share some of your qualms about it. However, it
is what we were able to compromise on. Because the LocaleInfo class does do
the resolution, and that information is available after creation, the
information is available for other services. And people could (being ES)
hang services off of their own LocaleInfo class.



 Other internationalization libraries have a core that anybody can build on
 to create internationalization functionality. In Java, for example, the
 Locale and Currency classes handles a variety of identifier mappings, while
 the ResourceBundle class handles loading of localized data with fallbacks
 [3]. In the Yahoo User Interface library, the Intl module does language
 negotiation and collaborates with the YUI loader in loading localized data
 [4]. I'd suggest separating similar functionality in LocaleInfo from the
 formatting and collation functionality and making it available to all. I
 suspect though that some of the current magic will turn out to be misguided
 when looked at in the clear light of a specification and will need to be
 discarded.

 4. Language IDs in the library should be those of BCP 47, not of Unicode
 LDML. The two are similar, but there are subtle differences, as described in
 the LDML spec: LDML excludes some BCP 47 tags and subtags, adds a separator
 and the root locale, and changes the semantics of some tags [5]. Since BCP
 47 is the dominant standard for language identification, internationalized
 applications have to support it. If an implementation of the
 internationalization API is based on LDML, it should handle the mapping
 from/to BCP 47 itself rather than burdening applications with it.


Every LDML language ID is also a BCP 47 language tag. LDML eliminates some
of the deadwood in BCP47 (the old irregular forms) but has the same
expressive power and somewhat more. There are some codes that are not
defined in BCP47 that turn out to be very important for implementations,
like the Unknown region.

I'm well familiar with both, being an author of each.


 5. The specification mentions that a few Unicode extensions in BCP 47
 (-u-ca-, -u-co-, can be used for specific purposes, but is silent on whether
 other extension are encouraged/allowed/ignored/illegal. This should be
 clarified.


Agreed. What it should add is one line saying that the implementation of any
other BCP47 extensions are implementation dependent.



 6. Region IDs should be those of ISO 3166. The straw man references LDML
 region subtags instead; I haven't been able to find a definition 

Comments on internationalization API

2011-07-19 Thread Norbert Lindenberg
Hi all,

I'm sorry for not having been able to contribute to the internationalization 
API earlier. I finally have reviewed the straw man [1], and am pleased to see 
that it contains a good subset of internationalization functionality to start 
with. Number and date formatting and collation are issues that most 
applications have to deal with. Collation especially, but also date formatting 
with support for multiple time zones and calendars are hard to implement as 
downloadable libraries.

I have some comments on the details though:

1. In the background section, it might be useful to add that with Node.js 
server-side JavaScript is seeing a rebound, and applications don't really want 
to have to call out to a non-JavaScript server in order to handle basic 
internationalization.

2. In the goals section, I'd qualify the reuse of objects goal as a reuse of 
implementation data structures, or even better replace it with measurable 
performance goals. Reuse of objects that are visible to applications has 
security and privacy implications, especially when loading third party code 
(apps or ads) onto pages [2]. I'd recommend letting applications freely 
construct Collator, NumberFormat, and DateTimeFormat objects, but have these 
objects share implementation objects (such as ICU objects) as much as possible. 
If the API does return shared objects, the security issues need to be dealt 
with, e.g., by specifying that the shared objects are immutable.

3. I'm very uncomfortable with the LocaleInfo class. It seems to pretend being 
the central source of all locale-related information, but can't live up to that 
claim because its design is limited to number and date formatting and 
collation. Developers will need to create other functionality such as text 
segmentation, spelling checking, message lookup, shoe size conversion, etc. 
LocaleInfo appears to perform some magic to derive regions, currencies, and 
possibly time zones, but doesn't specify it, and makes none of it available to 
other internationalization classes. It also does duty as a namespace, which 
looks odd in an EcmaScript standard that otherwise doesn't know namespaces.

Other internationalization libraries have a core that anybody can build on to 
create internationalization functionality. In Java, for example, the Locale and 
Currency classes handles a variety of identifier mappings, while the 
ResourceBundle class handles loading of localized data with fallbacks [3]. In 
the Yahoo User Interface library, the Intl module does language negotiation and 
collaborates with the YUI loader in loading localized data [4]. I'd suggest 
separating similar functionality in LocaleInfo from the formatting and 
collation functionality and making it available to all. I suspect though that 
some of the current magic will turn out to be misguided when looked at in the 
clear light of a specification and will need to be discarded.

4. Language IDs in the library should be those of BCP 47, not of Unicode LDML. 
The two are similar, but there are subtle differences, as described in the LDML 
spec: LDML excludes some BCP 47 tags and subtags, adds a separator and the root 
locale, and changes the semantics of some tags [5]. Since BCP 47 is the 
dominant standard for language identification, internationalized applications 
have to support it. If an implementation of the internationalization API is 
based on LDML, it should handle the mapping from/to BCP 47 itself rather than 
burdening applications with it.

5. The specification mentions that a few Unicode extensions in BCP 47 (-u-ca-, 
-u-co-, can be used for specific purposes, but is silent on whether other 
extension are encouraged/allowed/ignored/illegal. This should be clarified.

6. Region IDs should be those of ISO 3166. The straw man references LDML 
region subtags instead; I haven't been able to find a definition of this term. 
If ZZ is really necessary for the API, then it should be called out directly 
in the API spec. But what information does ZZ convey that EcmaScript's 
undefined doesn't?

7. The priority list matching algorithm is not well specified. It doesn't seem 
to match the BCP 47 Lookup algorithm however [6], and I'd expect that algorithm 
to be available at least as a baseline (enhancements might be offered as well).

8. The specifications of NumberFormat and DateTimeFormat list several optional 
features: Support for scientific notation in NumberFormat; support for various 
styles and skeletons in DateTimeFormat. How can applications find out which of 
these optional features are supported by an actual implementation?

9. Currency formatting should require applications to explicitly specify the 
currency, using an ISO 4217 currency code, when constructing a currency number 
format. Currencies are really part of the value; they're not a presentation 
preference. Imagine a European e-commerce site calculating its prices in euro, 
but then displaying the values with the Korean won symbol just