RE: [IndexedDB] Spec changes for international language support

2011-03-22 Thread Pablo Castro

From: keean.schu...@googlemail.com [mailto:keean.schu...@googlemail.com] On 
Behalf Of Keean Schupke
Sent: Friday, March 18, 2011 8:17 PM

 On 18 March 2011 19:29, Pablo Castro pablo.cas...@microsoft.com wrote:

 From: keean.schu...@googlemail.com [mailto:keean.schu...@googlemail.com] On 
 Behalf Of Keean Schupke
 Sent: Friday, March 18, 2011 1:53 AM

  See my proposal in another thread. The basic idea is to copy BDB. Have a 
  primary index that is based on an integer, something primitive and fast. 
  Allow secondary indexes which use a callback to generate a binary index 
  key. IDB shifts the complexity out into a library. Common use cases can 
  be provided (a hash of all fields in the object, internationalised 
  bidirectional lexicographic etc...), but the user is free to write their 
  own for less usual cases (for example indexing by the last word in a name 
  string to order by surname).
I agree with Jeremy's comments on the other thread for this. Having the 
callback mechanism definitely sounds interesting but there are a ton of common 
cases that we can solve by just taking a language identifier, I'm not sure we 
want to make people work hard to get something that's already supported in most 
systems. The idea of having a callback to compute the index value feels 
incremental to this, so we could take on it later on without disrupting the 
explicit international collation stuff.

 The idea would be to provide pre-defined implementations of the callback for 
 common use cases, then it is just as simple to register a callback as set 
 any other option. All this means to the API is you pass a function instead 
 of a string. It also is better for modularity as all the code relating to 
 the sort order is kept in the callback functions.

 The difference comes down to something like:

 index.set_order_lexicographic('us');

 vs

 index.set_order_method(order_lexicographic('us'));

 So more than just setting a property like the first case, where presumably 
 all the ordering code is mixed in with the indexing code, the second case 
 encapsulates all the ordering code in the function returned from the 
 execution of order_lexicographic('us'). This function would represent a 
 mapping from the object being indexed to a binary blob that is the actual 
 stored index data.

 So doing it this was does not necessarily make things harder, and it 
 improves encapsulation, the type-safety, and the flexibility of the API.

Yep, we talked about supporting callbacks already in the other threads and in 
this one. As I mentioned before, I think this is an incremental to the basic 
feature of taking a collation name. I do realize you can just pass a 
pre-implemented function, but that opens the door to a bunch of things we'd 
need to handle, including storing possibly storing code in the database (such 
that proper updates don't depend on each page re-registering all the index 
callbacks), handling scripts with the appropriate context to run during index 
updates, etc.  I would much rather have basic functionality in place and then 
expand as needed once we have users using the API.

Thanks
-pablo




Re: [IndexedDB] Spec changes for international language support

2011-03-22 Thread Jonas Sicking
On Tue, Mar 22, 2011 at 6:13 PM, Pablo Castro
pablo.cas...@microsoft.com wrote:


 From: keean.schu...@googlemail.com [mailto:keean.schu...@googlemail.com] On 
 Behalf Of Keean Schupke
 Sent: Tuesday, March 22, 2011 5:34 PM

 IMHO not the job of Idb to store the callbacks, so I don't see this 
 complexity as a reason not to implement the API using callbacks. I think 
 having one consistent API is more important.
 Specifying the collation 'name' has all the same problems as callbacks 
 (needs to be re-done on every page, possibility of using different 
 collations on different pages).
 Really a 'function' is just a symbol for a collation. A function name, is a 
 better symbol for a collation than a string. Function's have a uniqueness 
 property strings do not. So specifying a function as the  collations 
 instead of a string really is the same thing. Consider below:

 I don't think it's the same. If we don't store the callbacks in the database 
 it means every page has to have full knowledge of the database schema (at 
 least all the indexes) all the time, instead of just pulling that in on 
 demand when needed. It also means we can never allow browser developer tools 
 or generic dev-tool-webpages to modify the database because indexes would 
 become invalid (not sure allowing tools to mess with the database in general 
 is a good idea, but I thought it illustrated the point well).

 I wonder if the overall issue we're discussing has to do with how embedded 
 the database is. In BDB scenarios where the database is completely invisible 
 outside of an application many of these decisions make more sense. I don't 
 think of web applications that way. I think of them more as a number of 
 building blocks (pages, pieces within pages, tool pages added on the side) 
 that are authored and sometimes even versioned independently, and the 
 interface between those building blocks and the store is public and visible 
 to tools and generic data browsers. All that changes the assumptions in the 
 overall picture.

Yup. I Agree with Pablo here.

/ Jonas



Re: [IndexedDB] Spec changes for international language support

2011-03-18 Thread Keean Schupke
See my proposal in another thread. The basic idea is to copy BDB. Have a
primary index that is based on an integer, something primitive and fast.
Allow secondary indexes which use a callback to generate a binary index key.
IDB shifts the complexity out into a library. Common use cases can be
provided (a hash of all fields in the object, internationalised
bidirectional lexicographic etc...), but the user is free to write their own
for less usual cases (for example indexing by the last word in a name string
to order by surname).


Cheers,
Keean.


On 18 March 2011 02:19, Jonas Sicking jo...@sicking.cc wrote:

 2011/3/17 Pablo Castro pablo.cas...@microsoft.com:
 
  From: Jonas Sicking [mailto:jo...@sicking.cc]
  Sent: Tuesday, March 08, 2011 1:11 PM
 
  All in all, is there anything preventing adding the API Pablo suggests
  in this thread to the IndexedDB spec drafts?
 
  I wanted to propose a couple of specific tweaks to the initial proposal
 and then unless I hear pushback start editing this into the spec.
 
  From reading the details on this thread I'm starting to realize that
 per-database collations won't do it. What did it for me was the example that
 has a fuzzier matching mode (case/accent insensitive). This is exactly the
 kind of index I would want to sort people's names in my address book, but
 most likely not the index I'll want to use for my primary key.
 
  Refactoring the API to accommodate for this would mean to move the
 setCollation() method and the collation property to the object store and
 index objects. If we were willing to live without the ability to change them
 we could take collation as one of the optional parameters to
 createObjectStore()/createIndex() and reduce a bit of surface area...

 Unfortunately I think you bring up good use cases for
 per-objectStore/index collations. It's definitely tempting to just add
 it as a optional parameter to createObjectStore/createIndex. The
 downside is obviously pushing more complexity onto web developers.
 Complexity which will be duplicated across sites.

 However there is another problem to consider here. Can switching
 collation on a objectStore or a unique index can affect its validity?
 I.e. if you switch from a case sensitive to a case insensitive
 collation, does that mean that if you have two entries with the
 primary keys Sweden and sweden they collide and thus the change of
 collation must result in an error (or aborted transaction)?

 I do seem to recall that there are ways to do at least case
 sensitivity such that you generally don't take case into account when
 sorting, unless two entries are exactly the same, in which case you do
 look at casing to differentiate them. However I don't really know a
 whole lot about this and so defer to people that know
 internationalization better.

  I don't have a strong preference there. In any case both would use BCP47
 names as discussed in this thread (as Jonas pointed out, implementations can
 also do their thing as long as they don't interfere with BCP47).
 
  Another piece of feedback I heard consistently as I discussed this with
 various folks at Microsoft is the need to be able to pick up what the UA
 would consider the collation that's most appropriate for the user
 environment (derived from settings, page language or whatever). We could
 support this by introducing a special value that  you can pass to
 setCollation that indicates pick whatever is the right for the
 environment's language right now. Given that there is no other way for
 people to discover the user preference on this, I think this is pretty
 important.

 I would be fine with this as long as it's a explicit opt-in. There is
 definitely a risk that people will do this and then only do testing in
 one language, but it seems to me like a useful use case to support,
 and I don't see a way of supporting this while completely avoiding the
 risk of internationalization bugs.

 / Jonas




RE: [IndexedDB] Spec changes for international language support

2011-03-18 Thread Pablo Castro

From: keean.schu...@googlemail.com [mailto:keean.schu...@googlemail.com] On 
Behalf Of Keean Schupke
Sent: Friday, March 18, 2011 1:53 AM

 See my proposal in another thread. The basic idea is to copy BDB. Have a 
 primary index that is based on an integer, something primitive and fast. 
 Allow secondary indexes which use a callback to generate a binary index key. 
 IDB shifts the complexity out into a library. Common use cases can be 
 provided (a hash of all fields in the object, internationalised 
 bidirectional lexicographic etc...), but the user is free to write their own 
 for less usual cases (for example indexing by the last word in a name string 
 to order by surname).

I agree with Jeremy's comments on the other thread for this. Having the 
callback mechanism definitely sounds interesting but there are a ton of common 
cases that we can solve by just taking a language identifier, I'm not sure we 
want to make people work hard to get something that's already supported in most 
systems. The idea of having a callback to compute the index value feels 
incremental to this, so we could take on it later on without disrupting the 
explicit international collation stuff.

 On 18 March 2011 02:19, Jonas Sicking jo...@sicking.cc wrote:
 2011/3/17 Pablo Castro pablo.cas...@microsoft.com:
 
  From: Jonas Sicking [mailto:jo...@sicking.cc]
  Sent: Tuesday, March 08, 2011 1:11 PM
 
  All in all, is there anything preventing adding the API Pablo suggests
  in this thread to the IndexedDB spec drafts?
 
  I wanted to propose a couple of specific tweaks to the initial proposal 
  and then unless I hear pushback start editing this into the spec.
 
  From reading the details on this thread I'm starting to realize that 
  per-database collations won't do it. What did it for me was the example 
  that has a fuzzier matching mode (case/accent insensitive). This is 
  exactly the kind of index I would want to sort people's names in my 
  address book, but most likely not the index I'll want to use for my 
  primary key.
 
  Refactoring the API to accommodate for this would mean to move the 
  setCollation() method and the collation property to the object store and 
  index objects. If we were willing to live without the ability to change 
  them we could take collation as one of the optional parameters to 
  createObjectStore()/createIndex() and reduce a bit of surface area...
 Unfortunately I think you bring up good use cases for
 per-objectStore/index collations. It's definitely tempting to just add
 it as a optional parameter to createObjectStore/createIndex. The
 downside is obviously pushing more complexity onto web developers.
 Complexity which will be duplicated across sites.

 However there is another problem to consider here. Can switching
 collation on a objectStore or a unique index can affect its validity?
 I.e. if you switch from a case sensitive to a case insensitive
 collation, does that mean that if you have two entries with the
 primary keys Sweden and sweden they collide and thus the change of
 collation must result in an error (or aborted transaction)?

 I do seem to recall that there are ways to do at least case
 sensitivity such that you generally don't take case into account when
 sorting, unless two entries are exactly the same, in which case you do
 look at casing to differentiate them. However I don't really know a
 whole lot about this and so defer to people that know
 internationalization better.

This is a good point. It makes me lean toward not allowing changing the 
collation of an index or store. That means we could just have an optional 
parameter (in the generic parameter object thingy we have now) on 
createObjectStore and createIndex that indicates the collation name. It seems 
minimally disruptive, it doesn't tax people that don't care about it, and since 
there is no setCollation we don't have the problem of not being able to 
re-index the data.

  Another piece of feedback I heard consistently as I discussed this with 
  various folks at Microsoft is the need to be able to pick up what the UA 
  would consider the collation that's most appropriate for the user 
  environment (derived from settings, page language or whatever). We could 
  support this by introducing a special value that  you can pass to 
  setCollation that indicates pick whatever is the right for the 
  environment's language right now. Given that there is no other way for 
  people to discover the user preference on this, I think this is pretty 
  important.
 I would be fine with this as long as it's a explicit opt-in. There is
 definitely a risk that people will do this and then only do testing in
 one language, but it seems to me like a useful use case to support,
 and I don't see a way of supporting this while completely avoiding the
 risk of internationalization bugs.

I agree, it should be opt-in. I still assume we'll default to binary collation 
(same if you specify the collation value as null). I was 

Re: [IndexedDB] Spec changes for international language support

2011-03-18 Thread Jonas Sicking
On Fri, Mar 18, 2011 at 12:29 PM, Pablo Castro
pablo.cas...@microsoft.com wrote:

 From: keean.schu...@googlemail.com [mailto:keean.schu...@googlemail.com] On 
 Behalf Of Keean Schupke
 Sent: Friday, March 18, 2011 1:53 AM

 See my proposal in another thread. The basic idea is to copy BDB. Have a 
 primary index that is based on an integer, something primitive and fast. 
 Allow secondary indexes which use a callback to generate a binary index 
 key. IDB shifts the complexity out into a library. Common use cases can be 
 provided (a hash of all fields in the object, internationalised 
 bidirectional lexicographic etc...), but the user is free to write their 
 own for less usual cases (for example indexing by the last word in a name 
 string to order by surname).

 I agree with Jeremy's comments on the other thread for this. Having the 
 callback mechanism definitely sounds interesting but there are a ton of 
 common cases that we can solve by just taking a language identifier, I'm not 
 sure we want to make people work hard to get something that's already 
 supported in most systems. The idea of having a callback to compute the index 
 value feels incremental to this, so we could take on it later on without 
 disrupting the explicit international collation stuff.

 On 18 March 2011 02:19, Jonas Sicking jo...@sicking.cc wrote:
 2011/3/17 Pablo Castro pablo.cas...@microsoft.com:
 
  From: Jonas Sicking [mailto:jo...@sicking.cc]
  Sent: Tuesday, March 08, 2011 1:11 PM
 
  All in all, is there anything preventing adding the API Pablo suggests
  in this thread to the IndexedDB spec drafts?
 
  I wanted to propose a couple of specific tweaks to the initial proposal 
  and then unless I hear pushback start editing this into the spec.
 
  From reading the details on this thread I'm starting to realize that 
  per-database collations won't do it. What did it for me was the example 
  that has a fuzzier matching mode (case/accent insensitive). This is 
  exactly the kind of index I would want to sort people's names in my 
  address book, but most likely not the index I'll want to use for my 
  primary key.
 
  Refactoring the API to accommodate for this would mean to move the 
  setCollation() method and the collation property to the object store and 
  index objects. If we were willing to live without the ability to change 
  them we could take collation as one of the optional parameters to 
  createObjectStore()/createIndex() and reduce a bit of surface area...
 Unfortunately I think you bring up good use cases for
 per-objectStore/index collations. It's definitely tempting to just add
 it as a optional parameter to createObjectStore/createIndex. The
 downside is obviously pushing more complexity onto web developers.
 Complexity which will be duplicated across sites.

 However there is another problem to consider here. Can switching
 collation on a objectStore or a unique index can affect its validity?
 I.e. if you switch from a case sensitive to a case insensitive
 collation, does that mean that if you have two entries with the
 primary keys Sweden and sweden they collide and thus the change of
 collation must result in an error (or aborted transaction)?

 I do seem to recall that there are ways to do at least case
 sensitivity such that you generally don't take case into account when
 sorting, unless two entries are exactly the same, in which case you do
 look at casing to differentiate them. However I don't really know a
 whole lot about this and so defer to people that know
 internationalization better.

 This is a good point. It makes me lean toward not allowing changing the 
 collation of an index or store. That means we could just have an optional 
 parameter (in the generic parameter object thingy we have now) on 
 createObjectStore and createIndex that indicates the collation name. It seems 
 minimally disruptive, it doesn't tax people that don't care about it, and 
 since there is no setCollation we don't have the problem of not being able to 
 re-index the data.

So there is no way to specify things such that the collation doesn't
affect unique-ness? If so, I tend to agree.

  Another piece of feedback I heard consistently as I discussed this with 
  various folks at Microsoft is the need to be able to pick up what the UA 
  would consider the collation that's most appropriate for the user 
  environment (derived from settings, page language or whatever). We could 
  support this by introducing a special value that  you can pass to 
  setCollation that indicates pick whatever is the right for the 
  environment's language right now. Given that there is no other way for 
  people to discover the user preference on this, I think this is pretty 
  important.
 I would be fine with this as long as it's a explicit opt-in. There is
 definitely a risk that people will do this and then only do testing in
 one language, but it seems to me like a useful use case to support,
 and I don't see a way of supporting 

RE: [IndexedDB] Spec changes for international language support

2011-03-18 Thread Pablo Castro

From: Jonas Sicking [mailto:jo...@sicking.cc] 
Sent: Friday, March 18, 2011 1:57 PM

  However there is another problem to consider here. Can switching
  collation on a objectStore or a unique index can affect its validity?
  I.e. if you switch from a case sensitive to a case insensitive
  collation, does that mean that if you have two entries with the
  primary keys Sweden and sweden they collide and thus the change of
  collation must result in an error (or aborted transaction)?
 
  I do seem to recall that there are ways to do at least case
  sensitivity such that you generally don't take case into account when
  sorting, unless two entries are exactly the same, in which case you do
  look at casing to differentiate them. However I don't really know a
  whole lot about this and so defer to people that know
  internationalization better.
 
  This is a good point. It makes me lean toward not allowing changing the 
  collation of an index or store. That means we could just have an optional 
  parameter (in the generic parameter object thingy we have now) on 
  createObjectStore and createIndex that indicates the collation name. It 
  seems minimally disruptive, it doesn't tax people that don't care about 
  it, and since there is no setCollation we don't have the problem of not 
  being able to re-index the data.

 So there is no way to specify things such that the collation doesn't
 affect unique-ness? If so, I tend to agree.

The problem is that different collations will consider different things unique. 
This is bound to be variable across languages and such, so I'm not sure we want 
to be in the business of fine-tuning this. It seems that being a bit more 
restrictive could result in a more robust result overall. If someone really 
needs to change the collation they can copy the table manually...not great, but 
if we think it's a corner case it's probably fine.

   Another piece of feedback I heard consistently as I discussed this 
   with various folks at Microsoft is the need to be able to pick up what 
   the UA would consider the collation that's most appropriate for the 
   user environment (derived from settings, page language or whatever). 
   We could support this by introducing a special value that  you can 
   pass to setCollation that indicates pick whatever is the right for 
   the environment's language right now. Given that there is no other 
   way for people to discover the user preference on this, I think this 
   is pretty important.
  I would be fine with this as long as it's a explicit opt-in. There is
  definitely a risk that people will do this and then only do testing in
  one language, but it seems to me like a useful use case to support,
  and I don't see a way of supporting this while completely avoiding the
  risk of internationalization bugs.
 
  I agree, it should be opt-in. I still assume we'll default to binary 
  collation (same if you specify the collation value as null). I was reading 
  the BCP 47 [1] and in section 4.1 Choice of Language Tag the item #7 
  seems to describe what we're looking for. The value i-default seems to 
  match our needs close enough, so callers could use that value. 
  Discoverability is not great, but we avoid having to specify something 
  new, and arguably they'll need to read somewhere that this argument is a 
  BCP47-compatible value, and we could put a comment about i-default right 
  there.

 Sounds good to me. Though you seem to have forgotten to include the
 [1] reference.

Oops, here it goes:
 [1] http://tools.ietf.org/html/bcp47





Re: [IndexedDB] Spec changes for international language support

2011-03-18 Thread Keean Schupke
On 18 March 2011 19:29, Pablo Castro pablo.cas...@microsoft.com wrote:


 From: keean.schu...@googlemail.com [mailto:keean.schu...@googlemail.com]
 On Behalf Of Keean Schupke
 Sent: Friday, March 18, 2011 1:53 AM

  See my proposal in another thread. The basic idea is to copy BDB. Have a
 primary index that is based on an integer, something primitive and fast.
 Allow secondary indexes which use a callback to generate a binary index key.
 IDB shifts the complexity out into a library. Common use cases can be
 provided (a hash of all fields in the object, internationalised
 bidirectional lexicographic etc...), but the user is free to write their own
 for less usual cases (for example indexing by the last word in a name string
 to order by surname).

 I agree with Jeremy's comments on the other thread for this. Having the
 callback mechanism definitely sounds interesting but there are a ton of
 common cases that we can solve by just taking a language identifier, I'm not
 sure we want to make people work hard to get something that's already
 supported in most systems. The idea of having a callback to compute the
 index value feels incremental to this, so we could take on it later on
 without disrupting the explicit international collation stuff.


The idea would be to provide pre-defined implementations of the callback for
common use cases, then it is just as simple to register a callback as set
any other option. All this means to the API is you pass a function instead
of a string. It also is better for modularity as all the code relating to
the sort order is kept in the callback functions.

The difference comes down to something like:

index.set_order_lexicographic('us');

vs

index.set_order_method(order_lexicographic('us'));

So more than just setting a property like the first case, where presumably
all the ordering code is mixed in with the indexing code, the second case
encapsulates all the ordering code in the function returned from the
execution of order_lexicographic('us'). This function would represent a
mapping from the object being indexed to a binary blob that is the actual
stored index data.

So doing it this was does not necessarily make things harder, and it
improves encapsulation, the type-safety, and the flexibility of the API.


Cheers,
Keean.


Re: [IndexedDB] Spec changes for international language support

2011-03-17 Thread Jeremy Orlow
FWIW, this maybe would have been better off as its own thread.  :-)

On Thu, Mar 17, 2011 at 3:37 PM, Pablo Castro pablo.cas...@microsoft.comwrote:


 From: Jonas Sicking [mailto:jo...@sicking.cc]
 Sent: Tuesday, March 08, 2011 1:11 PM

  All in all, is there anything preventing adding the API Pablo suggests
  in this thread to the IndexedDB spec drafts?

 I wanted to propose a couple of specific tweaks to the initial proposal and
 then unless I hear pushback start editing this into the spec.

 From reading the details on this thread I'm starting to realize that
 per-database collations won't do it. What did it for me was the example that
 has a fuzzier matching mode (case/accent insensitive). This is exactly the
 kind of index I would want to sort people's names in my address book, but
 most likely not the index I'll want to use for my primary key.

 Refactoring the API to accommodate for this would mean to move the
 setCollation() method and the collation property to the object store and
 index objects. If we were willing to live without the ability to change them
 we could take collation as one of the optional parameters to
 createObjectStore()/createIndex() and reduce a bit of surface area...I don't
 have a strong preference there. In any case both would use BCP47 names as
 discussed in this thread (as Jonas pointed out, implementations can also do
 their thing as long as they don't interfere with BCP47).


I'm fine with this.  Another (I believe) related use case I ran into today
is wanting collation to be case insensitive.


 Another piece of feedback I heard consistently as I discussed this with
 various folks at Microsoft is the need to be able to pick up what the UA
 would consider the collation that's most appropriate for the user
 environment (derived from settings, page language or whatever). We could
 support this by introducing a special value that  you can pass to
 setCollation that indicates pick whatever is the right for the
 environment's language right now. Given that there is no other way for
 people to discover the user preference on this, I think this is pretty
 important.


This seems useful even outside of the context of IndexedDB.  It should
probably be added to some other spec.  I'm fine adding it to ours for now
and adding an issue along with it.  But if so, please do shop it around.

J


Re: [IndexedDB] Spec changes for international language support

2011-03-17 Thread Jonas Sicking
2011/3/17 Pablo Castro pablo.cas...@microsoft.com:

 From: Jonas Sicking [mailto:jo...@sicking.cc]
 Sent: Tuesday, March 08, 2011 1:11 PM

 All in all, is there anything preventing adding the API Pablo suggests
 in this thread to the IndexedDB spec drafts?

 I wanted to propose a couple of specific tweaks to the initial proposal and 
 then unless I hear pushback start editing this into the spec.

 From reading the details on this thread I'm starting to realize that 
 per-database collations won't do it. What did it for me was the example that 
 has a fuzzier matching mode (case/accent insensitive). This is exactly the 
 kind of index I would want to sort people's names in my address book, but 
 most likely not the index I'll want to use for my primary key.

 Refactoring the API to accommodate for this would mean to move the 
 setCollation() method and the collation property to the object store and 
 index objects. If we were willing to live without the ability to change them 
 we could take collation as one of the optional parameters to 
 createObjectStore()/createIndex() and reduce a bit of surface area...

Unfortunately I think you bring up good use cases for
per-objectStore/index collations. It's definitely tempting to just add
it as a optional parameter to createObjectStore/createIndex. The
downside is obviously pushing more complexity onto web developers.
Complexity which will be duplicated across sites.

However there is another problem to consider here. Can switching
collation on a objectStore or a unique index can affect its validity?
I.e. if you switch from a case sensitive to a case insensitive
collation, does that mean that if you have two entries with the
primary keys Sweden and sweden they collide and thus the change of
collation must result in an error (or aborted transaction)?

I do seem to recall that there are ways to do at least case
sensitivity such that you generally don't take case into account when
sorting, unless two entries are exactly the same, in which case you do
look at casing to differentiate them. However I don't really know a
whole lot about this and so defer to people that know
internationalization better.

 I don't have a strong preference there. In any case both would use BCP47 
 names as discussed in this thread (as Jonas pointed out, implementations can 
 also do their thing as long as they don't interfere with BCP47).

 Another piece of feedback I heard consistently as I discussed this with 
 various folks at Microsoft is the need to be able to pick up what the UA 
 would consider the collation that's most appropriate for the user environment 
 (derived from settings, page language or whatever). We could support this by 
 introducing a special value that  you can pass to setCollation that indicates 
 pick whatever is the right for the environment's language right now. Given 
 that there is no other way for people to discover the user preference on 
 this, I think this is pretty important.

I would be fine with this as long as it's a explicit opt-in. There is
definitely a risk that people will do this and then only do testing in
one language, but it seems to me like a useful use case to support,
and I don't see a way of supporting this while completely avoiding the
risk of internationalization bugs.

/ Jonas



Re: [IndexedDB] Spec changes for international language support

2011-03-08 Thread Jonas Sicking
2011/2/23 Pablo Castro pablo.cas...@microsoft.com:

 From: jungs...@google.com [mailto:jungs...@google.com] On Behalf Of Jungshik 
 Shin (???, ???)
 Sent: Tuesday, February 22, 2011 2:08 PM


 On Fri, Feb 18, 2011 at 2:34 AM, Bjoern Hoehrmann derhoe...@gmx.net wrote:
 * Pablo Castro wrote:
 We discussed international language support last time at the TPAC and I
 said I'd propose spec text for it. Please find the patch below, the
 changes mirror exactly the proposal described in the bug we have for
 tracking this: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9903
 You should anticipate objections to that; collation is not a property of
 language, for instance, for de-de you typically have dictionary sorting
 and phone book sorting (and of course you have de-de, de-ch, and so
 on, so de alone would be rather meaningless). So far the W3C and the
 IETF have used resource identifiers to specify collations (see XPath 2.0
 and RFC 4790) where the IETF allows shorthands like i;ascii-casemap.

 I agree that simply specifying that 'language' be used without saying what 
 it means is not sufficient. However, your examples (German phonebook vs 
 dictionary) can be  covered with language identifier framework laid out 
 in BCP47 (with 'u' extension).

 Fair enough. I'll adjust this part of the write up to discuss this in terms 
 of collation identifier or language identifier.

 I do understand that Microsoft uses an extension of language tags for
 the `CultureInfo` in the .NET Framework, where, say, `de-DE_phoneb` is
 used to refer to german phone book sorting, but BCP 47 does not allow
 for that,

 There's a way to specify alternate sorting orders (e.g. German phonebook, 
 Chinese pinyin, stroke count, radical-stroke count order, etc) under the 
 BCP 47 framework  because it has a mechanism for defining an extension 
 and registering it. The Unicode consortium uses that mechanism to define 
 'u' extension and a set of subtags that can  be used with 'u'.
 For instance, German phonebook sorting can be identified with 
 'de-DE-u-co-phonebk'. See

 https://tools.ietf.org/html/bcp47
 https://tools.ietf.org/html/rfc6067
 http://unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers

 Also, see Bug 9903 comment 6 by Mark Davis for more examples. Well, I'm 
 just copying his comment directly here:


 To add to what Jungshik said, BCP47 defines standard extensions. The 
 extension
 defined by the Unicode consortium
 (http://cldr.unicode.org/index/bcp47-extension) provides for fine-grained

 specifications of collation behavior.
 Examples for German:
 de-u-co-phonebk // phonebook order
 de-u-kn-true // numeric sorting, eg Tom2 comes before Tom12
 de-u-ks-level1 // ignore accents, case differences
 de-u-ks-level2 // ignore case differences
 de-u-ks-level1-kc-true // ignore accents, but not case
 These can be combined, such as:
 de-u-co-phonebk-kn-true-ks-level1-kc-true

 neither could you devise a language tag to define something
 like i;ascii-casemap (which simply defines A-Z = a-z).


 I'm not sure how specific we want to get into this. In particular, would be 
 it better if we specified it all the way (including which extensions UAs need 
 to support) or if we used BCP47 as the starting point and allowed UAs to 
 support additional extensions as needed?

I think for now we should allow implementations to support additional
collations in additions to whatever set we specify. It seems to me
that this is an area that is heavily in flux and I'd hate to paint
ourselves into a corner.

 I would expect that if browsers offer collations, there would be an in-
 terface for that so you can use them in other places, as such it might
 be wiser to accept something other than a language identifier string.

 There's an on-going effort to expose a 'rich' set of I18N API to 
 client-side development using Javascript ( 
 http://wiki.ecmascript.org/doku.php?id=strawman:i18n_api : The API used be 
 much more extensive than now, but has been scaled down significantly to get 
 more browsers on board in its 1st iteration). There we're likely to use BCP 
 47 with 'u' extension (see above). So, I think it'd be better if IndexedDB 
 matches what ECMAScript plans to do.

 This is interesting, do you know how far along is this?

And does someone have a link to drafts?

I suspect we don't want to wait for this work to finish, but we should
definitely track it and seek inspiration. And there are probably
people there that can review whatever we're doing.

 I also note that collation often involves equivalence testing, but it
 is not clear from your proposal whether that is the case here. It might
 also be a good idea to clearly spell out interoperability expectations;
 if two implementations support some collation, will they behave the same
 for any and all inputs as far as collation is concerned, or should one
 be prepared for slight differences among implementations?

 I think it's more practical to assume that users should be prepared for 
 

RE: [IndexedDB] Spec changes for international language support

2011-02-23 Thread Pablo Castro

From: jungs...@google.com [mailto:jungs...@google.com] On Behalf Of Jungshik 
Shin (???, ???)
Sent: Tuesday, February 22, 2011 2:08 PM


 On Fri, Feb 18, 2011 at 2:34 AM, Bjoern Hoehrmann derhoe...@gmx.net wrote:
 * Pablo Castro wrote:
 We discussed international language support last time at the TPAC and I
 said I'd propose spec text for it. Please find the patch below, the
 changes mirror exactly the proposal described in the bug we have for
 tracking this: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9903
 You should anticipate objections to that; collation is not a property of
 language, for instance, for de-de you typically have dictionary sorting
 and phone book sorting (and of course you have de-de, de-ch, and so
 on, so de alone would be rather meaningless). So far the W3C and the
 IETF have used resource identifiers to specify collations (see XPath 2.0
 and RFC 4790) where the IETF allows shorthands like i;ascii-casemap.

 I agree that simply specifying that 'language' be used without saying what 
 it means is not sufficient. However, your examples (German phonebook vs 
 dictionary) can be  covered with language identifier framework laid out in 
 BCP47 (with 'u' extension). 

Fair enough. I'll adjust this part of the write up to discuss this in terms of 
collation identifier or language identifier.

 I do understand that Microsoft uses an extension of language tags for
 the `CultureInfo` in the .NET Framework, where, say, `de-DE_phoneb` is
 used to refer to german phone book sorting, but BCP 47 does not allow
 for that, 

 There's a way to specify alternate sorting orders (e.g. German phonebook, 
 Chinese pinyin, stroke count, radical-stroke count order, etc) under the BCP 
 47 framework  because it has a mechanism for defining an extension and 
 registering it. The Unicode consortium uses that mechanism to define 'u' 
 extension and a set of subtags that can  be used with 'u'. 
 For instance, German phonebook sorting can be identified with 
 'de-DE-u-co-phonebk'. See 

 https://tools.ietf.org/html/bcp47
 https://tools.ietf.org/html/rfc6067
 http://unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers

 Also, see Bug 9903 comment 6 by Mark Davis for more examples. Well, I'm just 
 copying his comment directly here:


 To add to what Jungshik said, BCP47 defines standard extensions. The 
 extension
 defined by the Unicode consortium
 (http://cldr.unicode.org/index/bcp47-extension) provides for fine-grained

 specifications of collation behavior.
 Examples for German:
 de-u-co-phonebk // phonebook order
 de-u-kn-true // numeric sorting, eg Tom2 comes before Tom12
 de-u-ks-level1 // ignore accents, case differences
 de-u-ks-level2 // ignore case differences
 de-u-ks-level1-kc-true // ignore accents, but not case
 These can be combined, such as:
 de-u-co-phonebk-kn-true-ks-level1-kc-true
 
 neither could you devise a language tag to define something
 like i;ascii-casemap (which simply defines A-Z = a-z).


I'm not sure how specific we want to get into this. In particular, would be it 
better if we specified it all the way (including which extensions UAs need to 
support) or if we used BCP47 as the starting point and allowed UAs to support 
additional extensions as needed?

 I would expect that if browsers offer collations, there would be an in-
 terface for that so you can use them in other places, as such it might
 be wiser to accept something other than a language identifier string. 

 There's an on-going effort to expose a 'rich' set of I18N API to client-side 
 development using Javascript ( 
 http://wiki.ecmascript.org/doku.php?id=strawman:i18n_api : The API used be 
 much more extensive than now, but has been scaled down significantly to get 
 more browsers on board in its 1st iteration). There we're likely to use BCP 
 47 with 'u' extension (see above). So, I think it'd be better if IndexedDB 
 matches what ECMAScript plans to do. 

This is interesting, do you know how far along is this?


 I also note that collation often involves equivalence testing, but it
 is not clear from your proposal whether that is the case here. It might
 also be a good idea to clearly spell out interoperability expectations;
 if two implementations support some collation, will they behave the same
 for any and all inputs as far as collation is concerned, or should one
 be prepared for slight differences among implementations?

I think it's more practical to assume that users should be prepared for slight 
differences among implementations.

Thanks
-pablo



Re: [IndexedDB] Spec changes for international language support

2011-02-22 Thread 신정식, 申政湜
On Fri, Feb 18, 2011 at 2:34 AM, Bjoern Hoehrmann derhoe...@gmx.net wrote:

 * Pablo Castro wrote:
 We discussed international language support last time at the TPAC and I
 said I'd propose spec text for it. Please find the patch below, the
 changes mirror exactly the proposal described in the bug we have for
 tracking this: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9903

 You should anticipate objections to that; collation is not a property of
 language, for instance, for de-de you typically have dictionary sorting
 and phone book sorting (and of course you have de-de, de-ch, and so
 on, so de alone would be rather meaningless). So far the W3C and the
 IETF have used resource identifiers to specify collations (see XPath 2.0
 and RFC 4790) where the IETF allows shorthands like i;ascii-casemap.


I agree that simply specifying that 'language' be used without saying what
it means is not sufficient. However, your examples (German phonebook vs
dictionary) can be covered with language identifier framework laid out in
BCP47 (with 'u' extension).




 I do understand that Microsoft uses an extension of language tags for
 the `CultureInfo` in the .NET Framework, where, say, `de-DE_phoneb` is
 used to refer to german phone book sorting, but BCP 47 does not allow
 for that,


There's a way to specify alternate sorting orders (e.g. German phonebook,
Chinese pinyin, stroke count, radical-stroke count order, etc) under the BCP
47 framework because it has a mechanism for defining an extension and
registering it. The Unicode consortium uses that mechanism to define 'u'
extension and a set of subtags that can be used with 'u'.
For instance, German phonebook sorting can be identified with
'de-DE-u-co-phonebk'. See

https://tools.ietf.org/html/bcp47
https://tools.ietf.org/html/rfc6067
http://unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers

Also, see Bug 9903 comment 6 by Mark
Davishttp://www.w3.org/Bugs/Public/show_bug.cgi?id=9903#c6 for
more examples. Well, I'm just copying his comment directly here:

To add to what Jungshik said, BCP47 defines standard extensions. The extension
defined by the Unicode consortium
(http://cldr.unicode.org/index/bcp47-extension) provides for fine-grained
specifications of collation behavior.
Examples for German:
de-u-co-phonebk // phonebook order
de-u-kn-true // numeric sorting, eg Tom2 comes before Tom12
de-u-ks-level1 // ignore accents, case differences
de-u-ks-level2 // ignore case differences
de-u-ks-level1-kc-true // ignore accents, but not case
These can be combined, such as:
de-u-co-phonebk-kn-true-ks-level1-kc-true



 neither could you devise a language tag to define something
 like i;ascii-casemap (which simply defines A-Z = a-z).





 I would expect that if browsers offer collations, there would be an in-
 terface for that so you can use them in other places, as such it might
 be wiser to accept something other than a language identifier string.


There's an on-going effort to expose a 'rich' set of I18N API to client-side
development using Javascript (
http://wiki.ecmascript.org/doku.php?id=strawman:i18n_api : The API used be
much more extensive than now, but has been scaled down significantly to get
more browsers on board in its 1st iteration). There we're likely to use BCP
47 with 'u' extension (see above). So, I think it'd be better if IndexedDB
matches what ECMAScript plans to do.

Jungshik




 As
 above, URIs, or RFC 4790 values plus URIs, or, in anticipation of some
 such interface, some other object, might be a better choice. And the
 method and attribute should probably not use language in their names.

 I also note that collation often involves equivalence testing, but it
 is not clear from your proposal whether that is the case here. It might
 also be a good idea to clearly spell out interoperability expectations;
 if two implementations support some collation, will they behave the same
 for any and all inputs as far as collation is concerned, or should one
 be prepared for slight differences among implementations?
 --
 Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
 Am Badedeich 7 · Telefon: %2B49%280%29160%2F4415681+49(0)160/4415681 ·
 http://www.bjoernsworld.de
 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/




Re: [IndexedDB] Spec changes for international language support

2011-02-18 Thread Bjoern Hoehrmann
* Pablo Castro wrote:
We discussed international language support last time at the TPAC and I
said I'd propose spec text for it. Please find the patch below, the
changes mirror exactly the proposal described in the bug we have for
tracking this: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9903

You should anticipate objections to that; collation is not a property of
language, for instance, for de-de you typically have dictionary sorting
and phone book sorting (and of course you have de-de, de-ch, and so
on, so de alone would be rather meaningless). So far the W3C and the
IETF have used resource identifiers to specify collations (see XPath 2.0
and RFC 4790) where the IETF allows shorthands like i;ascii-casemap.

I do understand that Microsoft uses an extension of language tags for
the `CultureInfo` in the .NET Framework, where, say, `de-DE_phoneb` is
used to refer to german phone book sorting, but BCP 47 does not allow
for that, neither could you devise a language tag to define something
like i;ascii-casemap (which simply defines A-Z = a-z).

I would expect that if browsers offer collations, there would be an in-
terface for that so you can use them in other places, as such it might
be wiser to accept something other than a language identifier string. As
above, URIs, or RFC 4790 values plus URIs, or, in anticipation of some
such interface, some other object, might be a better choice. And the
method and attribute should probably not use language in their names.

I also note that collation often involves equivalence testing, but it
is not clear from your proposal whether that is the case here. It might
also be a good idea to clearly spell out interoperability expectations;
if two implementations support some collation, will they behave the same
for any and all inputs as far as collation is concerned, or should one
be prepared for slight differences among implementations?
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



[IndexedDB] Spec changes for international language support

2011-02-17 Thread Pablo Castro
We discussed international language support last time at the TPAC and I said 
I'd propose spec text for it. Please find the patch below, the changes mirror 
exactly the proposal described in the bug we have for tracking this:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9903

btw - the bug is assigned to Nikunj right now but I think that's just because 
of an editing glitch. Nikunj please let me know if you were working on it, 
otherwise I'll just submit the changes once I hear some feedback from this 
group.

Thanks
-pablo


Left file: \IndexedDB 
Specs\20110217\Speclet_023_IDB_API_Asynchronous_APIs.original.html
Right file: \IndexedDB Specs\20110217\Speclet_023_IDB_API_Asynchronous_APIs.html
copy 6
add 7
dtreadonly attribute DOMString language/dt
dd
On getting, this attribute MUST return the a title=database 
languagelanguage/a
that is configured in this database for string collation. If no 
collation has been
configured for a database this value is codenull/code and 
the database will
use binary collation.
/dd
copy 6
copy 6
add 24
dtIDBRequest setLanguage()/dt
dd
p
This method changes the a title=database 
languagelanguage/a used by the database
for string collation. Note that this method must only
be called from a acodeVERSION_CHANGE/code/a 
atransaction/a callback.
/p
p class=note
Changing the language in a database that already contains data 
typically involves reading and 
re-writing the entire database and thus can be a time consuming 
operation.
/p
dl class=parameters
dtoptional DOMString language/dt
ddThe language to be used in the database specified as a 
language identifier as
described in [[!BCP47]]./dd
/dl
dl class=exception title=IDBDatabaseException
dtNOT_ALLOWED_ERR/dt
ddThis method was not called from a 
acodeVERSION_CHANGE/code/a atransaction/a callback./dd
dtDATA_ERR/dt
ddThe language parameter contained a string that was not 
a valid language identifier or was a language
identifier not supported by the system./dd
/dl
/dd
copy 6



Left file: \IndexedDB 
Specs\20110217\Speclet_022_IDB_API_Synchronous_APIs.original.html
Right file: \IndexedDB Specs\20110217\Speclet_022_IDB_API_Synchronous_APIs.html
copy 6
add 7
dtreadonly attribute DOMString language/dt
dd
On getting, this attribute MUST return the a title=database 
languagelanguage/a
that is configured in this database for string collation. If no 
collation has been
configured for a database this value is codenull/code and 
the database will
use binary collation.
/dd
copy 6
copy 6
add 24
dtvoid setLanguage()/dt
dd
p
This method changes the a title=database 
languagelanguage/a used by the database
for string collation. Note that this method must only
be called from a acodeVERSION_CHANGE/code/a 
atransaction/a callback.
/p
p class=note
Changing the language in a database that already contains data 
typically involves reading and 
re-writing the entire database and thus can be a time consuming 
operation.
/p
dl class=parameters
dtoptional DOMString language/dt
ddThe language to be used in the database specified as a 
language identifier as
described in [[!BCP47]]./dd
/dl
dl class=exception title=IDBDatabaseException
dtNOT_ALLOWED_ERR/dt
ddThis method was not called from a 
acodeVERSION_CHANGE/code/a atransaction/a callback./dd
dtDATA_ERR/dt
ddThe language parameter contained a string that was not 
a valid language identifier or was a language
identifier not supported by the system./dd
/dl
/dd
copy 6



Left file: \IndexedDB 
Specs\20110217\Speclet_020_IDB_API_Constructs.original.html
Right file: \IndexedDB Specs\20110217\Speclet_020_IDB_API_Constructs.html
copy 6
add 4
Every adatabase/a also has a dfn title=database 
languagelanguage/dfn that indicates the 
language that should be used for collating strings when comparing 
keys.
  /p
  p
copy 6
copy 6
delete 1
add 2
value with no need to separate them by type. When comparing a 
codeDOMString/code with another codeDOMString/code, the adatabase
  

Re: [IndexedDB] Spec changes for international language support

2011-02-17 Thread Nikunj Mehta
Hi Pablo,

I will reassign this bug to Eliott.

Nikunj
On Feb 17, 2011, at 6:38 PM, Pablo Castro wrote:

 btw - the bug is assigned to Nikunj right now but I think that's just because 
 of an editing glitch. Nikunj please let me know if you were working on it, 
 otherwise I'll just submit the changes once I hear some feedback from this 
 group.