RE: [IndexedDB] Spec changes for international language support

Pablo Castro Fri, 18 Mar 2011 12:32:04 -0700

From: [email protected] [mailto:[email protected]] On 
Behalf Of Keean Schupke
Sent: Friday, March 18, 2011 1:53 AM


>> See my proposal in another thread. The basic idea is to copy BDB. Have a 
>> primary index that is based on an integer, something primitive and fast. 
>> Allow secondary indexes which use a callback to generate a binary index key. 
>> IDB shifts the complexity out into a library. Common use cases can be 
>> provided (a hash of all fields in the object, internationalised 
>> bidirectional lexicographic etc...), but the user is free to write their own 
>> for less usual cases (for example indexing by the last word in a name string 
>> to order by surname).

I agree with Jeremy's comments on the other thread for this. Having the 
callback mechanism definitely sounds interesting but there are a ton of common 
cases that we can solve by just taking a language identifier, I'm not sure we 
want to make people work hard to get something that's already supported in most 
systems. The idea of having a callback to compute the index value feels 
incremental to this, so we could take on it later on without disrupting the 
explicit international collation stuff.

>> On 18 March 2011 02:19, Jonas Sicking <[email protected]> wrote:
>> 2011/3/17 Pablo Castro <[email protected]>:
>> >
>> > From: Jonas Sicking [mailto:[email protected]]
>> > Sent: Tuesday, March 08, 2011 1:11 PM
>> >
>> >>> All in all, is there anything preventing adding the API Pablo suggests
>> >>> in this thread to the IndexedDB spec drafts?
>> >
>> > I wanted to propose a couple of specific tweaks to the initial proposal 
>> > and then unless I hear pushback start editing this into the spec.
>> >
>> > From reading the details on this thread I'm starting to realize that 
>> > per-database collations won't do it. What did it for me was the example 
>> > that has a fuzzier matching mode (case/accent insensitive). This is 
>> > exactly the kind of index I would want to sort people's names in my 
>> > address book, but most likely not the index I'll want to use for my 
>> > primary key.
>> >
>> > Refactoring the API to accommodate for this would mean to move the 
>> > setCollation() method and the collation property to the object store and 
>> > index objects. If we were willing to live without the ability to change 
>> > them we could take collation as one of the optional parameters to 
>> > createObjectStore()/createIndex() and reduce a bit of surface area...
>> Unfortunately I think you bring up good use cases for
>> per-objectStore/index collations. It's definitely tempting to just add
>> it as a optional parameter to createObjectStore/createIndex. The
>> downside is obviously pushing more complexity onto web developers.
>> Complexity which will be duplicated across sites.
>>
>> However there is another problem to consider here. Can switching
>> collation on a objectStore or a unique index can affect its validity?
>> I.e. if you switch from a case sensitive to a case insensitive
>> collation, does that mean that if you have two entries with the
>> primary keys "Sweden" and "sweden" they collide and thus the change of
>> collation must result in an error (or aborted transaction)?
>>
>> I do seem to recall that there are ways to do at least case
>> sensitivity such that you generally don't take case into account when
>> sorting, unless two entries are exactly the same, in which case you do
>> look at casing to differentiate them. However I don't really know a
>> whole lot about this and so defer to people that know
>> internationalization better.

This is a good point. It makes me lean toward not allowing changing the 
collation of an index or store. That means we could just have an optional 
parameter (in the generic parameter object thingy we have now) on 
createObjectStore and createIndex that indicates the collation name. It seems 
minimally disruptive, it doesn't tax people that don't care about it, and since 
there is no setCollation we don't have the problem of not being able to 
re-index the data.

>> > Another piece of feedback I heard consistently as I discussed this with 
>> > various folks at Microsoft is the need to be able to pick up what the UA 
>> > would consider the collation that's most appropriate for the user 
>> > environment (derived from settings, page language or whatever). We could 
>> > support this by introducing a special value that  you can pass to 
>> > setCollation that indicates "pick whatever is the right for the 
>> > environment's language right now". Given that there is no other way for 
>> > people to discover the user preference on this, I think this is pretty 
>> > important.
>> I would be fine with this as long as it's a explicit opt-in. There is
>> definitely a risk that people will do this and then only do testing in
>> one language, but it seems to me like a useful use case to support,
>> and I don't see a way of supporting this while completely avoiding the
>> risk of internationalization bugs.

I agree, it should be opt-in. I still assume we'll default to binary collation 
(same if you specify the collation value as null). I was reading the BCP 47 [1] 
and in section 4.1 "Choice of Language Tag" the item #7 seems to describe what 
we're looking for. The value "i-default" seems to match our needs close enough, 
so callers could use that value. Discoverability is not great, but we avoid 
having to specify something new, and arguably they'll need to read somewhere 
that this argument is a BCP47-compatible value, and we could put a comment 
about "i-default" right there.

Thanks
-pablo

RE: [IndexedDB] Spec changes for international language support

Reply via email to