Re: [IndexedDB] Spec changes for international language support

Jonas Sicking Fri, 18 Mar 2011 14:04:05 -0700

On Fri, Mar 18, 2011 at 12:29 PM, Pablo Castro
<[email protected]> wrote:
>
> From: [email protected] [mailto:[email protected]] On 
> Behalf Of Keean Schupke
> Sent: Friday, March 18, 2011 1:53 AM
>
>>> See my proposal in another thread. The basic idea is to copy BDB. Have a 
>>> primary index that is based on an integer, something primitive and fast. 
>>> Allow secondary indexes which use a callback to generate a binary index 
>>> key. IDB shifts the complexity out into a library. Common use cases can be 
>>> provided (a hash of all fields in the object, internationalised 
>>> bidirectional lexicographic etc...), but the user is free to write their 
>>> own for less usual cases (for example indexing by the last word in a name 
>>> string to order by surname).
>
> I agree with Jeremy's comments on the other thread for this. Having the 
> callback mechanism definitely sounds interesting but there are a ton of 
> common cases that we can solve by just taking a language identifier, I'm not 
> sure we want to make people work hard to get something that's already 
> supported in most systems. The idea of having a callback to compute the index 
> value feels incremental to this, so we could take on it later on without 
> disrupting the explicit international collation stuff.
>
>>> On 18 March 2011 02:19, Jonas Sicking <[email protected]> wrote:
>>> 2011/3/17 Pablo Castro <[email protected]>:
>>> >
>>> > From: Jonas Sicking [mailto:[email protected]]
>>> > Sent: Tuesday, March 08, 2011 1:11 PM
>>> >
>>> >>> All in all, is there anything preventing adding the API Pablo suggests
>>> >>> in this thread to the IndexedDB spec drafts?
>>> >
>>> > I wanted to propose a couple of specific tweaks to the initial proposal 
>>> > and then unless I hear pushback start editing this into the spec.
>>> >
>>> > From reading the details on this thread I'm starting to realize that 
>>> > per-database collations won't do it. What did it for me was the example 
>>> > that has a fuzzier matching mode (case/accent insensitive). This is 
>>> > exactly the kind of index I would want to sort people's names in my 
>>> > address book, but most likely not the index I'll want to use for my 
>>> > primary key.
>>> >
>>> > Refactoring the API to accommodate for this would mean to move the 
>>> > setCollation() method and the collation property to the object store and 
>>> > index objects. If we were willing to live without the ability to change 
>>> > them we could take collation as one of the optional parameters to 
>>> > createObjectStore()/createIndex() and reduce a bit of surface area...
>>> Unfortunately I think you bring up good use cases for
>>> per-objectStore/index collations. It's definitely tempting to just add
>>> it as a optional parameter to createObjectStore/createIndex. The
>>> downside is obviously pushing more complexity onto web developers.
>>> Complexity which will be duplicated across sites.
>>>
>>> However there is another problem to consider here. Can switching
>>> collation on a objectStore or a unique index can affect its validity?
>>> I.e. if you switch from a case sensitive to a case insensitive
>>> collation, does that mean that if you have two entries with the
>>> primary keys "Sweden" and "sweden" they collide and thus the change of
>>> collation must result in an error (or aborted transaction)?
>>>
>>> I do seem to recall that there are ways to do at least case
>>> sensitivity such that you generally don't take case into account when
>>> sorting, unless two entries are exactly the same, in which case you do
>>> look at casing to differentiate them. However I don't really know a
>>> whole lot about this and so defer to people that know
>>> internationalization better.
>
> This is a good point. It makes me lean toward not allowing changing the 
> collation of an index or store. That means we could just have an optional 
> parameter (in the generic parameter object thingy we have now) on 
> createObjectStore and createIndex that indicates the collation name. It seems 
> minimally disruptive, it doesn't tax people that don't care about it, and 
> since there is no setCollation we don't have the problem of not being able to 
> re-index the data.


So there is no way to specify things such that the collation doesn't
affect unique-ness? If so, I tend to agree.

>>> > Another piece of feedback I heard consistently as I discussed this with 
>>> > various folks at Microsoft is the need to be able to pick up what the UA 
>>> > would consider the collation that's most appropriate for the user 
>>> > environment (derived from settings, page language or whatever). We could 
>>> > support this by introducing a special value that  you can pass to 
>>> > setCollation that indicates "pick whatever is the right for the 
>>> > environment's language right now". Given that there is no other way for 
>>> > people to discover the user preference on this, I think this is pretty 
>>> > important.
>>> I would be fine with this as long as it's a explicit opt-in. There is
>>> definitely a risk that people will do this and then only do testing in
>>> one language, but it seems to me like a useful use case to support,
>>> and I don't see a way of supporting this while completely avoiding the
>>> risk of internationalization bugs.
>
> I agree, it should be opt-in. I still assume we'll default to binary 
> collation (same if you specify the collation value as null). I was reading 
> the BCP 47 [1] and in section 4.1 "Choice of Language Tag" the item #7 seems 
> to describe what we're looking for. The value "i-default" seems to match our 
> needs close enough, so callers could use that value. Discoverability is not 
> great, but we avoid having to specify something new, and arguably they'll 
> need to read somewhere that this argument is a BCP47-compatible value, and we 
> could put a comment about "i-default" right there.

Sounds good to me. Though you seem to have forgotten to include the
[1] reference.

/ Jonas

Re: [IndexedDB] Spec changes for international language support

Reply via email to