On Fri, Mar 18, 2011 at 12:29 PM, Pablo Castro <[email protected]> wrote: > > From: [email protected] [mailto:[email protected]] On > Behalf Of Keean Schupke > Sent: Friday, March 18, 2011 1:53 AM > >>> See my proposal in another thread. The basic idea is to copy BDB. Have a >>> primary index that is based on an integer, something primitive and fast. >>> Allow secondary indexes which use a callback to generate a binary index >>> key. IDB shifts the complexity out into a library. Common use cases can be >>> provided (a hash of all fields in the object, internationalised >>> bidirectional lexicographic etc...), but the user is free to write their >>> own for less usual cases (for example indexing by the last word in a name >>> string to order by surname). > > I agree with Jeremy's comments on the other thread for this. Having the > callback mechanism definitely sounds interesting but there are a ton of > common cases that we can solve by just taking a language identifier, I'm not > sure we want to make people work hard to get something that's already > supported in most systems. The idea of having a callback to compute the index > value feels incremental to this, so we could take on it later on without > disrupting the explicit international collation stuff. > >>> On 18 March 2011 02:19, Jonas Sicking <[email protected]> wrote: >>> 2011/3/17 Pablo Castro <[email protected]>: >>> > >>> > From: Jonas Sicking [mailto:[email protected]] >>> > Sent: Tuesday, March 08, 2011 1:11 PM >>> > >>> >>> All in all, is there anything preventing adding the API Pablo suggests >>> >>> in this thread to the IndexedDB spec drafts? >>> > >>> > I wanted to propose a couple of specific tweaks to the initial proposal >>> > and then unless I hear pushback start editing this into the spec. >>> > >>> > From reading the details on this thread I'm starting to realize that >>> > per-database collations won't do it. What did it for me was the example >>> > that has a fuzzier matching mode (case/accent insensitive). This is >>> > exactly the kind of index I would want to sort people's names in my >>> > address book, but most likely not the index I'll want to use for my >>> > primary key. >>> > >>> > Refactoring the API to accommodate for this would mean to move the >>> > setCollation() method and the collation property to the object store and >>> > index objects. If we were willing to live without the ability to change >>> > them we could take collation as one of the optional parameters to >>> > createObjectStore()/createIndex() and reduce a bit of surface area... >>> Unfortunately I think you bring up good use cases for >>> per-objectStore/index collations. It's definitely tempting to just add >>> it as a optional parameter to createObjectStore/createIndex. The >>> downside is obviously pushing more complexity onto web developers. >>> Complexity which will be duplicated across sites. >>> >>> However there is another problem to consider here. Can switching >>> collation on a objectStore or a unique index can affect its validity? >>> I.e. if you switch from a case sensitive to a case insensitive >>> collation, does that mean that if you have two entries with the >>> primary keys "Sweden" and "sweden" they collide and thus the change of >>> collation must result in an error (or aborted transaction)? >>> >>> I do seem to recall that there are ways to do at least case >>> sensitivity such that you generally don't take case into account when >>> sorting, unless two entries are exactly the same, in which case you do >>> look at casing to differentiate them. However I don't really know a >>> whole lot about this and so defer to people that know >>> internationalization better. > > This is a good point. It makes me lean toward not allowing changing the > collation of an index or store. That means we could just have an optional > parameter (in the generic parameter object thingy we have now) on > createObjectStore and createIndex that indicates the collation name. It seems > minimally disruptive, it doesn't tax people that don't care about it, and > since there is no setCollation we don't have the problem of not being able to > re-index the data.
So there is no way to specify things such that the collation doesn't affect unique-ness? If so, I tend to agree. >>> > Another piece of feedback I heard consistently as I discussed this with >>> > various folks at Microsoft is the need to be able to pick up what the UA >>> > would consider the collation that's most appropriate for the user >>> > environment (derived from settings, page language or whatever). We could >>> > support this by introducing a special value that you can pass to >>> > setCollation that indicates "pick whatever is the right for the >>> > environment's language right now". Given that there is no other way for >>> > people to discover the user preference on this, I think this is pretty >>> > important. >>> I would be fine with this as long as it's a explicit opt-in. There is >>> definitely a risk that people will do this and then only do testing in >>> one language, but it seems to me like a useful use case to support, >>> and I don't see a way of supporting this while completely avoiding the >>> risk of internationalization bugs. > > I agree, it should be opt-in. I still assume we'll default to binary > collation (same if you specify the collation value as null). I was reading > the BCP 47 [1] and in section 4.1 "Choice of Language Tag" the item #7 seems > to describe what we're looking for. The value "i-default" seems to match our > needs close enough, so callers could use that value. Discoverability is not > great, but we avoid having to specify something new, and arguably they'll > need to read somewhere that this argument is a BCP47-compatible value, and we > could put a comment about "i-default" right there. Sounds good to me. Though you seem to have forgotten to include the [1] reference. / Jonas
