RE: [IndexedDB] Closing on bug 9903 (collations)

2011-06-17 Thread Pablo Castro

From: [] On 
Behalf Of Keean Schupke
Sent: Tuesday, May 31, 2011 11:51 PM

>> On 1 June 2011 01:37, Pablo Castro  wrote:
>> -Original Message-
>> From: [] On Behalf Of Aryeh 
>> Gregor
>> Sent: Tuesday, May 31, 2011 3:49 PM
>> >> On Tue, May 31, 2011 at 6:39 PM, Pablo Castro
>> >>  wrote:
>> >> > No, that was poor wording on my part, I keep using "locale" in the 
>> >> > wrong context. I meant to have the API take a proper collation 
>> >> > identifier. The identifier can be as specific as the caller wants it to 
>> >> > be. The implementation could choose to not honor some specific detail 
>> >> > if it can't handle it (to the extent that doing so is allowed by the 
>> >> > specification of collation names), or fail because it considers that 
>> >> > not handling a particular aspect of the collation identifier would 
>> >> > severely deviate from the caller's expectations.
>> >>
>> >> I'm not sure I understand you.  My personal opinion is that there
>> >> should be no undefined behavior here.  If authors are allowed to pass
>> >> collation identifiers, the spec needs to say exactly how they're to be
>> >> interpreted, so the same identifier passed to two different browsers
>> >> will result in the same collation, i.e., the same strings need to sort
>> >> the same cross-browser.  Having only binary collation is better than
>> >> having non-binary collations but not defining them, IMO.
>> I thought BCP47 allowed implementations to drop subtags if needed. I just 
>> re-read the spec and it seems that it only allows to do that in constrained 
>> cases where you can't fit the whole name in your buffer (which wouldn't 
>> apply to the context discussed here). My first instinct is that this is 
>> quite a bit to guarantee (full consistency in collation), but it seems that 
>> that's what the spec is shooting for.
>> >> > Given the amount of debate on this, could we at least agree that we can 
>> >> > do binary for v1? We can then have an open item for v2 on taking 
>> >> > collation names and sort according to UCA or taking callbacks and such.
>> >>
>> >> I'm okay with supporting only binary to start with.
>> Great. I'll still wait a bit to see what other folks think, and then update 
>> the bug one way or the other.
>> Thanks
>> -pablo
>> The discussion sounds like it is headed in the right direction. Are there 
>> any issues with non-unicode encodings that need to be dealt with (HTTP 
>> headers default to ISO-8859 I think). Would people be expected to convert on 
>> read into UTF-16 strings or use typed-arrays?

I asked around here and folks actually pointed out that the JavaScript spec 
seems to be describing exactly what we needed. Looking at here [1], section 
11.8.5, the relevant fragment starting at step 4 goes:

Else, both px and py are Strings
a. If py is a prefix of px, return false. (A String value p is a prefix of 
String value q if q can be the result of concatenating p and some other String 
r. Note that any String is a prefix of itself, because r may be the empty 
b. If px is a prefix of py, return true.
c. Let k be the smallest nonnegative integer such that the character at 
position k within px is different from the character at position k within py. 
(There must be such a k, for neither String is a prefix of the other.)
d. Let m be the integer that is the code unit value for the character at 
position k within px.
e. Let n be the integer that is the code unit value for the character at 
position k within py.
f. If m < n, return true. Otherwise, return false.

It also has a note below indicating:

NOTE 2 The comparison of Strings uses a simple lexicographic ordering on 
sequences of code unit values. There is no attempt to use the more complex, 
semantically oriented definitions of character or string equality and collating 
order defined in the Unicode specification. Therefore String values that are 
canonically equal according to the Unicode standard could test as unequal. In 
effect this algorithm assumes that both Strings are already in normalised form. 
Also, note that for strings containing supplementary characters, lexicographic 
ordering on sequences of UTF-16 code unit values differs from that on sequences 
of code point values.

Which is very much in line with what we've been discussing, and has the extra 
feature of being compatible with JavaScript order. 

So it looks like we could reference (or inline) this in the spec and have a 
fully specified order for keys with string content.




RE: [IndexedDB] Evictable stores

2011-06-07 Thread Pablo Castro

From: [] On Behalf Of David Grogan
Sent: Tuesday, June 07, 2011 1:01 PM

>> We (chrome) are still having internal discussions about evictable vs 
>> non-evictable storage; we're on board with worrying about this in v2.
>> On Tue, May 31, 2011 at 5:33 PM, Jonas Sicking  wrote:
>> On Tue, May 31, 2011 at 3:46 PM, Pablo Castro
>>  wrote:
>> >> > We discussed evictable stores some time ago and captured it in bug 
>> >> > 11350 [1], but I haven't seen further discussion on it and it hasn't 
>> >> > gone into the spec. I'm curious on where folks are with this? Should we 
>> >> > move it to v2? Should we just allow UAs to have their own policy around 
>> >> > eviction (back at TPAC it seemed folks had reasonable but different 
>> >> > strategies for handling when to allow websites to use storage already).
>> >> I think this is a very interesting feature, but one that I'd prefer to
>> >> move to a version 2 as it isn't a required feature and is one that
>> >> seems easy to "retrofit".
>> >>
>>  >> / Jonas

Got it. I postponed the bug.

RE: [IndexedDB] Evictable stores

2011-06-07 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Tuesday, May 31, 2011 5:34 PM

>> On Tue, May 31, 2011 at 3:46 PM, Pablo Castro
>>  wrote:
>> > We discussed evictable stores some time ago and captured it in bug 11350 
>> > [1], but I haven't seen further discussion on it and it hasn't gone into 
>> > the spec. I'm curious on where folks are with this? Should we move it to 
>> > v2? Should we just allow UAs to have their own policy around eviction 
>> > (back at TPAC it seemed folks had reasonable but different strategies for 
>> > handling when to allow websites to use storage already).
>> I think this is a very interesting feature, but one that I'd prefer to
>> move to a version 2 as it isn't a required feature and is one that
>> seems easy to "retrofit".
>> / Jonas

The feature is already captured in the wiki page that tracks future features 
[1]. So I guess we can just resolve the bug as "later". 

Jeremy, the bug is currently assigned to you, were you doing work on it or 
should I just resolve it?



RE: [IndexedDB] Closing on bug 9903 (collations)

2011-05-31 Thread Pablo Castro

-Original Message-
From: [] On Behalf Of Aryeh 
Sent: Tuesday, May 31, 2011 3:49 PM

>> On Tue, May 31, 2011 at 6:39 PM, Pablo Castro
>>  wrote:
>> > No, that was poor wording on my part, I keep using "locale" in the wrong 
>> > context. I meant to have the API take a proper collation identifier. The 
>> > identifier can be as specific as the caller wants it to be. The 
>> > implementation could choose to not honor some specific detail if it can't 
>> > handle it (to the extent that doing so is allowed by the specification of 
>> > collation names), or fail because it considers that not handling a 
>> > particular aspect of the collation identifier would severely deviate from 
>> > the caller's expectations.
>> I'm not sure I understand you.  My personal opinion is that there
>> should be no undefined behavior here.  If authors are allowed to pass
>> collation identifiers, the spec needs to say exactly how they're to be
>> interpreted, so the same identifier passed to two different browsers
>> will result in the same collation, i.e., the same strings need to sort
>> the same cross-browser.  Having only binary collation is better than
>> having non-binary collations but not defining them, IMO.

I thought BCP47 allowed implementations to drop subtags if needed. I just 
re-read the spec and it seems that it only allows to do that in constrained 
cases where you can't fit the whole name in your buffer (which wouldn't apply 
to the context discussed here). My first instinct is that this is quite a bit 
to guarantee (full consistency in collation), but it seems that that's what the 
spec is shooting for. 

>> > Given the amount of debate on this, could we at least agree that we can do 
>> > binary for v1? We can then have an open item for v2 on taking collation 
>> > names and sort according to UCA or taking callbacks and such.
>> I'm okay with supporting only binary to start with.

Great. I'll still wait a bit to see what other folks think, and then update the 
bug one way or the other.


[IndexedDB] Evictable stores

2011-05-31 Thread Pablo Castro
We discussed evictable stores some time ago and captured it in bug 11350 [1], 
but I haven't seen further discussion on it and it hasn't gone into the spec. 
I'm curious on where folks are with this? Should we move it to v2? Should we 
just allow UAs to have their own policy around eviction (back at TPAC it seemed 
folks had reasonable but different strategies for handling when to allow 
websites to use storage already).



RE: [IndexedDB] Closing on bug 9903 (collations)

2011-05-31 Thread Pablo Castro

-Original Message-
From: [] On Behalf Of Aryeh 
Sent: Friday, May 06, 2011 10:05 AM

>> On Fri, May 6, 2011 at 5:18 AM, Jonas Sicking  wrote:
>> > Based on that, my conclusion is that we should go with what Pablo is
>> > proposing. And I think we should do it for v1.
>> If I understand correctly, Pablo's proposal is that the author be
>> allowed to specify a locale, and the browser can collate in some
>> undefined way based on that locale.  That sounds like a really bad
>> idea for interop.  If non-binary collation is supported in a first
>> version, it should be either

No, that was poor wording on my part, I keep using "locale" in the wrong 
context. I meant to have the API take a proper collation identifier. The 
identifier can be as specific as the caller wants it to be. The implementation 
could choose to not honor some specific detail if it can't handle it (to the 
extent that doing so is allowed by the specification of collation names), or 
fail because it considers that not handling a particular aspect of the 
collation identifier would severely deviate from the caller's expectations.

>> 1) Two choices, binary or UCA 6.0.0.  (AFAIK, UCA gives fairly good
>> results for most languages even without tailoring, so it might be just
>> fine for v1.  It's vastly better than binary, for sure.)

Given the amount of debate on this, could we at least agree that we can do 
binary for v1? We can then have an open item for v2 on taking collation names 
and sort according to UCA or taking callbacks and such.

>> 2) In addition to binary and UCA 6.0.0, allow UCA 6.0.0 tailored by
>> any of the locales defined by CLDR 1.9.1.
>> There also needs to be some thought put into how to handle version
>> updates, since browsers cannot update their UCA or CLDR implementation
>> without rebuilding all existing indexes that used it (unless they keep
>> the old implementation forever).  It might be that browsers should
>> just stick to a fixed version for the time being (like 6.0.0 and
>> 1.9.1), and we might decide that no further APIs are needed now to
>> accommodate possible future switches, but at least some thought needs
>> to be given to it.

I wonder if the API (independently of when we get to this) should include the 
version either as part of the collation identifier or as a separate argument. 
This would allow UAs to support a version or two for a while, and then phase 
them out as they fall out of use in favor of newer ones.

>> On consideration, I don't think user-specified sortkey functions are
>> necessary at this stage.  If collations are to be identified by
>> strings for now, we could always overload the value to accept a
>> function at some later date if we wanted to support that.  So I
>> wouldn't worry about that further.

I agree.


[IndexedDB] Closing on bug 9903 (collations)

2011-04-29 Thread Pablo Castro
We've had quite a bit of debate on this but I don't think we've reached 
closure. At this point I would be fine with either one of a) postpone to v2 and 
agree that for now we'll just do binary collation everywhere or b) the last 
form of the proposal sent around: extra "collation" argument (following BCP47 
plus whatever the UA wants to allow) in createObjectStore/createIndex, plus a 
collation property to interrogate it; no way to change the collation of a 
store/index once created.

Given that this turned out to be a more elaborate topic than I had originally 
expected and that it doesn't seem to have a lot of traction right now, my 
preference would be to postpone to v2. Thoughts? Once we make a call I'll make 
sure the spec reflects it.


[IndexedDB] Exceptions in IDB and the DOMException

2011-04-19 Thread Pablo Castro
This came up today that I didn't remember having a conversation about it with 

We currently have IDBDatabaseException with a some error codes as constants and 
code/message properties. Looking at DOMException as defined in DOM Core [1], it 
turns out that a) the pattern of the class is identica, but instead of 
code/message it has code/name and b) there are some errors present in both or 
that are very close (e.g. NOT_FOUND_ERR, DATA_CLONE_ERR, QUOTA_EXCEEDED_ERR). 

Would it be worth it trying to use the constants of DOMException when there's 
one already there that matches the need? If that was the case, would be it the 
constants that we would be reusing or would be have to throw a DOMException 
instead of an IDBDatabaseException?

Separately, in reference to a) above, should we change 
IDBDatabaseException.message to for consistency?



RE: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-05 Thread Pablo Castro

From: [] On 
Behalf Of Keean Schupke
Sent: Monday, April 04, 2011 10:17 PM

>> Something like RelationalDB gives you the power of a relational-db with no 
>> dependence on a specific implementation of SQL, so it would be compatible 
>> enough for the web.  It fixes all the problems with the standardisation of 
>> WebSQL that have been talked about so far.  I think it would find no 
>> technical issues that block its standardisation.  As a high level DB API it 
>> does not need all the low-level features of IndexedDB, so its API can be 
>> much simpler and cleaner. RelationalDB can at least be provided as a library 
>> on top of IndexedDB, and it can use WebSQL where it is supported. My concern 
>> with the library approach is performance when implemented on top of 
>> IndexedDB.

The goal of IndexedDB has always been to enable things like RelationalDB and 
CouchDB to be built on top, while maintaining a reasonable level of 
functionality for those that wanted to use it directly. I really like the idea 
of thinking of RelationalDB as something that's built as a library on top of 
IndexedDB. Are there specific tweaks we can make to IndexedDB so it can be a 
good lower-layer for RelationalDB, such that RelationalDB could be built as a 
pure JavaScript library?


RE: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Thursday, March 31, 2011 11:36 AM

>> I can find a lot of stuff on collation, but not a lot about why it could not 
>> be done in a library. Could you summerise the reasons why this needs to be 
>> core functionality for me?
>> Sorry, but that stuff is paged out of my brain.  Pablo, can you?
>> A library could chose to use an object store as meta-data to store the 
>> collation orders that it is using for various indexes for example.

- Currently there are no APIs in JavaScript to compare strings using specific 
collations. There are folks that are looking into this, but it will need time.
- I'm far from an expert in the topic, but from talking to folks that 
understand this well it seems that to actually implement this entirely in 
JavaScript it would mean you have to download collation tables and apply them 
as needed in callbacks. Not only this means a hit in download size/time for the 
app but also that callbacks have to either download stuff or inline collation 
rules/tables in the callback itself. 
- In pure practical terms, I suspect the 80% scenario can be covered by 
implementing this natively, having it be fast and simple to use for common 
cases. Not pushing back on the callback stuff, just saying that I find it 
valuable to have users simply say "en-US" and get what they wanted.
- Also from the practical perspective, simple cases that don't require the 
flexibility and can avoid having to take care of making the callbacks perfectly 
consistent even as you roll out updates that may hit only some of the pages, 
use components written by someone else, etc.
- By default we would still do binary collation (there was a question in the 
thread, I forget exactly where).


RE: [IndexedDB] Spec changes for international language support

2011-03-22 Thread Pablo Castro

From: [] On 
Behalf Of Keean Schupke
Sent: Tuesday, March 22, 2011 5:34 PM

>> IMHO not the job of Idb to store the callbacks, so I don't see this 
>> complexity as a reason not to implement the API using callbacks. I think 
>> having one consistent API is more important.
>> Specifying the collation 'name' has all the same problems as callbacks 
>> (needs to be re-done on every page, possibility of using different 
>> collations on different pages).
>> Really a 'function' is just a symbol for a collation. A function name, is a 
>> better symbol for a collation than a string. Function's have a uniqueness 
>> property strings do not. So specifying a function as the >> collations 
>> instead of a string really is the same thing. Consider below:

I don't think it's the same. If we don't store the callbacks in the database it 
means every page has to have full knowledge of the database schema (at least 
all the indexes) all the time, instead of just pulling that in on demand when 
needed. It also means we can never allow browser developer tools or generic 
dev-tool-webpages to modify the database because indexes would become invalid 
(not sure allowing tools to mess with the database in general is a good idea, 
but I thought it illustrated the point well). 

I wonder if the overall issue we're discussing has to do with "how embedded" 
the database is. In BDB scenarios where the database is completely invisible 
outside of an application many of these decisions make more sense. I don't 
think of web applications that way. I think of them more as a number of 
building blocks (pages, pieces within pages, tool pages added on the side) that 
are authored and sometimes even versioned independently, and the interface 
between those building blocks and the store is public and visible to tools and 
generic data browsers. All that changes the assumptions in the overall picture. 


RE: [IndexedDB] Spec changes for international language support

2011-03-22 Thread Pablo Castro

From: [] On 
Behalf Of Keean Schupke
Sent: Friday, March 18, 2011 8:17 PM

>> On 18 March 2011 19:29, Pablo Castro  wrote:
>> From: [] On 
>> Behalf Of Keean Schupke
>> Sent: Friday, March 18, 2011 1:53 AM
>> >> See my proposal in another thread. The basic idea is to copy BDB. Have a 
>> >> primary index that is based on an integer, something primitive and fast. 
>> >> Allow secondary indexes which use a callback to generate a binary index 
>> >> key. IDB shifts the complexity out into a library. Common use cases can 
>> >> be provided (a hash of all fields in the object, internationalised 
>> >> bidirectional lexicographic etc...), but the user is free to write their 
>> >> own for less usual cases (for example indexing by the last word in a name 
>> >> string to order by surname).
I agree with Jeremy's comments on the other thread for this. Having the 
callback mechanism definitely sounds interesting but there are a ton of common 
cases that we can solve by just taking a language identifier, I'm not sure we 
want to make people work hard to get something that's already supported in most 
systems. The idea of having a callback to compute the index value feels 
incremental to this, so we could take on it later on without disrupting the 
explicit international collation stuff.
>> The idea would be to provide pre-defined implementations of the callback for 
>> common use cases, then it is just as simple to register a callback as set 
>> any other option. All this means to the API is you pass a function instead 
>> of a string. It also is better for modularity as all the code relating to 
>> the sort order is kept in the callback functions.
>> The difference comes down to something like:
>> index.set_order_lexicographic('us');
>> vs
>> index.set_order_method(order_lexicographic('us'));
>> So more than just setting a property like the first case, where presumably 
>> all the ordering code is mixed in with the indexing code, the second case 
>> encapsulates all the ordering code in the function returned from the 
>> execution of order_lexicographic('us'). This function would represent a 
>> mapping from the object being indexed to a binary blob that is the actual 
>> stored index data.
>> So doing it this was does not necessarily make things harder, and it 
>> improves encapsulation, the type-safety, and the flexibility of the API.

Yep, we talked about supporting callbacks already in the other threads and in 
this one. As I mentioned before, I think this is an incremental to the basic 
feature of taking a collation name. I do realize you can just pass a 
pre-implemented function, but that opens the door to a bunch of things we'd 
need to handle, including storing possibly storing code in the database (such 
that proper updates don't depend on each page re-registering all the index 
callbacks), handling scripts with the appropriate context to run during index 
updates, etc.  I would much rather have basic functionality in place and then 
expand as needed once we have users using the API.


RE: [IndexedDB] Any particular reason built-in properties are not indexable?

2011-03-22 Thread Pablo Castro

-Original Message-
From: Jonas Sicking [] 
Sent: Monday, March 21, 2011 2:54 PM

>> On Mon, Mar 21, 2011 at 11:51 AM, Pablo Castro
>>  wrote:
>> > The spec today requires that properties key paths point at need to be
>> > enumerated (see 3.1.2 "Object Store"). Any particular reason for that? It
>> > would be reasonable to allow an index on say the "length" property of a
>> > string. Perhaps we're opening the door for too much, so I wanted to double
>> > check so we make an explicit call one way or the other. Thoughts?
>> The structured clone algorithm only copies enumerable properties,
>> given how we currently do indexes it would be sort of strange if you
>> could add an index on a property that isn't stored.
>> This is generally not a problem though. Before ES5 there wasn't even a
>> way to create non-enumerated properties. They only appeared on host
>> objects which you can't structured-clone anyway.
>> One exception to this is Array.length which you mention. While that
>> property isn't copied by the structured-clone algorithm, it's
>> recreated by it since a new array object is created which contains a
>> length property computed according to the same rules.
>> We could special-case the array.length property in the keyPath
>> evaluation algorithm. We might want to do the same for other
>> host-object properties such as Blob.size and Blob.type since they
>> aren't actually structured-cloned since they live on the prototype
>> chain rather than the objects themselves.

I'm fine not supporting it, I just wanted to bring it up because it came up 
here and wanted to make sure we made an explicit call. I'd rather not one-off 
Array.length, so it seems it would be best to just not do it across the board.


[IndexedDB] Any particular reason built-in properties are not indexable?

2011-03-21 Thread Pablo Castro
The spec today requires that properties key paths point at need to be 
enumerated (see 3.1.2 "Object Store"). Any particular reason for that? It would 
be reasonable to allow an index on say the "length" property of a string. 
Perhaps we're opening the door for too much, so I wanted to double check so we 
make an explicit call one way or the other. Thoughts?


RE: [IndexedDB] Spec changes for international language support

2011-03-18 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Friday, March 18, 2011 1:57 PM

>> >>> However there is another problem to consider here. Can switching
>> >>> collation on a objectStore or a unique index can affect its validity?
>> >>> I.e. if you switch from a case sensitive to a case insensitive
>> >>> collation, does that mean that if you have two entries with the
>> >>> primary keys "Sweden" and "sweden" they collide and thus the change of
>> >>> collation must result in an error (or aborted transaction)?
>> >>>
>> >>> I do seem to recall that there are ways to do at least case
>> >>> sensitivity such that you generally don't take case into account when
>> >>> sorting, unless two entries are exactly the same, in which case you do
>> >>> look at casing to differentiate them. However I don't really know a
>> >>> whole lot about this and so defer to people that know
>> >>> internationalization better.
>> >
>> > This is a good point. It makes me lean toward not allowing changing the 
>> > collation of an index or store. That means we could just have an optional 
>> > parameter (in the generic parameter object thingy we have now) on 
>> > createObjectStore and createIndex that indicates the collation name. It 
>> > seems minimally disruptive, it doesn't tax people that don't care about 
>> > it, and since there is no setCollation we don't have the problem of not 
>> > being able to re-index the data.
>> So there is no way to specify things such that the collation doesn't
>> affect unique-ness? If so, I tend to agree.

The problem is that different collations will consider different things unique. 
This is bound to be variable across languages and such, so I'm not sure we want 
to be in the business of fine-tuning this. It seems that being a bit more 
restrictive could result in a more robust result overall. If someone really 
needs to change the collation they can copy the table manually...not great, but 
if we think it's a corner case it's probably fine.

>> >>> > Another piece of feedback I heard consistently as I discussed this 
>> >>> > with various folks at Microsoft is the need to be able to pick up what 
>> >>> > the UA would consider the collation that's most appropriate for the 
>> >>> > user environment (derived from settings, page language or whatever). 
>> >>> > We could support this by introducing a special value that  you can 
>> >>> > pass to setCollation that indicates "pick whatever is the right for 
>> >>> > the environment's language right now". Given that there is no other 
>> >>> > way for people to discover the user preference on this, I think this 
>> >>> > is pretty important.
>> >>> I would be fine with this as long as it's a explicit opt-in. There is
>> >>> definitely a risk that people will do this and then only do testing in
>> >>> one language, but it seems to me like a useful use case to support,
>> >>> and I don't see a way of supporting this while completely avoiding the
>> >>> risk of internationalization bugs.
>> >
>> > I agree, it should be opt-in. I still assume we'll default to binary 
>> > collation (same if you specify the collation value as null). I was reading 
>> > the BCP 47 [1] and in section 4.1 "Choice of Language Tag" the item #7 
>> > seems to describe what we're looking for. The value "i-default" seems to 
>> > match our needs close enough, so callers could use that value. 
>> > Discoverability is not great, but we avoid having to specify something 
>> > new, and arguably they'll need to read somewhere that this argument is a 
>> > BCP47-compatible value, and we could put a comment about "i-default" right 
>> > there.
>> Sounds good to me. Though you seem to have forgotten to include the
>> [1] reference.

Oops, here it goes:

RE: [IndexedDB] Spec changes for international language support

2011-03-18 Thread Pablo Castro

From: [] On 
Behalf Of Keean Schupke
Sent: Friday, March 18, 2011 1:53 AM

>> See my proposal in another thread. The basic idea is to copy BDB. Have a 
>> primary index that is based on an integer, something primitive and fast. 
>> Allow secondary indexes which use a callback to generate a binary index key. 
>> IDB shifts the complexity out into a library. Common use cases can be 
>> provided (a hash of all fields in the object, internationalised 
>> bidirectional lexicographic etc...), but the user is free to write their own 
>> for less usual cases (for example indexing by the last word in a name string 
>> to order by surname).

I agree with Jeremy's comments on the other thread for this. Having the 
callback mechanism definitely sounds interesting but there are a ton of common 
cases that we can solve by just taking a language identifier, I'm not sure we 
want to make people work hard to get something that's already supported in most 
systems. The idea of having a callback to compute the index value feels 
incremental to this, so we could take on it later on without disrupting the 
explicit international collation stuff.

>> On 18 March 2011 02:19, Jonas Sicking  wrote:
>> 2011/3/17 Pablo Castro :
>> >
>> > From: Jonas Sicking []
>> > Sent: Tuesday, March 08, 2011 1:11 PM
>> >
>> >>> All in all, is there anything preventing adding the API Pablo suggests
>> >>> in this thread to the IndexedDB spec drafts?
>> >
>> > I wanted to propose a couple of specific tweaks to the initial proposal 
>> > and then unless I hear pushback start editing this into the spec.
>> >
>> > From reading the details on this thread I'm starting to realize that 
>> > per-database collations won't do it. What did it for me was the example 
>> > that has a fuzzier matching mode (case/accent insensitive). This is 
>> > exactly the kind of index I would want to sort people's names in my 
>> > address book, but most likely not the index I'll want to use for my 
>> > primary key.
>> >
>> > Refactoring the API to accommodate for this would mean to move the 
>> > setCollation() method and the collation property to the object store and 
>> > index objects. If we were willing to live without the ability to change 
>> > them we could take collation as one of the optional parameters to 
>> > createObjectStore()/createIndex() and reduce a bit of surface area...
>> Unfortunately I think you bring up good use cases for
>> per-objectStore/index collations. It's definitely tempting to just add
>> it as a optional parameter to createObjectStore/createIndex. The
>> downside is obviously pushing more complexity onto web developers.
>> Complexity which will be duplicated across sites.
>> However there is another problem to consider here. Can switching
>> collation on a objectStore or a unique index can affect its validity?
>> I.e. if you switch from a case sensitive to a case insensitive
>> collation, does that mean that if you have two entries with the
>> primary keys "Sweden" and "sweden" they collide and thus the change of
>> collation must result in an error (or aborted transaction)?
>> I do seem to recall that there are ways to do at least case
>> sensitivity such that you generally don't take case into account when
>> sorting, unless two entries are exactly the same, in which case you do
>> look at casing to differentiate them. However I don't really know a
>> whole lot about this and so defer to people that know
>> internationalization better.

This is a good point. It makes me lean toward not allowing changing the 
collation of an index or store. That means we could just have an optional 
parameter (in the generic parameter object thingy we have now) on 
createObjectStore and createIndex that indicates the collation name. It seems 
minimally disruptive, it doesn't tax people that don't care about it, and since 
there is no setCollation we don't have the problem of not being able to 
re-index the data.

>> > Another piece of feedback I heard consistently as I discussed this with 
>> > various folks at Microsoft is the need to be able to pick up what the UA 
>> > would consider the collation that's most appropriate for the user 
>> > environment (derived from settings, page language or whatever). We could 
>> > support this by introducing a special value that  you can pass to 
>> > setCollation that indicates "pick whatever is

RE: [IndexedDB] Spec changes for international language support

2011-03-17 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Tuesday, March 08, 2011 1:11 PM

>> All in all, is there anything preventing adding the API Pablo suggests
>> in this thread to the IndexedDB spec drafts?

I wanted to propose a couple of specific tweaks to the initial proposal and 
then unless I hear pushback start editing this into the spec.

From reading the details on this thread I'm starting to realize that 
per-database collations won't do it. What did it for me was the example that 
has a fuzzier matching mode (case/accent insensitive). This is exactly the kind 
of index I would want to sort people's names in my address book, but most 
likely not the index I'll want to use for my primary key. 

Refactoring the API to accommodate for this would mean to move the 
setCollation() method and the collation property to the object store and index 
objects. If we were willing to live without the ability to change them we could 
take collation as one of the optional parameters to 
createObjectStore()/createIndex() and reduce a bit of surface area...I don't 
have a strong preference there. In any case both would use BCP47 names as 
discussed in this thread (as Jonas pointed out, implementations can also do 
their thing as long as they don't interfere with BCP47).

Another piece of feedback I heard consistently as I discussed this with various 
folks at Microsoft is the need to be able to pick up what the UA would consider 
the collation that's most appropriate for the user environment (derived from 
settings, page language or whatever). We could support this by introducing a 
special value that  you can pass to setCollation that indicates "pick whatever 
is the right for the environment's language right now". Given that there is no 
other way for people to discover the user preference on this, I think this is 
pretty important.


RE: Indexed Database API

2011-03-17 Thread Pablo Castro

From: [] On 
Behalf Of Jeremy Orlow
Sent: Tuesday, March 15, 2011 3:08 PM

>> Filed:

I'm not sure if this is a lot more valuable than just creating an index over 
whatever index key you want plus the primary key, and then seeking to the 
compound key of the last row in the previous page to resume scanning the next 
page of records. No strong pushback, just not sure this is worth the extra 


RE: [IndexedDB] Compound and multiple keys

2011-03-08 Thread Pablo Castro

From: [] On 
Behalf Of Keean Schupke
Sent: Tuesday, March 08, 2011 3:03 PM

>> No objections here.
>> Keean.
>> On 8 March 2011 21:14, Jonas Sicking  wrote:
>> On Mon, Mar 7, 2011 at 10:43 PM, Jeremy Orlow  wrote:
>> > On Fri, Jan 21, 2011 at 1:41 AM, Jeremy Orlow  wrote:
>> >>
>> >> On Thu, Jan 20, 2011 at 6:29 PM, Tab Atkins Jr. 
>> >> wrote:
>> >>>
>> >>> On Thu, Jan 20, 2011 at 10:12 AM, Keean Schupke  wrote:
>> >>> > Compound primary keys are commonly used afaik.
>> >>>
>> >>> Indeed.  It's one of the common themes in the debate between natural
>> >>> and synthetic keys.
>> >>
>> >> Fair enough.
>> >> Should we allow explicit compound keys?  I.e myOS.put({...}, ['first
>> >> name', 'last name'])?  I feel pretty strongly that if we do, we should
>> >> require this be specified up-front when creating the objectStore.  I.e. 
>> >> add
>> >> some additional parameter to the optional options object.  Otherwise, 
>> >> we'll
>> >> force implementations to handle variable compound keys for just this one
>> >> case, which seems kind of silly.
>> >> The other option is to just disallow them.
>> >
>> > After thinking about it a bunch and talking to others, I'm actually leaning
>> > towards both option A and B.  Although this will be a little harder for
>> > implementors, it seems like there are solid reasons why some users would
>> > want to use A and solid reasons why others would want to use B.
>> > Any objections to us going that route?
>> Not from me. If I don't hear objections I'll write up a spec draft and
>> attach it here before committing to the spec.

Option A is pretty well understood, I like that one.

For option B, at some point we had a debate on whether when indexing an array 
value we should consider it a single key value or we should unfold it into 
multiple index records. The first option makes it very similar to A in that an 
array is just a composite value (it is quite a bit more painful to 
implement...), the second option is interesting in that allows for new 
scenarios such as objects with an array for tags, where you want to look up by 
tag (even after doing options A and B as currently defined, in order support 
multiple tags you'd need a second store that keeps the tags + key for the 
objects you want to tag). Is there any interest in that scenario?


RE: [IndexedDB] Spec changes for international language support

2011-02-23 Thread Pablo Castro

From: [] On Behalf Of Jungshik 
Shin (???, ???)
Sent: Tuesday, February 22, 2011 2:08 PM

>> On Fri, Feb 18, 2011 at 2:34 AM, Bjoern Hoehrmann  wrote:
>> * Pablo Castro wrote:
>> >We discussed international language support last time at the TPAC and I
>> >said I'd propose spec text for it. Please find the patch below, the
>> >changes mirror exactly the proposal described in the bug we have for
>> >tracking this:
>> You should anticipate objections to that; collation is not a property of
>> language, for instance, for de-de you typically have dictionary sorting
>> and phone book sorting (and of course you have "de-de", "de-ch", and so
>> on, so "de" alone would be rather meaningless). So far the W3C and the
>> IETF have used resource identifiers to specify collations (see XPath 2.0
>> and RFC 4790) where the IETF allows shorthands like "i;ascii-casemap".
>> I agree that simply specifying that 'language' be used without saying what 
>> it means is not sufficient. However, your examples (German phonebook vs 
>> dictionary) can be >> covered with language identifier framework laid out in 
>> BCP47 (with 'u' extension). 

Fair enough. I'll adjust this part of the write up to discuss this in terms of 
"collation identifier" or "language identifier".

>> I do understand that Microsoft uses an extension of language tags for
>> the `CultureInfo` in the .NET Framework, where, say, `de-DE_phoneb` is
>> used to refer to german phone book sorting, but BCP 47 does not allow
>> for that, 
>> There's a way to specify alternate sorting orders (e.g. German phonebook, 
>> Chinese pinyin, stroke count, radical-stroke count order, etc) under the BCP 
>> 47 framework >> because it has a mechanism for defining an extension and 
>> registering it. The Unicode consortium uses that mechanism to define 'u' 
>> extension and a set of subtags that can >> be used with 'u'. 
>> For instance, German phonebook sorting can be identified with 
>> 'de-DE-u-co-phonebk'. See 
>> Also, see Bug 9903 comment 6 by Mark Davis for more examples. Well, I'm just 
>> copying his comment directly here:
>> To add to what Jungshik said, BCP47 defines standard extensions. The 
>> extension
>> defined by the Unicode consortium
>> ( provides for fine-grained
>> specifications of collation behavior.
>> Examples for German:
>> de-u-co-phonebk // phonebook order
>> de-u-kn-true // numeric sorting, eg Tom2 comes before Tom12
>> de-u-ks-level1 // ignore accents, case differences
>> de-u-ks-level2 // ignore case differences
>> de-u-ks-level1-kc-true // ignore accents, but not case
>> These can be combined, such as:
>> de-u-co-phonebk-kn-true-ks-level1-kc-true
>> neither could you devise a language tag to define something
>> like "i;ascii-casemap" (which simply defines A-Z = a-z).

I'm not sure how specific we want to get into this. In particular, would be it 
better if we specified it all the way (including which extensions UAs need to 
support) or if we used BCP47 as the starting point and allowed UAs to support 
additional extensions as needed?

>> I would expect that if browsers offer collations, there would be an in-
>> terface for that so you can use them in other places, as such it might
>> be wiser to accept something other than a language identifier string. 
>> There's an on-going effort to expose a 'rich' set of I18N API to client-side 
>> development using Javascript ( 
>> : The API used be 
>> much more extensive than now, but has been scaled down significantly to get 
>> more browsers on board in its 1st iteration). There we're likely to use BCP 
>> 47 with 'u' extension (see above). So, I think it'd be better if IndexedDB 
>> matches what ECMAScript plans to do. 

This is interesting, do you know how far along is this?

>> I also note that collation often involves equivalence testing, but it
>> is not clear from your proposal whether that is the case here. It might
>> also be a good idea to clearly spell out interoperability expectations;
>> if two implementations support some collation, will they behave the same
>> for any and all inputs as far as collation is concerned, or should one
>> be prepared for slight differences among implementations?

I think it's more practical to assume that users should be prepared for slight 
differences among implementations.


[IndexedDB] Spec changes for international language support

2011-02-17 Thread Pablo Castro
We discussed international language support last time at the TPAC and I said 
I'd propose spec text for it. Please find the patch below, the changes mirror 
exactly the proposal described in the bug we have for tracking this:

btw - the bug is assigned to Nikunj right now but I think that's just because 
of an editing glitch. Nikunj please let me know if you were working on it, 
otherwise I'll just submit the changes once I hear some feedback from this 


Left file: \IndexedDB 
Right file: \IndexedDB Specs\20110217\Speclet_023_IDB_API_Asynchronous_APIs.html
copy 6
add 7
readonly attribute DOMString language

On getting, this attribute MUST return the language
that is configured in this database for string collation. If no 
collation has been
configured for a database this value is null and 
the database will
use binary collation.

copy 6
copy 6
add 24
IDBRequest setLanguage()

This method changes the language used by the database
for string collation. Note that this method must only
be called from a VERSION_CHANGE 
transaction callback.

Changing the language in a database that already contains data 
typically involves reading and 
re-writing the entire database and thus can be a time consuming 

optional DOMString language
The language to be used in the database specified as a 
language identifier as
described in [[!BCP47]].

This method was not called from a 
VERSION_CHANGE transaction callback.
The language parameter contained a string that was not 
a valid language identifier or was a language
identifier not supported by the system.

copy 6

Left file: \IndexedDB 
Right file: \IndexedDB Specs\20110217\Speclet_022_IDB_API_Synchronous_APIs.html
copy 6
add 7
readonly attribute DOMString language

On getting, this attribute MUST return the language
that is configured in this database for string collation. If no 
collation has been
configured for a database this value is null and 
the database will
use binary collation.

copy 6
copy 6
add 24
void setLanguage()

This method changes the language used by the database
for string collation. Note that this method must only
be called from a VERSION_CHANGE 
transaction callback.

Changing the language in a database that already contains data 
typically involves reading and 
re-writing the entire database and thus can be a time consuming 

optional DOMString language
The language to be used in the database specified as a 
language identifier as
described in [[!BCP47]].

This method was not called from a 
VERSION_CHANGE transaction callback.
The language parameter contained a string that was not 
a valid language identifier or was a language
identifier not supported by the system.

copy 6

Left file: \IndexedDB 
Right file: \IndexedDB Specs\20110217\Speclet_020_IDB_API_Constructs.html
copy 6
add 4
Every database also has a language that indicates the 
language that should be used for collating strings when comparing 
copy 6
copy 6
delete 1
add 2
value with no need to separate them by type. When comparing a 
DOMString with another DOMString, the database
language should be used to determine the specific collation 
rules to be used.
copy 6

RE: [IndexedDB] More questions about IDBRequests always firing (WAS: Reason for aborting transactions)

2011-02-17 Thread Pablo Castro

From: [] On 
Behalf Of Jeremy Orlow
Sent: Thursday, February 17, 2011 11:51 AM

>> On Thu, Feb 17, 2011 at 11:12 AM, Jonas Sicking  wrote:
>> On Thu, Feb 17, 2011 at 11:02 AM, ben turner  wrote:
>> >>> Also, what should we do when you enqueue a setVersion transaction and 
>> >>> then
>> >>> close the database handle?  Maybe an ABORT_ERR there too?
>> >>
>> >> Yeah, that'd make sense to me. Just like if you enque any other
>> >> transaction and then close the db handle.
>> >
>> > We don't abort transactions that are already in progress when you call
>> > db.close()... We just set a flag and prevent further transactions from
>> > being created.
>> Doh! Of course.
>> If the setVersion transaction has started then we should definitely
>> allow it finish, just like all other transactions. I don't have a
>> strong opinion on if we should let the setVersion transaction start if
>> it hasn't yet. Seems most consistent to let it, but if there's a
>> strong reason not to I could be convinced.
>> What if you have two database connections open and both do a setVersion 
>> transaction and one calls .close (to yield to the other)?  Neither can start 
>> until one or the other actually is closed.  If a database is closed (not 
>> just close pending) then I think we need to abort any blocked setVersion 
>> calls.  If one is already running, it should certainly be allowed to finish 
>> before we close the database.

This sounds reasonable to me (special case and abort the transaction only for 
blocked setVersion transactions). We should capture it explicitly on the spec, 
it's the kind of little detail that's easy to forget. 


RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-14 Thread Pablo Castro
(sorry for my random out-of-timing previous email on this thread. please see 
below for an actually up to date reply)

-Original Message-
From: Jonas Sicking [] 
Sent: Monday, February 07, 2011 3:31 PM

On Mon, Feb 7, 2011 at 3:07 PM, Jeremy Orlow  wrote:
> On Mon, Feb 7, 2011 at 2:49 PM, Jonas Sicking  wrote:
>> On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow  wrote:
>> > On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking  wrote:
>> >>
>> >> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow 
>> >> wrote:
>> >> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher 
>> >> > wrote:
>> >> >>
>> >> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>> >> >>>
>> >> >>> My current thinking is that we should have some relatively large
>> >> >>> limitmaybe on the order of 64k?  It seems like it'd be very
>> >> >>> difficult
>> >> >>> to
>> >> >>> hit such a limit with any sort of legitimate use case, and the
>> >> >>> chances
>> >> >>> of
>> >> >>> some subtle data-dependent error would be much less.  But a 1GB key
>> >> >>> is
>> >> >>> just
>> >> >>> not going to work well in any implementation (if it doesn't simply
>> >> >>> oom
>> >> >>> the
>> >> >>> process!).  So despite what I said earlier, I guess I think we
>> >> >>> should
>> >> >>> have
>> >> >>> some limit...but keep it an order of magnitude or two larger than
>> >> >>> what
>> >> >>> we
>> >> >>> expect any legitimate usage to hit just to keep the system as
>> >> >>> flexible
>> >> >>> as
>> >> >>> possible.
>> >> >>>
>> >> >>> Does that sound reasonable to people?
>> >> >>
>> >> >> Are we thinking about making this a MUST requirement, or a SHOULD?
>> >> >>  I'm
>> >> >> hesitant to spec an exact size as a MUST given how technology has a
>> >> >> way
>> >> >> of
>> >> >> changing in unexpected ways that makes old constraints obsolete.
>> >> >>  But
>> >> >> then,
>> >> >> I may just be overly concerned about this too.
>> >> >
>> >> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
>> >> > develop
>> >> > against one of the implementations that don't place a limit and then
>> >> > their
>> >> > app would break on the others.
>> >> > The reason that I suggested 64K is that it seems outrageously big for
>> >> > the
>> >> > data types that we're looking at.  But it's too small to do much with
>> >> > base64
>> >> > encoding binary blobs into it or anything else like that that I could
>> >> > see
>> >> > becoming rather large.  So it seems like a limit that'd avoid major
>> >> > abuses
>> >> > (where someone is probably approaching the problem wrong) but would
>> >> > not
>> >> > come
>> >> > close to limiting any practical use I can imagine.
>> >> > With our architecture in Chrome, we will probably need to have some
>> >> > limit.
>> >> >  We haven't decided what that is yet, but since I remember others
>> >> > saying
>> >> > similar things when we talked about this at TPAC, it seems like it
>> >> > might
>> >> > be
>> >> > best to standardize it--even though it does feel a bit dirty.
>> >>
>> >> One problem with putting a limit is that it basically forces
>> >> implementations to use a specific encoding, or pay a hefty price. For
>> >> example if we choose a 64K limit, is that of UTF8 data or of UTF16
>> >> data? If it is of UTF8 data, and the implementation uses something
>> >> else to store the date, you risk having to convert the data just to
>> >> measure the size. Possibly this would be different if we measured size
>> >> using UTF16 as javascript more or less enforces that the source string
>> >> is UTF16 which means that you can measure utf16 size on the cheap,
>> >> even if the stored data uses a different format.
>> >
>> > That's a very good point.  What's your suggestion then?  Spec unlimited
>> > storage and have non-normative text saying that
>> > most implementations will
>> > likely have some limit?  Maybe we can at least spec a minimum limit in
>> > terms
>> > of a particular character encoding?  (Implementations could translate
>> > this
>> > into the worst case size for their own native encoding and then ensure
>> > their
>> > limit is higher.)
>> I'm fine with relying on UTF16 encoding size and specifying a 64K
>> limit. Like Shawn points out, this API is fairly geared towards
>> JavaScript anyway (and I personally don't think that's a bad thing).
>> One thing that I just thought of is that even if implementations use
>> other encodings, you can in the vast majority of cases do a worst-case
>> estimate and easily see that the key that is used is below 64K.
>> That said, does having a 64K limit really help anyone? In SQLite we
>> can easily store vastly more than that, enough that we don't have to
>> specify a limit. And my understanding is that in the Microsoft
>> implementation, the limits for what they can store without resorting
>> to various tricks, is much lower. So since that implementation will
>> have to implement special handling of long keys anyway, is there a
>> difference between say

RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-14 Thread Pablo Castro

>> From: [] On Behalf Of Jeremy Orlow
>> Sent: Sunday, February 06, 2011 12:43 PM
>> On Tue, Dec 14, 2010 at 4:26 PM, Pablo Castro  
>> wrote:
>> From: [] On Behalf Of Jeremy Orlow
>> Sent: Tuesday, December 14, 2010 4:23 PM
>> >> On Wed, Dec 15, 2010 at 12:19 AM, Pablo Castro 
>> >>  wrote:
>> >>
>> >> From: 
>> >> [] On Behalf Of Jonas Sicking
>> >> Sent: Friday, December 10, 2010 1:42 PM
>> >>
>> >> >> On Fri, Dec 10, 2010 at 7:32 AM, Jeremy Orlow  
>> >> >> wrote:
>> >> >> > Any more thoughts on this?
>> >> >>
>> >> >> I don't feel strongly one way or another. Implementation wise I don't
>> >> >> really understand why implementations couldn't use keys of unlimited
>> >> >> size. I wouldn't imagine implementations would want to use fixed-size
>> >> >> allocations for every key anyway, right (which would be a strong
>> >> >> reason to keep maximum size down).
>> >> I don't have a very strong opinion either. I don't quite agree with the 
>> >> guideline of "having something working slowly is better than not working 
>> >> at all" having something not work at all sometimes may help 
>> >> developers hit a wall and think differently about their approach for a 
>> >> given problem. That said, if folks think this is an instance where we're 
>> >> better off not having a limit I'm fine with it.
>> >>
>> >> My only concern is that the developer might not hit this wall, but then 
>> >> some user (doing things the developer didn't fully anticipate) could hit 
>> >> that wall.  I can definitely see both sides of the argument though.  And 
>> >> elsewhere we've headed more in the direction of forcing the developer to 
>> >> think about performance, but this case seems a bit more non-deterministic 
>> >> than any of those.
>> Yeah, that's a good point for this case, avoiding data-dependent errors is 
>> probably worth the perf hit.
>> My current thinking is that we should have some relatively large 
>> limitmaybe on the order of 64k?  It seems like it'd be very difficult to 
>> hit such a limit with any sort of legitimate use case, and the chances of 
>> some subtle data-dependent error would be much less.  But a 1GB key is just 
>> not going to work well in any implementation (if it doesn't simply oom the 
>> process!).  So despite what I said earlier, I guess I think we should have 
>> some limit...but keep it an order of magnitude or two larger than what we 
>> expect any legitimate usage to hit just to keep the system as flexible as 
>> possible.
>> Does that sound reasonable to people?

I thought we were trying to avoid data-dependent errors and thus shooting for 
having no limit (which may translate into having very large limits in actual 
implementations but not the kind of thing you'd typically hit).  

Specifying an exact size may be a bit weird...I guess an alternative could be 
to spec what is the minimum size UAs need to support. A related problem is what 
units is this specified in, if it's bytes then that means developers need to 
make assumptions about how strings are stored or something.


RE: [IndexedDB] Reason for aborting transactions

2011-02-09 Thread Pablo Castro

From: [] On 
Behalf Of Jeremy Orlow
Sent: Wednesday, February 09, 2011 6:47 PM

>> On Wed, Feb 9, 2011 at 5:54 PM, Jonas Sicking  wrote:
>> On Wed, Feb 9, 2011 at 5:43 PM, Jeremy Orlow  wrote:
>> > On Wed, Feb 9, 2011 at 5:37 PM, ben turner  wrote:
>> >>
>> >> > Normal exceptions have error messages that are not consistient across
>> >> > implementations and are not localized.  What's the difference?
>> >>
>> >> These messages aren't part of any exception though, it's just some
>> >> property on a transaction object. (None of our DOM exceptions, IDB or
>> >> otherwise, have message properties btw, they're only converted to some
>> >> message if they make it to the error console).
>> >>
>> >> > For stuff like internal errors, they seem especially important.
>> >>
>> >> You're thinking of having multiple messages for the INTERAL_ERROR_ABORT
>> >> code?
>> >
>> > I think that'd be ideal, yes.  Since internal errors will be UA specific,
>> > string matching wouldn't be so bad there.
>> > If no one likes this idea, I'm happy hiding away the message in some
>> > webkitAbortMessage attribute so it's super clear it's just us who 
>> > implements
>> > this.  (Speaking of which, maybe you guys should do that with getAll.)
>> We'll definitely put getAll under a vendor prefix once we drop the
>> "front door" prefix on .indexeddb.
>> I'm with Ben here. I'd prefer to hide the message away under a vendor
>> prefix (either now or once you drop the front door one) for now to
>> gather feedback on how it'll be used.

I'm not sure about I was catching up on the thread I understood this 
more as a debugging helper feature. In the end if we didn't have this you could 
just have a database-wide error handler and stash errors as they come in a 
global array or something, and that's okay for diagnostics. If we want to make 
it easier to just look at the transaction and see what happened, we may as well 
let UAs include a descriptive string so you can really find out on the spot. I 
don't have a strong opinion about excluding (or vendor-prefixing) the property, 
but it seems it would come in handy.


[IndexedDB] KeyRange factory methods

2010-12-16 Thread Pablo Castro
I was going to file a bug on this but wanted to make sure I'm not missing 
something first.

All the factory methods for ranges (e.g. bound, lowerBound, etc.) are in the 
IDBKeyRangeConstructors interface now, but I don't see the interface referenced 
anywhere. Who implements this interface, the Window object, IDBFactory[Sync], 
something else?


RE: [IndexedDB] Do we need a timeout for VERSION_CHANGE?

2010-12-16 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Thursday, December 16, 2010 2:35 AM

>>In another thread (in the last couple days) we actually decided to remove 
>>timeouts from normal transactions since they can be implemented as a 
>>But I agree that we need a way to abort setVersion transactions before 
>>getting the callback (so that we implement timeouts for them as well).  
>>Unfortunately, I don't immediately have any good ideas on how to do that 

Sorry, forgot to qualify it, context == sync api. I assume that the sync 
versions of the API will truly block, so setTimeout won't do as code won't just 
reenter into the timeout callback while blocked on a sync IndexedDB call, are 
we all on the same page on that? If that's the case, then I don't think we can 
remove the timeout parameter from the sync versions of transaction() and 
setVersion(). Does that sound reasonable? I'll add them for now, we can adjust 
if somebody come up with a better approach.

As for setVersion in async...that's actually a problem as well now that I think 
about it because you don't have access to the (version) transaction object 
until it actually was able to start. One option besides having a timeout 
parameter in the method would be to have an abort() method in 


[IndexedDB] Do we need a timeout for VERSION_CHANGE?

2010-12-15 Thread Pablo Castro
Regular transactions take a timeout parameter when started, which ensures that 
we eventually make progress one way or the other if there's an un-cooperating 
script that won't let go of an object store or something like that.

I'm not sure if we discussed this before, it seems that we need to add a 
similar thing for setVersion(), and it's basically a way of starting a 

I was thinking we could have an optional timeout argument in setVersion with a 
UA-specific default. In the async case we would fire the onerror event and in 
the sync case just throw, both with TIMEOUT_ERR.


RE: [Bug 11553] New: Ensure indexedDBSync is on the right worker interface

2010-12-15 Thread Pablo Castro

From: [] On 
Behalf Of Jeremy Orlow
Sent: Wednesday, December 15, 2010 3:21 AM
>> I believe the instance of WorkerUtils is much like window in a page.  I.e. 
>> you put stuff on there that you want in the global scope.  Thus I'm pretty 
>> sure that WorkerUtils is the right place for both.

Yeah, I read the workers spec too quickly yesterday. You're right, WorkerUtils 
is what we need, I'll make it implement both IDBEnvironment and 


[IndexedDB] versionchange event gone?

2010-12-14 Thread Pablo Castro
Just noticed that the algorithm for updating versions refers to the 
"versionchange" event but the event is actually not defined in IDBDatabase or 
IDBDatabaseSync. Just an omission?

On a related note, I'm updating the sync API and changing the setVersion method 
so that it does all the version change notification dance synchronously and 
returns a transaction object that's the "version change" transaction. Given 
this behavior we probably don't need anything similar to the "blocked" event 
for the sync API. Any concerns?


RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-12-14 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Tuesday, December 14, 2010 4:23 PM

>> On Wed, Dec 15, 2010 at 12:19 AM, Pablo Castro  
>> wrote:
>> From: [] 
>> On Behalf Of Jonas Sicking
>> Sent: Friday, December 10, 2010 1:42 PM
>> >> On Fri, Dec 10, 2010 at 7:32 AM, Jeremy Orlow  wrote:
>> >> > Any more thoughts on this?
>> >>
>> >> I don't feel strongly one way or another. Implementation wise I don't
>> >> really understand why implementations couldn't use keys of unlimited
>> >> size. I wouldn't imagine implementations would want to use fixed-size
>> >> allocations for every key anyway, right (which would be a strong
>> >> reason to keep maximum size down).
>> I don't have a very strong opinion either. I don't quite agree with the 
>> guideline of "having something working slowly is better than not working at 
>> all" having something not work at all sometimes may help developers hit 
>> a wall and think differently about their approach for a given problem. That 
>> said, if folks think this is an instance where we're better off not having a 
>> limit I'm fine with it.
>> My only concern is that the developer might not hit this wall, but then some 
>> user (doing things the developer didn't fully anticipate) could hit that 
>> wall.  I can definitely see both sides of the argument though.  And 
>> elsewhere we've headed more in the direction of forcing the developer to 
>> think about performance, but this case seems a bit more non-deterministic 
>> than any of those.
Yeah, that's a good point for this case, avoiding data-dependent errors is 
probably worth the perf hit.


RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-12-14 Thread Pablo Castro

From: [] On 
Behalf Of Jonas Sicking
Sent: Friday, December 10, 2010 1:42 PM

>> On Fri, Dec 10, 2010 at 7:32 AM, Jeremy Orlow  wrote:
>> > Any more thoughts on this?
>> I don't feel strongly one way or another. Implementation wise I don't
>> really understand why implementations couldn't use keys of unlimited
>> size. I wouldn't imagine implementations would want to use fixed-size
>> allocations for every key anyway, right (which would be a strong
>> reason to keep maximum size down).

I don't have a very strong opinion either. I don't quite agree with the 
guideline of "having something working slowly is better than not working at 
all" having something not work at all sometimes may help developers hit a 
wall and think differently about their approach for a given problem. That said, 
if folks think this is an instance where we're better off not having a limit 
I'm fine with it. 

>> Pablo, do you know why the back ends you were looking at had such
>> relatively low limits?

Mostly an implementation thing. Keys (and all other non-blob columns) typically 
need to fit in a page.  Predictable perf is also nice (no linked lists, high 
density/locality, etc.), but not as fundamental as page size.


RE: [Bug 11375] New: [IndexedDB] Error codes need to be assigned new numbers

2010-12-14 Thread Pablo Castro

From: [] On 
Behalf Of Jeremy Orlow
Sent: Friday, December 10, 2010 5:03 AM

>> I noticed that QUOTA_ERR is commented out.  I can't remember when or why and 
>> the blame history is a bit mangled.  Does anyone else?  In Chromium we 
>> currently use UNKNOWN_ERR for whenever we have issues writing stuff to disk. 
>>  We could probably tease quota related issues out into their own error.  
>> And/or we should probably create or find a good existing error for such uses.

It sounds like a good idea to keep QUOTA_ERR separated from other general 
errors that come up when writing stuff to disk.

>> Speaking of which, we use UNKNOWN_ERR for a bunch of other 
>> internal consistency issues.  Is this OK by everyone, should we use another, 
>> or should we create a new one?  (Ideally these issues will be few and far 
>> between as we make things more robust.)

That sounds reasonable to me. 

>> We also use UNKNOWN_ERR for when things are not yet implemented.  Any 
>> concerns?

I don't think it's a big deal, but are we going to have a bunch of 
unimplemented stuff across browsers? If this becomes common, I wonder if we 
should have a separate error so calling code can choose to compensate or 

>> What error code should we use for IDBCursor.update/delete when the cursor is 
>> not currently on an item (or that item has been deleted)?


>> TRANSIENT_ERR doesn't seem to be used anywhere in the spec.  Should it be 
>> removed?


>> As for the numbering: does anyone object to me just starting from 1 and 
>> going sequentially?  I.e. does anyone have a problem with them all getting 
>> new numbers, or should I keep the numbers the same when possible.  (i.e. 
>> would change number, but the ordering of those on the page would change.)

I'm fine with that.


RE: [Bug 11398] New: [IndexedDB] Methods that take multiple optional parameters should instead take an options object

2010-12-10 Thread Pablo Castro

From: [] On 
Behalf Of Jeremy Orlow
Sent: Friday, December 10, 2010 7:27 AM
>> In addition to createObjectStore, I also intend to convert the following 
>> over:
>> IDBObjectStore.createIndex
>> IDBObjectStore.openCursor
>> IDBIndex.openCursor
>> IDBIndex.openKeyCursor
>> IDBKeyRange.bound

Sounds great.

>> We did all of these two weeks ago in Chromium and have gotten some feedback. 
>>  The main downside is that typos are silently ignored by JavaScript.  We 
>> considered throwing if someone passed in an option we didn't recognize, but 
>> this would make it impossible to add more options later (which is one of the 
>> main reasons for doing this change).  I think what we might do is just log 
>> something in the console with this happens.  (Should the spec actually make 
>> a recommendation to this effect?)  Besides that, I think overall we're happy 
>> with the change.

I'm not sure what the problem is with throwing. Can't each implementation throw 
if it receives a parameter that has no meaning for it? Given that we can't know 
if future options will have substantial impact on the behavior of the function 
when they are present, it looks safer to go that route.

Is there prior art in some other webapps API that takes JavaScript objects as 
parameters? What do they do?


RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-11-19 Thread Pablo Castro

-Original Message-
From: [] On 
Behalf Of
Sent: Friday, November 19, 2010 4:16 AM

>> Just looking at this list, I guess I'm leaning towards _not_ limiting the
>> maximum key size and instead pushing it onto implementations to do the hard
>> work here.  If so, we should probably have some normative text about how 
>> bigger
>> keys will probably not be handled very efficiently.

I was trying to make up my mind on this, and I'm not sure this is a good idea. 
What would be the options for an implementation? Hashing keys into smaller 
values is pretty painful because of sorting requirements (we'd have to index 
the data twice, once for the key prefix that fits within limits, and a second 
one for a hash plus some sort of discriminator for collisions). Just storing a 
prefix as part of the key under the covers obviously won't I missing 
some other option?

Clearly consistency in these things is important to people don't get caught off 
guard. I wonder if we just pick a "reasonable" limit, say 1 K characters (yeah, 
trying to do something weird to avoid details of how stuff is actually stored), 
and run with it. I looked around at a few databases (from a single vendor :)), 
and they seem to all be well over this but not by orders of magnitude (2KB to 
8KB seems to be the range of upper limits for this in practice).


RE: [Bug 11270] New: Interaction between in-line keys and key generators

2010-11-10 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Wednesday, November 10, 2010 2:08 PM

>> On Wed, Nov 10, 2010 at 1:50 PM, Tab Atkins Jr.  wrote:
>> > On Wed, Nov 10, 2010 at 1:43 PM, Pablo Castro
>> >  wrote:
>> >>
>> >> From: 
>> >> [] On Behalf Of 
>> >>
>> >> Sent: Monday, November 08, 2010 5:07 PM
>> >>
>> I'm fine with either solution here. My database experience is too weak
>> to have strong opinions on this matter.
>> What do databases usually do with columns that use autoincrement but a
>> value is still supplied? My recollection is that that is generally
>> allowed?

It does happen in practice that sometimes you need to use explicit keys. The 
typical case is when you're initializing a database with base data and you want 
to have known keys. 

As for what databases do, I'll use SQL Server as an example (for no particular 
reason :) ). In SQL Server by default if you try to insert a row with a value 
in an "identity" column you get an error and the operation is aborted; however, 
developers can issue a command (SET IDENTITY_INSERT  ON) to turn it off 
temporarily and insert rows with an explicitly provided primary key. Usually 
when you do this you have to be careful to use keys that are either way out of 
the range of keys the generator will use (or you may not be able to insert keys 
anymore) or you have to reset the next key (using an obscure DBCC CHECKIDENT 
(, RESEED, ) command). 

I don't know much about Oracle, but I believe the typical pattern is still to 
use a sequence object and set the default value for the key column to < 
sequence>.nextval, thus allowing callers to override the next value in the 
sequence by just providing one, and if necessary they may need to go and fix up 
the sequence. 

>From writing the above paragraph I'm realizing one more detail we need to be 
>explicit about: the fact that you do an add() with an explicit key does not 
>mean the implementation will fix up the next key it'll assign. You'll still 
>get the value that comes after the one generated last, and if you inserted 
>that value in the store explicitly you just made the store unable to add new 
>objects with generated keys until you delete it.

If that's too much fine-print then we should just disallow it. I like the 
ability to set explicit key values, but it does come with some extra care that 
both implementers and users will have to have.


RE: [Bug 11270] New: Interaction between in-line keys and key generators

2010-11-10 Thread Pablo Castro

From: Tab Atkins Jr. [] 
Sent: Wednesday, November 10, 2010 1:50 PM

>> On Wed, Nov 10, 2010 at 1:43 PM, Pablo Castro
>>  wrote:
>> >
>> > From: [] 
>> > On Behalf Of
>> > Sent: Monday, November 08, 2010 5:07 PM
>> >
>> >>> So what happens if trying save in an object store which has the following
>> >>> keypath, the following value. (The generated key is 4):
>> >>>
>> >>> ""
>> >>> { foo: {} }
>> >>>
>> >>> Here the resulting object is clearly { foo: { bar: 4 } }
>> >>>
>> >>> But what about
>> >>>
>> >>> ""
>> >>> { foo: { bar: 10 } }
>> >>>
>> >>> Does this use the value 10 rather than generate a new key, does it throw 
>> >>> an
>> >>> exception or does it store the value { foo: { bar: 4 } }?
>> >
>> > I suspect that all options are somewhat arbitrary here. I'll just propose 
>> > that we error out to ensure that nobody has the wrong expectations about 
>> > the implementation preserving the initial value. I would be open to other 
>> > options except silently overwriting the initial value with a generated 
>> > one, as that's likely to confuse folks.
>> It's relatively common for me to need to supply a manual value for an
>> id field that's automatically generated when working with databases,
>> and I don't see any particular reason that my situation would change
>> if using IndexedDB.  So I think that a manually-supplied key should be
>> kept.

That would be okay with me. One bit of fine-print on this one is that if you're 
calling store.add() with an explicit key then you may get a unique constraint 
error (which would never happen with a generator if you never provided your own 
keys). Also, did we settle for having put() never adding a new record if one 
didn't exist? If put() can create a record, then things still work but become a 
bit more elaborate in that put() would create a new record either if the key is 
not present in the object or if it's present but the value doesn't exist in the 
database, while it would update a record if the value was present and it 
existed as a key in the store.


RE: [Bug 11270] New: Interaction between in-line keys and key generators

2010-11-10 Thread Pablo Castro

From: [] On 
Behalf Of
Sent: Monday, November 08, 2010 5:07 PM

>> So what happens if trying save in an object store which has the following
>> keypath, the following value. (The generated key is 4):
>> ""
>> { foo: {} }
>> Here the resulting object is clearly { foo: { bar: 4 } }
>> But what about
>> ""
>> { foo: { bar: 10 } }
>> Does this use the value 10 rather than generate a new key, does it throw an
>> exception or does it store the value { foo: { bar: 4 } }?

I suspect that all options are somewhat arbitrary here. I'll just propose that 
we error out to ensure that nobody has the wrong expectations about the 
implementation preserving the initial value. I would be open to other options 
except silently overwriting the initial value with a generated one, as that's 
likely to confuse folks.

>> What happens if the property is missing several parents, such as
>> ""
>> { zip: {} }
>> Does this throw or does it store { zip: {}, foo: { bar: { baz: 4 } } }

We should just complete the object with all the missing parents.

>> If we end up allowing array indexes in key paths (like "foo[1].bar") what 
>> does
>> the following keypath/object result in?

I think we can live without array indexing in keys for this round, it's 
probably best to just leave them out and only allow paths.


RE: IndexedDB TPAC agenda

2010-11-02 Thread Pablo Castro
To hit the ground running on this, here is a consolidated list of issues coming 
both from the thread below and various pending bugs/discussions we've had. I 
picked an arbitrary order and grouping, feel free to tweak in any way.

- keys (arrays as keys, compound keys, general keypath restrictions)
- index keys (arrays as keys, empty values, general keypath restrictions)
- internationalization (collation specification, collation algorithm)
- quotas (how do apps request more storage, is there a temp/permanent 
- error handling (propagation, relationship to window.error, db scoped event 
handlers, errors vs return empty values)
- blobs (be explicit about behavior of blobs in indexeddb objects)
- transactions error modes (abort-on-unwind in error conditions; what happens 
when user leaves the page with pending transactions?)
- transactions isolation/concurrent aspects
- transactions scopes (dynamic support)
- synchronous api


-Original Message-
From: [] On 
Behalf Of Pablo Castro
Sent: Monday, November 01, 2010 10:39 PM
To: Jeremy Orlow; Jonas Sicking
Subject: RE: IndexedDB TPAC agenda

A few other items to add to the list to discuss tomorrow:

- Blobs support: have we discussed explicitly how things work when an object 
has a blob (file, array, etc.) as one of its properties?
- Close on collation and international support
- How do applications request that they need more storage? And related to this, 
at some point we discussed temporary vs permanent stores. Close on the whole 
story of how space is managed.
- Database-wide exception handlers

Looking forward to the discussion tomorrow.


From: [] On 
Behalf Of Jeremy Orlow
Sent: Monday, November 01, 2010 1:34 PM
To: Jonas Sicking
Subject: Re: IndexedDB TPAC agenda

On Mon, Nov 1, 2010 at 12:23 PM, Jonas Sicking  wrote:
On Mon, Nov 1, 2010 at 5:13 AM, Jeremy Orlow  wrote:
> On Mon, Nov 1, 2010 at 11:53 AM, Jonas Sicking  wrote:
>> On Mon, Nov 1, 2010 at 4:40 AM, Jeremy Orlow  wrote:
>> > What items should we try to cover during the f2f?
>> > On Mon, Nov 1, 2010 at 11:08 AM, Jonas Sicking  wrote:
>> >>
>> >> > P.S. I'm happy to discuss all of this f2f tomorrow rather than over
>> >> > email
>> >> > now.
>> >>
>> >> Speaking of which, would be great to have an agenda. Some of the
>> >> bigger items are:
>> >>
>> >> * Dynamic transactions
>> >> * Arrays-as-keys
>> >> * Arrays and indexes (what to do if the keyPath for an index evaluates
>> >> to an array)
>> >> * Synchronous API
>> >
>> > * Compound keys.
>> > * What should be allowed in a keyPath.
>> Aren't "compound keys" same as "arrays-as-keys"?
> Sorry, I meant to say compound indexes.
> We've talked about using indexes in many different ways--including compound
> indexes and allowing keys to include indexes.  I assumed you meant the
> latter?
I'm lost as to what you're saying here. Could you elaborate? Are you
saying "index" when you mean "array" anywhere?

oops.  Yes, I meant to say: "We've talked about using arrays in many different 
ways--including compound indexes and allowing keys to include arrays.  I 
assumed you meant the latter?"
>> * What should happen if an index's keyPath points to a property which
>> doesn't exist or which isn't a valid key-value? (same general topic as
>> "arrays and indexes" above)
> We've talked about this several times.  It'd be great to settle on something
> once and for all.

>> * What happens if the user leaves a page in the middle of a
>> transaction? (this might be nice to tackle since there'll be lots of
>> relevant people in the room)
> I'm pretty sure this is simple: if there's an onsuccess/onerror handler that
> has not yet fired (or we're in the middle of firing), then you abort the
> transaction.  If not, the behavior is undefined (because there's no way the
> app could have observed the difference anyway).  The aborting behavior is
> necessary since the user could have planned to execute additional commands
> atomically in the handler.
There is also the option to let the transaction finish. They should be
short-lived so it shouldn't be too bad.

I.e. keep the page alive for a bit longer in the background or something that 
blocks page unload?  Is there precedent for this elsewhere?  This sounds pretty 
complicated to get right both in terms of implementation and speccing.  Let's 
chat about it though.
>> * Error handling
> What do you mean by this?
How to handle exceptions in various places. Where (error) events
propagate. How does it relate to window.onerror. What happens if you
do/don't call preventDefault on the error event?

Sounds good.

RE: IndexedDB TPAC agenda

2010-11-01 Thread Pablo Castro
A few other items to add to the list to discuss tomorrow:

- Blobs support: have we discussed explicitly how things work when an object 
has a blob (file, array, etc.) as one of its properties?
- Close on collation and international support
- How do applications request that they need more storage? And related to this, 
at some point we discussed temporary vs permanent stores. Close on the whole 
story of how space is managed.
- Database-wide exception handlers

Looking forward to the discussion tomorrow.


From: [] On 
Behalf Of Jeremy Orlow
Sent: Monday, November 01, 2010 1:34 PM
To: Jonas Sicking
Subject: Re: IndexedDB TPAC agenda

On Mon, Nov 1, 2010 at 12:23 PM, Jonas Sicking  wrote:
On Mon, Nov 1, 2010 at 5:13 AM, Jeremy Orlow  wrote:
> On Mon, Nov 1, 2010 at 11:53 AM, Jonas Sicking  wrote:
>> On Mon, Nov 1, 2010 at 4:40 AM, Jeremy Orlow  wrote:
>> > What items should we try to cover during the f2f?
>> > On Mon, Nov 1, 2010 at 11:08 AM, Jonas Sicking  wrote:
>> >>
>> >> > P.S. I'm happy to discuss all of this f2f tomorrow rather than over
>> >> > email
>> >> > now.
>> >>
>> >> Speaking of which, would be great to have an agenda. Some of the
>> >> bigger items are:
>> >>
>> >> * Dynamic transactions
>> >> * Arrays-as-keys
>> >> * Arrays and indexes (what to do if the keyPath for an index evaluates
>> >> to an array)
>> >> * Synchronous API
>> >
>> > * Compound keys.
>> > * What should be allowed in a keyPath.
>> Aren't "compound keys" same as "arrays-as-keys"?
> Sorry, I meant to say compound indexes.
> We've talked about using indexes in many different ways--including compound
> indexes and allowing keys to include indexes.  I assumed you meant the
> latter?
I'm lost as to what you're saying here. Could you elaborate? Are you
saying "index" when you mean "array" anywhere?

oops.  Yes, I meant to say: "We've talked about using arrays in many different 
ways--including compound indexes and allowing keys to include arrays.  I 
assumed you meant the latter?"
>> * What should happen if an index's keyPath points to a property which
>> doesn't exist or which isn't a valid key-value? (same general topic as
>> "arrays and indexes" above)
> We've talked about this several times.  It'd be great to settle on something
> once and for all.

>> * What happens if the user leaves a page in the middle of a
>> transaction? (this might be nice to tackle since there'll be lots of
>> relevant people in the room)
> I'm pretty sure this is simple: if there's an onsuccess/onerror handler that
> has not yet fired (or we're in the middle of firing), then you abort the
> transaction.  If not, the behavior is undefined (because there's no way the
> app could have observed the difference anyway).  The aborting behavior is
> necessary since the user could have planned to execute additional commands
> atomically in the handler.
There is also the option to let the transaction finish. They should be
short-lived so it shouldn't be too bad.

I.e. keep the page alive for a bit longer in the background or something that 
blocks page unload?  Is there precedent for this elsewhere?  This sounds pretty 
complicated to get right both in terms of implementation and speccing.  Let's 
chat about it though.
>> * Error handling
> What do you mean by this?
How to handle exceptions in various places. Where (error) events
propagate. How does it relate to window.onerror. What happens if you
do/don't call preventDefault on the error event?

Sounds good.

RE: Seeking agenda items for WebApps' Nov 1-2 f2f meeting

2010-10-04 Thread Pablo Castro
Are these slots more or less frozen at this point? Just wanted to confirm to 
make travel arrangements.


-Original Message-
From: Arthur Barstow [] 
Sent: Wednesday, September 29, 2010 5:41 AM
To: ext Eric Uhrhane; Jonas Sicking; Jeremy Orlow; Pablo Castro; 
public-webapps; Arun Ranganathan
Subject: Re: Seeking agenda items for WebApps' Nov 1-2 f2f meeting

  I added the following slots for November 2:


13:30-15:00: Indexed DB
15:30-16:30: Indexed DB
16:30-18:00: File * APIs

Of course we can fine-tune the times as needed.

Arun - we reserved a speaker phone for remote participants for both days.

-Art Barstow

On 9/28/10 5:45 PM, ext Eric Uhrhane wrote:
> Works fine for me.  I'll be there all of Monday and Tuesday.  Due to
> jetlag morning vs. afternoon's probably irrelevant to me, as I won't
> have any idea what time it is ;'>.
> On Tue, Sep 28, 2010 at 2:30 PM, Jonas Sicking  wrote:
>> The later the better for me. If we can make it after noon I'll be
>> there for sure.
>> / Jonas
>> On Tue, Sep 28, 2010 at 1:37 PM, Jeremy Orlow  wrote:
>>> I'm OK with any time slot.
>>> On Tue, Sep 28, 2010 at 8:57 PM, Arthur Barstow
>>> wrote:
>>>>   Hi All,
>>>> Currently, no one has requested a specific day + time slot for any of the
>>>> proposed topics:
>>>> When our IndexedDB participants agree on a time slot on Tuesday the 2nd,
>>>> I'll add it to the agenda. Pablo, Jonas, Jeremy - please propose a time.
>>>> Day + time slot proposals for the agenda topics already proposed are also
>>>> welcome (as are proposals for additional topics).
>>>> -Art Barstow
>>>> On 9/28/10 3:28 PM, ext Pablo Castro wrote:
>>>>> It looks like there will be good critical mass for IndexedDB discussions,
>>>>> so I'll try to make it as well. Tuesday would be best for me as well for 
>>>>> an
>>>>> IndexedDB meeting so I can travel on Sunday/Monday.
>>>>> -pablo
>>>>> -Original Message-
>>>>> From: Jonas Sicking []
>>>>> Sent: Tuesday, September 28, 2010 10:53 AM
>>>>> To: Jeremy Orlow
>>>>> Cc: Pablo Castro;; public-webapps
>>>>> Subject: Re: Seeking agenda items for WebApps' Nov 1-2 f2f meeting
>>>>> I'm not 100% sure that I'll make TPAC this year, but if I do, I likely
>>>>> won't make monday. So a tuesday schedule would fit me better too.
>>>>> / Jonas
>>>>> On Tue, Sep 28, 2010 at 8:36 AM, Jeremy Orlowwrote:
>>>>>> Is it possible to schedule IndexedDB for Tuesday?  I'm pretty sure that
>>>>>> I
>>>>>> can be there then, but Monday is more up in the air at this moment.
>>>>>> Thanks!
>>>>>> Jeremy
>>>>>> On Thu, Sep 2, 2010 at 3:28 AM, Jonas Sickingwrote:
>>>>>>> I'm hoping to be there yes. Especially if we'll get a critical mass of
>>>>>>> IndexedDB contributors.
>>>>>>> / Jonas
>>>>>>> On Wed, Sep 1, 2010 at 7:18 PM, Pablo
>>>>>>> Castro
>>>>>>> wrote:
>>>>>>>> -Original Message-
>>>>>>>> From:
>>>>>>>> [] On Behalf Of Arthur Barstow
>>>>>>>> Sent: Tuesday, August 31, 2010 4:32 AM
>>>>>>>>>> The WebApps WG will meet face-to-face November 1-2 as part of the
>>>>>>>>>> W3C's
>>>>>>>>>> 2010 TPAC meeting week [TPAC].
>>>>>>>>>> I created a stub agenda item page and seek input to flesh out
>>>>>>>>>> agenda:
>>>>>>>>>> [TPAC] includes a link to the Registration page, a detailed schedule
>>>>>>>>>> of
>>>>>>>>>> the group meetings, and other useful information.
>>>>>>>>>> The registration fee is 40€ per day and will increase to 120€ per
>>>>>>>>>> day
>>>>>>>>>> after October 22.
>>>>>>>>>> -Art Barstow
>>>>>>>>>> [TPAC]
>>>>>>>> For folks working on IndexedDB, are you guys planning on attending the
>>>>>>>> TPAC? Given the timing of the event it may be a great opportunity to
>>>>>>>> get
>>>>>>>> together and iron out a whole bunch of issues at once. It would be
>>>>>>>> good to
>>>>>>>> know ahead of time so we can all make plans if we have critical mass.
>>>>>>>> Thanks
>>>>>>>> -pablo

Re: [IndexedDB] Explicitly stablishing the timing of clone creation

2010-10-04 Thread Pablo Castro

On Mon, Aug 16, 2010 at 12:11 AM, Jonas Sicking  wrote:

>> > On Fri, Aug 13, 2010 at 1:43 PM, Pablo Castro
>> >  wrote:
>> > > The spec for the asynchronous "put" and "add" methods in object store as
>> > well as "update" in cursors don't explicitly state when clones are created,
>> > and can even be read as if clones should be created after the function call
>> > returned, when the queued up task is executed. This leads to problems where
>> > the clone may be modified after the call to put/add/update happens. 
>> > Wouldn't
>> > it be more reasonable to require implementations to always create a clone 
>> > of
>> > the object before returning (i.e. synchronously) and perform the rest of 
>> > the
>> > operation asynchronously?
>> >
>> > Yes.
>> >
>> > > If we agree on this I'll file a bug and later follow up with some text
>> > for the spec.
>> >
>> > Please do.
>> >
>> Agreed.

Closing the loop on this one. Proposed text is below, any feedback is welcome. 
I also updated the bug with it.


Proposed text changes for this:

In section "3.2.5 Object Store", the description for the "add" method says:
This method returns immediately and stores the given value in this object store
by following the steps for storing a record into an object store with the
no-overwrite flag set. If the record can be successfully stored in the object
store, then a success event is fired on this method's returned object using the
IDBTransactionEvent interface with its result set to the key for the stored
record and transaction set to the transaction in which this object store is
opened. If a record exists in this object store for the key key parameter, then
an error event is fired on this method's returned object with its code set to

We should change it to:
This method stores the given value in this object store by first synchronously
creating a copy of the value following steps 1 through 4 of the algorithm
described in "4.2 Object Store Storage steps", then returning immediately and
asynchronously performing the remaining steps for the algorithm that actually
store the object in the object store, with the no-overwrite flag set. If the
record can be successfully stored in the object store, then a success event is
fired on this method's returned object using the IDBTransactionEvent interface
with its result set to the key for the stored record and transaction set to the
transaction in which this object store is opened. If a record exists in this
object store for the key key parameter, then an error event is fired on this
method's returned object with its code set to CONSTRAINT_ERR.

In section "3.2.5 Object Store", the description for the "put" method says:
This method returns immediately and stores the given value in this object store
by following the steps for storing a record into an object store. If the record
can be successfully stored in the object store, then a success event is fired
on this method's returned object using the IDBTransactionEvent interface with
its result set to the key for the stored record and transaction set to the
transaction in which this object store is opened.

We should change it to:
This method stores the given value in this object store by first synchronously
creating a copy of the value following steps 1 through 4 of the algorithm
described in "4.2 Object Store Storage steps", then returning immediately and
asynchronously performing the remaining steps for the algorithm that actually
store the object in the object store. If the record can be successfully stored
in the object store, then a success event is fired on this method's returned
object using the IDBTransactionEvent interface with its result set to the key
for the stored record and transaction set to the transaction in which this
object store is opened.

In section "3.2.7 Cursor" the description of the "update" method says:
This method returns immediately and sets the value for the record at the
cursor's position.

We should change it to:
This method sets the value for the record at the cursor's position by first
synchronously creating a copy of the value using the structured clone
algorithm, then returning immediately and asynchronously updating the record in
the underlying store.

RE: [IndexedDB] Languages for collation

2010-09-28 Thread Pablo Castro

From: Jungshik Shin (신정식, 申政湜) [] 
Sent: Tuesday, August 24, 2010 10:34 PM

>> As for the locale identifiers, my understanding is that Windows APIs (newer 
>> 'name-based' locale APIs) more or less follows BCP 47. 

Picking this back up from this August thread. I went around and asked Windows 
folks about this. Locale identifiers based on BCP 47 sound good.

On the other hand, we probably wouldn't do UCA. I heard various worries from 
folks that work in this space, including the fact that it seems it's still 
changing so it would be a moving target (which btw means that collisions could 
still happen) and that we don't support it in a number of places today. Given 
that feedback, I would rather leave this open and let implementations choose 
the algorithm for collation (still need to do language-sensitive collation, of 
course). Would that work?


RE: [IndexedDB] IDBCursor.update for cursors returned from IDBIndex.openCursor

2010-09-28 Thread Pablo Castro
I agree with Jonas on this. I think accessing the index values is an important 
feature (in addition to joins you can imagine add an extra property or two to 
the index key* to create a covering index and avoid fetching the object in a 
perf-critical path).

That said, to me it's just about allowing retrieval. For update/delete it would 
be perfectly reasonable to have to go to the store in my opinion.


-Original Message-
From: [] On 
Behalf Of Jonas Sicking
Sent: Friday, September 17, 2010 3:15 PM

On Fri, Sep 17, 2010 at 2:46 AM, Jeremy Orlow  wrote:
> On Fri, Sep 17, 2010 at 1:06 AM, Jonas Sicking  wrote:
>> On Thu, Sep 16, 2010 at 2:23 PM, Jeremy Orlow  wrote:
>> > On Thu, Sep 16, 2010 at 8:53 PM, Jonas Sicking  wrote:
>> >>
>> >> On Thu, Sep 16, 2010 at 2:15 AM, Jeremy Orlow 
>> >> wrote:
>> >> > Wait a sec.  What are the use cases for non-object cursors anyway?
>> >> >  They
>> >> > made perfect sense back when we allowed explicit index management,
>> >> > but
>> >> > now
>> >> > they kind of seem like a premature optimization or possibly even dead
>> >> > weight.  Maybe we should just remove them altogether?
>> >>
>> >> They are still useful for joins. Consider an objectStore "employees":
>> >>
>> >> { id: 1, name: "Sven", employed: "1-1-2010" }
>> >> { id: 2, name: "Bert", employed: "5-1-2009" }
>> >> { id: 3, name: "Adam", employed: "6-6-2008" }
>> >> And objectStore "sales"
>> >>
>> >> { seller: 1, candyName: "lollipop", quantity: 5, date: "9-15-2010" }
>> >> { seller: 1, candyName: "swedish fish", quantity: 12, date: "9-15-2010"
>> >> }
>> >> { seller: 2, candyName: "jelly belly", quantity: 3, date: "9-14-2010" }
>> >> { seller: 3, candyName: "heath bar", quantity: 3, date: "9-13-2010" }
>> >> If you want to display the amount of sales per person, sorted by names
>> >> of sales person, you could do this by first creating and index for
>> >> "employees" with keyPath "name". You'd then use IDBIndex.openCursor to
>> >> iterate that index, and for each entry find all entries in the "sales"
>> >> objectStore where "seller" matches the cursors .value.
>> >>
>> >> So in this case you don't actually need any data from the "employees"
>> >> objectStore, all the data is available in the index. Thus it is
>> >> sufficient, and faster, to use openCursor than openObjectCursor.
>> >>
>> >> In general, it's a common optimization to stick enough data in an
>> >> index that you don't have to actually look up in the objectStore
>> >> itself. This is slightly less commonly doable since we have relatively
>> >> simple indexes so far. But still doable as the example above shows.
>> >> Once we add support for arrays as keys this will be much more common
>> >> as you can then stick arbitrary data into the index by simply adding
>> >> additional entries to all key arrays. And even more so once we
>> >> (probably in a future version) add support for computed indexes.
>> >
>> >
>> > On Thu, Sep 16, 2010 at 8:57 PM, Jonas Sicking  wrote:
>> >>
>> >> On Thu, Sep 16, 2010 at 4:08 AM, Jeremy Orlow 
>> >> wrote:
>> >> > Actually, for that matter, are remove and update needed at all?  I
>> >> > think
>> >> > they may just be more cruft left over from the explicit index days.
>> >> >  As
>> >> > far
>> >> > as I can tell, any .delete or .remove should be doable via an
>> >> > objectCursor +
>> >> > .puts/.removes on the objectStore.
>> >>
>> >> They are not strictly needed, but they are a decent convinence
>> >> feature, and with a proper implementation they can even be a
>> >> performance optimization. With a cursor iterating a b-tree you can let
>> >> the cursor keep a pointer to the b-tree entry. They way .delete and
>> >> .update doesn't have to do a b-tree lookup at all.
>> >>
>> >> We're currently not able to do this since our backend (sqlite) doesn't
>> >> have good enough cursor support, but I suspect that this will change
>> >> at some point in the future. In the mean time it seems like a good
>> >> thing to allow people to use API that will be faster in the future.
>> >
>> > All your arguments revolve around what the spec
>> > and implementations might do
>> > in the future.
>> I disagree. The IDBIndex.openCursor example I included uses only
>> existing API, and is a performance improvement in at least our current
>> implementation. Would be interested to hear if it's not a performance
>> improvement in others.
> It's not in ours because we join to the ObjectStore's data table either way.
>  But that's not at all why I'm bringing this up.


>> > Typically we add API surface area only for use cases that
>> > are currently impossible to satisfy or proven performance bottlenecks. I
>> > agree that it's likely implementations will want to do optimizations
>> > like
>> > this in the future, but until they do, it'll be hard to really
>> > understand
>> > the implications and complications that might arrise.
>> That's not entire

RE: [IndexedDB] setVersion with multiple IDBDatabase objects

2010-09-28 Thread Pablo Castro

-Original Message-
From: [] On 
Behalf Of ben turner
Sent: Tuesday, September 28, 2010 8:19 AM

>> Yes, let's have it tied to the instance on which setVersion() was called.
>> As Shawn pointed out that is consistent with the behavior that
>> database instances from different windows will observe. As Jeremy
>> pointed out that is consistent with the way object stores and indexes
>> are tied to a transaction instance. Also, the |event.source| will be
>> db1 in the given example, so it seems natural to allow changes only to
>> the database we pass in the event and no other.
>> -Ben

+1, let's tie it to the instance and make it consistent with stores/indexes.


RE: Seeking agenda items for WebApps' Nov 1-2 f2f meeting

2010-09-28 Thread Pablo Castro
It looks like there will be good critical mass for IndexedDB discussions, so 
I'll try to make it as well. Tuesday would be best for me as well for an 
IndexedDB meeting so I can travel on Sunday/Monday.


-Original Message-
From: Jonas Sicking [] 
Sent: Tuesday, September 28, 2010 10:53 AM
To: Jeremy Orlow
Cc: Pablo Castro;; public-webapps
Subject: Re: Seeking agenda items for WebApps' Nov 1-2 f2f meeting

I'm not 100% sure that I'll make TPAC this year, but if I do, I likely
won't make monday. So a tuesday schedule would fit me better too.

/ Jonas

On Tue, Sep 28, 2010 at 8:36 AM, Jeremy Orlow  wrote:
> Is it possible to schedule IndexedDB for Tuesday?  I'm pretty sure that I
> can be there then, but Monday is more up in the air at this moment.
> Thanks!
> Jeremy
> On Thu, Sep 2, 2010 at 3:28 AM, Jonas Sicking  wrote:
>> I'm hoping to be there yes. Especially if we'll get a critical mass of
>> IndexedDB contributors.
>> / Jonas
>> On Wed, Sep 1, 2010 at 7:18 PM, Pablo Castro 
>> wrote:
>> >
>> > -Original Message-
>> > From:
>> > [] On Behalf Of Arthur Barstow
>> > Sent: Tuesday, August 31, 2010 4:32 AM
>> >
>> >>> The WebApps WG will meet face-to-face November 1-2 as part of the
>> >>> W3C's
>> >>> 2010 TPAC meeting week [TPAC].
>> >>>
>> >>> I created a stub agenda item page and seek input to flesh out agenda:
>> >>>
>> >>>
>> >>>
>> >>> [TPAC] includes a link to the Registration page, a detailed schedule
>> >>> of
>> >>> the group meetings, and other useful information.
>> >>>
>> >>> The registration fee is 40€ per day and will increase to 120€ per day
>> >>> after October 22.
>> >>>
>> >>> -Art Barstow
>> >>>
>> >>> [TPAC]
>> >
>> > For folks working on IndexedDB, are you guys planning on attending the
>> > TPAC? Given the timing of the event it may be a great opportunity to get
>> > together and iron out a whole bunch of issues at once. It would be good to
>> > know ahead of time so we can all make plans if we have critical mass.
>> >
>> > Thanks
>> > -pablo
>> >
>> >

RE: Seeking agenda items for WebApps' Nov 1-2 f2f meeting

2010-09-01 Thread Pablo Castro

-Original Message-
From: [] On 
Behalf Of Arthur Barstow
Sent: Tuesday, August 31, 2010 4:32 AM

>> The WebApps WG will meet face-to-face November 1-2 as part of the W3C's 
>> 2010 TPAC meeting week [TPAC].
>> I created a stub agenda item page and seek input to flesh out agenda:
>> [TPAC] includes a link to the Registration page, a detailed schedule of 
>> the group meetings, and other useful information.
>> The registration fee is 40€ per day and will increase to 120€ per day 
>> after October 22.
>> -Art Barstow
>> [TPAC]

For folks working on IndexedDB, are you guys planning on attending the TPAC? 
Given the timing of the event it may be a great opportunity to get together and 
iron out a whole bunch of issues at once. It would be good to know ahead of 
time so we can all make plans if we have critical mass.


RE: [IndexedDB] Let's remove IDBDatabase.objectStore()

2010-08-24 Thread Pablo Castro

From: [] On 
Behalf Of Jeremy Orlow
Sent: Tuesday, August 24, 2010 12:40 AM

>> On Tue, Aug 24, 2010 at 12:43 AM, ben turner  wrote:
>> Hi folks,
>> We originally included IDBDatabase.objectStore() as a convenience
>> function because we figured that everyone would hate typing
>> |myDatabase.transaction('myObjectStore').objectStore('myObjectStore')|.
>> Unfortunately I think we should remove it - too many developers have
>> used the function without realizing that the returned object was tied
>> to a particular transaction. Any objections?
>> It does seem like it could be confusing and it doesn't seem to save all that 
>> many characters.  So I'm fine with it.


[IndexedDB] Avoiding reader/writer starvation

2010-08-13 Thread Pablo Castro
In the context of transactions, readers using READ_ONLY and writers using 
READ_WRITE may block each other when starting transactions, at least for cases 
where the underlying implementation uses locking for isolation. Since we allow 
multiple readers and they can start while other readers were already running, 
it's possible that readers end up starving writers in a concurrent setting. It 
seems it would be a good idea to add some minimum guarantees to the spec that 
ensures some amount of fairness to concurrent activities against a given 

We could either include a loose recommendation or try to mandate a strict 
behavior. It seems the loose recommendation is more practical, the questions 
are a) is there a risk of incompatible behavior because of under-specification, 
and b) will we risk that some implementations will just ignore this aspect if 
it's specified too informally.

The loose recommendation could just be a sentence in the transactions section:

"UAs need to ensure a reasonable level of fairness across readers and writers 
to prevent starvation."

If we wanted to be more specific, we could go with something like this (we'd 
probably spell it out as rules if we decide to put this strict version in the 

"All readers can run concurrently, but once a writer tries to start a 
transaction we stop allowing new readers to start and queue up the writer and 
any subsequent reader/writer. Once the existing readers are drained the writer 
runs, and after that whatever is queued up next runs, which can be another 
writer or all the remaining readers (depending upon what came first, another 
writer or another reader; readers are released all simultaneously since they 
run concurrently)."

Given that not all implementations will have to deal with this and that 
different implementations may want to have different strategies, it seems that 
just having the recommendation around starvation is the best option.


[IndexedDB] Explicitly stablishing the timing of clone creation

2010-08-13 Thread Pablo Castro
The spec for the asynchronous "put" and "add" methods in object store as well 
as "update" in cursors don't explicitly state when clones are created, and can 
even be read as if clones should be created after the function call returned, 
when the queued up task is executed. This leads to problems where the clone may 
be modified after the call to put/add/update happens. Wouldn't it be more 
reasonable to require implementations to always create a clone of the object 
before returning (i.e. synchronously) and perform the rest of the operation 

If we agree on this I'll file a bug and later follow up with some text for the 


RE: [IndexedDB] Languages for collation

2010-08-12 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Thursday, August 12, 2010 3:36 AM

>> On Thu, Aug 12, 2010 at 11:19 AM, Jonas Sicking  wrote:
>> On Wed, Aug 11, 2010 at 11:28 PM, Pablo Castro
>>  wrote:
>> > We had some discussions about collation algorithms and such in the past, 
>> > but I don't think we have settled on the language aspect of it. In order 
>> > to have stores and indexes sort character-based keys in a way that is 
>> > consistent with users' expectations we'll have to take indication in the 
>> > API of what language we should use to collate strings.
>> >
>> > Trying to take a minimalist approach, we could add an optional parameter 
>> > on the database open call that indicates the language to use (e.g. "en" or 
>> > "en-UK", etc.). If the language is not specified and the database does not 
>> > exist, then we can use the current browser/OS language to create the 
>> > database. If not specified and database already exists, then use the one 
>> > it's already there (this accommodates the fact that a user may be able to 
>> > change their default language in the browser/OS after the database has 
>> > been created using the default). If the language is specified and the 
>> > database already exists and the specified language is not the one the 
>> > database has then we'll throw an exception (same behavior as with 
>> > "description", although we have that one in flight right now as well).
>> >
>> > We should probably also add a read-only attribute to the database object 
>> > that exposes the language.
>> >
>> > If this works for folks I can write a proposal for the specific changes to 
>> > the spec.
>> If we make it part of the database open call, then that makes it
>> impossible to change the sorting order of an existing database, no?
>> This seems like it could be a problem. I.e. it quite possible that an
>> application will want to allow the user to change the sorting
>> language, for example when changing the language of the UI.
>> One solution would be to allow language to be set as part of the
>> setVersion call.
>> Whether it's per-database or more fine grained I think it absolutely must be 
>> part of setVersion.  Changing the language will be a very heavyweight 
>> operation that'll require a similar level of isolation to "schema" changes 
>> of the database.  (Not sure how I missed this point of Pablo's original 
>> email.)

Yes, changing the collection would effectively mean re-creating all the stores 
and indexes. At a very minimum it needs to be a setVersion thing. I also don't 
think it would be too crazy to not support changing collations period. In the 
unusual case where a user must absolutely do this, it can be done by creating a 
separate database and copying the data over using the APIs.

RE: [IndexedDB] Languages for collation

2010-08-12 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Thursday, August 12, 2010 2:18 AM

>> I think we should first break down the use cases and look at how many of 
>> them just need _a_ sort order, how many of them a per-database sort order is 
>> ok, and how many of them would need something finer grained (like a per-key 
>> ordering).

That's reasonable. What I was thinking is that any case where you'll use the 
order of items in a store/index to display things to the user (e.g. a list of 
contacts) you'd want the items to be in proper order  for the user's language. 
That will not only match users' expectations but also match other applications 
(or even other parts of the UA) that display data based on the current OS 
language or the users' choice of language. 

That covers a very broad spectrum of scenarios that need language-specific sort 

I find it unlikely that a single web app will need more than one language per 
database (or even per origin/OS account), given that most applications operate 
in a single language at any one point in time. 

>> Are there work-arounds for getting an UCA ordered data structure to hold 
>> data other language's order?  For example, I could imagine it'd be possible 
>> to do some sort of encode step on the data before insertion (and decode on 
>> removal) that would make UCA work.  I have no idea, but if such algorithms 
>> existed and were well understood, then it'd definitely make me lean towards 
>> punting language specification to v2.

I'm not sure I understand this paragraph. "UCA ordered" may not mean much more 
than just ordering using a binary collation if the language is not specified. 
While this is typically not an issue in English, in other languages this 
introduces a varying level of deviation from users' expectations. Given that 
different languages have conflicting rules for collation, I'm not sure how this 
can be generalized independently of the language. Even in the UCA specification 
[1] the aspect of input language is mentioned as the most important feature of 


[IndexedDB] READ_ONLY vs SNAPSHOT_READ transactions

2010-08-12 Thread Pablo Castro
We currently have two read-only transaction modes, READ_ONLY and SNAPSHOT_READ. 
As we map this out to implementation we ran into various questions that made me 
wonder whether we have the right set of modes. 

It seems that READ_ONLY and SNAPSHOT_READ are identical in every aspect 
(point-in-time consistency for readers, allow multiple concurrent readers, 
etc.), except that they have different concurrency characteristics, with 
READ_ONLY blocking writers and SNAPSHOT_READ allowing concurrent writers come 
and go while readers are active. Does that match everybody's interpretation?

Assuming that interpretation, then I'm not sure if we need both. Should we 
consider having only READ_ONLY, where transactions are guaranteed a stable view 
of the world regardless of the implementation strategy, and then let 
implementations either block writers or version the data? I understand that 
this introduces variability in the reader-writer interaction. On the other 
hand, I also suspect that the cost of SNAPSHOT_READ will also vary a lot across 
implementations (e.g. mvcc-based stores versus non-mvcc stores that will have 
to make copies of all stores included in a transaction to support this mode). 


RE: [IndexedDB] question about description argument of IDBFactory::open()

2010-08-12 Thread Pablo Castro

From: [] On 
Behalf Of Jeremy Orlow
Sent: Thursday, August 12, 2010 3:59 AM

>> On Thu, Aug 12, 2010 at 11:55 AM, Jonas Sicking  wrote:
>> On Thu, Aug 12, 2010 at 3:41 AM, Jeremy Orlow  wrote:
>> >
>> > One quesiton though: if they pass in null or undefined, do we want to
>> > interpret this as the argument not being passed in or simply let them
>> > convert to "undefined" and "null" (which is the default behavior in WebIDL,
>> > I believe).  I feel somewhat strongly we should do the former.  Especially
>> > since the latter would make it impossible to add additional parameters to
>> > .open() in the future.
>> I don't understand why it would make it impossible to add optional
>> parameters in the future. Wouldn't it be a matter of people writing
>>"mydatabase", "", SOME_OTHER_PARAM);
>> vs.
>>"mydatabase", null, SOME_OTHER_PARAM);
>> So "" is assumed to mean "don't update"?  My assumption was that "" meant 
>> empty description.
>> It seems silly to make someone replace the description with a space (or 
>> something like that) if they truly want to zero it out.  And it seems silly 
>> to ever make your description be >> "null".  So it seemed natural to make 
>> null and/or undefined be such a signal.

Given that open() is one of those functions that are likely to grow in 
parameters over time, I wonder if we should consider taking an object as the 
second argument with names/values(e.g. open("mydatabase", { description: "foo" 
}); ). That would allow us to keep the minimum specification small and easily 
add more parameters later without resulting un hard to read code that has a 
bunch of "undefined" in arguments. The only thing I'm not sure is if there is 
precedent of doing this in one of the standard APIs.


RE: [IndexedDB] Languages for collation

2010-08-12 Thread Pablo Castro

From: Mikeal Rogers [] 
Sent: Wednesday, August 11, 2010 11:35 PM

>> Why not just use the unicode collation algorithm?
>> Then you won't have to hint the locale.

Unless I'm missing something, the UCA defines the general algorithm for 
collating strings but you still need to know the language in order to sort 
strings properly in that language. For example, in Spanish the letters "c" and 
"h"  together (e.g. in "chau" (bye)) sort as a single letter, causing the 
expected sort order to be different from English where they are always two 
independent letters (e.g. so "chau" comes before "cuando" (when) when sorted in 
English, but after when sorted in Spanish).

>> CouchDB uses some definitions around sorting complex types like arrays and 
>> objects but when it comes down to sorting strings it just defaults to to the 
>> unicode collation algorithm and all the locale's are happy.
>> -Mikeal
>> On Wed, Aug 11, 2010 at 11:28 PM, Pablo Castro  
>> wrote:
>> We had some discussions about collation algorithms and such in the past, but 
>> I don't think we have settled on the language aspect of it. In order to have 
>> stores and indexes sort character-based keys in a way that is consistent 
>> with users' expectations we'll have to take indication in the API of what 
>> language we should use to collate strings.
>> Trying to take a minimalist approach, we could add an optional parameter on 
>> the database open call that indicates the language to use (e.g. "en" or 
>> "en-UK", etc.). If the language is not specified and the database does not 
>> exist, then we can use the current browser/OS language to create the 
>> database. If not specified and database already exists, then use the one 
>> it's already there (this accommodates the fact that a user may be able to 
>> change their default language in the browser/OS after the database has been 
>> created using the default). If the language is specified and the database 
>> already exists and the specified language is not the one the database has 
>> then we'll throw an exception (same behavior as with "description", although 
>> we have that one in flight right now as well).
>> We should probably also add a read-only attribute to the database object 
>> that exposes the language.
>> If this works for folks I can write a proposal for the specific changes to 
>> the spec.
>> Thanks
>> -pablo

[IndexedDB] Languages for collation

2010-08-11 Thread Pablo Castro
We had some discussions about collation algorithms and such in the past, but I 
don't think we have settled on the language aspect of it. In order to have 
stores and indexes sort character-based keys in a way that is consistent with 
users' expectations we'll have to take indication in the API of what language 
we should use to collate strings.

Trying to take a minimalist approach, we could add an optional parameter on the 
database open call that indicates the language to use (e.g. "en" or "en-UK", 
etc.). If the language is not specified and the database does not exist, then 
we can use the current browser/OS language to create the database. If not 
specified and database already exists, then use the one it's already there 
(this accommodates the fact that a user may be able to change their default 
language in the browser/OS after the database has been created using the 
default). If the language is specified and the database already exists and the 
specified language is not the one the database has then we'll throw an 
exception (same behavior as with "description", although we have that one in 
flight right now as well). 

We should probably also add a read-only attribute to the database object that 
exposes the language.

If this works for folks I can write a proposal for the specific changes to the 


RE: CfC: to publish new WD of Indexed Database API; deadline August 17

2010-08-11 Thread Pablo Castro
We support this as well.


-Original Message-
From: [] On 
Behalf Of Jonas Sicking
Sent: Tuesday, August 10, 2010 8:06 AM
To: Jeremy Orlow
Cc:; public-webapps
Subject: Re: CfC: to publish new WD of Indexed Database API; deadline August 17

I support this.

On Tue, Aug 10, 2010 at 4:38 AM, Jeremy Orlow  wrote:
> On Tue, Aug 10, 2010 at 12:04 PM, Arthur Barstow 
> wrote:
>> All - the Editors of the Indexed Database API would like to publish a new
>> Working Draft:
>> If you have any comments or concerns about this proposal, please send them
>> to public-webapps by August 10 at the latest.
> I assume you mean the 17th?
>> As with all of our CfCs, positive response is preferred and encouraged and
>> silence will be assumed to be assent.
> We support.

RE: [IndexedDB] Need a method to remove a database

2010-08-09 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Friday, August 06, 2010 2:34 AM

>> On Fri, Aug 6, 2010 at 12:37 AM, Jonas Sicking  wrote:
>> On Thu, Aug 5, 2010 at 4:02 PM, Pablo Castro  
>> wrote:
>> >
>> > -Original Message-
>> > From: [] 
>> > On Behalf Of Jonas Sicking
>> > Sent: Thursday, August 05, 2010 2:12 PM
>> >
>> >>> >> I suggest we make removeDatabase (or whatever we call it) schedule a
>> >>> >> database to be deleted, but doesn't actually delete it until all
>> >>> >> existing connections to it are closed (though either explicit calls to
>> >>> >> IDBDatabase.close(), or through the tab being closed).
>> >>> >>
>> >>> >> Any calls to with the same name will hold the callback
>> >>> >> until the removeDatabase() operation is finished. I.e. after all
>> >>> >> existing connections are closed and the database is removed.
>> >>> >>
>> >>> >> This is similar to how setVersion works.
>> >>> >
>> >>> > If we're not going to keep it simple, then we should match the 
>> >>> > setVersion
>> >>> > semantics as much as is possible.  I.e. add the blocked event and 
>> >>> > stuff like
>> >>> > that.
>> >>>
>> >>> The "blocked" event fires on the IDBDatabase object. Do we want to
>> >>> require that the database is opened before it can be removed? I don't
>> >>> really feel strongly either way.
>> >>>
>> >>> The other question is if we should fire a "versionchange" event on
>> >>> other open IDBDatabases, like setVersion does. Or should we fire a
>> >>> "holy hell, your database is about to get nuked!" event? The former
>> >>> would keep things simpler since there is just one event to listen to.
>> >>> The latter might be more correct.
>> >>>
>> >>> / Jonas
>> >
>> > I like the idea of just scheduling the database to be deleted once the 
>> > last connection to it closes, and also preventing any new connection from 
>> > being established >> once the database has been scheduled for deletion. 
>> > This adds as little surface area as possible to the API.
>> >
>> > If we find that that's not a good idea for some reason, I wonder if we 
>> > should unify the "versionchange" event and this into a single "stuff 
>> > seriously changed" event where subscribers need to close their handles and 
>> > let go of any assumptions they had about the database. Once they can 
>> > re-open, they need to re-establish all their context (this is already true 
>> > for a version change, we may as well extend it to database deletes and any 
>> > other future big changes to the database schema, options, etc.)
>> Here's my proposal, please poke holes in it:
>> interface IDBFactory {
>> ...
>> IDBRequest deleteDatabase(in DOMString name);
>> ...
>> };
>> When deleteDatabase is called, the given database is scheduled for
>> deletion. If any IDBDatabase objects are opened to the database fire a
>> "versionchange" event on those IDBDatabase objects, with a .version
>> set to null. If any calls to occur, stall those until
>> after this algorithm is finished. Note that this generally won't mean
>> that those open calls will fail. They'll generally will receive a
>> newly created database instead.
>> Once all existing IDBDatabase are closed (implicitly or explicitly),
>> the database is removed. At this point any calls are
>> fulfilled and a "success" event is fired on the returned IDBRequest.
>> So no "blocked" event is fired as I'm not sure where to fire it. I'm
>> also not sure that this is a big problem. I'm not even sure that
>> returning a IDBRequest is worth it. The only value I can see is
>> wanting to display to a user when a database is for sure deleted as to
>> allow the user to for example safely shut down the computer without
>> worrying that sensitive data is still in the database.
>> All of this sounds good to me.  I'd probably still return an IDBRequest 
>> for consistency and so that the app can get a conformation when it's really 
>> gone.  On success would fire with a "null" result field, I'd think.

This looks good to me too. I agree with still having deleteDatabase return an 
IDBRequest so the caller can tell when the operation is done.


RE: [IndexedDB] Need a method to remove a database

2010-08-05 Thread Pablo Castro

-Original Message-
From: [] On 
Behalf Of Jonas Sicking
Sent: Thursday, August 05, 2010 2:12 PM

>> >> I suggest we make removeDatabase (or whatever we call it) schedule a
>> >> database to be deleted, but doesn't actually delete it until all
>> >> existing connections to it are closed (though either explicit calls to
>> >> IDBDatabase.close(), or through the tab being closed).
>> >>
>> >> Any calls to with the same name will hold the callback
>> >> until the removeDatabase() operation is finished. I.e. after all
>> >> existing connections are closed and the database is removed.
>> >>
>> >> This is similar to how setVersion works.
>> >
>> > If we're not going to keep it simple, then we should match the setVersion
>> > semantics as much as is possible.  I.e. add the blocked event and stuff 
>> > like
>> > that.
>> The "blocked" event fires on the IDBDatabase object. Do we want to
>> require that the database is opened before it can be removed? I don't
>> really feel strongly either way.
>> The other question is if we should fire a "versionchange" event on
>> other open IDBDatabases, like setVersion does. Or should we fire a
>> "holy hell, your database is about to get nuked!" event? The former
>> would keep things simpler since there is just one event to listen to.
>> The latter might be more correct.
>> / Jonas

I like the idea of just scheduling the database to be deleted once the last 
connection to it closes, and also preventing any new connection from being 
established once the database has been scheduled for deletion. This adds as 
little surface area as possible to the API.

If we find that that's not a good idea for some reason, I wonder if we should 
unify the "versionchange" event and this into a single "stuff seriously 
changed" event where subscribers need to close their handles and let go of any 
assumptions they had about the database. Once they can re-open, they need to 
re-establish all their context (this is already true for a version change, we 
may as well extend it to database deletes and any other future big changes to 
the database schema, options, etc.)


RE: [IndexedDB] Need a method to clear an object store

2010-08-04 Thread Pablo Castro

From: [] On 
Behalf Of Jonas Sicking
Sent: Tuesday, August 03, 2010 12:21 PM

>> On Tue, Aug 3, 2010 at 12:09 PM, ben turner  wrote:
>> > Hi folks,
>> >
>> > Currently there are only two ways to clear an object store of all
>> > data: (i) remove the object store and recreate it, or (ii) open a
>> > cursor and call remove for all entries. I propose a third, simpler
>> > approach:
>> >
>> > interface IDBObjectStore
>> > {
>> >  ...
>> >  void clear();
>> >  ...
>> > };
>> >
>> > Any thoughts?
>> Some background. At least in our implementation, removing each
>> individual item is significantly slower than removing and recreating
>> the objectStore. It's also significantly slower than a 'clear'
>> function is. And while tearing down and recreating the objectStore
>> works, it's fairly complex if there are multiple indexes on the store.
>> Adding a clear() function, while redundant, should make things easier
>> for developers while adding very little work in the implementation.
>> I think there is a bug in the above proposal though. clear() should
>> return a IDBRequest. However the .result of the request should likely
>> be null.
>> / Jonas

+1 on having clear(). We ran into the need also while playing with samples and 


RE: [IndexedDB] Need a method to remove a database

2010-08-04 Thread Pablo Castro

From: [] On 
Behalf Of Jeremy Orlow
Sent: Wednesday, August 04, 2010 2:56 AM

>> On Tue, Aug 3, 2010 at 11:26 PM, Jonas Sicking  wrote:
>> On Tue, Aug 3, 2010 at 3:20 PM, Shawn Wilsher  wrote:
>> > Hey all,
>> >
>> > Some of the feedback I've been seeing on the web is that there is no way to
>> > remove a database.  Examples seem to be "web page wants to allow the user 
>> > to
>> > remove the data they stored".  A site can almost accomplish this now by
>> > removing all object stores, but we still end up storing some meta data
>> > (version number).  Does this seem like a legit request to everyone?
>> Sounds legit to me. Feel somewhat embarrassed that I've missed this so far :)
>> Agreed.
>> What should the semantics be for open database connections?  We could do 
>> something like setVersion, but I'd just as soon nuke any existing connection 
>> (i.e. make all future operations fail).  This seems >> reasonable since the 
>> reasons we didn't do this for setVersion (data loss) don't really seem to 
>> apply here.
>> J


Nuking is fine...another option would be to queue up the delete until all 
database sessions are gone, but probably will complicate things and not add 
much. The only thing I wonder is if we'll create a bunch of pain for 
implementations where nuking is tricky (thinking of multi-process scenarios 
where maybe files are locked or something).


RE: [IndexedDB] Current editor's draft

2010-07-22 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Thursday, July 22, 2010 5:30 PM

>> On Thu, Jul 22, 2010 at 5:26 PM, Pablo Castro
>>  wrote:
>> >
>> > From: Jonas Sicking []
>> > Sent: Thursday, July 22, 2010 5:18 PM
>> >
>> >>> > The author doesn't explicitly specify which rows to lock. All rows 
>> >>> > that you "see" become locked (e.g. through get(), put(), scanning with 
>> >>> > a cursor, etc.). If you start the transaction as read-only then 
>> >>> > they'll all have shared locks. If you start the transaction as 
>> >>> > read-write then we can choose whether the implementation should always 
>> >>> > attempt to take exclusive locks or if it should take shared locks on 
>> >>> > read, and attempt to upgrade to an exclusive lock on first write (this 
>> >>> > affects failure modes a bit).

>> >
>> >>> What counts as "see"? If you iterate using an index-cursor all the
>> >>> rows that have some value between "A" and "B", but another, not yet
>> >>> committed, transaction changes a row such that its value now is
>> >>> between "A" and "B", what happens?
>> >
>> > We need to design something a bit more formal that covers the whole 
>> > spectrum. As a short answer, assuming we want to have "serializable" as 
>> > our isolation level, then we'd have a range lock that goes from the start 
>> > of a cursor to the point you've reached, so if you were to start another 
>> > cursor you'd be guaranteed the exact same view of the world. In that case 
>> > it wouldn't be possible for other transaction to insert a row between two 
>> > rows you scanned through with a cursor.
>> How would you prevent that? Would a call to .modify() or .put() block
>> until the other transaction finishes? With appropriate timeouts on
>> deadlocks of course.

That's right, calls would block if they need to acquire a lock for a key or a 
range and there is an incompatible lock present that overlaps somehow with that.


RE: [IndexedDB] Current editor's draft

2010-07-22 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Thursday, July 22, 2010 5:25 PM

>> >> Regarding deadlocks, that's right, the implementation cannot determine if
>> >> a deadlock will occur ahead of time. Sophisticated implementations could
>> >> track locks/owners and do deadlock detection, although a simple
>> >> timeout-based mechanism is probably enough for IndexedDB.
>> >
>> > Simple implementations will not deadlock because they're only doing object
>> > store level locking in a constant locking order.

Well, it's not really simple vs sophisticated, but whether they do dynamically 
scoped transactions or not, isn't it? If you do dynamic transactions, then 
regardless of the granularity of your locks, code will grow the lock space in a 
way that you cannot predict so you can't use a well-known locking order, so 
deadlocks are not avoidable. 

>> >  Sophisticated implementations will be doing key level (IndexedDB's analog
>> > to row level) locking with deadlock detection or using methods to 
>> > completely
>> > avoid it.  I'm not sure I'm comfortable with having one or two in-between
>> > implementations relying on timeouts to resolve deadlocks.

Deadlock detection is quite a bit to ask from the storage engine. From the 
developer's perspective, the difference between deadlock detection and timeouts 
for deadlocks is the fact that the timeout approach will take a bit longer, and 
the error won't be as definitive. I don't think this particular difference is 
enough to require deadlock detection.

>> > Of course, if we're breaking deadlocks that means that web developers need
>> > to handle this error case on every async request they make.  As such, I'd
>> > rather that we require implementations to make deadlocks impossible.  This
>> > means that they either need to be conservative about locking or to do MVCC
>> > (or something similar) so that transactions can continue on even beyond the
>> > point where we know they can't be serialized.  This would 
>> > be consistent with
>> > our usual policy of trying to put as much of the burden as is practical on
>> > the browser developers rather than web developers.

Same as above...MVCC is quite a bit to mandate from all implementations. For 
example, I'm not sure but from my basic understanding of SQLite I think it 
always does straight up locking and doesn't have support for versioning.

>> >>
>> >> As for locking only existing rows, that depends on how much isolation we
>> >> want to provide. If we want "serializable", then we'd have to put in 
>> >> things
>> >> such as range locks and locks on non-existing keys so reads are consistent
>> >> w.r.t. newly created rows.
>> >
>> > For the record, I am completely against anything other than "serializable"
>> > being the default.  Everything a web developer deals with follows run to
>> > completion.  If you want to have optional modes that relax things in terms
>> > of serializability, maybe we should start a new thread?
>> Agreed.
>> I was against dynamic transactions even when they used
>> whole-objectStore locking. So I'm even more so now that people are
>> proposing row-level locking. But I'd like to understand what people
>> are proposing, and make sure that what is being proposed is a coherent
>> solution, so that we can correctly evaluate it's risks versus
>> benefits.

The way I see the risk/benefit tradeoff of dynamic transactions: they bring 
better concurrency and more flexibility at the cost of new failure modes. I 
think that weighing them in those terms is more important than the specifics 
such as whether it's okay to have timeouts versus explicit deadlock errors. 


RE: [IndexedDB] Current editor's draft

2010-07-22 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Thursday, July 22, 2010 5:18 PM

>> > The author doesn't explicitly specify which rows to lock. All rows that 
>> > you "see" become locked (e.g. through get(), put(), scanning with a 
>> > cursor, etc.). If you start the transaction as read-only then they'll all 
>> > have shared locks. If you start the transaction as read-write then we can 
>> > choose whether the implementation should always attempt to take exclusive 
>> > locks or if it should take shared locks on read, and attempt to upgrade to 
>> > an exclusive lock on first write (this affects failure modes a bit).

>> What counts as "see"? If you iterate using an index-cursor all the
>> rows that have some value between "A" and "B", but another, not yet
>> committed, transaction changes a row such that its value now is
>> between "A" and "B", what happens?

We need to design something a bit more formal that covers the whole spectrum. 
As a short answer, assuming we want to have "serializable" as our isolation 
level, then we'd have a range lock that goes from the start of a cursor to the 
point you've reached, so if you were to start another cursor you'd be 
guaranteed the exact same view of the world. In that case it wouldn't be 
possible for other transaction to insert a row between two rows you scanned 
through with a cursor.


RE: [IndexedDB] Current editor's draft

2010-07-22 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Thursday, July 22, 2010 11:27 AM

>> On Thu, Jul 22, 2010 at 3:43 AM, Nikunj Mehta  wrote:
>> >
>> > On Jul 16, 2010, at 5:41 AM, Pablo Castro wrote:
>> >
>> >>
>> >> From: [] On Behalf Of Jeremy 
>> >> Orlow
>> >> Sent: Thursday, July 15, 2010 8:41 AM
>> >>
>> >> On Thu, Jul 15, 2010 at 4:30 PM, Andrei Popescu  
>> >> wrote:
>> >> On Thu, Jul 15, 2010 at 3:24 PM, Jeremy Orlow  wrote:
>> >>> On Thu, Jul 15, 2010 at 3:09 PM, Andrei Popescu  
>> >>> wrote:
>> >>>>
>> >>>> On Thu, Jul 15, 2010 at 9:50 AM, Jeremy Orlow  
>> >>>> wrote:
>> >>>>>>>> Nikunj, could you clarify how locking works for the dynamic
>> >>>>>>>> transactions proposal that is in the spec draft right now?
>> >>>>>>>
>> >>>>>>> I'd definitely like to hear what Nikunj originally intended here.
>> >>>>>>>>
>> >>>>>>
>> >>>>>> Hmm, after re-reading the current spec, my understanding is that:
>> >>>>>>
>> >>>>>> - Scope consists in a set of object stores that the transaction 
>> >>>>>> operates
>> >>>>>> on.
>> >>>>>> - A connection may have zero or one active transactions.
>> >>>>>> - There may not be any overlap among the scopes of all active
>> >>>>>> transactions (static or dynamic) in a given database. So you cannot
>> >>>>>> have two READ_ONLY static transactions operating simultaneously over
>> >>>>>> the same object store.
>> >>>>>> - The granularity of locking for dynamic transactions is not specified
>> >>>>>> (all the spec says about this is "do not acquire locks on any database
>> >>>>>> objects now. Locks are obtained as the application attempts to access
>> >>>>>> those objects").
>> >>>>>> - Using dynamic transactions can lead to dealocks.
>> >>>>>>
>> >>>>>> Given the changes in 9975, here's what I think the spec should say for
>> >>>>>> now:
>> >>>>>>
>> >>>>>> - There can be multiple active static transactions, as long as their
>> >>>>>> scopes do not overlap, or the overlapping objects are locked in modes
>> >>>>>> that are not mutually exclusive.
>> >>>>>> - [If we decide to keep dynamic transactions] There can be multiple
>> >>>>>> active dynamic transactions. TODO: Decide what to do if they start
>> >>>>>> overlapping:
>> >>>>>>   -- proceed anyway and then fail at commit time in case of
>> >>>>>> conflicts. However, I think this would require implementing MVCC, so
>> >>>>>> implementations that use SQLite would be in trouble?
>> >>>>>
>> >>>>> Such implementations could just lock more conservatively (i.e. not 
>> >>>>> allow
>> >>>>> other transactions during a dynamic transaction).
>> >>>>>
>> >>>> Umm, I am not sure how useful dynamic transactions would be in that
>> >>>> case...Ben Turner made the same comment earlier in the thread and I
>> >>>> agree with him.
>> >>>>
>> >>>> Yes, dynamic transactions would not be useful on those implementations, 
>> >>>> but the point is that you could still implement the spec without a MVCC 
>> >>>> backend--though it >> would limit the concurrency that's possible.  
>> >>>> Thus "implementations that use SQLite would" NOT necessarily "be in 
>> >>>> trouble".
>> >>
>> >> Interesting, I'm glad this conversation came up so we can sync up on 
>> >> assumptions...mine where:
>> >> - There can be multiple transactions of any kind active against a given 
>> >> database session (see note below)
>> >> - Multiple static transactions may overlap as long as they have 
>> >> compatible modes, which in practice means they are all READ_ONLY
>> >> - D

RE: [IndexedDB] Cursors and modifications

2010-07-15 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Thursday, July 15, 2010 11:59 AM

On Thu, Jul 15, 2010 at 11:02 AM, Pablo Castro
>> >
>> > From: [] On Behalf Of Jeremy 
>> > Orlow
>> > Sent: Thursday, July 15, 2010 2:04 AM
>> >
>> > On Thu, Jul 15, 2010 at 2:44 AM, Jonas Sicking  wrote:
>> > On Wed, Jul 14, 2010 at 6:20 PM, Pablo Castro  
>> > wrote:
>> >
>> >>> > If it's accurate, as a side note, for the async API it seems that this 
>> >>> > makes it more interesting to enforce callback order, so we can more 
>> >>> > easily explain what we mean by "before".
>> >>> Indeed.
>> >>>
>> >>> What do you mean by enforce callback order?  Are you saying that 
>> >>> callbacks should be done in the order the requests are made (rather than 
>> >>> prioritizing cursor callbacks)?  (That's how I read it, but Jonas' 
>> >>> "Indeed" makes me suspect I missed something. :-)
>> >
>> > That's right. If changes are visible as they are made within a 
>> > transaction, then reordering the callbacks would have a visible effect. In 
>> > particular if we prioritize the cursor callbacks then you'll tend to see a 
>> > callback for a cursor move before you see a callback for say an 
>> > add/modify, and it's not clear at that point whether the add/modify 
>> > happened already and is visible (but the callback didn't land yet) or if 
>> > the change hasn't happened yet. If callbacks are in order, you see changes 
>> > within your transaction strictly in the order that each request is made, 
>> > avoiding surprises in cursor callbacks.

>> Oh, I took what you said just as that we need to have a defined
>> callback order. Not anything in particular what that definition should
>> be.
>> Regarding when a modification happens, I think the design should be
>> that changes logically happen as soon as the 'success' call is fired.
>> Any success calls after that will see the modified values.

Yep, I agree with this, a change happened "for sure" when you see the success 
callback. Before that you may or may not observe the change if you do a get or 
open a cursor to look at the record.
>> I still think given the quite substantial speedups gained from
>> prioritizing cursor callbacks, that it's the right thing to do. It
>> arguably also has some benefits from a practical point of view when it
>> comes to the very topic we're discussing. If we prioritize cursor
>> callbacks, that makes it much easier to iterate a set of entries and
>> update them, without having to worry about those updates messing up
>> your iterator.

I hear you on the perf implications, but I'm worried that non-sequential order 
for callbacks will be completely non-intuitive for users. In particular, if 
you're changing things as you scan a cursor, if then you cursor through the 
changes you're not sure if you'll see the changes or not (because the callback 
is the only "definitive" point where the change is visible. That seems quite 


RE: [IndexedDB] Current editor's draft

2010-07-15 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Thursday, July 15, 2010 8:41 AM

On Thu, Jul 15, 2010 at 4:30 PM, Andrei Popescu  wrote:
On Thu, Jul 15, 2010 at 3:24 PM, Jeremy Orlow  wrote:
> On Thu, Jul 15, 2010 at 3:09 PM, Andrei Popescu  wrote:
>> On Thu, Jul 15, 2010 at 9:50 AM, Jeremy Orlow  wrote:
>> >> >> Nikunj, could you clarify how locking works for the dynamic
>> >> >> transactions proposal that is in the spec draft right now?
>> >> >
>> >> > I'd definitely like to hear what Nikunj originally intended here.
>> >> >>
>> >>
>> >> Hmm, after re-reading the current spec, my understanding is that:
>> >>
>> >> - Scope consists in a set of object stores that the transaction operates
>> >> on.
>> >> - A connection may have zero or one active transactions.
>> >> - There may not be any overlap among the scopes of all active
>> >> transactions (static or dynamic) in a given database. So you cannot
>> >> have two READ_ONLY static transactions operating simultaneously over
>> >> the same object store.
>> >> - The granularity of locking for dynamic transactions is not specified
>> >> (all the spec says about this is "do not acquire locks on any database
>> >> objects now. Locks are obtained as the application attempts to access
>> >> those objects").
>> >> - Using dynamic transactions can lead to dealocks.
>> >>
>> >> Given the changes in 9975, here's what I think the spec should say for
>> >> now:
>> >>
>> >> - There can be multiple active static transactions, as long as their
>> >> scopes do not overlap, or the overlapping objects are locked in modes
>> >> that are not mutually exclusive.
>> >> - [If we decide to keep dynamic transactions] There can be multiple
>> >> active dynamic transactions. TODO: Decide what to do if they start
>> >> overlapping:
>> >>   -- proceed anyway and then fail at commit time in case of
>> >> conflicts. However, I think this would require implementing MVCC, so
>> >> implementations that use SQLite would be in trouble?
>> >
>> > Such implementations could just lock more conservatively (i.e. not allow
>> > other transactions during a dynamic transaction).
>> >
>> Umm, I am not sure how useful dynamic transactions would be in that
>> case...Ben Turner made the same comment earlier in the thread and I
>> agree with him.
>> Yes, dynamic transactions would not be useful on those implementations, but 
>> the point is that you could still implement the spec without a MVCC 
>> backend--though it would limit the concurrency that's possible.  Thus 
>> "implementations that use SQLite would" NOT necessarily "be in trouble".

Interesting, I'm glad this conversation came up so we can sync up on 
assumptions...mine where:
- There can be multiple transactions of any kind active against a given 
database session (see note below)
- Multiple static transactions may overlap as long as they have compatible 
modes, which in practice means they are all READ_ONLY
- Dynamic transactions have arbitrary granularity for scope (implementation 
specific, down to row-level locking/scope)
- Overlapping between statically and dynamically scoped transactions follows 
the same rules as static-static overlaps; they can only overlap on compatible 
scopes. The only difference is that dynamic transactions may need to block 
mid-flight until it can grab the resources it needs to proceed.

Note: for some databases having multiple transactions active on a single 
connection may be an unsupported thing. This could probably be handled in the 
IndexedDB layer though by using multiple connections under the covers.


RE: [IndexedDB] Cursors and modifications

2010-07-15 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Thursday, July 15, 2010 2:04 AM

On Thu, Jul 15, 2010 at 2:44 AM, Jonas Sicking  wrote:
On Wed, Jul 14, 2010 at 6:20 PM, Pablo Castro  

>> > If it's accurate, as a side note, for the async API it seems that this 
>> > makes it more interesting to enforce callback order, so we can more easily 
>> > explain what we mean by "before".
>> Indeed.
>> What do you mean by enforce callback order?  Are you saying that callbacks 
>> should be done in the order the requests are made (rather than prioritizing 
>> cursor callbacks)?  (That's how I read it, but Jonas' "Indeed" makes me 
>> suspect I missed something. :-)

That's right. If changes are visible as they are made within a transaction, 
then reordering the callbacks would have a visible effect. In particular if we 
prioritize the cursor callbacks then you'll tend to see a callback for a cursor 
move before you see a callback for say an add/modify, and it's not clear at 
that point whether the add/modify happened already and is visible (but the 
callback didn't land yet) or if the change hasn't happened yet. If callbacks 
are in order, you see changes within your transaction strictly in the order 
that each request is made, avoiding surprises in cursor callbacks. 


RE: [IndexedDB] Cursors and modifications

2010-07-14 Thread Pablo Castro
Making sure I get the essence of this thread: we're saying that cursors see 
live changes as they happen on objects that are "after" the object you're 
currently standing on; and of course, any other activity within a transaction 
sees all the changes that happened before that activity took place. Is that 

If it's accurate, as a side note, for the async API it seems that this makes it 
more interesting to enforce callback order, so we can more easily explain what 
we mean by "before".


From: [] On Behalf Of Jeremy Orlow
Sent: Wednesday, July 14, 2010 9:27 AM

On Wed, Jul 14, 2010 at 5:17 PM, Jonas Sicking  wrote:
On Wed, Jul 14, 2010 at 5:12 AM, Jeremy Orlow  wrote:
> On Thu, Jul 8, 2010 at 8:42 PM, Jonas Sicking  wrote:
>> On Mon, Jul 5, 2010 at 9:45 AM, Andrei Popescu  wrote:
>> > On Sat, Jul 3, 2010 at 2:09 AM, Jonas Sicking  wrote:
>> >> On Fri, Jul 2, 2010 at 5:44 PM, Andrei Popescu 
>> >> wrote:
>> >>> On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking 
>> >>> wrote:
>> >>>> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro
>> >>>>  wrote:
>> >>>>>
>> >>>>> From:
>> >>>>> [] On Behalf Of Jonas Sicking
>> >>>>> Sent: Friday, July 02, 2010 4:00 PM
>> >>>>>
>> >>>>>>> We ran into an complicated issue while implementing IndexedDB. In
>> >>>>>>> short, what should happen if an object store is modified while a 
>> >>>>>>> cursor is
>> >>>>>>> iterating it? >> Note that the modification can be done within the 
>> >>>>>>> same
>> >>>>>>> transaction, so the read/write locks preventing several transactions 
>> >>>>>>> from
>> >>>>>>> accessing the same table isn't helping here.
>> >>>>>>>
>> >>>>>>> Detailed problem description (this assumes the API proposed by
>> >>>>>>> mozilla):
>> >>>>>>>
>> >>>>>>> Consider a objectStore "words" containing the following objects:
>> >>>>>>> { name: "alpha" }
>> >>>>>>> { name: "bravo" }
>> >>>>>>> { name: "charlie" }
>> >>>>>>> { name: "delta" }
>> >>>>>>>
>> >>>>>>> and the following program (db is a previously opened IDBDatabase):
>> >>>>>>>
>> >>>>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var
>> >>>>>>> result = []; trans.objectStore("words").openCursor().onsuccess = 
>> >>>>>>> function(e)
>> >>>>>>> {
>> >>>>>>>   cursor = e.result;
>> >>>>>>>   result.push(cursor.value);
>> >>>>>>>   cursor.continue();
>> >>>>>>> }
>> >>>>>>> trans.objectStore("words").get("delta").onsuccess = function(e) {
>> >>>>>>>   trans.objectStore("words").put({ name: "delta", myModifiedValue:
>> >>>>>>> 17 }); }
>> >>>>>>>
>> >>>>>>> When the cursor reads the "delta" entry, will it see the
>> >>>>>>> 'myModifiedValue' property? Since we so far has defined that the 
>> >>>>>>> callback
>> >>>>>>> order is defined to be >> the request order, that means that put 
>> >>>>>>> request
>> >>>>>>> will be finished before the "delta" entry is iterated by the cursor.
>> >>>>>>>
>> >>>>>>> The problem is even more serious with cursors that iterate
>> >>>>>>> indexes.
>> >>>>>>> Here a modification can even affect the position of the currently
>> >>>>>>> iterated object in the index, and the modification can (if i'm 
>> >>>>>>> reading the
>> >>>>>>> spec correctly) >> come from the cursor itself.
>> >>>>>>>
>> &

RE: [IndexedDB] Current editor's draft

2010-07-14 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Wednesday, July 14, 2010 5:43 PM

On Wed, Jul 14, 2010 at 5:03 PM, Pablo Castro
> From: Jonas Sicking []
> Sent: Wednesday, July 14, 2010 12:07 AM

>> I think what I'm struggling with is how dynamic transactions will help
>> since they are still doing whole-objectStore locking. I'm also curious
>> how you envision people dealing with deadlock hazards. Nikunjs
>> examples in the beginning of this thread simply throw up their hands
>> and report an error if there was a deadlock. That is obviously not
>> good enough for an actual application.
>> So in short, looking forward to an example :)

I'll try to come up with one, although I doubt the code itself will be very 
interesting in this particular case. Not sure what you mean by "they are still 
doing whole-objectStore locking". The point of dynamic transactions is that 
they *don't* lock the whole store, but instead have the freedom to choose the 
granularity (e.g. you could do row-level locking). 

As for deadlocks, whenever you're doing an operation you need to be ready to 
handle errors (out of disk, timeout, etc.). I'm not sure why deadlocks are 
different. If the underlying implementation has deadlock detection then you may 
get a specific error, otherwise you'll just get a timeout. 

>> >>> This will likely be extra bad for transactions where no write
>> >>> operations are done. In this case failure to call a 'commit()'
>> >>> function won't result in any broken behavior. The transaction will
>> >>> just sit open for a long time and eventually "rolled back", though
>> >>> since no changes were done, the rollback is transparent, and the only
>> >>> noticeable effect is that the application halts for a while while the
>> >>> transaction is waiting to time out.
>> >>>
>> >>> I should add that the WebSQLDatabase uses automatically committing
>> >>> transactions very similar to what we're proposing, and it seems to
>> >>> have worked fine there.
>> >
>> > I find this a bit scary, although it could be that I'm permanently tainted 
>> > with traditional database stuff. Typical databases follow a presumed abort 
>> > protocol, where if your code is interrupted by an exception, a process 
>> > crash or whatever, you can always assume transactions will be rolled back 
>> > if you didn't reach an explicit call to commit. The implicit commit here 
>> > takes that away, and I'm not sure how safe that is.
>> >
>> > For example, if I don't have proper exception handling in place, an 
>> > illegal call to some other non-indexeddb related API may throw an 
>> > exception causing the whole thing to unwind, at which point nothing will 
>> > be pending to do in the database and thus the currently active transaction 
>> > will be committed.
>> >
>> > Using the same line of thought we used for READ_ONLY, forgetting to call 
>> > commit() is easy to detect the first time you try out your code. Your 
>> > changes will simply not stick. It's not as clear as the READ_ONLY example 
>> > because there is no opportunity to throw an explicit exception with an 
>> > explanation, but the data not being around will certainly prompt 
>> > developers to look for the issue :)

>> Ah, I see where we are differing in thinking. My main concern has been
>> that of rollbacks, and associated dataloss, in the non-error case. For
>> example people forget to call commit() in some branch of their code,
>> thus causing dataloss when the transaction is rolled back.
>> Your concern seems to be that of lack of rollback in the error case,
>> for example when an exception is thrown and not caught somewhere in
>> the code. In this case you'd want to have the transaction rolled back.
>> One way to handle this is to try to detect unhandled errors and
>> implicitly roll back the transaction. Two situations where we could do
>> this is:
>> 1. When an 'error' event is fired, but where .preventDefault() has is
>> not called by any handler. The result is that if an error is ever
>> fired, but no one explicitly handles it, we roll back the transaction.
>> See also below.
>> 2. When a success handler is called, but the handler throws an exception.
>> The second is a bit of a problem from a spec point of view. I'm not
>> sure it is allowed by the DOM Events spec, or by all existi

RE: [IndexedDB] IDBRequest.abort on writing requests

2010-07-14 Thread Pablo Castro
>From my perspective cancelling is not something that happens that often, and 
>when it happens it's probably ok to cancel the whole transaction. If we can 
>spec abort() in the transaction object such that it try to cancel all pending 
>operations and then rollback any work that has been done so far, then we 
>probably don't need abort on individual operations (with the added value that 
>it's uniform across read and write operations).


From: [] On 
Behalf Of Jeremy Orlow
Sent: Wednesday, July 14, 2010 1:57 AM

On Wed, Jul 14, 2010 at 9:14 AM, Jonas Sicking  wrote:
On Wed, Jul 14, 2010 at 1:02 AM, Jeremy Orlow  wrote:
> On Wed, Jul 14, 2010 at 8:53 AM, Jonas Sicking  wrote:
>> On Tue, Jul 13, 2010 at 11:33 PM, Jeremy Orlow 
>> wrote:
>> > On Wed, Jul 14, 2010 at 7:28 AM, Jonas Sicking  wrote:
>> >>
>> >> On Tue, Jul 13, 2010 at 11:12 PM, Jeremy Orlow 
>> >> wrote:
>> >> > On Tue, Jul 13, 2010 at 9:41 PM, Jonas Sicking 
>> >> > wrote:
>> >> >>
>> >> >> On Tue, Jul 13, 2010 at 1:17 PM, Jeremy Orlow 
>> >> >> wrote:
>> >> >> > On Tue, Jul 13, 2010 at 8:25 PM, Jonas Sicking 
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi All,
>> >> >> >>
>> >> >> >> Sorry if this is something that I've brought up before. I know I
>> >> >> >> meant
>> >> >> >> to bring this up in the past, but I couldn't find any actual
>> >> >> >> emails.
>> >> >> >>
>> >> >> >> One thing that we discussed while implementing IndexedDB was what
>> >> >> >> to
>> >> >> >> do for IDBRequest.abort() or "writing" requests. For example on
>> >> >> >> the
>> >> >> >> request object returned from IDBObjectStore.remove() or
>> >> >> >> IDBCursor.update().
>> >> >> >>
>> >> >> >> Ideal would of course be if it would cancel the write operation,
>> >> >> >> however this isn't always possible. If the call to .abort() comes
>> >> >> >> after the write operation has already executed in the database,
>> >> >> >> but
>> >> >> >> before the 'success' event has had a chance to fire. What's worse
>> >> >> >> is
>> >> >> >> that other write operations might already have been performed on
>> >> >> >> top
>> >> >> >> of the aborted request. Consider for example the following code:
>> >> >> >>
>> >> >> >> req1 = myObjectStore.remove(12);
>> >> >> >> req2 = myObjectStore.add({ id: 12, name: "Benny Andersson" });
>> >> >> >>  do other stuff 
>> >> >> >> req1.abort();
>> >> >> >>
>> >> >> >> In this case, even if the database supported aborting a specific
>> >> >> >> operation, it's very hard to say what the correct thing to do
>> >> >> >> with
>> >> >> >> operations performed after it. As far as I know, databases
>> >> >> >> generally
>> >> >> >> don't support rolling back a given operation, only rolling back
>> >> >> >> to a
>> >> >> >> specific point, i.e. rolling back a given operation and all
>> >> >> >> operations
>> >> >> >> performed after it.
>> >> >> >>
>> >> >> >> We could say that abort() signals some sort of error if the
>> >> >> >> operation
>> >> >> >> has already been performed in the database, however that makes
>> >> >> >> abort()
>> >> >> >> very racy.
>> >> >> >>
>> >> >> >> Instead we concluded that the best thing to do was to specify
>> >> >> >> that
>> >> >> >> IDBRequest.abort() should throw if called on a modifying request.
>> >> >> >> If
>> >> >> >> this sounds good I'll make this change to the spec.
>> >> >> >
>> >> >> > I'd be fine with that.
>> >> >> > Or we could remove abort all together.  I can't really think of
>> >> >> > what
>> >> >> > types
>> >> >> > of operations you'd really want to abort until (at least) we have
>> >> >> > some
>> >> >> > sort
>> >> >> > of join language or other mechanism to do really expensive
>> >> >> > read-only
>> >> >> > calls.
>> >> >>
>> >> >> I think there are expensive-ish read-only calls. Indexes are
>> >> >> effectively a join mechanism since you'll hit one b-tree to do the
>> >> >> index lookup, and then a second b-tree to look up the full object in
>> >> >> the objectStore.
>> >> >
>> >> > But each individual call (the scope of canceling an IDBRequest) is
>> >> > pretty
>> >> > short.
>> >> >
>> >> >>
>> >> >> I don't really feel strongly either way. I think abort() isn't too
>> >> >> hard to implement, but also doesn't provide a ton of value. At least
>> >> >> not, like you say, until we add expensive calls like getAll or
>> >> >> multi-step joins.
>> >> >
>> >> > I agree that when we look at adding such calls we may want to add an
>> >> > abort
>> >> > on just IDBRequest, but until then I don't think it's a very useful
>> >> > feature.
>> >> >  And being easy to add is not a good reason to lock ourselves into
>> >> > a particular design in the future.  I think we should remove it until
>> >> > there's a good reason for it to exist.
>> >> >
>> >> >>
>> >> >> > Or we could take abort off IDBRequest and instead put a rollback
>> >> >> > on
>> >> >> > transactions (and not do the modify limitation).
>> >> >>
>> >> >> I 

RE: [IndexedDB] Current editor's draft

2010-07-14 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Wednesday, July 14, 2010 12:10 AM

On Wed, Jul 14, 2010 at 3:52 AM, Pablo Castro  

From: [] On 
Behalf Of Andrei Popescu
Sent: Monday, July 12, 2010 5:23 AM

>> >> Dynamic transactions:
>> >> I see that most folks would like to see these going away. While I like 
>> >> the predictability and simplifications that we're able to make by using 
>> >> static scopes for transactions, I worry that we'll close the door for two 
>> >> scenarios: background tasks and query processors. Background tasks such 
>> >> as synchronization and post-processing of content would seem to be almost 
>> >> impossible with the static scope approach, mostly due to the granularity 
>> >> of the scope specification (whole stores). Are we okay with saying that 
>> >> you can't for example sync something in the background (e.g. in a worker) 
>> >> while your app is still working? Am I missing something that would enable 
>> >> this class of scenarios? Query processors are also tricky because you 
>> >> usually take the query specification in some form after the transaction 
>> >> started (especially if you want to execute multiple queries with later 
>> >> queries depending on the outcome of the previous ones). The background 
>> >> tasks issue in particular looks pretty painful to me if we don't have a 
>> >> way to achieve it without freezing the application while it happens.

>> Well, the application should never freeze in terms of the UI locking up, but 
>> in what you described I could see it taking a while for data to show up on 
>> the screen.  This is something that can be fixed by doing smaller updates on 
>> the background thread, sending a message to the background thread that it 
>> should abort for now, doing all database access on the background thread, 
>> etc.

This is an issue regardless, isn't it? Let's say you have a worker churning on 
the database somehow. The worker has no UI or user to wait for, so it'll run in 
a tight loop at full speed. If it splits the work in small transactions, in 
cases where it doesn't have to wait for something external there will still be 
a small gap between transactions. That could easily starve the UI thread that 
needs to find an opportunity to get in and do a quick thing against the 
database. As you say the difference between freezing and locking up at this 
point is not that critical, as the end user in the end is just waiting.

>> One point that I never saw made in the thread that I think is really 
>> important is that dynamic transactions can make concurrency worse in some 
>> cases.  For example, with dynamic transactions you can get into live-lock 
>> situations.  Also, using Pablo's example, you could easily get into a 
>> situation where the long running transaction on the worker keeps hitting 
>> serialization issues and thus it's never able to make progress.

While it could certainly happen, I don't remember seeing something like a 
live-lock in a long, long time. Deadlocks are common, but a simple timeout will 
kill one of the transactions and let the other make progress. A bit violent, 
but always effective. 

>> I do see that there are use cases where having dynamic transactions would be 
>> much nicer, but the amount of non-determinism they add (including to 
>> performance) has me pretty worried.  I pretty firmly believe we should look 
>> into adding them in v2 and remove them for now.  If we do leave them in, it 
>> should definitely be in its own method to make it quite clear that the 
>> semantics are more complex.
Let's explore a bit more and see where we land. I'm not pushing for dynamic 
transactions themselves, but more for the scenarios they enable (background 
processing and such). If we find other ways of doing that, then all the better. 
Having different entry points is reasonable.

>> >> Nested transactions:
>> >> Not sure why we're considering this an advanced scenario. To be clear 
>> >> about what the feature means to me: make it legal to start a transaction 
>> >> when one is already in progress, and the nested one is effectively a 
>> >> no-op, just refcounts the transaction, so you need equal amounts of 
>> >> commit()'s, implicit or explicit, and an abort() cancels all nested 
>> >> transactions. The purpose of this is to allow composition, where a piece 
>> >> of code that needs a transaction can start one locally, independe

RE: [IndexedDB] Current editor's draft

2010-07-14 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Wednesday, July 14, 2010 12:07 AM

>> > Dynamic transactions:
>> > I see that most folks would like to see these going away. While I like the 
>> > predictability and simplifications that we're able to make by using static 
>> > scopes for transactions, I worry that we'll close the door for two 
>> > scenarios: background tasks and query processors. Background tasks such as 
>> > synchronization and post-processing of content would seem to be almost 
>> > impossible with the static scope approach, mostly due to the granularity 
>> > of the scope specification (whole stores). Are we okay with saying that 
>> > you can't for example sync something in the background (e.g. in a worker) 
>> > while your app is still working? Am I missing something that would enable 
>> > this class of scenarios? Query processors are also tricky because you 
>> > usually take the query specification in some form after the transaction 
>> > started (especially if you want to execute multiple queries with later 
>> > queries depending on the outcome of the previous ones). The background 
>> > tasks issue in particular looks pretty painful to me if we don't have a 
>> > way to achieve it without freezing the application while it happens.

>> I don't understand enough of the details here to be able to make a
>> decision. The use cases you are bringing up I definitely agree are
>> important, but I would love to look at even a rough draft of what code
>> you are expecting people will need to write.

I'll try and hack up and example. In general any scenario that has a worker and 
the UI thread working on the same database will be quite a challenge, because 
the worker will have to a) split the work in small pieces, even if it was 
naturally a bigger chunk and b) consider interleaving implications with the UI 
thread, otherwise even when split in chunks you're not guaranteed that one of 
the two will starve the other one (the worker running on a tight loop will 
effectively always have an active transaction, it'll be just changing the 
actual transaction from time to time). This can certainly happen with dynamic 
transactions as well, the only difference is that since the locking granularity 
is different, it may be that what you're working on in the worker and in the UI 
threads is independent enough that they don't interfere too much, allowing for 
some more concurrency.

>> What I suggest is that we keep dynamic transactions in the spec for
>> now, but separate the API from static transactions, start a separate
>> thread and try to hammer out the details and see what we arrive at. I
>> do want to clarify that I don't think dynamic transactions are
>> particularly hard to implement, I just suspect they are hard to use
>> correctly.

Sounds reasonable.

>> > Implicit commit:
>> > Does this really work? I need to play with sample app code more, it may 
>> > just be that I'm old-fashioned. For example, if I'm downloading a bunch of 
>> > data form somewhere and pushing rows into the store within a transaction, 
>> > wouldn't it be reasonable to do the whole thing in a transaction? In that 
>> > case I'm likely to have to unwind while I wait for the next callback from 
>> > XmlHttpRequest with the next chunk of data.

>> You definitely want to do it in a transaction. In our proposal there
>> is no way to even call .get or .put if you aren't inside a
>> transaction. For the case you are describing, you'd download the data
>> using XMLHttpRequest first. Once the data has been downloaded you
>> start a transaction, parse the data, and make the desired
>> modifications. Once that is done the transaction is automatically
>> committed.
>> The idea here is to avoid keeping transactions open for long periods
>> of time, while at the same time making the API easier to work with.
>> I'm very concerned that any API that requires people to do:
>> startOperation();
>>... do lots of stuff here ...
>> endOperation();
>> people will forget to do the endOperation call. This is especially
>> true if the startOperation/endOperation calls are spread out over
>> multiple different asynchronously called functions, which seems to be
>> the use case you're concerned about above. One very easy way to
>> "forget" to call endOperation is if something inbetween the two
>> function calls throw an exception.

Fair enough, maybe I need to think of this scenario differently, and if someone 
needs to download a bunch of data and then put it in the database atomically 
the right way is to download to work tables first over a long time and 
independent transactions, and then use a transaction only to move the data 
around into its final spot.

>> This will likely be extra bad for transactions where no write
>> operations are done. In this case failure to call a 'commit()'
>> function won't result in any broken behavior. The transaction will
>> just sit open for a long time and eventually "rolled back", though

RE: [IndexedDB] Current editor's draft

2010-07-13 Thread Pablo Castro

From: [] On 
Behalf Of Andrei Popescu
Sent: Monday, July 12, 2010 5:23 AM

Sorry I disappeared for a while. Catching up with this discussion was an 
interesting exercise...there is no particular message in this thread I can 
respond to, so I thought I'd just reply to the last one. Overall I think the 
new proposal is shaping up well and is being effective in simplifying 
scenarios. I do have a few suggestions and questions for things I'm not sure I 
see all the way.

READ_ONLY vs READ_WRITE as defaults for transactions:
To be perfectly honest, I think this discussion went really deep over an issue 
that won't be a huge deal for most people. My perspective, trying to avoid 
performance or usage frequency speculation, is around what's easier to detect. 
Concurrency issues are hard to see. On the other hand, whenever we can throw an 
exception and give explicit guidance that unblocks people right away. For this 
case I suspect it's best to default to READ_ONLY, because if someone doesn't 
read or think about it and just uses the stuff and tries to change something 
they'll get a clear error message saying "if you want to change stuff, use 
READ_WRITE please". The error is not data- or context-dependent, so it'll fail 
on first try at most once per developer and once they fix it they'll know for 
all future cases.

Dynamic transactions:
I see that most folks would like to see these going away. While I like the 
predictability and simplifications that we're able to make by using static 
scopes for transactions, I worry that we'll close the door for two scenarios: 
background tasks and query processors. Background tasks such as synchronization 
and post-processing of content would seem to be almost impossible with the 
static scope approach, mostly due to the granularity of the scope specification 
(whole stores). Are we okay with saying that you can't for example sync 
something in the background (e.g. in a worker) while your app is still working? 
Am I missing something that would enable this class of scenarios? Query 
processors are also tricky because you usually take the query specification in 
some form after the transaction started (especially if you want to execute 
multiple queries with later queries depending on the outcome of the previous 
ones). The background tasks issue in particular looks pretty painful to me if 
we don't have a way to achieve it without freezing the application while it 

Implicit commit:
Does this really work? I need to play with sample app code more, it may just be 
that I'm old-fashioned. For example, if I'm downloading a bunch of data form 
somewhere and pushing rows into the store within a transaction, wouldn't it be 
reasonable to do the whole thing in a transaction? In that case I'm likely to 
have to unwind while I wait for the next callback from XmlHttpRequest with the 
next chunk of data. I understand that avoiding it results in nicer patterns 
(e.g. db.objectStores("foo").get(123).onsuccess = ...), but in practice I'm not 
sure if that will hold given that you still need error callbacks and such.

Nested transactions:
Not sure why we're considering this an advanced scenario. To be clear about 
what the feature means to me: make it legal to start a transaction when one is 
already in progress, and the nested one is effectively a no-op, just refcounts 
the transaction, so you need equal amounts of commit()'s, implicit or explicit, 
and an abort() cancels all nested transactions. The purpose of this is to allow 
composition, where a piece of code that needs a transaction can start one 
locally, independently of whether the caller had already one going.

Schema versioning:
It's unfortunate that we need to have explicit elements in the page for the 
versioning protocol to work, but the fact that we can have a reliable mechanism 
for pages to coordinate a version bump is really nice. For folks that don't 
know about this the first time they build it, an explicit error message on the 
schema change timeout can explain where to start. I do think that there may be 
a need for non-breaking changes to the schema to happen without a "version 
dance". For example, query processors regularly create temporary tables during 
sorts and such. Those shouldn't require any coordination (maybe we allow 
non-versioned additions, or we just introduce temporary, unnamed tables that 
evaporate on commit() or database close()...).


RE: [IndexedDB] Cursors and modifications

2010-07-02 Thread Pablo Castro

From: [] On 
Behalf Of Jonas Sicking
Sent: Friday, July 02, 2010 4:00 PM

>> We ran into an complicated issue while implementing IndexedDB. In short, 
>> what should happen if an object store is modified while a cursor is 
>> iterating it? >> Note that the modification can be done within the same 
>> transaction, so the read/write locks preventing several transactions from 
>> accessing the same table isn't helping here.
>> Detailed problem description (this assumes the API proposed by mozilla):
>> Consider a objectStore "words" containing the following objects:
>> { name: "alpha" }
>> { name: "bravo" }
>> { name: "charlie" }
>> { name: "delta" }
>> and the following program (db is a previously opened IDBDatabase):
>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var result = 
>> []; trans.objectStore("words").openCursor().onsuccess = function(e) {
>>   cursor = e.result;
>>   result.push(cursor.value);
>>   cursor.continue();
>> }
>> trans.objectStore("words").get("delta").onsuccess = function(e) {
>>   trans.objectStore("words").put({ name: "delta", myModifiedValue: 17 }); }
>> When the cursor reads the "delta" entry, will it see the 'myModifiedValue' 
>> property? Since we so far has defined that the callback order is defined to 
>> be >> the request order, that means that put request will be finished before 
>> the "delta" entry is iterated by the cursor.
>> The problem is even more serious with cursors that iterate indexes.
>> Here a modification can even affect the position of the currently iterated 
>> object in the index, and the modification can (if i'm reading the spec 
>> correctly) >> come from the cursor itself.
>> Consider the following objectStore "people" with keyPath "name"
>> containing the following objects:
>> { name: "Adam", count: 30 }
>> { name: "Bertil", count: 31 }
>> { name: "Cesar", count: 32 }
>> { name: "David", count: 33 }
>> { name: "Erik", count: 35 }
>> and an index "countIndex" with keyPath "count". What would the following 
>> code do?
>> results = [];
>> db.objectStore("people",
>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess = function (e) {
>>   cursor = e.result;
>>   if (!cursor) {
>> alert(results);
>> return;
>>   }
>>   if ( == "Bertil") {
>> cursor.update({name: "Bertil", count: 34 });
>>   }
>>   results.push(;
>>   cursor.continue();
>> };
>> What does this alert? Would it alert "Adam,Bertil,Erik" as the cursor would 
>> stay on the "Bertil" object as it is moved in the index? Or would it alert 
>> "Adam,Bertil,Cesar,David,Bertil,Erik" as we would iterate "Bertil" again at 
>> its new position in the index?

My first reaction is that both from the expected behavior of perspective 
(transaction is the scope of isolation) and from the implementation perspective 
it would be better to see live changes if they happened in the same transaction 
as the cursor (over a store or index). So in your example you would iterate one 
of the rows twice. Maintaining order and membership stable would mean creating 
another scope of isolation within the transaction, which to me would be unusual 
and it would be probably quite painful to implement without spilling a copy of 
the records to disk (at least a copy of the keys/order if you don't care about 
protecting from changes that don't affect membership/order; some databases call 
these keyset cursors).

>> We could say that cursors always iterate snapshots, however this introduces 
>> MVCC. Though it seems to me that SNAPSHOT_READ already does that.

Actually, even with MVCC you'd see your own changes, because they happen in the 
same transaction so the buffer pool will use the same version of the page. 
While it may be possible to reuse the MVCC infrastructure, it would still 
require the introduction of a second scope for stability. 

>> We could also say that cursors iterate live data though that can be pretty 
>> confusing and forces the implementation to deal with entries being added and 
>> >> removed during iteration, and it'd be tricky to define all edge cases.

Would this be any different from the implementation perspective than dealing 
with changes that happen through other transactions once they are committed? 
Typically at least in non-MVCC systems committed changes that are "further 
ahead" in a cursor scan end up showing up even when the cursor was opened 
before the other transaction committed.

>> It's certainly debatable how much of a problem any of these edgecases are 
>> for users. Note that all of this is only an issue if you modify and read 
>> from the >> same records *in the same transaction*. I can't think of a case 
>> where it isn't trivial to avoid these problems by separating things into 
>> separate transactions. >> However it'd be nice to avoid creating foot-guns 
>> for people to play with (think of the childre

RE: [IndexedDB] Multi-value keys

2010-06-18 Thread Pablo Castro
+1 on composite keys in general. The alternative to the proposal below would be 
to have the actual key path specification include multiple members (e.g. 
db.createObjectStore("foo", ["a", "b"])). I like the proposal below as well, I 
just wonder if having the key path specification (that's external to the 
object) indicate which members are keys would be less invasive for scenarios 
where you already have javascript objects you're getting from a web service or 
something and want to store them "as is". 


From: [] On 
Behalf Of Jonas Sicking
Sent: Friday, June 18, 2010 4:08 PM

Hi All,

One thing that (if I'm reading the spec correctly) is currently impossible is 
to create multi-valued keys. Consider for example an object store containing 
objects like:

{ firstName: "Sven", lastName: "Svensson", age: 57 } { firstName: "Benny", 
lastName: "Andersson", age: 63 } { firstName: "Benny", lastName: "Bedrup", age: 
9 }

It is easy to create an index which lets you quickly find everyone with a given 
firstName or a given lastName. However it doesn't seem possible to create an 
index that finds everyone with a given firstName
*and* lastName, or sort the list of people based on firstName and then lastName.

The best thing you could do is to concatenate the firstname and lastname and 
insert a ascii-null character in between and then use that as a key in the 
index. However this doesn't work if firstName or lastName can contain null 
characters. Also, if you want to be able to sort by firstName and then age 
there is no good way to put all the information into a single string while 
having sorting work.

Generally the way this is done in SQL is that you can create an index on 
multiple columns. That way each row has multiple values as the key, and sorting 
is first done on the first value, then the second, then the third etc.

However since we don't really have columns we can't use that exact solution. 
Instead, the way we could allow multiple values is to add an additional type as 
keys: Arrays.

That way you can use ["Sven",  57], ["Benny", 63] and ["Benny", 9] as keys for 
the respective objects above. This would allow sorting and searching on 
firstName and age.

The way that array keys would be compared is that we'd first compare the first 
item in both arrays. If they are different the arrays are ordered the same way 
as the two first-values are order. If they are the same you look at the second 
value and so on. If you reach the end of one array before finding a difference 
then that array is sorted before the other.

We'd also have to define the order if an array is compared to a non-array 
value. It doesn't really matter what we say here, but I propose that we put all 
array after all non-arrays.

Note that I don't think we need to allow arrays to contain arrays.
That just seems to add complication without adding additional functionality.

Let me know what you think.

/ Jonas

RE: Seeking pre-LCWD comments for Indexed Database API; deadline February 2

2010-06-14 Thread Pablo Castro

From: Jonas Sicking [] 
Sent: Friday, June 11, 2010 3:20 PM

>> >> >> So there is a real likelyhood of a browser implementation that 
>> >> >> will predate it's associated JS engine's upgrade to ES5? 
>> >> >> Feeling a "concern" isn't really much of technical argument on 
>> >> >> it's own, and designing for outdated technology is a poor approach.
>> >> I don't think there is, just wanted to avoid imposing it. If you 
>> >> think it's really important then let's change it back to delete 
>> >> assuming other folks are good with it.
>> >> I had the same concerns Pablo did, but I don't feel strongly 
>> >> either way.

Besides the maneuvering we'll have to do on the C++ side of things to avoid 
clashes with language keywords, the question is whether we expect plugins and 
such to add support for IndexedDB in existing browsers that don't do ES5. For 

>> Before we close on this, let me validate one more thing independently 
>> of the JS version. Are we going to have trouble when trying to expose 
>> these interfaces in C++? Not sure about other compilers and IDL 
>> processing tools, but I'm playing around with Visual Studio 2010 and 
>> while the COM IDL compiler will take "delete" as an interface member, 
>> my C++ compiler really doesn't like it. As far as I know there is no 
>> standard syntax to indicate that a symbol wasn't meant to be a 
>> keyword in C++, so having "delete" (or other C++ keywords for that 
>> matter) would be problematic. Am I missing something?
> Good point.  Does anyone have a strong opinion on how much we should 
> care about reserved word conflicts in language other than JavaScript?  
> it seems like a slippery slope.
> As an example, "IDBDatabase.description" is actually used by the 
> ObjectiveC base object class and so this caused some problems 
> initially.  We worked around it by having the ObjectiveC bindings 
> generator add a suffix whenever an attribute named "description" is 
> hit.  (Something similar was done for "hash" and "id" in other APIs.) 
> To be honest, I hadn't even considered bringing this up and asking for 
> it to be changed, but if we're going to avoid delete because it's a 
> reserved word in JavaScript (pre v5) and/or because it's a reserved 
> word in C++, perhaps we should consider changing description as well?

>> We've had to do this a few times in the past already. One example was 
>> Window.postMessage where we couldn't use the name "PostMessage" in C++ 
>> because it was a predefined macro on some platform (windows iirc, not to 
>> point fingers ;) ).


>> We developed a similar trick where we can indicate in the IDL that different 
>> names are used for scripted languages and for compiled languages.

>> So all in all I believe this problem can be overcome. I prefer to focus on 
>> making the JS API be the best it can be, and let other languages take a back 
>> seat. As long as it's solvable without too much of an issue (such as large 
>> performance penalties) in other languages.

I agree we can sort this out and certainly limitations on the implementation 
language shouldn't surface here. The issue is more whether folks care about a 
C++ binding (or some other language with a similar issue) where we'll have to 
have a different name for this method.

Even though I've been bringing this up I'm ok with keeping delete(), I just 
want to make sure we understand all the implications that come with that.


RE: Seeking pre-LCWD comments for Indexed Database API; deadline February 2

2010-06-11 Thread Pablo Castro

From: [] On Behalf Of Jeremy Orlow
Sent: Friday, June 11, 2010 3:20 AM
Subject: Re: Seeking pre-LCWD comments for Indexed Database API; deadline 
February 2

On Fri, Jun 11, 2010 at 1:54 AM, Pablo Castro  

From: Kris Zyp []
Sent: Thursday, June 10, 2010 4:38 PM
Subject: Re: Seeking pre-LCWD comments for Indexed Database API; deadline 
February 2

>> >> So there is a real likelyhood of a browser implementation that will
>> >> predate it's associated JS engine's upgrade to ES5? Feeling a
>> >> "concern" isn't really much of technical argument on it's own, and
>> >> designing for outdated technology is a poor approach.
>> I don't think there is, just wanted to avoid imposing it. If you think it's 
>> really important then let's change it back to delete assuming other folks 
>> are good with it.

>> I had the same concerns Pablo did, but I don't feel strongly either way.

Before we close on this, let me validate one more thing independently of the JS 
version. Are we going to have trouble when trying to expose these interfaces in 
C++? Not sure about other compilers and IDL processing tools, but I'm playing 
around with Visual Studio 2010 and while the COM IDL compiler will take 
"delete" as an interface member, my C++ compiler really doesn't like it. As far 
as I know there is no standard syntax to indicate that a symbol wasn't meant to 
be a keyword in C++, so having "delete" (or other C++ keywords for that matter) 
would be problematic. Am I missing something?


RE: Seeking pre-LCWD comments for Indexed Database API; deadline February 2

2010-06-10 Thread Pablo Castro

From: Kris Zyp [] 
Sent: Thursday, June 10, 2010 4:38 PM
Subject: Re: Seeking pre-LCWD comments for Indexed Database API; deadline 
February 2

>> On 6/10/2010 4:15 PM, Pablo Castro wrote:
>> >
>> >>> From:
>> >>> [] On Behalf Of Kris Zyp
>> >>> Sent: Thursday, June 10, 2010 9:49 AM Subject: Re: Seeking
>> >>> pre-LCWD comments for Indexed Database API; deadline February
>> >>> 2
>> >
>> >>> I see that in the trunk version of the spec [1] that delete()
>> >>> was changed to remove(). I thought we had established that
>> >>> there is no reason to make this change. Is anyone seriously
>> >>> expecting to have an implementation prior to or without ES5's
>> >>> contextually unreserved keywords? I would greatly prefer
>> >>> delete(), as it is much more consistent with standard DB and
>> >>> REST terminology.
>> >
>> > My concern is that it seems like taking an unnecessary risk. I
>> > understand the familiarity aspect (and I like delete() better as
>> > well), but to me that's not a strong enough reason to use it and
>> > potentially cause trouble in some browser.
>> >
>> So there is a real likelyhood of a browser implementation that will
>> predate it's associated JS engine's upgrade to ES5? Feeling a
>> "concern" isn't really much of technical argument on it's own, and
>> designing for outdated technology is a poor approach.

I don't think there is, just wanted to avoid imposing it. If you think it's 
really important then let's change it back to delete assuming other folks are 
good with it.


RE: Seeking pre-LCWD comments for Indexed Database API; deadline February 2

2010-06-10 Thread Pablo Castro

>> From: [] 
>> On Behalf Of Kris Zyp
>> Sent: Thursday, June 10, 2010 9:49 AM
>> Subject: Re: Seeking pre-LCWD comments for Indexed Database API; deadline 
>> February 2

>> I see that in the trunk version of the spec [1] that delete() was
>> changed to remove(). I thought we had established that there is no
>> reason to make this change. Is anyone seriously expecting to have an
>> implementation prior to or without ES5's contextually unreserved
>> keywords? I would greatly prefer delete(), as it is much more
>> consistent with standard DB and REST terminology.

My concern is that it seems like taking an unnecessary risk. I understand the 
familiarity aspect (and I like delete() better as well), but to me that's not a 
strong enough reason to use it and potentially cause trouble in some browser.


RE: [IndexedDB] Event on commits (WAS: Proposal for async API changes)

2010-06-10 Thread Pablo Castro

From: [] On 
Behalf Of Jonas Sicking
Sent: Thursday, June 10, 2010 1:27 PM
Subject: Re: [IndexedDB] Event on commits (WAS: Proposal for async API changes)

>> >> >>> One of the things that will entail is a by-sequence index for all
>> >> >>> the
>> >> >>> changes in a give "database" (in my case a database will be scoped
>> >> >>> to
>> >> >>> more than one ObjectStore). In order to accomplish this I'll need
>> >> >>> to
>> >> >>> keep the last known sequence around so that each new write can
>> >> >>> create
>> >> >>> a new entry in the by-sequence index. The problem is that if
>> >> >>> another
>> >> >>> tab/window writes to the database it'll increment that sequence and
>> >> >>> I
>> >> >>> won't be notified so I would have to start every transaction with a
>> >> >>> check on the sequence index for the last sequence which seems like
>> >> >>> a
>> >> >>> lot of extra cursor calls.
>> >> >>
>> >> >> It would be a lot of extra calls, but I'm a bit hesitant to add much
>> >> >> more
>> >> >> API surface area to v1, and the fall back plan doesn't seem too
>> >> >> unreasonable.
>> >> >>
>> >> >>>
>> >> >>> What I really need is an event listener on an ObjectStore that
>> >> >>> fires
>> >> >>> after a transaction is committed to the store but before the next
>> >> >>> transaction is run that gives me information about the commits to
>> >> >>> the
>> >> >>> ObjectStore.
>> >> >>>
>> >> >>> Thoughts?
>> >> >>
>> >> >> To do this, we could specify an
>> >> >> IndexedDatabaseRequest.ontransactioncommitted event that would
>> >> >> be guaranteed to fire after every commit and before we started the
>> >> >> next
>> >> >> transaction.  I think that'd meet your needs and not add too much
>> >> >> additional
>> >> >> surface area...  What do others think?
>> >> >
>> >> > It sounds reasonable but, to clarify, it seems to me that
>> >> > 'ontransactioncommitted' can only be guaranteed to fire after every
>> >> > commit and before the next transaction starts in the current window.
>> >> > Other transactions may have already started in other windows.
>> >>
>> >> We could technically enforce that other transactions won't be allowed
>> >> to start until the event has fired in all windows that has the
>> >> database open.
>> >
>> > Sure, but I can't think of any reason you'd want such semantics.  Can
>> > you?
>> I'm not entirely sure what the requirements are, so not sure.
>> If the requirement is that you are always notified about changes to a
>> table before those changes start affecting reads, so that you can keep
>> some separate information in sync, then we need to block further
>> transactions until the event has been fired in all relevant windows.
> This would only make sense if all the oncommit handlers were started in
> their own transaction so that you could at least read data.  Otherwise all
> you know is that something changed--so you wouldn't really have much to go
> on for the goal of "keep[ing] some separate information in sync".  Or you'd
> then have to schedule a transaction which wouldn't necessarily run before
> other stuff is updated which means there was no point for us to block
> transactions on everyone being notified anyway.  The only reason I can think
> of why it'd matter is if your app was doing synchronization via other means
> as well, but I can't immediately think of any places where waiting on all to
> be notified would save you, even then.
>> Possibly it would be ok to allow windows that has already received the
>> transaction to start reading the updated data though. That should make
>> this have virtually no performance impact.
> We should think VERY carefully about anything that has a perf impact.  But
> what I originally suggested should have a small one at worst, I would think.
>> But yes, we should definitely figure out what the actual requirements are.
> Agreed.
>> >> Either way though, I'm wondering how this relates to the fact that you
>> >> can (in our proposal, I'm unclear what the current draft allows) have
>> >> several writing transactions at the same time, as long as they operate
>> >> on different tables. Here another transaction might have already
>> >> started by the time another transaction is committed. Be that in this
>> >> window or another.
>> >
>> > That's only true of dynamic transactions.
>> That isn't true in at least our proposal (again, I'm unclear what the
>> current draft allows). In our proposal you can have multiple static
>> write transactions in progress at the same time. As long as they don't
>> overlap in which objectStores they use.
> I was assuming the sequence number would be stored in a single objectStore.

>Ah, I see what you mean. Good point.

>Mikeal, could you describe in detail how you were planning on using this event.

We should drill more into the actual requirements. I would be really weary of 
introducing constructs that require cross-process coo

RE: [IndexDB] Collation Algorithm?

2010-06-10 Thread Pablo Castro

>> From: [] 
>> On Behalf Of Mikeal Rogers
>> Sent: Wednesday, June 09, 2010 2:42 PM
>> Subject: [IndexDB] Collation Algorithm?

>> One of the things I noticed that seems to be missing from the IndexDB
>> specification is the collation algorithm used for sorting the index
>> keys.

>> There are lots of collation differences between databases, if left
>> unspecified I'm afraid this would negatively affect interoperability
>> between IndexDB implementations.

>> CouchDB has a good collation specification for rich keys (any JSON
>> type) and defers to the Unicode Collation Algorithm once it hits
>> string comparisons. This might be a good starting point.



>> -Mikeal

We've touched on this in the past but haven't closed on a plan. I agree that 
this needs to be specified. I suspect that this will mean we'll have to take a 
collation name at some level (database, index) if we want to allow apps to get 
proper order for strings for different languages. 

I filed a bug to make sure we track this.


RE: Can IndexedDB depend on JavaScript? (WAS: [Bug 9793] New: Allow dates and floating point numbers in keys)

2010-06-03 Thread Pablo Castro

From: Jeremy Orlow
Sent: Tuesday, May 25, 2010 6:54 AM

>> On Mon, May 24, 2010 at 9:21 PM, Jonas Sicking  wrote:
>> On Sat, May 22, 2010 at 3:58 AM, Jeremy Orlow  wrote:
>> > On Fri, May 21, 2010 at 11:42 PM,  wrote:
>> >>
>> >>
>> >>
>> >>           Summary: Allow dates and floating point numbers in keys
>> >>           Product: WebAppsWG
>> >>           Version: unspecified
>> >>          Platform: All
>> >>        OS/Version: All
>> >>            Status: NEW
>> >>          Severity: normal
>> >>          Priority: P2
>> >>         Component: Indexed Database API
>> >>        AssignedTo:
>> >>        ReportedBy:
>> >>         QAContact:
>> >>                CC:,
>> >>
>> >>
>> >> Currently the spec requires the values referenced by the key path to be
>> >> integers or strings. I strongly believe that we should also allow dates
>> >> and
>> >> floating point numbers (am I missing any other important types?). While
>> >> dates
>> >> and floating point numbers alone are not good for a primary key, they are
>> >> important for non-unique indexes and as part of a composite key, allowing
>> >> for
>> >> things such as scanning in temporal order.
>> >>
>> >> This is the change I'd like to propose:
>> >>
>> >> Section "3.1.1 Keys" of the currently published draft reads:
>> >>
>> >> -
>> >> In order to efficiently retrieve records stored in an indexed database, a
>> >> user
>> >> agent needs to organize each record by its key. Conforming user agents
>> >> must
>> >> support the use of values of IDL data types [WEBIDL] DOMString and long as
>> >> well
>> >> as the value null as keys.
>> >>
>> >> For purposes of comparison, a DOMString key is always evaluated higher
>> >> than any
>> >> long key. Moreover, null always evaluates lower than any DOMString or long
>> >> key.
>> >> -
>> >>
>> >> New proposed text:
>> >>
>> >> -
>> >> In order to efficiently retrieve records stored in an indexed database, a
>> >> user
>> >> agent needs to organize each record by its key. Conforming user agents
>> >> must
>> >> support the use of values of IDL data types [WEBIDL] DOMString, long,
>> >> float,
>> >> and the Date JavaScript object
>> >
>> > We really need to decide, once and for all, whether or not IndexedDB is
>> > going to be tied to JavaScript or not.  The two major reasons to do so are
>> > the lack of date in WebIDL and keyPath.
>> > KeyPath may be tricky to spec in a way that would work for any language
>> > without cutting out a lot of flexibility.  In order to keep what we're
>> > speccing sane, it will probably need to be a pretty small subset of what's
>> > possible in JavaScript and thus even browsers will likely need to roll 
>> > their
>> > own parser and such to support it.  (If we do decide to depend on
>> > JavaScript, it should enable some really neat things with the keyPath as
>> > well.)
>> > The HTML spec defines its own date type, but does not specify sort order at
>> > all.  I started a thread on this a bit ago (subject: "[IndexedDB/WebIDL]
>> > Dates + Sorting (WAS: Detailed comments for the current draft)") but it 
>> > only
>> > got one response [3].
>> Note that a Date type for WebIDL doesn't really affect things a whole
>> lot for the interfaces in IndexedDB though. The relevant functions all
>> take 'any' as type though, so we'll still have to describe in prose
>> what types are permitted. I don't think this makes IndexedDB depend on
>> javascript though.

Closing the loop on this one. Now that we agreed to add some language to WebIDL 
for the Date type [1], should we go ahead and make this change to the spec? I 
can ask Eliot to do it so we can close this one if folks feel it makes sense.



RE: [IndexedDB] Proposal for async API changes

2010-05-20 Thread Pablo Castro
(still catching up on the rest of the long thread of API changes, will get back 
to that a bit later)

From: [] On 
Behalf Of Jeremy Orlow
Sent: Thursday, May 20, 2010 3:34 PM

>> >> On Thu, May 20, 2010 at 11:25 PM, Shawn Wilsher  
>> >> wrote:
>> >> On 5/20/2010 7:34 AM, Shawn Wilsher wrote:
>> >> So far it's really just that joins are painful in IndexedDB. I'm working
>> >> on a blog post on this very topic though, and I'll be sure to point
>> >> everyone in this thread to it (I figure this is useful stuff to get out
>> >> to a wider audience).
>> >> And honestly, I thought that we had discussed joins on this list, but I 
>> >> only see a thread from Pablo mentioning it, but no real discussions. 
>> >> Should we start that?

>> Joins were actually in the original spec but taken out during the effort to 
>> simply the API greatly.  IIRC, the main reason why Nikunj took them out is 
>> that we believed you could fairly efficiently join yourself if you had 2 
>> sorted lists and because we didn't see a simple way to do them without 
>> introducing a lot of API surface area or creating (or borrowing) some sort 
>> of syntax for the joins.  (Now that I think about it, though, maybe doing 
>> this is not that big of a leap from what we're going to need to do to spec 
>> keyPaths.  I'm starting to wonder if we need to rethink that as well)

>> Anyway, the decision was made so long ago that maybe it's worth re-opening 
>> the discussion.  I'll hunt through my mail archives tomorrow and start a new 
>> thread with references to any original bits of info I can find.

My main concern with joins, besides API surface, was that in order to implement 
joins you need to choose an actual strategy. Depending on whether you have 
indexes or not and other circumstances you could choose to do range 
scans/lookups, a merge join, etc. So at least for fancier libraries this would 
only be of partial help, as they would probably want to do their own joins 

I'm happy to explore again though. It's certainly the case that for simpler 
cases it might help users pull off tasks without depending on a library. I do 
wonder if we should try and land the async API first.



[IndexedDB] Interaction between transactions and objects that allow multiple operations

2010-05-04 Thread Pablo Castro
The interaction between transactions and objects that allow multiple operations 
is giving us trouble. I need to elaborate a little to explain the problem.

You can perform operations in IndexedDB with or without an explicitly started 
transaction. When no transaction is present, you get an implicit one that is 
there for the duration of the operation and is committed and the end (or 
rolled-back if an error occurs).

There are a number of operations in IndexedDB that are a single step. For 
example, store.put() occurs either entirely in the current transaction (if the 
user started one explicitly) or in an implicit transaction if there isn't one 
active at the time the operation starts. The interaction between the operation 
and transactions is straightforward in this case.

On the other hand, other operations in IndexedDB return an object that then 
allows multiple operations on it. For example, when you open a cursor over a 
store, you can then move to the next row, update a row, delete a row, etc. The 
question is, what is the interaction between these operations and transactions? 
Are all interactions with a given cursor supposed to happen within the 
transaction that was active (implicit or explicit) when the cursor was opened? 
Or should each interaction happen in its own transaction (unless there is a 
long-lived active transaction, of course)?

We have a few options:
a) make multi-step objects bound to the transaction that was present when the 
object is first created (or an implicit one if none was present). This requires 
new APIs to mark cursors and such as "done" so implicit transactions can 
commit/abort, and has issues around use of the database object while a cursor 
with an implicit transaction is open.

b) make each interaction happen in its own transaction (explicit or implicit). 
This is quite unusual and means you'll get inconsistent reads from row to row 
while scanning unless you wrap cursor/index scans on transactions. It also 
probably poses interesting implementation challenges depending on what you're 
using as your storage engine.

c) require an explicit transaction always, along the lines Nikunj's original 
proposal had it. We would move most methods from database to transaction 
(except a few properties such as version and such, which it may still be ok to 
handle implicitly from the transactions perspective). This eliminates this 
whole problem altogether at the cost of an extra step required always.

We would prefer to go with option c) and always require explicit transactions. 


RE: [IndexedDB] Dynamic Transactions (WAS: Lots of small nits and clarifying questions)

2010-04-22 Thread Pablo Castro
On Apr 21, 2010, 11:18 PM Nikunj Mehta wrote:

On Apr 21, 2010, at 5:11 PM, Jeremy Orlow wrote:

On Mon, Apr 19, 2010 at 11:44 PM, Nikunj Mehta  wrote:

On Mar 15, 2010, at 10:45 AM, Jeremy Orlow wrote:

On Mon, Mar 15, 2010 at 3:14 PM, Jeremy Orlow  wrote:
On Sat, Mar 13, 2010 at 9:02 AM, Nikunj Mehta  wrote:
On Feb 18, 2010, at 9:08 AM, Jeremy Orlow wrote:
>> 2) In the spec, dynamic transactions and the difference between static and 
>> dynamic are not very well explained.
>> Can you propose spec text?
>> In 3.1.8 of in the first 
>> paragraph, adding a sentence would probably be good enough.  "If the scope 
>> is dynamic, the transaction may use any object stores or indexes in the 
>> database, but if another transaction touches any of the resources in a 
>> manner that could not be serialized by the implementation, a RECOVERABLE_ERR 
>> exception will be thrown on commit." maybe?
>> By the way, are there strong use cases for Dynamic transactions?  The more 
>> that I think about them, the more out of place they seem.
>> Dynamic transactions are in common place use in server applications. It 
>> follows naturally that client applications would want to use them. 
>> There are a LOT of things that are common place in server applications that 
>> are not in v1 of IndexedDB.
>> Consider the use case where you want to view records in entityStore A, 
>> while, at the same time, modifying another entityStore B using the records 
>> in entityStore A. Unless you use dynamic transactions, you will not be able 
>> to perform the two together.
>>...unless you plan ahead.  The only thing dynamic transactions buy you is not 
>>needing to plan ahead about using resources.
>> The dynamic transaction case is particularly important when dealing with 
>> asynchronous update processing while keeping the UI updated with data.

I strongly agree that dynamic transactions are important. Funnily enough we 
were considering proposing the other extreme, and drop all the static modes in 
favor of dynamic. This is not only about being able to transport server-service 
code to the client, but more in general about supporting modes of operation 
where the complete set of objects you'll use in a transaction is dependent upon 
things you'll only find out as you process the transaction; this includes the 
particular case where your application will make decisions based on data on the 
same database, so there is no way to plan ahead short of locking the whole 

>> 1) Treat Dynamic transactions as "lock everything".
>> This is not consistent with the spec behavior. Locking everything is the 
>> static global scope.
>> I don't understand what you're trying to say in the second sentence.  And I 
>> don't understand how this is inconsistent with spec behavior--it's simply 
>> more conservative.

Of my main concerns around being overly conservative, and with the static 
locking model in general, is its impact on concurrency. While the client 
scenarios of IndexedDB don't have the same pressure for concurrency as server 
databases, things like synchronization and other background processing tasks do 
need a based level of concurrency to operate in a user-friendly way. 

>> 2) Implement MVCC so that dynamic transactions can operate on 
>> a consistent view of data.  (At times, we'll know a transaction is doomed 
>> long before commit, but we'll need to let it keep running since only 
>> .commit() can raise the proper error.)
>> MVCC is not required for dynamic transactions. MVCC is only required to open 
>> a database in the DETACHED_READ mode.
>> Since locks are acquired in the order in which they are requested, a failure 
>> could occur when an object store is being opened, but it is locked by 
>> another transaction. One doesn't have to wait until commit is invoked.
>> Am I missing something here?
>> If we really expect UAs to implement MVCC (or something else along those 
>> lines), I would expect other more advanced transaction concepts to be 
>> exposed. 
>> What precisely are you referring to? Why are these other more advanced 
>> transaction concepts required?
>> If we expect most v1 implementations to just use objectStore locks and thus 
>>use option 1, then is there any reason to include Dynamic transactions?
>> Why do you conclude that most implementations just use object store locks?

We were actually favoring use of the dynamic pattern. Note that other than the 
failure mode (which is a separate discussion we should have), you can do 
dynamic using regular locks instead of versioning if you follow the two-phase 
protocol[1]; that still results in a serializable schedule, although not with 
point-in-time consistency.

More in general, I'm a bit worried about the number of options around 
transactions. I understand the goal of creating an "error free" model where 
once you succeed at starting a transaction you know you won't

RE: [IndexedDB] Lots of small nits and clarifying questions

2010-03-30 Thread Pablo Castro
On Tue, March 30, 2010 at 2:53 AM, Jeremy Orlow wrote:

>> On Tue, Mar 30, 2010 at 9:10 AM, Pablo Castro  
>> wrote:
>> Sorry for having disappeared for a while, "odata" was keeping me busy. I 
>> agree with all the clarifications listed in this thread that are required, 
>> so I won't redundantly mark each with "same here", but I have a few comments 
>> on one or two of them below.

>> On Mon, Mar 15, 2010 at 8:14 AM, Jeremy Orlow wrote:

>> On Sat, Mar 13, 2010 at 9:02 AM, Nikunj Mehta  wrote:
>> Thanks for your patience. Most questions below don't seem to need new spec 
>> text.

>> On Feb 18, 2010, at 9:08 AM, Jeremy Orlow wrote:

>> >> 1) Structured clone is going to change over time.  And, realistically, 
>> >> UAs won't support every type right away anyway.  What do we do when a 
>> >> value is inserted that we do not support?

>> >> We will evolve the text as and when the same evolves in WebStorage.

>> >> I don't know of any implementations which have moved away from only 
>> >> allowing strings within WebStorage.  I suspect that not 
>> >> fully supporting the structured clone algorithm as specced is one of the 
>> >> reasons for this.

>> >> As far as I can tell, you're essentially saying that fully supporting the 
>> >> the structured clone algorithm a pre-req for IndexedDB?  I guess I can't 
>> >> argue too much with that, but I'm not sure how realistic it is.  I know 
>> >> we only half support it at the moment in Chromium.
I have the same worry about structured's right in principle but I 
can't see implementations converging and that will just hurt interoperability. 
Unfortunately there doesn't seem to be a well-known middle-ground. JSON is way 
too restrictive (e.g. no Date). Should we consider defining a subset of 
structured clones that work (maybe something like Javascript primitives plus 
Date plus whatever extra we feel we should include such as perhaps File 

>> There is some precedent for what you suggest: the spec for LocalStorage 
>> already specifies that storing ImageData isn't allowed.  
>> ( see setItem 
>> section.)

>> On the other hand, I'm not sure I like the idea of each API supporting 
>> different subsets of the structured clone algorithm.  Even if all UAs 
>> support the same subset for each API, it still seems fairly confusing to web 
>> developers.  And I'm guessing that UAs won't be to keen on adding more 
>> complex control flow to their structured clone implementations to disallow 
>> different parts of the algorithm based on what it's using.  Thus any specced 
>> subset of the algorithm will probably need to be a MAY not a MUST.

>> I still think we should spec an error to be returned when the UA doesn't 
>> fully support the structured clone algorithm and thus can't handle the data 
>> provided.  I agree it's sub-optimal, but I think it's the pragmatic choice.  
>> Especially if the structured clone algorithm ever changes (and thus 
>> implementations can fall out of compliance with the spec).

I agree with that concern, but I also worry that we'll end up with UAs 
implementing different subsets and then developers having to settle for the 
minimum common denominator or doing a bunch of guess work. May be we use 
structured clone but have some non-normative text that recommends reasonable 
subset that we can agree are something we can all implement consistently?


RE: [IndexedDB] Promises (WAS: Seeking pre-LCWD comments for Indexed Database API; deadline February 2)

2010-03-30 Thread Pablo Castro
On Fri, Mar 12, 2010 at 7:26 AM, Jeremy Orlow wrote:

On Fri, Mar 12, 2010 at 3:23 PM, Jeremy Orlow  wrote:
On Fri, Mar 12, 2010 at 3:04 PM, Kris Zyp  wrote:

>> I believe computer science has clearly
>> observed the fragility of passing callbacks to the initial function
>> since it conflates the concerns of the operation with the asynchronous
>> notifications and consequently greatly complicates composability.

>> I don't understand this sentence.  I'm pretty sure that you can wrap any 
>> callback based API in JavaScript with a promised, differed, etc based API.  
>> As >> Nikunj mentioned earlier, we're more concerned about creating a small 
>> API surface area and sticking with well understood API designs rather than 
>> >> eliminating the need for libraries that wrap IndexedDB.
Trying to digest this thread, I think we've sort of gone full-circle with the 
whole promises thing. When looking at the code with the chained "then" pattern 
I just love the result, but it seems that we can't get all the way there (and 
nesting instead of chaining stuff kind of lacks the magic). My take is that 
either we get the really nice pattern by going all the way or we create a more 
traditional callback/events-based API and then we build promises on top. Things 
seem to indicate that frameworks are still cooking on promises, so it may be 
safe to stay with callbacks/events and just build libraries on top (I would 
have loved to have this be the thing that saved us from needing a library 
always...but it seems we'll fall just a bit short).

As for callbacks versus events, while now I'm starting to get used to the 
events hooked up to the result object after the call, the callbacks may be a 
more natural mechanism for this particular usage. I'm not sure why this is 
fundamentally broken...would love to see examples or reference. If that's the 
case, then events are the obvious choice.


RE: [IndexedDB] Lots of small nits and clarifying questions

2010-03-30 Thread Pablo Castro
Sorry for having disappeared for a while, "odata" was keeping me busy. I agree 
with all the clarifications listed in this thread that are required, so I won't 
redundantly mark each with "same here", but I have a few comments on one or two 
of them below. 

On Mon, Mar 15, 2010 at 8:14 AM, Jeremy Orlow wrote:

On Sat, Mar 13, 2010 at 9:02 AM, Nikunj Mehta  wrote:
Thanks for your patience. Most questions below don't seem to need new spec text.

On Feb 18, 2010, at 9:08 AM, Jeremy Orlow wrote:

>> 6) The specific ordering of elements should probably be specced including a 
>> mix of types.
>> Can you propose spec text for this? What do you think about the text 
>> in
>> If we're only adding long long for v1, then I think language similar to 
>> what's there now is probably OK.  But now that I think about it, I'm a bit 
>> concerned that we might be backing ourselves into a corner for the future.  
>> I also noticed that the sort order of JavaScript seems to order it numbers, 
>> strings, and then nulls (not strings, numbers, nulls).

>> I wonder if there is some other spec on sort order we can cite rather than 
>> rolling our own.

I really think that just doing long/strings won't do, even for v1. For 
non-primary-key indexes we'll need at least Date and number (not just integers) 
in addition to long/string. Without that there is no ordering by "date sent" 
for emails or "list price" for products or lots of other scenarios where you're 
caching data coming from a server.

>> 2) What happens when data mutates while you're iterating via a cursor?
>> This is covered by
>> That applies to two separate transactions.  As far as I can tell, it should 
>> be possible to have a cursor open and then delete an element that the cursor 
>> is currently traversing all within the same transaction.  Am I missing 
>> something?
I was assuming that within the same transaction you could change rows and those 
changes would be observable from open cursors. If it happens to be the current 
row then you won't be able to fetch it anymore but you can still move to the 
next one and continue scanning (and seeing any new changes that happened since 
you last moved).

>> 1) Structured clone is going to change over time.  And, realistically, UAs 
>> won't support every type right away anyway.  What do we do when a value is 
>> inserted that we do not support?

>> We will evolve the text as and when the same evolves in WebStorage.

>> I don't know of any implementations which have moved away from only allowing 
>> strings within WebStorage.  I suspect that not fully supporting the 
>> structured clone algorithm as specced is one of the reasons for this.

>> As far as I can tell, you're essentially saying that fully supporting the 
>> the structured clone algorithm a pre-req for IndexedDB?  I guess I can't 
>> argue too much with that, but I'm not sure how realistic it is.  I know we 
>> only half support it at the moment in Chromium.

I have the same worry about structured's right in principle but I 
can't see implementations converging and that will just hurt interoperability. 
Unfortunately there doesn't seem to be a well-known middle-ground. JSON is way 
too restrictive (e.g. no Date). Should we consider defining a subset of 
structured clones that work (maybe something like Javascript primitives plus 
Date plus whatever extra we feel we should include such as perhaps File 


RE: [IndexedDB] Detailed comments for the current draft

2010-02-02 Thread Pablo Castro

On Mon, Feb 1, 2010 at 1:30 AM, Jeremy Orlow  wrote:

> > > 1. Keys and sorting

> > > a.       3.1.1:  it would seem that having also date/time values as keys 
> > > would be important and it's a common sorting criteria (e.g. as part of a 
> > > composite primary key or in general as an index key).

> > The Web IDL spec does not support a Date/Time data type. Could your use 
> > case be supported by storing the underlying time with millisecond precision 
> > using an IDL long long type? I am willing to change the spec so that it 
> > allows long long instead of long IDL type, which will provide adequate 
> > support for Date and time sorting.

> Can the spec not be augmented?  It seems like other specs like WebGL have 
> created their own types.  If not, I suppose your suggested change would 
> suffice as well.  This does seem like an important use case.
I agree, either we could augment the spec or we could describe it in terms of 
Javascript object values. That is, we can say something specific about the 
treatment of Javascript's Date object. Would that be possible? E.g. we could 
require implementations to provide full order for dates if they find an 
instance of that type in a path.

> > > b.      3.1.1: similarly, sorting on number in general (not just 
> > > integers/longs) would be important (e.g. price lists, scores, etc.)

> > I am once again hampered by Web IDL spec. Is it possible to leave this for 
> > future versions of the spec?

Actually Web IDL does define the "double" type and its Javascript binding. Can 
we add double to the list of types an index can be applied to?

> > > c.       3.1.1: cross type sorting and sorting of long values are clear. 
> > > Sorting of strings however needs more elaboration. In particular, which 
> > > collation do we use? Does the user or developer get to choose a 
> > > collation? If we pick up a collation from the environment (e.g. the OS), 
> > > if the collation changes we'd have to re-index all the databases.

> > I propose to use Unicode collation algorithm, which was also suggested by 
> > Jonas during a conversation.

I don't think this is specific enough, in that it still doesn't say which 
collation tables to use and how to specify them. A single collation strategy 
won't do for all languages (it'll range from slightly wrong to nonsense 
depending on the target language). This is a trickier area than I had 
initialize thought. We'll bake on this a bit and get back to this group with 

> > > d.      3.1.3: spec reads ".key path must be the name of an enumerated 
> > > property."; how about composite keys (would make the related APIs take a 
> > > DOMString or DOMStringList)

> > I prefer to leave composite keys to a future version.

I don't think we can get away with this. For indexes this is quite common (if 
anything else to have stable ordering when the prefix of the index has 
repeats). Once we have it for indexes the delta for having it for primary keys 
as well is pretty small (although I wouldn't oppose leaving out composite 
primary keys if that would help scope the feature).

> > > b.      Query processing libraries will need temporary stores, which need 
> > > temporary names. Should we introduce an API for the creation of temporary 
> > > stores with transaction lifetime and no name?

> > Firstly, I think we can leave this safely to a future version. Secondly, my 
> > suggestion would be to provide a parameter to the create call to indicate 
> > that an object store being created is a transient one, i.e., not backed by 
> > durable storage. They could be available across different transactions. If 
> > your intention is to not make these object stores unavailable across 
> > connections, then we can also offer a connection-specific transient object 
> > store.

> > In general, it requires us to introduce the notion of create params, which 
> > would simplify the evolution of the API. This is also similar to how 
> > Berkeley DB handles various options, not just those related to creation of 
> > a Berkeley "database".

Let's see how we progress on this one, and maybe revisit it a bit later. I'm 
worried about code that wants to do things such as a block-sort that needs to 
spill to disk, as it would have to either use some pattern or ask the user for 
temp table names.

> > > c.      It would be nice to have an estimate row count on each store. 
> > > This comes at an implementation and runtime cost. Strong opinions? 
> > > Lacking everything else, this would be the only statistic to base 
> > > decisions on for a query processor.

> > I believe we need to have a general way of estimating the number of records 
> > in a cursor once a key range has been specified. Kris Zyp also brings this 
> > up in a separate email. I am willing to add an estimateCount attribute to 
> > IDBCursor for this.

EstimateCount sounds good.

> > > d.      The draft does not touch on how applications would do optimistic 
> > > concurrency. A common way 

RE: Seeking pre-LCWD comments for Indexed Database API; deadline February 2

2010-02-01 Thread Pablo Castro
A few comments inline marked with [PC].

From: [] On 
Behalf Of Nikunj Mehta
Sent: Sunday, January 31, 2010 11:37 PM
To: Kris Zyp
Cc: Arthur Barstow; public-webapps
Subject: Re: Seeking pre-LCWD comments for Indexed Database API; deadline 
February 2

On Jan 27, 2010, at 1:46 PM, Kris Zyp wrote:

Hash: SHA1

A few comments I've been meaning to suggest:

* count on KeyRange - Previously I had asked if there would be a way
to get a count of the number of objects within a given key range. The
addition of the KeyRange interface seems to be a step towards that,
but the cursor generated with a KeyRange still only provides a "count"
property that returns "the total number of objects that share the
current key". There is still no way to determine how many objects are
within a range. Was the intent to make "count" return the number of
objects in a KeyRange and the wording is just not up to date?
Otherwise could we add such a count property (countForRange maybe, or
have a count and countForKey, I think Pablo suggested something like

I agree with the concept. I have doubts about implementation success. However, 
I will include this in the editor's draft.

[PC] I agree with Nikunj, I suspect that a implementations will have to just 
compute the count, as it's unlikely that updating intermediate nodes in the 
tree for each update would  be desired (to try to maintain extra information 
for fast range size computation). At that point it's almost the same as user 
code iterating over the range (modulo the Javascript interface overhead). I'm 
also not sure how often you'd use this, as it would only work on simple 
conditions (no composite expressions, no functions in expressions)  that happen 
to have an index.

* Use promises for async interfaces - In server side JavaScript, most
projects are moving towards using promises for asynchronous interfaces
instead of trying to define the specific callback parameters for each
interface. I believe the advantages of using promises over callbacks
are pretty well understood in terms of decoupling async semantics from
interface definitions, and improving encapsulation of concerns. For
the indexed database API this would mean that sync and async
interfaces could essentially look the same except sync would return
completed values and async would return promises. I realize that
defining a promise interface would have implications beyond the
indexed database API, as the goal of promises is to provide a
consistent interface for asynchronous interaction across components,
but perhaps this would be a good time for the W3C to define such an
API. It seems like the indexed database API would be a perfect
interface to leverage promises. If you are interested in proposal,
there is one from CommonJS here [1] (the get() and call() wouldn't
apply here). With this interface, a promise.then(callback,
errorHandler) function is the only function a promise would need to

Thanks for the pointer. I will look in to this as even Pablo had related 


and a comment on this:
On 1/26/2010 1:47 PM, Pablo Castro wrote:
> 11. API Names
 > a.   "transaction" is really non-intuitive (particularly given
> the existence of currentTransaction in the same class).
> "beginTransaction" would capture semantics more accurately. b.
> ObjectStoreSync.delete: delete is a Javascript keyword, can we use
> "remove" instead?
I'd prefer to keep both of these as is. Since commit and abort are
part of the transaction interface, using transaction() to denote the
transaction creator seems brief and appropriate. As far as
ObjectStoreSync.delete, most JS engines have or should be contextually
reserving "delete". I certainly prefer delete in preserving the
familiarity of REST terminology.
[PC] I understand the term familiarity aspect, but this seems to be something 
that would just cause trouble. From a quick check with the browsers I had at 
hand, both IE8 and Safari 4 reject scripts where you try to add a method called 
"delete" to an object's prototype. Natively-implemented objects may be able to 
work-around this but I see no reason to push it. remove()  is probably equally 
intuitive. Note that the method "continue" on async cursors are likely to have 
the same issue as continue is also a Javascript keyword.


- --
Kris Zyp
(503) 806-1841

Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla -



[IndexedDB] Detailed comments for the current draft

2010-01-26 Thread Pablo Castro
These are notes that we collected both from reviewing the spec (editor's draft 
up to Jan 24th) and from a prototype implementation that we are working on. I 
didn't realize we had this many notes, otherwise I would have been sending 
intermediate notes early. Will do so next round.

1. Keys and sorting

a.   3.1.1:  it would seem that having also date/time values as keys would 
be important and it's a common sorting criteria (e.g. as part of a composite 
primary key or in general as an index key).
b.  3.1.1: similarly, sorting on number in general (not just 
integers/longs) would be important (e.g. price lists, scores, etc.)
c.   3.1.1: cross type sorting and sorting of long values are clear. 
Sorting of strings however needs more elaboration. In particular, which 
collation do we use? Does the user or developer get to choose a collation? If 
we pick up a collation from the environment (e.g. the OS), if the collation 
changes we'd have to re-index all the databases.
d.  3.1.3: spec reads "…key path must be the name of an enumerated 
property…"; how about composite keys (would make the related APIs take a 
DOMString or DOMStringList) 

2. Values

a.   3.1.2: isn't the requirement for "structured clones" too much? It 
would mean implementations would have to be able to store and retrieve File 
objects and such. Would it be more appropriate to say it's just graphs of 
Javascript primitive objects/values (object, string, number, date, arrays, 

3. Object store

a.   3.1.3: do we really need in-line + out-of-line keys? Besides the 
concept-count increase, we wonder whether out-of-line keys would cause trouble 
to generic libraries, as the values for the keys wouldn't be part of the values 
iterated when doing a "foreach" over the table.
b.  Query processing libraries will need temporary stores, which need 
temporary names. Should we introduce an API for the creation of temporary 
stores with transaction lifetime and no name?
c.  It would be nice to have an estimate row count on each store. This 
comes at an implementation and runtime cost. Strong opinions? Lacking 
everything else, this would be the only statistic to base decisions on for a 
query processor. 
d.  The draft does not touch on how applications would do optimistic 
concurrency. A common way of doing this is to use a timestamp value that's 
automatically updated by the system every time someone touches the row. While 
we don't feel it's a must have, it certainly supports common scenarios.

4. Indexes

a.   3.1.4 mentions "auto-populated" indexes, but then there is no mention 
of other types. We suggest that we remove this and in the algorithms section 
describe side-effecting operations as always updating the indexes as well.
b.  If during insert/update the value of the key is not present (i.e. 
undefined as opposite to null or a value), is that a failure, does the row not 
get indexed, or is it indexed as null? Failure would probably cause a lot of 
trouble to users; the other two have correctness problems. An option is to 
index them as undefined, but now we have undefined and null as indexable keys. 
We lean toward this last option. 
5.   Databases
a.   Not being able to enumerate database gets in the way of creating good 
tools and frameworks such as database explorers. What was the motivation for 
this? Is it security related?
b.  Clarification on transactions: all database operations that affect the 
schema (create/remove store/index, setVersion, etc.) as well as data 
modification operations are assumed to be auto-commit by default, correct? 
Furthermore, all those operations (both schema and data) can happen within a 
transaction, including mixing schema and data changes. Does that line up with 
others' expectations? If so we should find a spot to articulate this explicitly.
c.   No way to delete a database? It would be reasonable for applications 
to want to do that and let go of the user data (e.g. a "forget me" feature in a 
web site)
6.   Transactions
a.   While we understand the goal of simplifying developers' life with an 
error-free transactional model, we're not sure if we're making more harm by 
introducing more concepts into this space. Wouldn't it be better to use regular 
transactions with a well-known failure mode (e.g. either deadlocks or 
optimistic concurrency failure on commit)?
b.If in auto-commit mode, if two cursors are opened at the same time (e.g. 
to scan them in an interleaved way), are they in independent transactions 
simultaneously active in the same connection?

7. Algorithms

a.   3.2.2: steps 4 and 5 are inverted in order.
b.  3.2.2: when there is a key generator and the store uses in-line keys, 
should the generated key value be propagated to the original object (in 
addition to the clone), such that both are in sync after the put operation?
c.   3.2.3: step 2, probably editorial mistake? Wouldn't all indexes have 

RE: IndexedDB and MVCC

2010-01-18 Thread Pablo Castro
Hi Chris,

> -Original Message-
> From: [mailto:public-webapps-
>] On Behalf Of Chris Anderson
> Sent: Friday, January 15, 2010 11:14 AM
> To: public-webapps WG
> Subject: IndexedDB and MVCC
> Hi,
> I've been reading the new IndexedDB spec as published here:
> My first impression is that this simpler than WebSimpleDB, but not too
> simple. I'm happy to see detached readers being mentioned.
> There's one other piece of the concurrency story that could be useful.
> In section 3.2.2 Object Store Storage steps
> step 7: If the no-overwrite flag was passed to these steps and is set,
> and a record already exists with its key being key, then terminate
> these steps and set error code CONSTRAINT_ERR.
> I think it wouldn't add much complexity to use a compare-and-swap
> pattern, instead of a no-write-if-exists pattern. This would allow for
> better concurrency via optimistic updates, and look a lot like HTTP
> etags.

Wouldn't these be different scenarios? The purpose of the flag is to help in 
scenarios where you don't want to automatically create an item, only update an 
existing one. What you're describing seems to be oriented towards the case 
where you're updating an existing item, have an optimistic concurrency token, 
and want to use it to check for conflicts before the update goes through. 

You definitely make a good point about the fact that the current document 
doesn't touch on how applications would handle optimistic concurrency. One way 
would be to build-in support for it (as you suggest, an optional path for the 
concurrency token, and perhaps also a timestamp sort of thing that gets 
automatically updated). Alternatively application code could do the 
check-and-update-or-fail deal within a transaction. 

> It could be accomplished by allowing an object store to take a
> key-path for the update-token. Then subsequent updates could require
> that the key-path match. (Some additional complexity: we'd need the
> ability to check for a matching update-token, then change it, in a
> transaction).
> CouchDB uses an MVCC token that must match to allow updates. This
> allows us to avoid locking. But even more important is the parallels
> we have with HTTP Etags (if-match for idempotence, if-none-match for
> caching).
> The CouchDB style of MVCC can be accomplished by updates in a
> compare-and-swap transaction, so technically I can do what I want in
> the spec as it stands. But I still think the parallels to HTTP etags
> can be instructive.

Out of curiosity: if you were to layer CouchDB on top of IndexedDB, would  you 
always just use the dynamic locking mode, or do you actually have use for the 
other options offered?

I ask because I'm seriously concerned that the extra modes will add to the 
overall concept count in an attempt to simplify the use of transactions, and 
don't really simplify the end to end.

> Chris
> --
> Chris Anderson


[WebSimpleDB] Introduce a pause/resume pattern for coordinated access to multiple stores

2009-12-22 Thread Pablo Castro
Whenever we take a callback that's to be called for each item in a set (e.g. 
with a .forEach(callback) pattern), we need a way to indicate the system 
whether it's ok to move to the next row and invoke the next callback or not. 
Otherwise, in scenarios where the callback itself performs an operation that 
doesn't finish immediately (such as another database async call) the system 
will keep queuing up top-level callbacks, which in turn may queue up more 
callbacks as part of its implementation, and execution will be in "some order" 
that's very hard to predict at best.

This comes up in several contexts. Applications will often need to scan more 
than one object store in coordination. Query processors will also need this 
when implementing physical operators for joins and such. A different context 
would be a system that needs to submit an HTTP request per row, where you may 
want to use an XmlHttpRequest and unwind after calling open. While the HTTP 
request is in flight you don't want to move to the next

In most cases one of the key aspects is that we need separate components to 
work cooperatively as they pull rows from one or multiple scans, and there 
needs to be a way of controlling the advance of cursors through the rows.

We would like to introduce "pause" and "resume" functions for scans to support 
this. Since there is no obvious place to put this right now, we could introduce 
an "iterator" object that can be used to control things related to the current 
state of the iteration as of when the callback happens, or maybe this is the 
cursor itself.

The resulting code would look like this (the example uses the 
single-async-level pattern we're playing around, but these two are actually 
independent things):

async_db.forEachObjectInStore("people", function(person, iteration) {
  iteration.pause(); // we won't be done with 'person' until later...
  var request = async_db.getFromStore("people", person.managerId);
  request.onsuccess = function() {
var manager = request.result;
// Do something with both 'person' and 'manager', and now we're ready to 
process the next person.

The nice thing about adding these as methods on the side is that it's 
completely out of sight in simple scenarios where you may be just scanning to 
build some HTML for example. Only if you're doing multiple coordinated, async 
tasks you need to know about these functions.


RE: [WebSimpleDB] Allowing schema operations anywhere

2009-12-22 Thread Pablo Castro
My apologies for my late reply, I've been out for a while.

> -Original Message-
> From: Nikunj R. Mehta []
> Sent: Friday, December 11, 2009 10:47 AM
> To: WG
> Cc: Pablo Castro
> Subject: Re: [WebSimpleDB] Allowing schema operations anywhere
> I have gone ahead and updated the spec to allow option B (only).
> Please take a look.

Option B makes sense, as without it there is a class of algorithms that cannot 
be implemented or it would be quite difficult to do so (e.g. a "sort" type of 
construct a query language might want to support wouldn't be possible without a 
backing index). 

This certainly means versioning becomes the responsibility of the app/library 
and not the user agent. This makes sense to me, given that not all schema 
changes are really version changes (e.g. creation of a spill-to-disk table 
shouldn't bump up the database version).


> Nikunj
> On Dec 8, 2009, at 10:14 AM, Nikunj R. Mehta wrote:
> > Hi Pablo,
> >
> > Sorry for the long delay in responding to your comments. Hopefully, we
> > can continue the discussion now.
> >
> > Schema changes interact with the locking model of the database. As I
> > see it, here are several ways in which the API could be designed and
> > the consequences of doing so:
> >
> > A. Allow schema changes inside a metadata transaction which can only
> > be performed at connection time B. Allow schema changes inside a data
> > transaction, which can be performed any time a connection is open C.
> > Allow schema changes inside a metadata transaction, which can be
> > performed any time a connection is open
> >
> > Option A's disadvantages are that metadata manipulation cannot be
> > combined with data changes. Moreover, version numbers are no longer
> > issued by the application but rather by a user agent.
> >
> > Option A's advantages are that resource acquisition is simplified and
> > deadlocks can be avoided considering that a connection acquires and
> > releases the metadata resource in a consistent sequence. Another
> > upside is that version number maintenance is automated.
> >
> > Option B's main disadvantage is that there is no real notion of
> > version that can be managed by the user agent. Another is that
> > deadlocks could occur because there is no a priori declaration of
> > intent about metadata modification. This could be remedied by
> > including the database itself in the list of objects that are intended
> > to be modified in the transaction.
> >
> > Option B's advantages are closer interleaving of and atomic metadata
> > changes with data changes, and application controlled version numbers
> > used for the database.
> >
> > Option C's disadvantage is that data and metadata changes cannot be
> > interleaved atomically.
> >
> > Option C's advantages are that deadlocks can be avoided and version
> > number management can be performed  by an application.
> >
> > Overall, I think version management and metadata changes are exclusive
> > in some sense. IOW, if we want Option B and Option C, then we have to
> > remove the connection time version check.
> >
> > Hope that helps. Please feel free to add if I missed anything.
> >
> > Nikunj
> >
> > On Nov 22, 2009, at 3:14 PM, Pablo Castro wrote:
> >
> >> We are finding a number of reasons for wanting to create tables on
> >> the fly, and without bumping up the database version. A few examples:
> >> - Packaged components that create side tables to maintain its own
> >> state
> >> - Query processors often need to "spill to disk" during query
> >> execution. For example, sorting large sets requires storing temporary
> >> sets of rows on disk to be merged later.
> >>
> >> So we're thinking it would be better to have these methods directly
> >> in the DatabaseSync/DatabaseAsync objects (with proper corresponding
> >> patterns), instead of their current location in the Upgrade
> >> interface.
> >>
> >> For the common case where several schema changes need to be done
> >> atomically, developers can simply wrap the calls in a transaction,
> >> and they would do for regular data manipulation.
> >>
> >> We would need an extra method to bump up the version explicitly, as
> >> that would no longer be in the upgrade callback.
> >>
> >> Does this seem reasonable?
> >>
> >> Regards,
> >> -pablo
> >>
> >>
> >
> > Nikunj
> >
> >
> >
> >
> >
> Nikunj

[WebSimpleDB] Allowing schema operations anywhere

2009-11-22 Thread Pablo Castro
We are finding a number of reasons for wanting to create tables on the fly, and 
without bumping up the database version. A few examples:
- Packaged components that create side tables to maintain its own state
- Query processors often need to "spill to disk" during query execution. For 
example, sorting large sets requires storing temporary sets of rows on disk to 
be merged later.

So we're thinking it would be better to have these methods directly in the 
DatabaseSync/DatabaseAsync objects (with proper corresponding patterns), 
instead of their current location in the Upgrade interface.

For the common case where several schema changes need to be done atomically, 
developers can simply wrap the calls in a transaction, and they would do for 
regular data manipulation.

We would need an extra method to bump up the version explicitly, as that would 
no longer be in the upgrade callback.

Does this seem reasonable?


[WebSimpleDB] Flatting APIs to simplify primary cases

2009-11-19 Thread Pablo Castro
We're busy creating experimental implementations of WebSimpleDB to both 
understand what it takes to implement and also to see what the developer 
experience looks like. 

As we started to write "application code" against the API (particularly the 
async one) the first thing that popped is the fact that you need two levels of 
nested callbacks for everything. While the current factoring of the API makes 
sense on the design board, it's kind of noisy in app code. For example:

// assume you already have a database opened in dbReq 
var html = ""; 
var storeReq = new ObjectStoreRequest(dbReq.database);
storeReq.success = function() {
var cursorReq = new CursorRequest(;
cursorReq.callback = function(key, cursor, value) {
html += "" + value.Name + "";
cursorReq.onsuccess = function(r) {
document.getElementById("output").innerHTML = html + "";

One option that we would like to explore is to "flatten" the API, so most 
common methods are straight in the database class. This trades off some of the 
factoring in favor of usability for common cases using the async API.

The change would span a couple of aspects:

1. Move operations from object store interface and the index interface into the 
Database interface.

Accessing indexes and stores through specialized objects is problematic for the 
following reasons:
- It's always the case that we need to consider when objects are invalidated 
because something changes from underneath them, for example a schema change. So 
for example, if there is an explicit store object, then when the store is 
dropped we need to consider what is valid/invalid and what its failure points 
and modes are. By not having a standalone store object, we significantly reduce 
the "gotchas" to consider.
- From a usability perspective, it's simpler to work with a store in a single 
step, rather than having to open it first and then work with it (see patterns 
below with a single request and one DBRequest object).
- With no "two-step" access pattern, the API has one less level of 
asynchronicity, as effectively the table lookup + operation are atomic within 
the store. This also consolidates all operations with an async variant in a 
single interface (the Database), which is a great simplification for 

var html = "";
var request = asyncDb.forEachStoreObject("contacts", function(row) {
   html += "" + row.Name + "";
request.onsuccess = function(r) {
  document.getElementById("output").innerHTML = html + ""; }

In moving the operations, it's probably best to rename them to something more 
descriptive, so we can have for example 'getFromStore(storeName, key)' and 
'getFromIndex(storeName, indexName, key)'. This also helps in that 'delete' 
won't collide with the Javascript keyword.

Note that the store and index interfaces are still around to provide metadata, 
but at this point they behave as simple read-only snapshots.

2. Generalize the use of DBRequest, add a 'result' member to it and have all 
asynchronous operations be initiated from a DatabaseAsync interface.

As a result of the previous changes, all operations that have an async 
counterpart should now exist on the DatabaseAsync interface. Rather than having 
multiple types of requests depending on the target object, it is possible to 
have operations on a DatabaseAsync interface that provide a uniform invocation 
and handling programming pattern.

This gives a nice pattern for understanding how a sync API maps to an async API.

So for example:

var record = db.getFromStore("store", key); // use record...


var request = asyncDb.getFromStore("store", key); request.onsuccess = 
function(req) {
  var record = req.result;
  // use record...

We could include more data in DBRequest or DBRequest.result as needed if in 
some cases a method produces more than just a simple result. Further 
specializatons of DBRequest (subtypes) are still possible in the future if we 
need to introduce special cases for specific operations.

Similarly, we would have something like asyncDb.forEachStoreObject() that 
queues a task to call a callback for each element in a store/index, potentially 
within a range if specified. The pattern scales well to all the other APIs 
present in db/store/index today.

If this seems like a good idea to folks, we'd be happy to write up a more 
complete version that articulates the tweaks across all the WebSimpleDB APIs to 
make this happen.


Web Data APIs

2009-10-31 Thread Pablo Castro
We've been looking at the web database space here at Microsoft, trying to 
understand scenarios and requirements. After assessing what was out there we 
are forming an opinion around this. I wanted to write to this group to share 
how we think about the space, what principles we try to apply, and to discuss 

The short story is that we believe Nikunj's WebSimpleDB proposal, which 
basically describes a minimum-bar web database API and enables a whole set of 
diverse options to be built on top, is the right thing to do.

During the last couple of weeks we have been talking with various folks from 
Mozilla and Oracle and iterating over details of the WebSimpleDB draft. In the 
process it has become clear that we all share the same high-level expectations 
on the scope and capabilities of this API, and Nikunj has been hard at work 
making changes to the draft to keep up with them. I'll touch on a few details 
below, but bear in mind that several of them are already in the process of 
being addressed.

We would love to hear feedback, requirements, specific application scenarios, 
etc. We want to make progress quickly and get experimental implementations 
going to ensure that as we explore we stay grounded, with things that are 

Guiding principles and why we think the ISAM style proposed in WebSimpleDB is a 
good idea
As we try to understand the problem space we formulated a couple of guiding 
- Get into the standard the key building blocks that are either impossible to 
build on top, or so common that would be very redundant to do so
- Focus on an API that is simple enough that can be reliably specifiable and 
that can be implemented to follow the spec in a relatively simple manner
We believe that WebSimpleDB sets the stage in this direction. An ISAM layer can 
be used directly or can be a building block for more elaborate layers that can 
be built entirely in Javascript on top. Also, ISAM is simple enough that can be 
specified in a way that should enable highly interoperable implementations.

Trimming down

There are a number of elements of WebSimpleDB that we can probably live 
without, at least for a first version, such as Queues and Sequences. This may 
help simplify the database API even further.

Also, there are a few simplifying assumptions we can make from the get-go. For 
example, that "paths" as informally mentioned in the spec only reference 
Javascript identifiers (perhaps with dot-notation) and when used for 
index/primary keys they point to Javascript primitive values and not to 


The word "Entity" has a lot of different meanings depending on who you talk to. 
It would be interesting to find a simpler term, perhaps something that matches 
the Javascript terminology better.

Areas where we need to dig deeper and have broader discussions to understand 

Isolation model and its implications in locking: Various isolation models lead 
to different failure modes; for example, regular locks mean that application 
code needs to be ready to deal with deadlocks, or in the case of 
multi-versioning you can see optimistic concurrency violation exceptions during 
commit. There is a tricky balance between not dictating too much from the 
implementation and ensuring that observable behavior across implementations 
really enables interoperability.

What's the sweet spot for the API?: is the primary use for this API to be 
directly consumed by application code? Or is it a building block to create 
various different libraries that present a diversity of styles for query 
formulation and execution? We lean to the side of making it an API that's great 
for libraries to build nice layers on top, but it's still useable directly in 
application code (along the lines of what happens with XmlHttpRequest, where 
most developers will actually use a wrapper that fits the particular 
scenario/library better).
