Re: [IndexedDB] Cursors and modifications

Jonas Sicking Thu, 08 Jul 2010 12:43:43 -0700

On Mon, Jul 5, 2010 at 9:45 AM, Andrei Popescu <[email protected]> wrote:
> On Sat, Jul 3, 2010 at 2:09 AM, Jonas Sicking <[email protected]> wrote:
>> On Fri, Jul 2, 2010 at 5:44 PM, Andrei Popescu <[email protected]> wrote:
>>> On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <[email protected]> wrote:
>>>> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro <[email protected]> 
>>>> wrote:
>>>>>
>>>>> From: [email protected] 
>>>>> [mailto:[email protected]] On Behalf Of Jonas Sicking
>>>>> Sent: Friday, July 02, 2010 4:00 PM
>>>>>
>>>>>>> We ran into an complicated issue while implementing IndexedDB. In 
>>>>>>> short, what should happen if an object store is modified while a cursor 
>>>>>>> is iterating it? >> Note that the modification can be done within the 
>>>>>>> same transaction, so the read/write locks preventing several 
>>>>>>> transactions from accessing the same table isn't helping here.
>>>>>>>
>>>>>>> Detailed problem description (this assumes the API proposed by mozilla):
>>>>>>>
>>>>>>> Consider a objectStore "words" containing the following objects:
>>>>>>> { name: "alpha" }
>>>>>>> { name: "bravo" }
>>>>>>> { name: "charlie" }
>>>>>>> { name: "delta" }
>>>>>>>
>>>>>>> and the following program (db is a previously opened IDBDatabase):
>>>>>>>
>>>>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var 
>>>>>>> result = []; trans.objectStore("words").openCursor().onsuccess = 
>>>>>>> function(e) {
>>>>>>>   cursor = e.result;
>>>>>>>   result.push(cursor.value);
>>>>>>>   cursor.continue();
>>>>>>> }
>>>>>>> trans.objectStore("words").get("delta").onsuccess = function(e) {
>>>>>>>   trans.objectStore("words").put({ name: "delta", myModifiedValue: 17 
>>>>>>> }); }
>>>>>>>
>>>>>>> When the cursor reads the "delta" entry, will it see the 
>>>>>>> 'myModifiedValue' property? Since we so far has defined that the 
>>>>>>> callback order is defined to be >> the request order, that means that 
>>>>>>> put request will be finished before the "delta" entry is iterated by 
>>>>>>> the cursor.
>>>>>>>
>>>>>>> The problem is even more serious with cursors that iterate indexes.
>>>>>>> Here a modification can even affect the position of the currently 
>>>>>>> iterated object in the index, and the modification can (if i'm reading 
>>>>>>> the spec correctly) >> come from the cursor itself.
>>>>>>>
>>>>>>> Consider the following objectStore "people" with keyPath "name"
>>>>>>> containing the following objects:
>>>>>>>
>>>>>>> { name: "Adam", count: 30 }
>>>>>>> { name: "Bertil", count: 31 }
>>>>>>> { name: "Cesar", count: 32 }
>>>>>>> { name: "David", count: 33 }
>>>>>>> { name: "Erik", count: 35 }
>>>>>>>
>>>>>>> and an index "countIndex" with keyPath "count". What would the 
>>>>>>> following code do?
>>>>>>>
>>>>>>> results = [];
>>>>>>> db.objectStore("people",
>>>>>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess = function 
>>>>>>> (e) {
>>>>>>>   cursor = e.result;
>>>>>>>   if (!cursor) {
>>>>>>>     alert(results);
>>>>>>>     return;
>>>>>>>   }
>>>>>>>   if (cursor.value.name == "Bertil") {
>>>>>>>     cursor.update({name: "Bertil", count: 34 });
>>>>>>>   }
>>>>>>>   results.push(cursor.value.name);
>>>>>>>   cursor.continue();
>>>>>>> };
>>>>>>>
>>>>>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the cursor 
>>>>>>> would stay on the "Bertil" object as it is moved in the index? Or would 
>>>>>>> it alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would iterate 
>>>>>>> "Bertil" again at its new position in the index?
>>>>>
>>>>> My first reaction is that both from the expected behavior of perspective 
>>>>> (transaction is the scope of isolation) and from the implementation 
>>>>> perspective it would be better to see live changes if they happened in 
>>>>> the same transaction as the cursor (over a store or index). So in your 
>>>>> example you would iterate one of the rows twice. Maintaining order and 
>>>>> membership stable would mean creating another scope of isolation within 
>>>>> the transaction, which to me would be unusual and it would be probably 
>>>>> quite painful to implement without spilling a copy of the records to disk 
>>>>> (at least a copy of the keys/order if you don't care about protecting 
>>>>> from changes that don't affect membership/order; some databases call 
>>>>> these keyset cursors).
>>>>>
>>>>>>>
>>>>>>> We could say that cursors always iterate snapshots, however this 
>>>>>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ already does 
>>>>>>> that.
>>>>>
>>>>> Actually, even with MVCC you'd see your own changes, because they happen 
>>>>> in the same transaction so the buffer pool will use the same version of 
>>>>> the page. While it may be possible to reuse the MVCC infrastructure, it 
>>>>> would still require the introduction of a second scope for stability.
>>>>
>>>> It's quite implementable using append-only b-trees. Though it might be
>>>> much to ask that implementations are forced to use that.
>>>>
>>>> An alternative to what I suggested earlier is that all read operations
>>>> use "read committed". I.e. they always see the data as it looked at
>>>> the beginning of the transaction. Would this be more compatible with
>>>> existing MVCC implementations?
>>>>
>>>
>>> Hmm, so if you modified the object store and then, later in the same
>>> transaction, used a cursor to iterate the object store, the cursor
>>> would not see the earlier modifications? That's not very intiutive to
>>> me...or did I misunderstand?
>>
>> If we go with "read committed" then yes, your understanding is correct.
>>
>> Out of curiosity, how are you feeling about the "cursors iterate data
>> as it looked when cursor was opened" solution?
>>
>
> I feel that that's the easiest solution to specify although it may
> also be unintuitive if one calls 'put / update' and then expects to
> see the updated value once the cursor gets to the relevant object. My
> other concern was the one brought by Pablo, i.e. is it even more
> complex to implement another scope of isolation for the duration of
> the cursor?
>
>>>> I'd imagine this should be as easy to implement as SNAPSHOT_READ.
>>>>
>>>>>>> We could also say that cursors iterate live data though that can be 
>>>>>>> pretty confusing and forces the implementation to deal with entries 
>>>>>>> being added and >> removed during iteration, and it'd be tricky to 
>>>>>>> define all edge cases.
>>>>>
>>>>> Would this be any different from the implementation perspective than 
>>>>> dealing with changes that happen through other transactions once they are 
>>>>> committed? Typically at least in non-MVCC systems committed changes that 
>>>>> are "further ahead" in a cursor scan end up showing up even when the 
>>>>> cursor was opened before the other transaction committed.
>>>>
>>>> IndexedDB doesn't allow write transactions to a given store to start
>>>> while there are read transactions using that store, so that doesn't
>>>> seem to be a problem. Unless I'm misunderstanding something?
>>>>
>>>
>>> That was my understanding too: while a cursor was open, you can't
>>> allow a write transaction to start.
>>
>> Exactly. Though "while a read transaction was open, you can't allow a
>> write transaction to start" is even more correct.
>>
>>>>>>> It's certainly debatable how much of a problem any of these edgecases 
>>>>>>> are for users. Note that all of this is only an issue if you modify and 
>>>>>>> read from the >> same records *in the same transaction*. I can't think 
>>>>>>> of a case where it isn't trivial to avoid these problems by separating 
>>>>>>> things into separate transactions.
>>>>>>> However it'd be nice to avoid creating foot-guns for people to play 
>>>>>>> with (think of the children!).
>>>>>>>
>>>>>>> However we still need to define *something*. I would suggest that we 
>>>>>>> define that cursors iterate snapshots. It seems the cleanest for users 
>>>>>>> and easiest >> to define. And once implementations add MVCC support it 
>>>>>>> should be easy to implement. I think we've come up with a decent plan 
>>>>>>> for how to do implement it in sqlite even without proper MVCC, so it 
>>>>>>> should be doable even then.
>>>>>
>>>>> Besides the expectations aspects, I worry that doing this means that 
>>>>> opening a cursor means incurring in substantial cost for all cases (e.g. 
>>>>> creating a keyset or something like that).
>>>>
>>>> I agree we definitely don't want that. We are working on an
>>>> implementation which is backed by a SQL database and completely
>>>> incapable of MVCC, so it seems doable. However I don't yet know how
>>>> much of a complexity and performance penalty that carries.
>>>>
>>>
>>> On the other hand, as you said earlier, maybe allowing the live
>>> changes to be visible in the cursor is not such a big problem as apps
>>> can work around these edge cases?
>>
>> I suspect so. Though given that behavior can be very unintuitive, we
>> would have to define things to a very high level of detail to ensure
>> that the same unintuitive behavior is happening in all
>> implementations. Additionally, I suspect implementation can get quite
>> complex in order to deal with all edge cases, for example dealing with
>> data that was read ahead optimistically.
>>
>> We strategy for defining this is to define all mutating operations in
>> terms of which ones are insertions vs. updates vs. removals. Both in
>> the objectStore and in indexes. For example, does a IDBObjectStore.put
>> call that modifies an existing entry cause corresponding entries in
>> indexes to be updated or removed and reinserted? Both in the situation
>> when the put call causes index values to be changed and not changed.
>>
>> Then define how active cursors move around when an entry that they are
>> currently on is removed, or when entries before and after the current
>> entry are inserted. For all types of cursors; forwards, backwards and
>> no_duplicate. Also need to define this in situations when the cursor
>> is currently firmly in one position, or when a .continue() call has
>> been made, but the callback not yet has fired. Likewise when the
>> cursor has not yet fired its first callback and entries are inserted
>> in the beginning, and when a cursor has notified about the last entry,
>> and entries are added after the end.
>>
>> It's certainly possible, but no small feat.
>>
>
> Ok, let me try to come up with some wording for this in the spec. I
> think we should go for this solution for now and see what the
> implementation feedback is.


Actually, I had a somewhat different idea for how this should be
specified, which I think both would be easier to specify, as well as
implement.

When iterating using a cursor, always remember the key-value of the
last returned entry. Whenever .continue() is called, it goes to find
the next entry with a key-value bigger than the one last returned.
This way the cursor isn't affected if the current entry is removed,
we'll still remember and use the key-value that that entry had.

This is simple when iterating objectStores since keys are always
unique. However it's only marginally trickier when iterating indexes
which can contain duplicate key-values. As I suggested in bug 10058, I
think we should defined that duplicate index entries are ordered by
their key-value in the objectStore. So all a index-cursors needs to do
is to remember both the index key-value and the objectStore key-value
of the last returned entry. When .continue() is called it returns the
next entry with the same index key-value and larger objectStore
key-value if one exists, or the next entry with a larger index
key-value otherwise.

Simpler put: For objectStore cursors, the key remembered is the
objectStores key-value. For index cursors, the key remembered is the
<index-key-value, objectStore-key-value> tuple.

Let me know what you think.

/ Jonas

Re: [IndexedDB] Cursors and modifications

Reply via email to