On Mon, Jul 5, 2010 at 9:45 AM, Andrei Popescu <[email protected]> wrote: > On Sat, Jul 3, 2010 at 2:09 AM, Jonas Sicking <[email protected]> wrote: >> On Fri, Jul 2, 2010 at 5:44 PM, Andrei Popescu <[email protected]> wrote: >>> On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <[email protected]> wrote: >>>> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro <[email protected]> >>>> wrote: >>>>> >>>>> From: [email protected] >>>>> [mailto:[email protected]] On Behalf Of Jonas Sicking >>>>> Sent: Friday, July 02, 2010 4:00 PM >>>>> >>>>>>> We ran into an complicated issue while implementing IndexedDB. In >>>>>>> short, what should happen if an object store is modified while a cursor >>>>>>> is iterating it? >> Note that the modification can be done within the >>>>>>> same transaction, so the read/write locks preventing several >>>>>>> transactions from accessing the same table isn't helping here. >>>>>>> >>>>>>> Detailed problem description (this assumes the API proposed by mozilla): >>>>>>> >>>>>>> Consider a objectStore "words" containing the following objects: >>>>>>> { name: "alpha" } >>>>>>> { name: "bravo" } >>>>>>> { name: "charlie" } >>>>>>> { name: "delta" } >>>>>>> >>>>>>> and the following program (db is a previously opened IDBDatabase): >>>>>>> >>>>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var >>>>>>> result = []; trans.objectStore("words").openCursor().onsuccess = >>>>>>> function(e) { >>>>>>> cursor = e.result; >>>>>>> result.push(cursor.value); >>>>>>> cursor.continue(); >>>>>>> } >>>>>>> trans.objectStore("words").get("delta").onsuccess = function(e) { >>>>>>> trans.objectStore("words").put({ name: "delta", myModifiedValue: 17 >>>>>>> }); } >>>>>>> >>>>>>> When the cursor reads the "delta" entry, will it see the >>>>>>> 'myModifiedValue' property? Since we so far has defined that the >>>>>>> callback order is defined to be >> the request order, that means that >>>>>>> put request will be finished before the "delta" entry is iterated by >>>>>>> the cursor. >>>>>>> >>>>>>> The problem is even more serious with cursors that iterate indexes. >>>>>>> Here a modification can even affect the position of the currently >>>>>>> iterated object in the index, and the modification can (if i'm reading >>>>>>> the spec correctly) >> come from the cursor itself. >>>>>>> >>>>>>> Consider the following objectStore "people" with keyPath "name" >>>>>>> containing the following objects: >>>>>>> >>>>>>> { name: "Adam", count: 30 } >>>>>>> { name: "Bertil", count: 31 } >>>>>>> { name: "Cesar", count: 32 } >>>>>>> { name: "David", count: 33 } >>>>>>> { name: "Erik", count: 35 } >>>>>>> >>>>>>> and an index "countIndex" with keyPath "count". What would the >>>>>>> following code do? >>>>>>> >>>>>>> results = []; >>>>>>> db.objectStore("people", >>>>>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess = function >>>>>>> (e) { >>>>>>> cursor = e.result; >>>>>>> if (!cursor) { >>>>>>> alert(results); >>>>>>> return; >>>>>>> } >>>>>>> if (cursor.value.name == "Bertil") { >>>>>>> cursor.update({name: "Bertil", count: 34 }); >>>>>>> } >>>>>>> results.push(cursor.value.name); >>>>>>> cursor.continue(); >>>>>>> }; >>>>>>> >>>>>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the cursor >>>>>>> would stay on the "Bertil" object as it is moved in the index? Or would >>>>>>> it alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would iterate >>>>>>> "Bertil" again at its new position in the index? >>>>> >>>>> My first reaction is that both from the expected behavior of perspective >>>>> (transaction is the scope of isolation) and from the implementation >>>>> perspective it would be better to see live changes if they happened in >>>>> the same transaction as the cursor (over a store or index). So in your >>>>> example you would iterate one of the rows twice. Maintaining order and >>>>> membership stable would mean creating another scope of isolation within >>>>> the transaction, which to me would be unusual and it would be probably >>>>> quite painful to implement without spilling a copy of the records to disk >>>>> (at least a copy of the keys/order if you don't care about protecting >>>>> from changes that don't affect membership/order; some databases call >>>>> these keyset cursors). >>>>> >>>>>>> >>>>>>> We could say that cursors always iterate snapshots, however this >>>>>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ already does >>>>>>> that. >>>>> >>>>> Actually, even with MVCC you'd see your own changes, because they happen >>>>> in the same transaction so the buffer pool will use the same version of >>>>> the page. While it may be possible to reuse the MVCC infrastructure, it >>>>> would still require the introduction of a second scope for stability. >>>> >>>> It's quite implementable using append-only b-trees. Though it might be >>>> much to ask that implementations are forced to use that. >>>> >>>> An alternative to what I suggested earlier is that all read operations >>>> use "read committed". I.e. they always see the data as it looked at >>>> the beginning of the transaction. Would this be more compatible with >>>> existing MVCC implementations? >>>> >>> >>> Hmm, so if you modified the object store and then, later in the same >>> transaction, used a cursor to iterate the object store, the cursor >>> would not see the earlier modifications? That's not very intiutive to >>> me...or did I misunderstand? >> >> If we go with "read committed" then yes, your understanding is correct. >> >> Out of curiosity, how are you feeling about the "cursors iterate data >> as it looked when cursor was opened" solution? >> > > I feel that that's the easiest solution to specify although it may > also be unintuitive if one calls 'put / update' and then expects to > see the updated value once the cursor gets to the relevant object. My > other concern was the one brought by Pablo, i.e. is it even more > complex to implement another scope of isolation for the duration of > the cursor? > >>>> I'd imagine this should be as easy to implement as SNAPSHOT_READ. >>>> >>>>>>> We could also say that cursors iterate live data though that can be >>>>>>> pretty confusing and forces the implementation to deal with entries >>>>>>> being added and >> removed during iteration, and it'd be tricky to >>>>>>> define all edge cases. >>>>> >>>>> Would this be any different from the implementation perspective than >>>>> dealing with changes that happen through other transactions once they are >>>>> committed? Typically at least in non-MVCC systems committed changes that >>>>> are "further ahead" in a cursor scan end up showing up even when the >>>>> cursor was opened before the other transaction committed. >>>> >>>> IndexedDB doesn't allow write transactions to a given store to start >>>> while there are read transactions using that store, so that doesn't >>>> seem to be a problem. Unless I'm misunderstanding something? >>>> >>> >>> That was my understanding too: while a cursor was open, you can't >>> allow a write transaction to start. >> >> Exactly. Though "while a read transaction was open, you can't allow a >> write transaction to start" is even more correct. >> >>>>>>> It's certainly debatable how much of a problem any of these edgecases >>>>>>> are for users. Note that all of this is only an issue if you modify and >>>>>>> read from the >> same records *in the same transaction*. I can't think >>>>>>> of a case where it isn't trivial to avoid these problems by separating >>>>>>> things into separate transactions. >>>>>>> However it'd be nice to avoid creating foot-guns for people to play >>>>>>> with (think of the children!). >>>>>>> >>>>>>> However we still need to define *something*. I would suggest that we >>>>>>> define that cursors iterate snapshots. It seems the cleanest for users >>>>>>> and easiest >> to define. And once implementations add MVCC support it >>>>>>> should be easy to implement. I think we've come up with a decent plan >>>>>>> for how to do implement it in sqlite even without proper MVCC, so it >>>>>>> should be doable even then. >>>>> >>>>> Besides the expectations aspects, I worry that doing this means that >>>>> opening a cursor means incurring in substantial cost for all cases (e.g. >>>>> creating a keyset or something like that). >>>> >>>> I agree we definitely don't want that. We are working on an >>>> implementation which is backed by a SQL database and completely >>>> incapable of MVCC, so it seems doable. However I don't yet know how >>>> much of a complexity and performance penalty that carries. >>>> >>> >>> On the other hand, as you said earlier, maybe allowing the live >>> changes to be visible in the cursor is not such a big problem as apps >>> can work around these edge cases? >> >> I suspect so. Though given that behavior can be very unintuitive, we >> would have to define things to a very high level of detail to ensure >> that the same unintuitive behavior is happening in all >> implementations. Additionally, I suspect implementation can get quite >> complex in order to deal with all edge cases, for example dealing with >> data that was read ahead optimistically. >> >> We strategy for defining this is to define all mutating operations in >> terms of which ones are insertions vs. updates vs. removals. Both in >> the objectStore and in indexes. For example, does a IDBObjectStore.put >> call that modifies an existing entry cause corresponding entries in >> indexes to be updated or removed and reinserted? Both in the situation >> when the put call causes index values to be changed and not changed. >> >> Then define how active cursors move around when an entry that they are >> currently on is removed, or when entries before and after the current >> entry are inserted. For all types of cursors; forwards, backwards and >> no_duplicate. Also need to define this in situations when the cursor >> is currently firmly in one position, or when a .continue() call has >> been made, but the callback not yet has fired. Likewise when the >> cursor has not yet fired its first callback and entries are inserted >> in the beginning, and when a cursor has notified about the last entry, >> and entries are added after the end. >> >> It's certainly possible, but no small feat. >> > > Ok, let me try to come up with some wording for this in the spec. I > think we should go for this solution for now and see what the > implementation feedback is.
Actually, I had a somewhat different idea for how this should be specified, which I think both would be easier to specify, as well as implement. When iterating using a cursor, always remember the key-value of the last returned entry. Whenever .continue() is called, it goes to find the next entry with a key-value bigger than the one last returned. This way the cursor isn't affected if the current entry is removed, we'll still remember and use the key-value that that entry had. This is simple when iterating objectStores since keys are always unique. However it's only marginally trickier when iterating indexes which can contain duplicate key-values. As I suggested in bug 10058, I think we should defined that duplicate index entries are ordered by their key-value in the objectStore. So all a index-cursors needs to do is to remember both the index key-value and the objectStore key-value of the last returned entry. When .continue() is called it returns the next entry with the same index key-value and larger objectStore key-value if one exists, or the next entry with a larger index key-value otherwise. Simpler put: For objectStore cursors, the key remembered is the objectStores key-value. For index cursors, the key remembered is the <index-key-value, objectStore-key-value> tuple. Let me know what you think. / Jonas
