On Wed, Jun 9, 2010 at 11:39 AM, Laxmi Narsimha Rao Oruganti <laxmi.oruga...@microsoft.com> wrote: > Inline... > > -----Original Message----- > From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] On > Behalf Of Jonas Sicking > Sent: Wednesday, June 09, 2010 11:55 PM > To: Jeremy Orlow > Cc: Shawn Wilsher; Webapps WG > Subject: Re: [IndexDB] Proposal for async API changes > > On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow <jor...@chromium.org> wrote: >> On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking <jo...@sicking.cc> wrote: >>> >>> On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow <jor...@chromium.org> >>> wrote: >>> > I'm not sure I like the idea of offering sync cursors either since the >>> > UA >>> > will either need to load everything into memory before starting or risk >>> > blocking on disk IO for large data sets. Thus I'm not sure I support >>> > the >>> > idea of synchronous cursors. But, at the same time, I'm concerned about >>> > the >>> > overhead of firing one event per value with async cursors. Which is >>> > why I >>> > was suggesting an interface where the common case (the data is in >>> > memory) is >>> > done synchronously but the uncommon case (we'd block if we had to >>> > respond >>> > synchronously) has to be handled since we guarantee that the first time >>> > will >>> > be forced to be asynchronous. >>> > Like I said, I'm not super happy with what I proposed, but I think some >>> > hybrid async/sync interface is really what we need. Have you guys spent >>> > any >>> > time thinking about something like this? How dead-set are you on >>> > synchronous cursors? >>> >>> The idea is that synchronous cursors load all the required data into >>> memory, yes. I think it would help authors a lot to be able to load >>> small chunks of data into memory and read and write to it >>> synchronously. Dealing with asynchronous operations constantly is >>> certainly possible, but a bit of a pain for authors. >>> >>> I don't think we should obsess too much about not keeping things in >>> memory, we already have things like canvas and the DOM which adds up >>> to non-trivial amounts of memory. >>> >>> Just because data is loaded from a database doesn't mean it's huge. >>> >>> I do note that you're not as concerned about getAll(), which actually >>> have worse memory characteristics than synchronous cursors since you >>> need to create the full JS object graph in memory. >> >> I've been thinking about this off and on since the original proposal was >> made, and I just don't feel right about getAll() or synchronous cursors. >> You make some good points about there already being many ways to overwhelm >> ram with webAPIs, but is there any place we make it so easy? You're right >> that just because it's a database doesn't mean it needs to be huge, but >> often times they can get quite big. And if a developer doesn't spend time >> making sure they test their app with the upper ends of what users may >> possibly see, it just seems like this is a recipe for problems. >> Here's a concrete example: structured clone allows you to store image data. >> Lets say I'm building an image hosting site and that I cache all the images >> along with their thumbnails locally in an IndexedDB entity store. Lets say >> each thumbnail is a trivial amount, but each image is 1MB. I have an album >> with 1000 images. I do |var photos = albumIndex.getAllObjects(albumName);| >> and then iterate over that to get the thumbnails. But I've just loaded over >> 1GB of stuff into ram (assuming no additional inefficiency/blowup). I >> suppose it's possible JavaScript engines could build mechanisms to fetch >> this stuff lazily (like you could even with a synchronous cursor) but that >> will take time/effort and introduce lag in the page (while fetching >> additional info from disk). >> >> I'm not completely against the idea of getAll/sync cursors, but I do think >> they should be de-coupled from this proposed API. I would also suggest that >> we re-consider them only after at least one implementation has normal >> cursors working and there's been some experimentation with it. Until then, >> we're basing most of our arguments on intuition and assumptions. > > I'm not married to the concept of sync cursors. However I pretty > strongly feel that getAll is something we need. If we just allow > cursors for getting multiple results I think we'll see an extremely > common pattern of people using a cursor to loop through a result set > and put values into an array. > > Yes, it can be misused, but I don't see a reason why people wouldn't > misuse a cursor just as much. If they don't think about the fact that > a range contains lots of data when using getAll, why would they think > about it when using cursors? > > [Laxmi] Cursor is a streaming operator that means only the current row or > page is available in memory and the rest sits on the disk. As the program > moves the cursor thru the result, old pages are thrown away and new pages are > loaded from the result set. Whereas with getAll everything has to come to > memory before returning to the caller. If there is not enough memory to keep > the result all at a time, we would end up in out-of-memory. In short, getAll > suites well for small result/range, but not for big databases. That is, with > getAll we are expecting the people to think and where as with Cursors we > don't expect the people to think about the volume/size of the result.
I'm well aware of this. My argument is that I think we'll see people write code like this: results = []; db.objectStore("foo").openCursor(range).onsuccess = function(e) { var cursor = e.result; if (!cursor) { weAreDone(results); } results.push(cursor.value); cursor.continue(); } While the indexedDB implementation doesn't hold much data in memory at a time, the webpage will hold just as much as if we had had a getAll function. Thus we havn't actually improved anything, only forced the author to write more code. Put it another way: The raised concern is that people won't think about the fact that getAll can load a lot of data into memory. And the proposed solution is to remove the getAll function and tell people to use openCursor. However if they weren't thinking about that a lot of data will be in memory at one time, then why wouldn't they write code like the above? Which results as just as much data being in memory? / Jonas