Re: [IndexDB] Proposal for async API changes
On Tue, Jun 15, 2010 at 5:44 PM, Nikunj Mehta nik...@o-micron.com wrote: (specifically answering out of context) On May 17, 2010, at 6:15 PM, Jonas Sicking wrote: 9. IDBKeyRanges are created using functions on IndexedDatabaseRequest. We couldn't figure out how the old API allowed you to create a range object without first having a range object. Hey Jonas, What was the problem in simply creating it like it is shown in examples? The API is intentionally designed that way to be able to use constants such as LEFT_BOUND and operations like only directly from the interface. For example, IDBKeyRange.LEFT_BOUND; // this should evaluate to 4 IDBKeyRange.only(a).left; // this should evaluate to a But in http://dvcs.w3.org/hg/IndexedDB/rev/fc747a407817 you added [NoInterfaceObject] to the IDBKeyRange interface. Does the above syntax still work? My understanding is that it doesn't anymore.. Thanks, Andrei
Re: [IndexDB] Proposal for async API changes
On Jun 22, 2010, at 12:44 AM, Andrei Popescu wrote: On Tue, Jun 15, 2010 at 5:44 PM, Nikunj Mehta nik...@o-micron.com wrote: (specifically answering out of context) On May 17, 2010, at 6:15 PM, Jonas Sicking wrote: 9. IDBKeyRanges are created using functions on IndexedDatabaseRequest. We couldn't figure out how the old API allowed you to create a range object without first having a range object. Hey Jonas, What was the problem in simply creating it like it is shown in examples? The API is intentionally designed that way to be able to use constants such as LEFT_BOUND and operations like only directly from the interface. For example, IDBKeyRange.LEFT_BOUND; // this should evaluate to 4 IDBKeyRange.only(a).left; // this should evaluate to a But in http://dvcs.w3.org/hg/IndexedDB/rev/fc747a407817 you added [NoInterfaceObject] to the IDBKeyRange interface. Does the above syntax still work? My understanding is that it doesn't anymore.. You are right. I will reverse that modifier. Nikunj
Re: [IndexDB] Proposal for async API changes
(specifically answering out of context) On May 17, 2010, at 6:15 PM, Jonas Sicking wrote: 9. IDBKeyRanges are created using functions on IndexedDatabaseRequest. We couldn't figure out how the old API allowed you to create a range object without first having a range object. Hey Jonas, What was the problem in simply creating it like it is shown in examples? The API is intentionally designed that way to be able to use constants such as LEFT_BOUND and operations like only directly from the interface. For example, IDBKeyRange.LEFT_BOUND; // this should evaluate to 4 IDBKeyRange.only(a).left; // this should evaluate to a Let me know if you need help with this IDL. Also, it might be a good idea to get the WebIDL experts involved in clarifying such questions rather than changing the API. Nikunj
Re: [IndexDB] Proposal for async API changes
On Tue, Jun 15, 2010 at 9:44 AM, Nikunj Mehta nik...@o-micron.com wrote: (specifically answering out of context) On May 17, 2010, at 6:15 PM, Jonas Sicking wrote: 9. IDBKeyRanges are created using functions on IndexedDatabaseRequest. We couldn't figure out how the old API allowed you to create a range object without first having a range object. Hey Jonas, What was the problem in simply creating it like it is shown in examples? The API is intentionally designed that way to be able to use constants such as LEFT_BOUND and operations like only directly from the interface. For example, IDBKeyRange.LEFT_BOUND; // this should evaluate to 4 IDBKeyRange.only(a).left; // this should evaluate to a Let me know if you need help with this IDL. Also, it might be a good idea to get the WebIDL experts involved in clarifying such questions rather than changing the API. If that is the intended syntax then that looks ok with me. What confused me was the IDL. We should definitely have keyrange stuff as a separate thread though, I'll start one today. / Jonas
Re: [IndexDB] Proposal for async API changes
I've been looking through the current spec and all the proposed changes. Great work. I'm going to be building a CouchDB compatible API on top of IndexedDB that can support peer-to-peer replication without other CouchDB instances. One of the things that will entail is a by-sequence index for all the changes in a give database (in my case a database will be scoped to more than one ObjectStore). In order to accomplish this I'll need to keep the last known sequence around so that each new write can create a new entry in the by-sequence index. The problem is that if another tab/window writes to the database it'll increment that sequence and I won't be notified so I would have to start every transaction with a check on the sequence index for the last sequence which seems like a lot of extra cursor calls. What I really need is an event listener on an ObjectStore that fires after a transaction is committed to the store but before the next transaction is run that gives me information about the commits to the ObjectStore. Thoughts? -Mikeal On Wed, Jun 9, 2010 at 11:40 AM, Jeremy Orlow jor...@chromium.org wrote: On Wed, Jun 9, 2010 at 7:25 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow jor...@chromium.org wrote: On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking jo...@sicking.cc wrote: On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow jor...@chromium.org wrote: I'm not sure I like the idea of offering sync cursors either since the UA will either need to load everything into memory before starting or risk blocking on disk IO for large data sets. Thus I'm not sure I support the idea of synchronous cursors. But, at the same time, I'm concerned about the overhead of firing one event per value with async cursors. Which is why I was suggesting an interface where the common case (the data is in memory) is done synchronously but the uncommon case (we'd block if we had to respond synchronously) has to be handled since we guarantee that the first time will be forced to be asynchronous. Like I said, I'm not super happy with what I proposed, but I think some hybrid async/sync interface is really what we need. Have you guys spent any time thinking about something like this? How dead-set are you on synchronous cursors? The idea is that synchronous cursors load all the required data into memory, yes. I think it would help authors a lot to be able to load small chunks of data into memory and read and write to it synchronously. Dealing with asynchronous operations constantly is certainly possible, but a bit of a pain for authors. I don't think we should obsess too much about not keeping things in memory, we already have things like canvas and the DOM which adds up to non-trivial amounts of memory. Just because data is loaded from a database doesn't mean it's huge. I do note that you're not as concerned about getAll(), which actually have worse memory characteristics than synchronous cursors since you need to create the full JS object graph in memory. I've been thinking about this off and on since the original proposal was made, and I just don't feel right about getAll() or synchronous cursors. You make some good points about there already being many ways to overwhelm ram with webAPIs, but is there any place we make it so easy? You're right that just because it's a database doesn't mean it needs to be huge, but often times they can get quite big. And if a developer doesn't spend time making sure they test their app with the upper ends of what users may possibly see, it just seems like this is a recipe for problems. Here's a concrete example: structured clone allows you to store image data. Lets say I'm building an image hosting site and that I cache all the images along with their thumbnails locally in an IndexedDB entity store. Lets say each thumbnail is a trivial amount, but each image is 1MB. I have an album with 1000 images. I do |var photos = albumIndex.getAllObjects(albumName);| and then iterate over that to get the thumbnails. But I've just loaded over 1GB of stuff into ram (assuming no additional inefficiency/blowup). I suppose it's possible JavaScript engines could build mechanisms to fetch this stuff lazily (like you could even with a synchronous cursor) but that will take time/effort and introduce lag in the page (while fetching additional info from disk). I'm not completely against the idea of getAll/sync cursors, but I do think they should be de-coupled from this proposed API. I would also suggest that we re-consider them only after at least one implementation has normal cursors working and there's been some experimentation with it. Until then, we're basing most of our arguments on intuition and assumptions. I'm not married to the concept of sync cursors. However I
Re: [IndexDB] Proposal for async API changes
Hi Jonas, On Wed, Jun 9, 2010 at 11:27 PM, Jonas Sicking jo...@sicking.cc wrote: I'm well aware of this. My argument is that I think we'll see people write code like this: results = []; db.objectStore(foo).openCursor(range).onsuccess = function(e) { var cursor = e.result; if (!cursor) { weAreDone(results); } results.push(cursor.value); cursor.continue(); } While the indexedDB implementation doesn't hold much data in memory at a time, the webpage will hold just as much as if we had had a getAll function. Thus we havn't actually improved anything, only forced the author to write more code. True, but the difference here is that the author's code is the one that may cause an OOM situation, not the indexedDB implementation. I am afraid that, by allowing getAll(), we are designing an API may or may not work depending on how large the underlying data set is and what platform the code is running on (e.g. a mobile with a few MB of RAM available or a desktop with a few GB free). To me, that is not ideal. Put it another way: The raised concern is that people won't think about the fact that getAll can load a lot of data into memory. And the proposed solution is to remove the getAll function and tell people to use openCursor. However if they weren't thinking about that a lot of data will be in memory at one time, then why wouldn't they write code like the above? Which results as just as much data being in memory? If they write code like the above and they run out of memory, I think there's a chance they can trace the problem back to their own code and attempt to fix it. On the other hand, if they trace the problem to the indexedDB implementation, then their only choice is to avoid using getAll(). Like you said, perhaps it's best to leave this method out for now and see what kind of feedback we get from API users. If there is demand, we can add it at that point? Thanks, Andrei
Re: [IndexDB] Proposal for async API changes
On Thu, Jun 10, 2010 at 4:46 AM, Andrei Popescu andr...@google.com wrote: Hi Jonas, On Wed, Jun 9, 2010 at 11:27 PM, Jonas Sicking jo...@sicking.cc wrote: I'm well aware of this. My argument is that I think we'll see people write code like this: results = []; db.objectStore(foo).openCursor(range).onsuccess = function(e) { var cursor = e.result; if (!cursor) { weAreDone(results); } results.push(cursor.value); cursor.continue(); } While the indexedDB implementation doesn't hold much data in memory at a time, the webpage will hold just as much as if we had had a getAll function. Thus we havn't actually improved anything, only forced the author to write more code. True, but the difference here is that the author's code is the one that may cause an OOM situation, not the indexedDB implementation. I don't see that the two are different. The user likely sees the same behavior and the action on the part of the website author is the same, i.e. to load the data in chunks rather than all at once. Why does it make a different on which side of the API the out-of-memory happens? Put it another way: The raised concern is that people won't think about the fact that getAll can load a lot of data into memory. And the proposed solution is to remove the getAll function and tell people to use openCursor. However if they weren't thinking about that a lot of data will be in memory at one time, then why wouldn't they write code like the above? Which results as just as much data being in memory? If they write code like the above and they run out of memory, I think there's a chance they can trace the problem back to their own code and attempt to fix it. On the other hand, if they trace the problem to the indexedDB implementation, then their only choice is to avoid using getAll(). Yes, their only choice is to rewrite the code to read data in chunks. However you could do that both using getAll (using limits and making several calls to getAll) and using cursors. So again, I don't really see a difference. / Jonas
Re: [IndexDB] Proposal for async API changes
On Thu, Jun 10, 2010 at 5:52 PM, Jonas Sicking jo...@sicking.cc wrote: On Thu, Jun 10, 2010 at 4:46 AM, Andrei Popescu andr...@google.com wrote: Hi Jonas, On Wed, Jun 9, 2010 at 11:27 PM, Jonas Sicking jo...@sicking.cc wrote: I'm well aware of this. My argument is that I think we'll see people write code like this: results = []; db.objectStore(foo).openCursor(range).onsuccess = function(e) { var cursor = e.result; if (!cursor) { weAreDone(results); } results.push(cursor.value); cursor.continue(); } While the indexedDB implementation doesn't hold much data in memory at a time, the webpage will hold just as much as if we had had a getAll function. Thus we havn't actually improved anything, only forced the author to write more code. True, but the difference here is that the author's code is the one that may cause an OOM situation, not the indexedDB implementation. I don't see that the two are different. The user likely sees the same behavior and the action on the part of the website author is the same, i.e. to load the data in chunks rather than all at once. Why does it make a different on which side of the API the out-of-memory happens? Yep, you are right in saying that the two situations are identical from the point of view of the user or from the point of view of the action that the website author takes. I just thought that in one case, the website author wrote code to explicitly load the entire store into the memory, so when an OOM happens, the culprit may be easy to spot. In the other case, the website author may not have realized how getAll() is implemented and may not know immediately what is going on. On the other hand, getAll() asynchronously returns an Array containing all the requested values so it should be just as obvious that it may cause an OOM. So ok, this isn't such a big concern after all.. Put it another way: The raised concern is that people won't think about the fact that getAll can load a lot of data into memory. And the proposed solution is to remove the getAll function and tell people to use openCursor. However if they weren't thinking about that a lot of data will be in memory at one time, then why wouldn't they write code like the above? Which results as just as much data being in memory? If they write code like the above and they run out of memory, I think there's a chance they can trace the problem back to their own code and attempt to fix it. On the other hand, if they trace the problem to the indexedDB implementation, then their only choice is to avoid using getAll(). Yes, their only choice is to rewrite the code to read data in chunks. However you could do that both using getAll (using limits and making several calls to getAll) and using cursors. So again, I don't really see a difference. Well, I don't feel very strongly about it but I personally would lean towards keeping the API simple and, where possible, avoid having multiple ways of doing the same thing until we're sure there's demand for them... Thanks, Andrei
Re: [IndexDB] Proposal for async API changes
On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking jo...@sicking.cc wrote: On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow jor...@chromium.org wrote: I'm not sure I like the idea of offering sync cursors either since the UA will either need to load everything into memory before starting or risk blocking on disk IO for large data sets. Thus I'm not sure I support the idea of synchronous cursors. But, at the same time, I'm concerned about the overhead of firing one event per value with async cursors. Which is why I was suggesting an interface where the common case (the data is in memory) is done synchronously but the uncommon case (we'd block if we had to respond synchronously) has to be handled since we guarantee that the first time will be forced to be asynchronous. Like I said, I'm not super happy with what I proposed, but I think some hybrid async/sync interface is really what we need. Have you guys spent any time thinking about something like this? How dead-set are you on synchronous cursors? The idea is that synchronous cursors load all the required data into memory, yes. I think it would help authors a lot to be able to load small chunks of data into memory and read and write to it synchronously. Dealing with asynchronous operations constantly is certainly possible, but a bit of a pain for authors. I don't think we should obsess too much about not keeping things in memory, we already have things like canvas and the DOM which adds up to non-trivial amounts of memory. Just because data is loaded from a database doesn't mean it's huge. I do note that you're not as concerned about getAll(), which actually have worse memory characteristics than synchronous cursors since you need to create the full JS object graph in memory. I've been thinking about this off and on since the original proposal was made, and I just don't feel right about getAll() or synchronous cursors. You make some good points about there already being many ways to overwhelm ram with webAPIs, but is there any place we make it so easy? You're right that just because it's a database doesn't mean it needs to be huge, but often times they can get quite big. And if a developer doesn't spend time making sure they test their app with the upper ends of what users may possibly see, it just seems like this is a recipe for problems. Here's a concrete example: structured clone allows you to store image data. Lets say I'm building an image hosting site and that I cache all the images along with their thumbnails locally in an IndexedDB entity store. Lets say each thumbnail is a trivial amount, but each image is 1MB. I have an album with 1000 images. I do |var photos = albumIndex.getAllObjects(albumName);| and then iterate over that to get the thumbnails. But I've just loaded over 1GB of stuff into ram (assuming no additional inefficiency/blowup). I suppose it's possible JavaScript engines could build mechanisms to fetch this stuff lazily (like you could even with a synchronous cursor) but that will take time/effort and introduce lag in the page (while fetching additional info from disk). I'm not completely against the idea of getAll/sync cursors, but I do think they should be de-coupled from this proposed API. I would also suggest that we re-consider them only after at least one implementation has normal cursors working and there's been some experimentation with it. Until then, we're basing most of our arguments on intuition and assumptions. J
Re: [IndexDB] Proposal for async API changes
On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow jor...@chromium.org wrote: On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking jo...@sicking.cc wrote: On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow jor...@chromium.org wrote: I'm not sure I like the idea of offering sync cursors either since the UA will either need to load everything into memory before starting or risk blocking on disk IO for large data sets. Thus I'm not sure I support the idea of synchronous cursors. But, at the same time, I'm concerned about the overhead of firing one event per value with async cursors. Which is why I was suggesting an interface where the common case (the data is in memory) is done synchronously but the uncommon case (we'd block if we had to respond synchronously) has to be handled since we guarantee that the first time will be forced to be asynchronous. Like I said, I'm not super happy with what I proposed, but I think some hybrid async/sync interface is really what we need. Have you guys spent any time thinking about something like this? How dead-set are you on synchronous cursors? The idea is that synchronous cursors load all the required data into memory, yes. I think it would help authors a lot to be able to load small chunks of data into memory and read and write to it synchronously. Dealing with asynchronous operations constantly is certainly possible, but a bit of a pain for authors. I don't think we should obsess too much about not keeping things in memory, we already have things like canvas and the DOM which adds up to non-trivial amounts of memory. Just because data is loaded from a database doesn't mean it's huge. I do note that you're not as concerned about getAll(), which actually have worse memory characteristics than synchronous cursors since you need to create the full JS object graph in memory. I've been thinking about this off and on since the original proposal was made, and I just don't feel right about getAll() or synchronous cursors. You make some good points about there already being many ways to overwhelm ram with webAPIs, but is there any place we make it so easy? You're right that just because it's a database doesn't mean it needs to be huge, but often times they can get quite big. And if a developer doesn't spend time making sure they test their app with the upper ends of what users may possibly see, it just seems like this is a recipe for problems. Here's a concrete example: structured clone allows you to store image data. Lets say I'm building an image hosting site and that I cache all the images along with their thumbnails locally in an IndexedDB entity store. Lets say each thumbnail is a trivial amount, but each image is 1MB. I have an album with 1000 images. I do |var photos = albumIndex.getAllObjects(albumName);| and then iterate over that to get the thumbnails. But I've just loaded over 1GB of stuff into ram (assuming no additional inefficiency/blowup). I suppose it's possible JavaScript engines could build mechanisms to fetch this stuff lazily (like you could even with a synchronous cursor) but that will take time/effort and introduce lag in the page (while fetching additional info from disk). I'm not completely against the idea of getAll/sync cursors, but I do think they should be de-coupled from this proposed API. I would also suggest that we re-consider them only after at least one implementation has normal cursors working and there's been some experimentation with it. Until then, we're basing most of our arguments on intuition and assumptions. I'm not married to the concept of sync cursors. However I pretty strongly feel that getAll is something we need. If we just allow cursors for getting multiple results I think we'll see an extremely common pattern of people using a cursor to loop through a result set and put values into an array. Yes, it can be misused, but I don't see a reason why people wouldn't misuse a cursor just as much. If they don't think about the fact that a range contains lots of data when using getAll, why would they think about it when using cursors? / Jonas
RE: [IndexDB] Proposal for async API changes
Inline... -Original Message- From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] On Behalf Of Jonas Sicking Sent: Wednesday, June 09, 2010 11:55 PM To: Jeremy Orlow Cc: Shawn Wilsher; Webapps WG Subject: Re: [IndexDB] Proposal for async API changes On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow jor...@chromium.org wrote: On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking jo...@sicking.cc wrote: On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow jor...@chromium.org wrote: I'm not sure I like the idea of offering sync cursors either since the UA will either need to load everything into memory before starting or risk blocking on disk IO for large data sets. Thus I'm not sure I support the idea of synchronous cursors. But, at the same time, I'm concerned about the overhead of firing one event per value with async cursors. Which is why I was suggesting an interface where the common case (the data is in memory) is done synchronously but the uncommon case (we'd block if we had to respond synchronously) has to be handled since we guarantee that the first time will be forced to be asynchronous. Like I said, I'm not super happy with what I proposed, but I think some hybrid async/sync interface is really what we need. Have you guys spent any time thinking about something like this? How dead-set are you on synchronous cursors? The idea is that synchronous cursors load all the required data into memory, yes. I think it would help authors a lot to be able to load small chunks of data into memory and read and write to it synchronously. Dealing with asynchronous operations constantly is certainly possible, but a bit of a pain for authors. I don't think we should obsess too much about not keeping things in memory, we already have things like canvas and the DOM which adds up to non-trivial amounts of memory. Just because data is loaded from a database doesn't mean it's huge. I do note that you're not as concerned about getAll(), which actually have worse memory characteristics than synchronous cursors since you need to create the full JS object graph in memory. I've been thinking about this off and on since the original proposal was made, and I just don't feel right about getAll() or synchronous cursors. You make some good points about there already being many ways to overwhelm ram with webAPIs, but is there any place we make it so easy? You're right that just because it's a database doesn't mean it needs to be huge, but often times they can get quite big. And if a developer doesn't spend time making sure they test their app with the upper ends of what users may possibly see, it just seems like this is a recipe for problems. Here's a concrete example: structured clone allows you to store image data. Lets say I'm building an image hosting site and that I cache all the images along with their thumbnails locally in an IndexedDB entity store. Lets say each thumbnail is a trivial amount, but each image is 1MB. I have an album with 1000 images. I do |var photos = albumIndex.getAllObjects(albumName);| and then iterate over that to get the thumbnails. But I've just loaded over 1GB of stuff into ram (assuming no additional inefficiency/blowup). I suppose it's possible JavaScript engines could build mechanisms to fetch this stuff lazily (like you could even with a synchronous cursor) but that will take time/effort and introduce lag in the page (while fetching additional info from disk). I'm not completely against the idea of getAll/sync cursors, but I do think they should be de-coupled from this proposed API. I would also suggest that we re-consider them only after at least one implementation has normal cursors working and there's been some experimentation with it. Until then, we're basing most of our arguments on intuition and assumptions. I'm not married to the concept of sync cursors. However I pretty strongly feel that getAll is something we need. If we just allow cursors for getting multiple results I think we'll see an extremely common pattern of people using a cursor to loop through a result set and put values into an array. Yes, it can be misused, but I don't see a reason why people wouldn't misuse a cursor just as much. If they don't think about the fact that a range contains lots of data when using getAll, why would they think about it when using cursors? [Laxmi] Cursor is a streaming operator that means only the current row or page is available in memory and the rest sits on the disk. As the program moves the cursor thru the result, old pages are thrown away and new pages are loaded from the result set. Whereas with getAll everything has to come to memory before returning to the caller. If there is not enough memory to keep the result all at a time, we would end up in out-of-memory. In short, getAll suites well for small result/range, but not for big databases. That is, with getAll
Re: [IndexDB] Proposal for async API changes
On Wed, Jun 9, 2010 at 7:25 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow jor...@chromium.org wrote: On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking jo...@sicking.cc wrote: On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow jor...@chromium.org wrote: I'm not sure I like the idea of offering sync cursors either since the UA will either need to load everything into memory before starting or risk blocking on disk IO for large data sets. Thus I'm not sure I support the idea of synchronous cursors. But, at the same time, I'm concerned about the overhead of firing one event per value with async cursors. Which is why I was suggesting an interface where the common case (the data is in memory) is done synchronously but the uncommon case (we'd block if we had to respond synchronously) has to be handled since we guarantee that the first time will be forced to be asynchronous. Like I said, I'm not super happy with what I proposed, but I think some hybrid async/sync interface is really what we need. Have you guys spent any time thinking about something like this? How dead-set are you on synchronous cursors? The idea is that synchronous cursors load all the required data into memory, yes. I think it would help authors a lot to be able to load small chunks of data into memory and read and write to it synchronously. Dealing with asynchronous operations constantly is certainly possible, but a bit of a pain for authors. I don't think we should obsess too much about not keeping things in memory, we already have things like canvas and the DOM which adds up to non-trivial amounts of memory. Just because data is loaded from a database doesn't mean it's huge. I do note that you're not as concerned about getAll(), which actually have worse memory characteristics than synchronous cursors since you need to create the full JS object graph in memory. I've been thinking about this off and on since the original proposal was made, and I just don't feel right about getAll() or synchronous cursors. You make some good points about there already being many ways to overwhelm ram with webAPIs, but is there any place we make it so easy? You're right that just because it's a database doesn't mean it needs to be huge, but often times they can get quite big. And if a developer doesn't spend time making sure they test their app with the upper ends of what users may possibly see, it just seems like this is a recipe for problems. Here's a concrete example: structured clone allows you to store image data. Lets say I'm building an image hosting site and that I cache all the images along with their thumbnails locally in an IndexedDB entity store. Lets say each thumbnail is a trivial amount, but each image is 1MB. I have an album with 1000 images. I do |var photos = albumIndex.getAllObjects(albumName);| and then iterate over that to get the thumbnails. But I've just loaded over 1GB of stuff into ram (assuming no additional inefficiency/blowup). I suppose it's possible JavaScript engines could build mechanisms to fetch this stuff lazily (like you could even with a synchronous cursor) but that will take time/effort and introduce lag in the page (while fetching additional info from disk). I'm not completely against the idea of getAll/sync cursors, but I do think they should be de-coupled from this proposed API. I would also suggest that we re-consider them only after at least one implementation has normal cursors working and there's been some experimentation with it. Until then, we're basing most of our arguments on intuition and assumptions. I'm not married to the concept of sync cursors. However I pretty strongly feel that getAll is something we need. If we just allow cursors for getting multiple results I think we'll see an extremely common pattern of people using a cursor to loop through a result set and put values into an array. Yes, it can be misused, but I don't see a reason why people wouldn't misuse a cursor just as much. If they don't think about the fact that a range contains lots of data when using getAll, why would they think about it when using cursors? Once again, I feel like there is a lot of speculation (more than normal) happening here. I'd prefer we take the Async API without the sync cursors or getAll and give the rest of the API some time to bake before considering it again. Ideally by then we'd have at least one or two early adopters that can give their perspective on the issue. J
Re: [IndexDB] Proposal for async API changes
On Wed, Jun 9, 2010 at 11:39 AM, Laxmi Narsimha Rao Oruganti laxmi.oruga...@microsoft.com wrote: Inline... -Original Message- From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] On Behalf Of Jonas Sicking Sent: Wednesday, June 09, 2010 11:55 PM To: Jeremy Orlow Cc: Shawn Wilsher; Webapps WG Subject: Re: [IndexDB] Proposal for async API changes On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow jor...@chromium.org wrote: On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking jo...@sicking.cc wrote: On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow jor...@chromium.org wrote: I'm not sure I like the idea of offering sync cursors either since the UA will either need to load everything into memory before starting or risk blocking on disk IO for large data sets. Thus I'm not sure I support the idea of synchronous cursors. But, at the same time, I'm concerned about the overhead of firing one event per value with async cursors. Which is why I was suggesting an interface where the common case (the data is in memory) is done synchronously but the uncommon case (we'd block if we had to respond synchronously) has to be handled since we guarantee that the first time will be forced to be asynchronous. Like I said, I'm not super happy with what I proposed, but I think some hybrid async/sync interface is really what we need. Have you guys spent any time thinking about something like this? How dead-set are you on synchronous cursors? The idea is that synchronous cursors load all the required data into memory, yes. I think it would help authors a lot to be able to load small chunks of data into memory and read and write to it synchronously. Dealing with asynchronous operations constantly is certainly possible, but a bit of a pain for authors. I don't think we should obsess too much about not keeping things in memory, we already have things like canvas and the DOM which adds up to non-trivial amounts of memory. Just because data is loaded from a database doesn't mean it's huge. I do note that you're not as concerned about getAll(), which actually have worse memory characteristics than synchronous cursors since you need to create the full JS object graph in memory. I've been thinking about this off and on since the original proposal was made, and I just don't feel right about getAll() or synchronous cursors. You make some good points about there already being many ways to overwhelm ram with webAPIs, but is there any place we make it so easy? You're right that just because it's a database doesn't mean it needs to be huge, but often times they can get quite big. And if a developer doesn't spend time making sure they test their app with the upper ends of what users may possibly see, it just seems like this is a recipe for problems. Here's a concrete example: structured clone allows you to store image data. Lets say I'm building an image hosting site and that I cache all the images along with their thumbnails locally in an IndexedDB entity store. Lets say each thumbnail is a trivial amount, but each image is 1MB. I have an album with 1000 images. I do |var photos = albumIndex.getAllObjects(albumName);| and then iterate over that to get the thumbnails. But I've just loaded over 1GB of stuff into ram (assuming no additional inefficiency/blowup). I suppose it's possible JavaScript engines could build mechanisms to fetch this stuff lazily (like you could even with a synchronous cursor) but that will take time/effort and introduce lag in the page (while fetching additional info from disk). I'm not completely against the idea of getAll/sync cursors, but I do think they should be de-coupled from this proposed API. I would also suggest that we re-consider them only after at least one implementation has normal cursors working and there's been some experimentation with it. Until then, we're basing most of our arguments on intuition and assumptions. I'm not married to the concept of sync cursors. However I pretty strongly feel that getAll is something we need. If we just allow cursors for getting multiple results I think we'll see an extremely common pattern of people using a cursor to loop through a result set and put values into an array. Yes, it can be misused, but I don't see a reason why people wouldn't misuse a cursor just as much. If they don't think about the fact that a range contains lots of data when using getAll, why would they think about it when using cursors? [Laxmi] Cursor is a streaming operator that means only the current row or page is available in memory and the rest sits on the disk. As the program moves the cursor thru the result, old pages are thrown away and new pages are loaded from the result set. Whereas with getAll everything has to come to memory before returning to the caller. If there is not enough memory to keep the result all at a time, we would end up
Re: [IndexDB] Proposal for async API changes
On Wed, Jun 9, 2010 at 3:27 PM, Jonas Sicking jo...@sicking.cc wrote: I'm well aware of this. My argument is that I think we'll see people write code like this: results = []; db.objectStore(foo).openCursor(range).onsuccess = function(e) { var cursor = e.result; if (!cursor) { weAreDone(results); } results.push(cursor.value); cursor.continue(); } While the indexedDB implementation doesn't hold much data in memory at a time, the webpage will hold just as much as if we had had a getAll function. Thus we havn't actually improved anything, only forced the author to write more code. Put it another way: The raised concern is that people won't think about the fact that getAll can load a lot of data into memory. And the proposed solution is to remove the getAll function and tell people to use openCursor. However if they weren't thinking about that a lot of data will be in memory at one time, then why wouldn't they write code like the above? Which results as just as much data being in memory? At the very least, explicitly loading things into an honest-to-god array can make it more obvious that you're eating memory in the form of a big array, as opposed to just a magically transform my blob of data into something more convenient. (That said, I dislike cursors and explicitly avoid them in my own code. In the PHP db abstraction layer I wrote for myself, every query slurps the results into an array and just returns that - I don't give myself any access to the cursor at all. I probably like this better simply because I can easily foreach through an array, while I can't do the same with a cursor unless I write some moderately more complex code. I hate using while loops when foreach is beckoning to me.) ~TJ
Re: [IndexDB] Proposal for async API changes
On Wed, Jun 9, 2010 at 11:40 AM, Jeremy Orlow jor...@chromium.org wrote: On Wed, Jun 9, 2010 at 7:25 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow jor...@chromium.org wrote: On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking jo...@sicking.cc wrote: On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow jor...@chromium.org wrote: I'm not sure I like the idea of offering sync cursors either since the UA will either need to load everything into memory before starting or risk blocking on disk IO for large data sets. Thus I'm not sure I support the idea of synchronous cursors. But, at the same time, I'm concerned about the overhead of firing one event per value with async cursors. Which is why I was suggesting an interface where the common case (the data is in memory) is done synchronously but the uncommon case (we'd block if we had to respond synchronously) has to be handled since we guarantee that the first time will be forced to be asynchronous. Like I said, I'm not super happy with what I proposed, but I think some hybrid async/sync interface is really what we need. Have you guys spent any time thinking about something like this? How dead-set are you on synchronous cursors? The idea is that synchronous cursors load all the required data into memory, yes. I think it would help authors a lot to be able to load small chunks of data into memory and read and write to it synchronously. Dealing with asynchronous operations constantly is certainly possible, but a bit of a pain for authors. I don't think we should obsess too much about not keeping things in memory, we already have things like canvas and the DOM which adds up to non-trivial amounts of memory. Just because data is loaded from a database doesn't mean it's huge. I do note that you're not as concerned about getAll(), which actually have worse memory characteristics than synchronous cursors since you need to create the full JS object graph in memory. I've been thinking about this off and on since the original proposal was made, and I just don't feel right about getAll() or synchronous cursors. You make some good points about there already being many ways to overwhelm ram with webAPIs, but is there any place we make it so easy? You're right that just because it's a database doesn't mean it needs to be huge, but often times they can get quite big. And if a developer doesn't spend time making sure they test their app with the upper ends of what users may possibly see, it just seems like this is a recipe for problems. Here's a concrete example: structured clone allows you to store image data. Lets say I'm building an image hosting site and that I cache all the images along with their thumbnails locally in an IndexedDB entity store. Lets say each thumbnail is a trivial amount, but each image is 1MB. I have an album with 1000 images. I do |var photos = albumIndex.getAllObjects(albumName);| and then iterate over that to get the thumbnails. But I've just loaded over 1GB of stuff into ram (assuming no additional inefficiency/blowup). I suppose it's possible JavaScript engines could build mechanisms to fetch this stuff lazily (like you could even with a synchronous cursor) but that will take time/effort and introduce lag in the page (while fetching additional info from disk). I'm not completely against the idea of getAll/sync cursors, but I do think they should be de-coupled from this proposed API. I would also suggest that we re-consider them only after at least one implementation has normal cursors working and there's been some experimentation with it. Until then, we're basing most of our arguments on intuition and assumptions. I'm not married to the concept of sync cursors. However I pretty strongly feel that getAll is something we need. If we just allow cursors for getting multiple results I think we'll see an extremely common pattern of people using a cursor to loop through a result set and put values into an array. Yes, it can be misused, but I don't see a reason why people wouldn't misuse a cursor just as much. If they don't think about the fact that a range contains lots of data when using getAll, why would they think about it when using cursors? Once again, I feel like there is a lot of speculation (more than normal) happening here. I'd prefer we take the Async API without the sync cursors or getAll and give the rest of the API some time to bake before considering it again. Ideally by then we'd have at least one or two early adopters that can give their perspective on the issue. If it helps move things forward we can keep getAll out of the spec for now. I still think that mozilla will keep the implementation though as to allow people to experiment with it. This will also allow us to guess less
Re: [IndexDB] Proposal for async API changes
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/9/2010 4:27 PM, Jonas Sicking wrote: On Wed, Jun 9, 2010 at 11:39 AM, Laxmi Narsimha Rao Oruganti laxmi.oruga...@microsoft.com wrote: Inline... -Original Message- From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] On Behalf Of Jonas Sicking Sent: Wednesday, June 09, 2010 11:55 PM To: Jeremy Orlow Cc: Shawn Wilsher; Webapps WG Subject: Re: [IndexDB] Proposal for async API changes On Wed, Jun 9, 2010 at 7:42 AM, Jeremy Orlow jor...@chromium.org wrote: On Tue, May 18, 2010 at 8:34 PM, Jonas Sicking jo...@sicking.cc wrote: On Tue, May 18, 2010 at 12:10 PM, Jeremy Orlow jor...@chromium.org wrote: I'm not sure I like the idea of offering sync cursors either since the UA will either need to load everything into memory before starting or risk blocking on disk IO for large data sets. Thus I'm not sure I support the idea of synchronous cursors. But, at the same time, I'm concerned about the overhead of firing one event per value with async cursors. Which is why I was suggesting an interface where the common case (the data is in memory) is done synchronously but the uncommon case (we'd block if we had to respond synchronously) has to be handled since we guarantee that the first time will be forced to be asynchronous. Like I said, I'm not super happy with what I proposed, but I think some hybrid async/sync interface is really what we need. Have you guys spent any time thinking about something like this? How dead-set are you on synchronous cursors? The idea is that synchronous cursors load all the required data into memory, yes. I think it would help authors a lot to be able to load small chunks of data into memory and read and write to it synchronously. Dealing with asynchronous operations constantly is certainly possible, but a bit of a pain for authors. I don't think we should obsess too much about not keeping things in memory, we already have things like canvas and the DOM which adds up to non-trivial amounts of memory. Just because data is loaded from a database doesn't mean it's huge. I do note that you're not as concerned about getAll(), which actually have worse memory characteristics than synchronous cursors since you need to create the full JS object graph in memory. I've been thinking about this off and on since the original proposal was made, and I just don't feel right about getAll() or synchronous cursors. You make some good points about there already being many ways to overwhelm ram with webAPIs, but is there any place we make it so easy? You're right that just because it's a database doesn't mean it needs to be huge, but often times they can get quite big. And if a developer doesn't spend time making sure they test their app with the upper ends of what users may possibly see, it just seems like this is a recipe for problems. Here's a concrete example: structured clone allows you to store image data. Lets say I'm building an image hosting site and that I cache all the images along with their thumbnails locally in an IndexedDB entity store. Lets say each thumbnail is a trivial amount, but each image is 1MB. I have an album with 1000 images. I do |var photos = albumIndex.getAllObjects(albumName);| and then iterate over that to get the thumbnails. But I've just loaded over 1GB of stuff into ram (assuming no additional inefficiency/blowup). I suppose it's possible JavaScript engines could build mechanisms to fetch this stuff lazily (like you could even with a synchronous cursor) but that will take time/effort and introduce lag in the page (while fetching additional info from disk). I'm not completely against the idea of getAll/sync cursors, but I do think they should be de-coupled from this proposed API. I would also suggest that we re-consider them only after at least one implementation has normal cursors working and there's been some experimentation with it. Until then, we're basing most of our arguments on intuition and assumptions. I'm not married to the concept of sync cursors. However I pretty strongly feel that getAll is something we need. If we just allow cursors for getting multiple results I think we'll see an extremely common pattern of people using a cursor to loop through a result set and put values into an array. Yes, it can be misused, but I don't see a reason why people wouldn't misuse a cursor just as much. If they don't think about the fact that a range contains lots of data when using getAll, why would they think about it when using cursors? [Laxmi] Cursor is a streaming operator that means only the current row or page is available in memory and the rest sits on the disk. As the program moves the cursor thru the result, old pages are thrown away and new pages are loaded from the result set. Whereas with getAll everything has to come to memory before returning to the caller
Re: [IndexDB] Proposal for async API changes
On 6/9/2010 3:48 PM, Kris Zyp wrote: Another option would be to have cursors essentially implement a JS array-like API: db.objectStore(foo).openCursor(range).forEach(function(object){ // do something with each object }).onsuccess = function(){ // all done }); (Or perhaps the cursor with a forEach would be nested inside a callback, not sure). The standard some function is also useful if you know you probably won't need to iterate through everything db.objectStore(foo).openCursor(range).some(function(object){ return object.name == John; }).onsuccess = function(johnIsInDatabase){ if(johnIsInDatabase){ ... } }); This allows us to have an async interface (the callbacks can be called at any time) and still follows normal JS array patterns, for programmer convenience (so programmers wouldn't need to iterate over a cursor and push the results into another array). I don't think anyone would miss getAll() with this design, since cursors would already be array-like. To me, this feels like we are basically doing what we expect a library to do: make the syntactic sugar work. I don't see why a library couldn't provide a some or forEach method with the currently proposed API. Cheers, Shawn smime.p7s Description: S/MIME Cryptographic Signature
Re: [IndexDB] Proposal for async API changes
On Wed, Jun 9, 2010 at 3:36 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: On Wed, Jun 9, 2010 at 3:27 PM, Jonas Sicking jo...@sicking.cc wrote: I'm well aware of this. My argument is that I think we'll see people write code like this: results = []; db.objectStore(foo).openCursor(range).onsuccess = function(e) { var cursor = e.result; if (!cursor) { weAreDone(results); } results.push(cursor.value); cursor.continue(); } While the indexedDB implementation doesn't hold much data in memory at a time, the webpage will hold just as much as if we had had a getAll function. Thus we havn't actually improved anything, only forced the author to write more code. Put it another way: The raised concern is that people won't think about the fact that getAll can load a lot of data into memory. And the proposed solution is to remove the getAll function and tell people to use openCursor. However if they weren't thinking about that a lot of data will be in memory at one time, then why wouldn't they write code like the above? Which results as just as much data being in memory? At the very least, explicitly loading things into an honest-to-god array can make it more obvious that you're eating memory in the form of a big array, as opposed to just a magically transform my blob of data into something more convenient. I don't fully understand this. getAll also returns an honest-to-god array. (That said, I dislike cursors and explicitly avoid them in my own code. In the PHP db abstraction layer I wrote for myself, every query slurps the results into an array and just returns that - I don't give myself any access to the cursor at all. I probably like this better simply because I can easily foreach through an array, while I can't do the same with a cursor unless I write some moderately more complex code. I hate using while loops when foreach is beckoning to me.) This is what I'd expect many/most people to do. / Jonas
Re: [IndexDB] Proposal for async API changes
On 6/9/2010 3:36 PM, Tab Atkins Jr. wrote: At the very least, explicitly loading things into an honest-to-god array can make it more obvious that you're eating memory in the form of a big array, as opposed to just a magically transform my blob of data into something more convenient. I'm sorry, but if a developer can't figure out that if they are given a big array (that is a proper Array in JavaScript) that it is the cause of large amounts of memory usage, I don't see how them populating it themselves is going to raise any additional flags. Cheers, Shawn smime.p7s Description: S/MIME Cryptographic Signature
Re: [IndexDB] Proposal for async API changes
Hi Jonas, A draft of the proposed API is here: http://docs.google.com/View?id=dfs2skx2_4g3s5f857 As someone new to this API, I thought the naming used in the current draft is somewhat confusing. Consider the following interfaces: IndexedDatabase IndexedDatabaseRequest, IDBDatabaseRequest, IDBDatabase, IDBRequest Just by looking at this, it is pretty hard to understand what the relationship between these interfaces really is and what role do they play in the API. For instance, I thought that the IDBDatabaseRequest is some type of Request when, in fact, it isn't a Request at all. It also isn't immediately obvious what the difference between IndexedDatabase and IDBDatabase really is, etc. I really don't want to start a color of the bikeshed argument and I fully understand how you reached the current naming convention. However, I thought I'd suggest a three small changes that could help other people understand this API easier: - I know we need to keep the IDB prefix in order to avoid collisions with other APIs. I would therefore think we should keep the IDB prefix and make sure all the interfaces start with it (right now they don't). - The Request suffix is now used to denote the asynchronous versions of the API interfaces. These interfaces aren't actually Requests of any kind, so I would like to suggest changing this suffix. In fact, if the primary usage of this API is via its async version, we could even drop this suffix altogether and just add Sync to the synchronous versions? - Some of the interfaces could have names that would more closely reflect their roles in the API. For instance, IDBDatabase could be renamed to IDBConnection, since in the spec it is described as a connection to the database. Likewise, IndexedDatabase could become IDBFactory since it is used to create database connections or key ranges. In any case, I want to make it clear that the current naming works once one takes the time to understand it. On the other hand, if we make it easier for people to understand the API, we could hopefully get feedback from more developers. Thanks, Andrei
Re: [IndexDB] Proposal for async API changes
On Tue, May 18, 2010 at 2:15 AM, Jonas Sicking jo...@sicking.cc wrote: A draft of the proposed API is here: http://docs.google.com/View?id=dfs2skx2_4g3s5f857 I just noticed another nit. Your proposal says interface IDBIndex { }; // Unchanged but the spec's IDBIndex interface includes readonly attribute DOMString storeName #widl-IDBIndex-storeName; which is the owning object store's name. This param is probably no longer necessary now that indexes hang off of objectStores (and thus it's pretty clear which one an index is associated with). J
Re: [IndexDB] Proposal for async API changes
On Tue, May 18, 2010 at 7:20 AM, Jeremy Orlow jor...@chromium.org wrote: Overall, I'm pretty happy with these changes. I support making these changes to the spec. Additional comments inline... On Tue, May 18, 2010 at 2:15 AM, Jonas Sicking jo...@sicking.cc wrote: Hi All, I, together with Ben Turner and Shawn Wilsher have been looking at the asynchronous API defined in the IndexDB specification and have a set of changes to propose. The main goal of these changes is to simplify the API that we expose to authors, making it easier for them to work with. Another goal has been to reduce the risk that authors misuse the API and use long running transactions. Finally, it has been a goal to reduce the risk of situations that can race. It has explicitly not been a goal to simplify the implementation. In some cases it is definitely harder to implement the proposed API. However, we believe that the extra complexity in implementation is outweighed by simplicity for users of the API. The main changes are: 1. Once a database has been opened (a database connection has been established) read access to meta-data, such as objectStore and index names, is synchronous. Changes to such meta data, such as creating objectStores and indexes, is still asynchronous. I believe this is already how it's specced. The IDBDatabase interface already gives you synchronous access to all of this. The big difference is that the current spec makes openObjectStore() and openIndex() asynchronous. Our proposal makes openObjectStore() and openIndex() (renamed objectStore() and index()) synchronous. So opening an objectstore, or even starting a transaction, only synchronously accesses metadata. But any requests you make on the transaction will be held until the transaction has managed to grab the requested tables. So when you, in our proposal, call: db.objectStore(foo, READ_WRITE).put(...); the objectStore function synchronously creates a transaction object representing a transaction which only contains the foo objectStore. The implementation then fires off a asynchronous request to lock the foo objectStore with a write lock. It then returns the synchronously created transaction object. When the put() function is called, the implementation notices that the lock is not yet acquired. So it simply records what information should be written to the object store. Later, when the write lock is successfully acquired, the implementation executes the recorded operations and once they finish we call their callbacks. 9. IDBKeyRanges are created using functions on IndexedDatabaseRequest. We couldn't figure out how the old API allowed you to create a range object without first having a range object. In the spec, I see the following in examples: var range = new IDBKeyRange.bound(2, 4); and var range = IDBKeyRange.leftBound(key); I'm not particularly happy with hanging functions off of IndexedDatabaseRequest for this. Can it work something like what I listed above? If not, maybe we can find a better place to put them? Or just create multiple openCursor functions for each case? Mostly we were just confused as to what syntax was actually proposed. You are listing two syntaxes (with and without 'new'), neither of which match the WebIDL in the spec. I personally think that most proposed syntaxes are ok and don't care much which one we choose, as long as it's clearly defined. 10. You are allowed to have multiple transactions per database connection. However if they use overlapping tables, only the first one will receive events until it is finished (with the usual exceptions of allowing multiple readers of the same table). Can you please clarify what you mean here? This seems like simply an implementation detail to me, so maybe I'm missing something? The spec currently does explicitly forbid having multiple transactions per database connection. The syntax doesn't even support it since there is a .currentTransaction property on IDBDatabase. I.e. the following code seems forbidden (though I'm not sure if forbidden means that an exception will be thrown somewhere, or if it means that the code will just silently fail to work). request = indexedDB.openDatabase(myDB, ...); request.onsuccess = function() { db = request.result; r1 = db.openTransaction([foo]); r1.onsuccess = function() { ... }; r2 = db.openTransaction([bar]); r2.onsuccess = function() { ... }; }; The spec says that the above is forbidden. In our new proposal the following would be allowed: request = indexedDB.openDatabase(myDB, ...); request.onsuccess = function() { db = request.result; t1 = db.transaction([foo], READ_WRITE); t2 = db.transaction([bar], READ_WRITE); }; And would allow the two transactions to run concurrently. A draft of the proposed API is here: http://docs.google.com/View?id=dfs2skx2_4g3s5f857 Comments: 1) IDBRequest.abort() in IDBRequest needs to be able to raise. I think in general we haven't indicated