Re: Futures and transactions
The problem with IndexedDB transactions is when you need to start doing any kind of streaming, where there is the potential for the stream write buffer to fill up, e.g. syncing over the network: 1. Get references to objects within a collection within a transaction. 2. Compare these to objects over the network. 3. Start writing objects to the network, waiting for the network to drain (assuming web sockets) before writing more data. While this is essentially a long-lived read transaction, this won't work with IDB. Some have argued that the design goal was to avoid long-lived transactions, but there is a difference between long-lived read transactions and long-lived write transactions. For MVCC transactions, which I think IDB was once supposed to be aiming for, there is by definition no problem with long running readers, since they do not block each other or writers, they simply read the database at a snapshot in time. The browser is starting to support stream apis, and I think with that, we need transactions that can be "retained". That is, keep the same semantics as per IDB transactions, but with an additional method "retain(milliseconds)" that would keep the transaction alive for a certain amount of time. Joran Greef
Re: Sandbox
On 17 Sep 2012, at 2:33 PM, Florian Bösch wrote: > Security is a pretty serious concern if you're distributing apps without any > oversight to billions of users automatically upon a single link click. You are conflating web apps (trusted, installed) with web pages (single link click). > No TCP. > Wrong, see websockets which upgrade to plain old TCP after the handshake. No, WebSockets are not "plain old TCP". > > No UDP. > Coming with WebRTC in the form of unreliable data channels. WebRTC is above UDP. It's not UDP. WebRTC is a massive conglomeration of protocols and codecs and opinions. > No POSIX. > Why would you need cross-OS posix standards and operating system shells when > you already have a browser which abstracts cross-OS APIs in its own fashion? How do you fsync in a browser? > Tim Berners-Lee raised this point first awhile back on Public Web Apps: > http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0464.html > I believe his point was subtly different. He was arguing for vendors to come > up with ways to solve the usecases he mentioned, not arguing to just blast > the OS at the JS developer and let the ensuing security armageddon sort > itself out. No, not at all. Nowhere did he ask for browser vendors "to solve the use cases he mentioned".
Re: Sandbox
Apps (native/web) need direct access to bare metal. Browser vendors need to move away from the "we do all the thinking and designing and implementing" top-down model of innovation. Browser vendors need to provide minimal core OS APIs and get out of the way and let open source grow around and do the rest. For too long now the typical response to this kind of proposal has been "how do you propose solving the security problems?" That is to say, we should not do any of this unless we can perfectly solve the security problems. As if they can be perfectly solved. And so our most perfect solution has been to completely cripple web apps: No TCP. No UDP. No POSIX. No Hardware. Tim Berners-Lee raised this point first awhile back on Public Web Apps: http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0464.html As a user, I want to write a web app. I trust it. I want to give it UDP, TCP and POSIX anointing. I want it to use the resources of my machine to act on my behalf and assist me in my work. The browser won't let me. Why?
Drag & Drop Web Apps
Given the advance of HTML 5, and in the interest of developing web apps with average functionality, would it now be possible to: 1. Drag files and folders into a web app? 2. Drag files and folders out of a web app? 3. Drag a spreadsheet out of a web app onto the icon of Excel in the dock and have it open in Excel? 4. Monitor that same spreadsheet's content (originally provided by the web app) for changes when the user edits it and presses CTRL+S? Or is it only possible to drag things into a browser window but not back out and nothing else? Can a user drag a piece of data into a browser window… and then drag it back out? For example, a user may want to use a Contacts web app, and drag a contact out the browser window as a piece of vcard data and land this onto the Contacts app in the dock, which would then import the contact, all in a single mouse gesture? Or is it not possible to provide that kind of user experience? For example, a user may want to use a PDF web app, and transfer a piece of PDF data to the Preview app, but be forced to click a link to download the PDF, click the very small "Keep" button next to the "This type of file can harm your computer. Do you want to keep anyway?" warning, and then drag the PDF onto the Preview app, and then go to the Downloads folder to delete the "download". At least 5 mouse clicks and then a CMD+backspace to accomplish what (from the user's point of view at least) should have only taken one drag and drop? And then this may be vendor specific, but if a user created a piece of PDF data and dragged it into the browser window in the first place, does it still make sense to warn them that "this type of file can harm your computer"? The browser takes on too much responsibility for things it can't possibly reason about, and seeks not enough advice from the user where it could. It often seems that the browser is built to lecture the user, rather than the other way round. I use the browser everyday at work, and sometimes you have to ask yourself: who's serving who. Does the user serve the browser, or does the browser serve the user?
Non-persistent in-memory storage accessible by same domain tabs
Web applications need a way to communicate between two same domain tabs without polling LocalStorage and without hitting the disk. It would be useful to have an in-memory get/set/compare_and_set hash table exposed to scripts running same domain tabs, that is discarded by the browser when those tabs are closed. Use cases: 1. Coordinate replication between tabs for an offline app, i.e. one tab takes responsibility for syncing a user's data to and from IndexedDB. 2. Sign out from one tab triggers sign out from all other tabs. 3. If something like LevelDB were exposed directly to JS, one could implement MVCC on top using the shared hash. 4. Library authors would be able to implement their own cross-tab postMessage. It's difficult to implement these use cases with LocalStorage, without a coarse resolution, and risky at that, due to the lack of compare and set primitive in LocalStorage.
IndexedDB: Binary Keys
IndexedDB supports binary values as per the structured clone algorithm as implemented in Chrome and Firefox. IndexedDB needs to support binary keys (ArrayBuffer, TypedArrays). Many popular KV stores accept binary keys (BDB, Tokyo, LevelDB). The Chrome implementation of IDB is already serializing keys to binary. JS is moving more and more towards binary data across the board (WebSockets, TypedArrays, FileSystemAPI). IDB is not quite there if it does not support binary keys. Binary keys are more efficient than Base 64 encoded keys, e.g. a 128 bit key in base 256 is 16 bytes, but 22 bytes in base 64. Am working on a production system storing 3 million keys in IndexedDB. In about 6 months it will be storing 60 million keys in IndexedDB. Without support for binary keys, that's 330mb wasted storage (60,000,000 * (22 - 16)) not to mention the wasted CPU overhead spent Base64 encoding and decoding keys.
IndexedDB: Retrieving a slice of a record value.
It would be great if there was a way to use IndexedDB to get just a slice of a record value, not the entire value. For example, when storing many large binary values, there may be useful meta or header info at the start or end of each value, which could be retrieved directly. It would be a waste to have to store this data twice, or to read the entire value.
Re: Installing Web Apps
The problem is we're framing the discussion in terms of "installing" web apps. We're answering the wrong question. The real question is whether we want to start seeing powerful applications running in the browser. If we do, then we'll figure out a way to get there. Be it "installing", "permissions", or letting apps use as much storage as they need, but just giving me a way to keep tabs on what they're using so I can uninstall them if I want. Or letting apps use as much bandwidth or CPU or whatever they need, but just giving me a way to keep tabs. Or if I'm really security conscious there could be a firewall to let me as user defend certain "system calls" or whitelist specific apps but only if I want. But none of that is really the issue now. The issue now is that some are unimaginatively saying "what, browser in a browser?". It's the "nobody would ever want a personal computer" attitude and this needs to change so that the next unforeseen innovation can take place. What do you want to build in the browser? 1. Dropbox (e.g. drag and drop files into the browser, click a link in the app to open them in native applications such as Excel, poll the file for changes from the browser and sync the chunks that changed)? 2. Web browser? 3. Proxyless POP and SMTP clients that don't waste server bandwidth and let users go direct? 4. Spotify client? 5. Skype client? I want to be building all of the above.
Re: Enable Compression Of A Blob To .zip File
It would be great to have a native binding to Zlib and Snappy exposed to Javascript in the browser. Zlib covers the expensive disk use-cases, Snappy covers the expensive CPU use-cases. Also a native binding to basic crypto primitives, even if that means just SHA1 to start, and even if the Node.js crypto api is copied verbatim. TypedArrays are in current implementations are too slow to help with these, as far as I have tried.
Re: [IndexedDB] Transaction Auto-Commit
> On 03 Aug 2011, at 7:33 PM, Jonas Sicking wrote: > "Note that reads are also blocked if the long-running transaction is a READ_WRITE transaction." >> >> Is it acceptable for a writer to block readers? What if one tab is >> downloading a gigabyte of user data (using a workload-configurable Merkle >> tree scheme), and another tab for the same application needs to show data? > > This is exactly why transactions are auto-committing. We don't want > someone to start a transaction, download a gigabyte of data, write it > to the database, and only after commit the transaction. The > auto-committing behavior forces you to download the data first, only > then can you start a transaction to insert that data into the > database. If someone were syncing a gigabyte of data using a Merkle tree scheme they would probably not consider using a single transaction to persist the data nor would they find it necessary. Rather the point was made to emphasize that a write-intensive task may take place where many write transactions are required, one after the other. For instance, in the previous example, a gigabyte of data may likely consist of a million 1KB text objects, or 250,000 4KB objects, each of which may require a write transaction to update a few parts of the database. Any implementation of IDB where writers blocked readers would perform poorly in this case. But all of this is orthogonal to the question of auto-commit. Are there other reasons in favor of auto-committing transactions? I'm not sure that library writers stand to gain from it, and it forces one to use other methods of concurrency control to match the semantics of server-side databases. > IndexedDB allows MVCC in that it allows writers to start while there > are still reading transactions running. Firefox currently isn't > implementing this though since our underlying storage engine doesn't > permit it. > > IndexedDB does however not allow readers to start once a writing > transaction has started. I thought that that was common behavior even > for MVCC databases. Is that not the case? Is it more common that > readers can start whenever and always just see the data that was > committed by the time the reading transaction started? If your database supports MVCC, then by definition there is no reason for writers to block readers.
Re: [IndexedDB] Transaction Auto-Commit
I have been spending time on IDB lately and wanted to give feedback as to the transaction auto-commit interface: I am trying to write a wrapper around IDB to match the interface of my server-side data store, which allows you to: 1. Request a read or write transaction asynchronously. 2. GET, MGET, EXISTS or SET against that transaction asynchronously. 3. COMMIT when done to release and commit the transaction or ABORT to release but not commit the transaction. 4. Have many concurrent read transactions. 5. Have one write transaction at a time (without blocking readers - MVCC). As you can imagine, IDB does not support this, since it forces you to issue requests against an IDB transaction synchronously (from the viewpoint of the rest of the application). In other words, once you have obtained an IDB transaction, it is automatically released when your code returns control so there is no way to do something such as get a value from IDB, do something taking a millisecond or two such as reading from WebSQL and then writing the a value back to IDB, all within the same IDB transaction. You'd have to use multiple IDB transactions which would be fine if the user only had your application open in one tab, but not in multiple tabs. To get around this, I thought one could use optimistic concurrency control to write a nonce to IDB whenever a write transaction is requested from my IDB wrapper, use separate IDB transactions, and when writing, generate a conflict error if the nonce has changed. The problem is it's significantly slower to do each GET, MGET, EXISTS, or SET on a separate IDB transaction. I think it works out to an extra millisecond or two overhead. If you're doing 10 or 20 operations, however small, that's an extra 10-20ms wasted overhead. So then I thought I would request an IDB transaction when a transaction is requested from my wrapper, and then check the active flag when it's needed, and if active is set to false then re-request the transaction. The trouble is that the active flag does not appear to be exposed to JS as far as I can see. Then I tried using a try/catch whenever an object store is requested from an IDB transaction so as to reset the IDB transaction if it's expired. Chrome returns "NOT_ALLOWED_ERR" instead of "...INACTIVE…" as it should. But I also found that the UA sometimes updates the active flag when my code has not returned control so there's a race condition somewhere in there I think, which may make this trick impossible. It works fine if I schedule a delay between operations of 10ms or more. When it gets down to 1ms though, it starts failing every now and then. I tried the same thing using transaction.oncomplete to set my own active flag, but this did not work either. Throughout, IDB in Chrome performs at least an order of magnitude slower than the same code running against an in-house mvcc database on the same machine. Firefox is significantly slower than Chrome. Would anyone know what the LevelDB benchmark would look like if through IDB on Chrome? >> "Note that reads are also blocked if the long-running transaction is a >> READ_WRITE transaction." Is it acceptable for a writer to block readers? What if one tab is downloading a gigabyte of user data (using a workload-configurable Merkle tree scheme), and another tab for the same application needs to show data? On 25 Jul 2011, at 8:38 PM, Jonas Sicking wrote: On Mon, Jul 25, 2011 at 6:28 AM, Joran Greef wrote: > Regarding transactions in the IndexedDB specification (3.1.7 Transaction): > >>> "Once a transaction no longer can become active, and if the transaction >>> hasn't been aborted, the implementation must automatically attempt to >>> commit it. This usually happens after all requests placed against the >>> transaction has been executed and their returned results handled, but no >>> new requests has been placed against the transaction." > > What does "no longer can become active" mean? Well.. generally it's exactly the text you are quoting. "after all requests placed against the transaction has been executed and their returned results handled, but no new requests has been placed against the transaction". If you want the full exact definition, look for all the places that references the "active" flag for transactions. >>> "Authors can still cause transactions to run for a long time, however this >>> is generally not a usage pattern which is recommended and can lead to bad >>> user experience in some implementations." > > How exactly can an author still cause a transaction to span several > asynchronous events? All transactions span all the asynchronously firing events that are fired against the requests placed against the transaction. So as lon
[IndexedDB] Transaction Auto-Commit
Regarding transactions in the IndexedDB specification (3.1.7 Transaction): >> "Once a transaction no longer can become active, and if the transaction >> hasn't been aborted, the implementation must automatically attempt to commit >> it. This usually happens after all requests placed against the transaction >> has been executed and their returned results handled, but no new requests >> has been placed against the transaction." What does "no longer can become active" mean? >> "Authors can still cause transactions to run for a long time, however this >> is generally not a usage pattern which is recommended and can lead to bad >> user experience in some implementations." How exactly can an author still cause a transaction to span several asynchronous events? For example, start a transaction, read a value, use that value to do something asynchronous outside of IDB (perhaps for a millisecond or two or up to a second), and then write the result of that back to the transaction? If it is indeed possible for an author to prolong a transaction, does that mean the UA is implementing a delay to give transactions with asynchronous dependencies the chance to add requests? Surely an explicit commit in this case would be preferable for performance reasons (with a UA timeout protecting against developer forgetfulness)? Then again, if a developer forgot an explicit commit, it would only block writes for his particular application.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 06 Apr 2011, at 7:42 PM, Boris Zbarsky wrote: > On 4/6/11 10:30 AM, Joran Greef wrote: >> If Mozilla enjoys using the latest version of SQLite (and I assume they are >> not planning on replacing internal SQLite embeddings with IndexedDB - not at >> this stage at least), then web developers deserve the latest version. > > This is not obvious a priori, for what it's worth. The point was made with reference to Mozilla expecting web developers to run production client code on IndexedDB, when Mozilla themselves run production code on SQLite. Boris, Jonas and Shaun, we could talk round and round in circles. It seems you're not too concerned by any of the performance and design problems re: indexedDB that I have raised. You ask for "proposals" but it's clear you're not sold on these issues. If you were, I am sure you would be among the first to provide them. Do you have real-world experience developing web-based applications, targeting mobile and desktop, with offline support for storing, indexing, migrating and synchronizing several million objects? Or are we all arguing in the realm of conjecture ("it should be able to") without having encountered any of these issues ourselves, or having any basis for our claims?
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 06 Apr 2011, at 7:24 PM, Tab Atkins Jr. wrote: > When a security bug is encountered, either the browsers update to a > new version of sqlite (if it's already been fixed), thus potentially > breaking sites, or they patch sqlite and then upgrade to the patched > version, thus potentially breaking sites, or they fork sqlite and > patch the error only in their forked version, still potentially > breaking sites but also forking the project. The only thing that is > *not* a valid possibility is the browsers staying on the single fixed > version, thus continuing to expose their users to the security bug. > > ~TJ Browser vendors are moving to shorter and shorter release cycles. People have stopped viewing these things through the "IE6-here-forever" lens. Browsers are starting to update themselves automatically, even nightly. If a security issue were to be found, it would be highly unlikely that its patch would break any SQL interface of SQLite.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 06 Apr 2011, at 7:14 PM, Shawn Wilsher wrote: > On 4/6/2011 9:44 AM, Joran Greef wrote: >> We only need one fixed version of SQLite to be shipped across Chrome, >> Safari, Opera, Firefox and IE. That in itself would represent a tremendous >> goal for IndexedDB to target and to try and achieve. When it actually does, >> and surpasses the fixed version of SQLite, those developers requiring the >> raw performance and reliability of SQLite could then switch over. > I don't believe any browser vendor would be interested in shipping two > different version of SQLite (one for internal use, and one for the web). I > can say, with certainty, that Mozilla is not. > > Cheers, > > Shawn If Mozilla enjoys using the latest version of SQLite (and I assume they are not planning on replacing internal SQLite embeddings with IndexedDB - not at this stage at least), then web developers deserve the latest version. Ship the latest version of SQLite (even with the -moz prefix). Developers targeting "HTML 5" are used to API changes, waiting on browsers and trying to reason about broken implementations. The library writers will quickly grow over any SQLite version changes should they even ever arise. Would you run the Mozilla production database on any browser's implementation of IndexedDB? How can you expect developers to run their production client code on IndexedDB? It's simply not ready and will not be for at least a year or two or three. How likely is it that SQLite (given it's history) will remove the SELECT, INSERT, UPDATE, DELETE statements before then?
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 06 Apr 2011, at 6:49 PM, Shawn Wilsher wrote: > On 4/4/2011 10:18 AM, Joran Greef wrote: >> How would you create an index on an existing object store in IndexedDB >> containing more than 50,000 objects on an iPad, without incurring any object >> deserialization/serialization overhead, without being an order of magnitude >> slower than SQLite, and without bringing the iPad to its knees? If you can >> do it with even one IndexedDB implementation out there then kudos and hats >> off to you. :) > You keep bringing this point up, but only a naive implementation of IndexedDB > would bring a device to it's knees (or a poorly implemented thread scheduler, > which I don't expect the iPad to have). The API is asynchronous, which means > it doesn't need to (nor should it) happen on any thread that the UI is being > drawn on. > > You still have a point about it possibly taking longer, but even then, that > will be implementation dependent. > > Cheers, > > Shawn > I bring up the iPad example because I had experience with a LocalStorage implementation (I think it was Safari) loading the contents of LocalStorage into memory synchronously on first access, blocking the UI thread. I am probably wrong on this one but I think I remember reading on Web Apps that this was one of the motivations behind limiting LocalStorage quota to around 10mb. At the time I was one of those who believed that LocalStorage would support storage of at least 10 GB as a matter of course. I hope you can understand my slight distrust of subsequent storage APIs (other than those of proven track record) in this light. It would still take longer (easily 30-50 seconds per 50,000 objects more than an opaque key-value store built on SQLite) even if the IndexedDB implementation was asynchronous. The developer would also have a tough time reasoning about when index migrations would be finished, since IndexedDB offers no control over the migration process and provides no way to modify index memberships directly. For those that care about these things, IndexedDB does not provide sufficient low-level storage primitives.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 06 Apr 2011, at 6:26 PM, Shawn Wilsher wrote: > On 4/4/2011 8:07 AM, Joran Greef wrote: >> SQLite has a fantastic track record of maintaining backwards compatibility. > Sort of. They didn't between SQLite 2 and SQLite 3. There also have been > some (albeit minor) backwards compatibility issues with SQLite 3.x releases. > The most serious of which deal with performance characteristics changing > because they changed how the optimizer works. > > These type of things are acceptable to deal with in browser code because you > can change your code unlike on the web (unless you want to have different > code for each browser, and then each browser version). It's that, or > browsers can ship one version of SQLite for all eternity. > > Cheers, > > Shawn We only need one fixed version of SQLite to be shipped across Chrome, Safari, Opera, Firefox and IE. That in itself would represent a tremendous goal for IndexedDB to target and to try and achieve. When it actually does, and surpasses the fixed version of SQLite, those developers requiring the raw performance and reliability of SQLite could then switch over. It is too soon to deprecate SQLite in the browser. IndexedDB is only getting started. It is beta and nowhere near the performance and test coverage of SQLite. A fixed version of SQLite across browsers would be helpful at this stage. If Mozilla could lead the way on this it would be fantastic. Perhaps that would satisfy all parties on these issues? It would also give IndexedDB implementors sufficient incentive to optimize their implementations, and developers the safety net of SQLite until such time as they do.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 06 Apr 2011, at 8:56 AM, Jonas Sicking wrote: > >> 1. Treat object values as opaque (necessary to avoid >> deserialization/serialization overhead, this is mandatory for storing >> anything over 50,000 objects on a device like an iPad or iPhone). > > Please explain this in more detail as I have no idea what you mean by > "treat as opaque". Are you saying that we should not allow storing > objects but rather only allow storing strings? If not, surely any type > of object needs to be serialized upon storage. If you are simply > suggesting forbidding storing objects, then this doesn't seem like a > blocker. Simply store a string and we won't serialize anything. > > I'm also interested in what you are basing the claim on overhead on. > Have you profiled a IndexedDB implementation? If so, which? And if > Firefox, did you do so before or after we switched away from using a > JSON serializer? Yes, it must accept a string value and store that directly. The "opaque" terminology comes from some of the BDB papers. I tested both Chrome and Firefox implementations 3 weeks ago. Both were an order of magnitude slower than using SQLite as a key-value store (storing strings as blobs). You can use whatever serializer you like, but it will always be slower than avoiding serialization completely (this is possible by the way, my application does not deserialize objects received from the server before storing them). Even if your serializer takes only 1ms per serialize call, that's 50 seconds for 50,000 objects. For my use-case that is unacceptable, considering that SQLite is available in Chrome and Safari. I will encourage my users to use those browsers and continue developing for SQLite until IndexedDB resolves this issue. How would you support indices (see below) if you say "Simply store a string and we won't serialize anything."? >> 2. Enable indices to be modified at time of putting/deleting objects (index >> references provided by application at time of putObject/deleteObject call). > > I don't believe that this is a blocker. You can simply modify the > object you are storing to add properties and then index of these > properties. What you are suggesting only has the advantage that it > allows storing objects without modifying them. While that can be > important, it isn't a blocker to at least creating a prototype > implementation. How would you index objects passed to putObject as a string (see above)? Plus you have the unnecessary object creation overhead. How fast is it to create 50,000 objects on an iPad? What would that do to the GC and why would you want to do that if you don't need to? I would like to see Mozilla "do as they say": re-implement a SQLite on IndexedDB themselves, that is just as fast and memory efficient as the original, before suggesting that this is possible, that the web therefore be deprived of SQLite. Furthermore, that Mozilla stop using SQLite for all internal use, and rely solely on IndexedDB instead. That is essentially the request that Mozilla are making of web developers today. It's clear that scores of web developers are upset with the decision to deprecate WebSQL. It's not clear that IndexedDB provides anything close in terms of actual raw performance. This surprised me greatly since I assumed IndexedDB would naturally leverage established indexed key-value ideas (for instance to quote BDB - "In Berkeley DB, the key and value in a record are opaque to Berkeley DB") which would give it an edge over SQLite. Pragmatically speaking, would it really be so hard for Mozilla to join Chrome, Safari and Opera and provide an embedding of SQLite along with IndexedDB? If IndexedDB is as good as you suggest it is, then I am sure developers will flock to it, and you won't need to speculate as to whether or not SQLite will take over the web and then break backwards compatibility (despite a stated objective and proven track record of not doing so). And if SQLite did ever break backwards compatibility then developers would have IndexedDB. And if applications relying on SQLite are abandoned by their authors and broken as a result of not upgrading, then arguably those applications should be deprecated and not SQLite.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 06 Apr 2011, at 2:53 AM, Pablo Castro wrote: > The goal of IndexedDB has always been to enable things like RelationalDB and > CouchDB to be built on top, while maintaining a reasonable level of > functionality for those that wanted to use it directly. I really like the > idea of thinking of RelationalDB as something that's built as a library on > top of IndexedDB. Are there specific tweaks we can make to IndexedDB so it > can be a good lower-layer for RelationalDB, such that RelationalDB could be > built as a pure JavaScript library? > > Thanks > -pablo 1. Treat object values as opaque (necessary to avoid deserialization/serialization overhead, this is mandatory for storing anything over 50,000 objects on a device like an iPad or iPhone). 2. Enable indices to be modified at time of putting/deleting objects (index references provided by application at time of putObject/deleteObject call). 3. Provide a simpler, more powerful locking mechanism, opaque to IndexedDB, to provide finer-grained application-specific locking (i.e. have we just entered into a sync process with the master database). If I may say so, it does seem odd that some would advocate the difficulties of speccing merely the interface of something like SQLite, and then advise others to suggest re-implementing it entirely. If there was a specific BTree API in the browser and a powerful asynchronous sLocalStorage mechanism this might be something for the brave, but IndexedDB is a little too tightly coupled to it's own interface agenda at the moment to make this goal possible.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 04 Apr 2011, at 7:28 PM, Mikeal Rogers wrote: > the biggest bottleneck here in the current implementation would be the > transaction overhead on a database this size, which is because of performance > problems in sqlite which is underlying the implementation. sqlite can't fix > this, it's currently the problem. the object serialization is not a huge > performance issue, performance issues in databases are almost always do to IO > or transaction locks. You do not have me convinced. I have tried these things (and was once an avid CouchDB user), and one of the first things I learnt was that object deserialization/serialization incurs a massive performance penalty. Just measure the time it takes to JSON.parse/JSON.stringify 50,000 objects on an iPad and then implement an indexing scheme that avoids this overhead and compare the performance times. > you should most definitely be able build sqlite on top of IDB, there would be > a performance penalty of course, which we can address, but you should be able > to do it. if you can't then we need to extend the specification. Trust me on this Mikeal, you cannot build SQLite on top of IDB, the primitives are simply not there. I have been asking for the specification to be extended (namely with regards to schema-less index operation, set operations on indices, and opaque objects) and one or two of the contributors have expressed interest but Mozilla do not appear to be enthralled. Read up on SQLite if you have not yet had the chance to understand the mammoth collective effort it represents: http://www.sqlite.org (it's a stellar project)
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 04 Apr 2011, at 6:04 PM, Tab Atkins Jr. wrote: > It's new. Do you think it would be wise then to advocate doing away with SQLite before IndexedDB has had a chance to prove itself? Surely two competing APIs would be the fastest way to bring IndexedDB up to speed? > Ironically, the poor performance is because it's using sqlite as a > backing-store in the current implementation. That's being fixed by > replacing sqlite. Yes I am aware of this. There are some design flaws in IndexedDB. For instance, it does not regard objects as opaque (as would a typical key-value store), which means that creating an index on an existing object store would require deserializing/serializing every object therein. Doing that for 50,000 objects on an iPad would be breathtaking. I have written object stores on top of SQLite and they are already an order of magnitude faster than IndexedDB with a more powerful and memory efficient API to boot. > Kinda the point, in that the power/complexity of SQL confuses a huge > number of develoeprs, who end up coding something which doesn't > actually use the relational model in any significant way, but still > pays the cost of it in syntax. I was not referring to SQL but to the underlying primitives exposed through the SQL interface. For example, set operations on indices, or the ability to index objects with array values.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 04 Apr 2011, at 6:10 PM, Mikeal Rogers wrote: > it's not very hard to write the abstraction you're talking about on top of > IndexedDB, and until you do it i'm going to have a hard time taking you > seriously because it's clearly doable. You assume I have not written the abstraction I am talking about on top of IndexedDB? > the constructs in IndexedDB are pretty low level but sufficient if you know > how to implement databases. performance is definitely an issue, but making > these constructs faster would be much easier than trying to tweak an off the > shelf SQL implementation to your use case. How exactly would you make a schema-enforcing interface faster than a stateless interface? How would you implement application-managed indices on top of IndexedDB without being slower than SQLite? How would you implement set operations on indices in IndexedDB without being slower or less memory efficient than SQLite? How would you create an index on an existing object store in IndexedDB containing more than 50,000 objects on an iPad, without incurring any object deserialization/serialization overhead, without being an order of magnitude slower than SQLite, and without bringing the iPad to its knees? If you can do it with even one IndexedDB implementation out there then kudos and hats off to you. :) I understand your point of view. I once thought the same. You would think that IndexedDB would be more than satisfactory for these things. The question is whether IndexedDB provides adequate and performant database primitives, to the same degree as SQLite (and of course SQL is merely an interface to database storage primitives, I do not recalling saying otherwise). You can build IndexedDB on top of SQLite (as some browsers are indeed doing), but you cannot build SQLite on IndexedDB.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 04 Apr 2011, at 5:26 PM, Keean Schupke wrote: > This is ignoring the possibility that something like RelationalDB could be > used, where a well defined common subset of SQL can be used (and I use > well-defined in the formal sense). This would allow a relatively thin wrapper > on top of most SQL implementations and would allow SQLite (or BDB) to be used > as the backend. Yes, if an implementation of RelationalDB arrives which is solid and fast with support for set operations that would be great. The important thing is that we have two competing APIs (and preferably a strong API with a great track record).
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On 04 Apr 2011, at 4:39 PM, Jonas Sicking wrote: > Hence it would still be the case that we would be relying on the > SQLite developers to maintain a stable SQL interpretation... SQLite has a fantastic track record of maintaining backwards compatibility. IndexedDB has as yet no track record, no consistent implementations, no widespread deployment, only measurably poor performance and a lukewarm indexing and querying API. If anything it's the other way round. You have yet to convince developers that IndexedDB will be faster, more stable, more powerful, more memory efficient than SQLite and with better test coverage at that.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
> I am incredibly uncomfortable with the idea of putting the > responsibility of the health of the web in the hands of one project. > In fact, one of the main reasons I started working at Mozilla was to > prevent this. > > / Jonas I agree with you. All the more reason to support both WebSQL and IndexedDB. It is not a case of either/or. It would be healthy to have competing APIs.
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
On Sat, Apr 2, 2011 at 00:42:40, Glenn Maynard wrote: > You can certainly ask if they're interested in doing so, not for "our" > benefit (whoever "our" means), but for the benefit of the Web as a whole, > and there's nothing at all rude in asking. I'd say the opposite: it's rude > to assume they wouldn't be interested, rather than asking and letting them > come to their own decision. (I don't know where the notion of "forcing" > them to do anything came from.) I have been reading up more on the history of SQLite. It is a stellar implementation, just to highlight a few points: 1. "Most of the SQLite source code is devoted purely to testing and verification. An automated test suite runs millions and millions of test cases involving hundreds of millions of individual SQL statements and achieves 100% branch test coverage." 2. "SQLite can also be made to run in minimal stack space (4KiB) and very little heap (100KiB), making SQLite a popular database engine choice on memory constrained gadgets such as cellphones, PDAs, and MP3 players." 3. "Faster than popular client/server database engines for most common operations." 4. "Supports terabyte-sized databases and gigabyte-sized strings and blobs." 5. "The developers continue to expand the capabilities of SQLite and enhance its reliability and performance while maintaining backwards compatibility with the published interface spec, SQL syntax, and database file format." It is easier to build a performant IndexedDB on SQLite than to build a performant SQLite on IndexedDB. Maybe that is something to think about. Developers need working database primitives, more than they need convenience. There may be conjectural reasons for Mozilla not implementing WebSQL, but the track history of SQLite is hard to ignore. Mozilla is already embedding SQLite for other uses, and appears to be a sponsor of the project. SQLite may not be a specification in "our" sense of the word, but in a Web sense of the word, it is so widely deployed already that it would be hard not to call it a standard.
Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque
On 31 Mar 2011, at 10:07 PM, Shawn Wilsher wrote: > On 3/31/2011 11:47 AM, Joran Greef wrote: > Let those who introduced these design flaws be among the first to take > responsibility and fix them. > You aren't being constructive, and that's a surefire way to be ignored. You > have yet to convince the working group that these are "design flaws" in the > first place. > > /sdwilsh Agreed. I am actively using the API with real-world data and I am providing feedback. You are welcome to use it or not. It is not for me to convince anyone. As I said, if people think there is a problem, let those who introduced it fix it. Joran Greef
Re: Mail List Etiquette [Was: WebSQL] Any future plans, or has IndexedDB replaced WebSQL?]
Thank you Art. To clarify, I have heard from a contributor to the specification in question who referred to LocalStorage himself as "little more than a toy", expressing his frustrations at the specification. It is well known that most LocalStorage implementations do not support more than 10mb, some load the entire contents into memory synchronously on first access, and there were some issues around locking that were not addressed as far as I recall. LocalStorage does not work as advertised. Many developers, including myself, got excited, spent hours with it, only to see these issues left unresolved. It would be true to say that most LocalStorage implementations are "crippled" in this sense. No one need be offended since specification and implementation are two separate things. I do wish however, that the specification would have addressed large quota support, and encouraged certain implementation practices, and in this sense I feel that not enough was done. The same with WebSQL. And recently I learned that IDB prevents applications from managing indices? These things are disappointing to us developers. I think we have a right to be critical on these issues where criticism is due. If the specification is inadequate, or burdened by politics, we should be free to say so (respectfully and professionally of course, but also honestly and directly and with the right measure of urgnency), without fear of offending anyone or being policed for it. Joran Greef On 31 Mar 2011, at 9:37 PM, Arthur Barstow wrote: >> This is painful to read. WebSQL development died because SQLite, the most >> widely-deployed database software in the world, was too good? That sounds >> like a catastrophic failure of the W3C process. >> >> -- >> Glenn Maynard > Hear. > > I am starting to think that Mozilla will step up and provide an embedding of > SQLite, even if it has to only think of it as such. It will have to. > > People would rather use a working database than something crippled albeit > "specced" (see LocalStorage or IndexedDB). > > It was things like XHR in all their unspecced glory that brought the web to > where it is today. Joran - as one of the moderators of public-webapps, I find your comments above offensive to those that work on the specs you mention. All - this is a reminder that all e-mails on this list are expected to be respectful and professional. Please see the following for more information about the etiquette and usage of this list: http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/1216.html -Regards, Art Barstow
Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?
> This is painful to read. WebSQL development died because SQLite, the most > widely-deployed database software in the world, was too good? That sounds > like a catastrophic failure of the W3C process. > > -- > Glenn Maynard Hear. I am starting to think that Mozilla will step up and provide an embedding of SQLite, even if it has to only think of it as such. It will have to. People would rather use a working database than something crippled albeit "specced" (see LocalStorage or IndexedDB). It was things like XHR in all their unspecced glory that brought the web to where it is today.
Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque
On 31 Mar 2011, at 7:41 PM, Jonas Sicking wrote: > So pretty please, with sugar on top, please come up with a proposal > for the full API rather than bits and pieces. Let those who introduced these design flaws be among the first to take responsibility and fix them.
Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque
On 31 Mar 2011, at 7:27 PM, Jeremy Orlow wrote: >> 1. Provide the application with a first-class means to manage indexes at >> time of putting/deleting objects. > > I'm OK with doing this for v1 if the others are. It doesn't seem like that > big of an addition and it would give a decent amount of additional > flexibility. Thanks Jeremy that would be great. >> (reduces serialization/deserialization overhead where application already >> has the object as a string) > > I'm not sure why you think this would reduce overhead. How long would it take an iPad to JSON deserialize/serialize 500 / 5,000 / 50,000 / 500,000 / 5,000,000 2KB objects? That's a reasonable device and those are reasonable workloads. In it's present state, IndexedDB needs to do this every time setVersion is called with a createIndex in there... you see the problem is there's no way for the application to control this. The application would arguably be able to find better ways of migrating indexes than using key paths which necessitate deserialization/serialization to be performed on the client. For instance, you could use batch jobs on the server to do this on behalf of clients, and this would make sense especially where many clients/devices share the same objects. With IndexedDB this is not possible. With pure storage primitives it would have been possible. This is just one use-case, and for every one of these there will be plenty more. > Like I said above, although I think we should make it possible to operate > more statelessly, I don't see a reason we need to remove stuff like this. > Some users will find it more convenient to work this way. Agreed on both counts. It is clearly too late to remove it now. But it may be a good idea in future to keep the focus on providing low-level primitives rather than convenience features, since the latter often get in the way of the former.
Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque
On 31 Mar 2011, at 12:52 PM, Keean Schupke wrote: > I totally agree with everything so far... > >> 3. This requires an adjustment to the putObject and deleteObject interfaces >> (see previous threads). > > I disagree that a simple API change is the answer. The problem is > architectural, not just a superficial API issue. Yes, for IndexedDB to be stateless with respect to application schema, one would need to: 1. Provide the application with a first-class means to manage indexes at time of putting/deleting objects. 2. Treat objects as opaque (remove key path, structured clone mechanisms, application must provide an id and JSON value to put/delete calls, reduces serialization/deserialization overhead where application already has the object as a string). 3. Remove setVersion (redundant, application migrates objects and indexes using transactions as it needs to). 4. Remove createIndex. This would rip so much from the spec as to reduce it to a bunch of tatters, defining nothing more than an interface for index/key/value primitives in terms of well-established interfaces. Essentially, we need LocalStorage with asynchronous IO (based on Node's callback style), large quota support, and a BTree API. Failing that, a decent FileSystem API on which to build these.
Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque
On 31 Mar 2011, at 9:34 AM, Jeremy Orlow wrote: > We have made an effort to understand other "contributions to the field". > > I'm not convinced that these are "essential database concepts" and having > personally spent quite some time working with the API in JS and implementing > it, I feel pretty confident that what we have for v1 is pretty solid. There > are definitely some things I wouldn't mind re-visiting or looking at closer, > possibly even for v1, but they all seem reasonable to study further for v2 as > well. > > We've spent a lot of time over the last year and a half talking about > IndexedDB. But now it's shipping in Firefox 4 and soon Chrome 11. So > realistically v1 is not going to change much unless we are convinced that > what's there is fundamentally broken. > > We intentionally limited the scope of v1, which is why we know there'll be a > v2. We can't solve all the problems at once, and the difficulty of speccing > something is typically exponential to the size of the API. > > Maybe a constructive way to discuss this would be to look at what use cases > will be difficult or impossible to achieve with the current design? Application-managed indices for starters. I would consider that to be essential when designing indexed key/value stores, and I would consider that to be the contribution made by almost every other indexed key/value store to date. If we have to use IDB the way FriendFeed used MySQL to achieve application-managed indices then I would argue that the API is in fact "fundamentally broken" and we would be better off with an embedding of SQLite by Mozilla. Regarding "the difficulty of speccing something is typically exponential to the size of the API", if people want to build a Rube Goldberg device then they must deal with the spec issues of that. If we were provided with the primitives for an indexed key/value store with application-managed indices (as Nikunj suggested at the time), we would have been well out of the starting blocks by now, and issues such as "computed indexes", "indexing array values" etc. would have been non-issues. Summary: 1. There's a problem. 2. It can still be fixed with a minimum of fuss. 3. This requires an adjustment to the putObject and deleteObject interfaces (see previous threads).
Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque
On 31 Mar 2011, at 9:53 AM, Jonas Sicking wrote: > I previously have asked for a detailed proposal, but so far you have > not supplied one but instead keep referring to other unnamed database > APIs. I have already provided an adequate interface proposal for putObject and deleteObject. I have already referenced at least Redis and Tokyo Cabinet as examples of "stateless" database interfaces, on numerous occasions. > For example, you've asked for callbacks to > implement collations, but what do we do if those callbacks don't > return consistent results? I have not once asked for callbacks, let alone callbacks to implement collations. You have jumped to this conclusion from my previous post, and missed the point of it entirely.
Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque
On 31 Mar 2011, at 1:01 AM, Jonas Sicking wrote: > Anyhow, I do think that the idea of passing in index values at the > same time as a entry is created/modified is an interesting idea. And I > have said so in the past on this list. It's definitely something we > should consider for v2. > Oh, and if we did this, I wouldn't really know how to support things > like collations. Neither if you did collations using built in sets of > locales (like in Pablo's recent proposal), nor if you used some sort > of callback to do collation. > > / Jonas That's fine. You don't need to figure it out. Just look at how stateless databases have done it (or not done it) and do likewise. I submit to you that there is inadequate understanding of the concerns raised, hence the lack of urgency in trying to address them. That there is even a need for a "V2" is symptomatic of this. It may be a good idea to start looking at these things not as "interesting ideas" but as essential database concepts. If someone were trying to build some kind of transactional indexed key value store for the web, and they wanted to do a truly great job of it, they would certainly want to learn everything they could from databases that have made contributions to the field.
Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque
> On 26 Mar 2011, at 10:14 AM, Nikunj Mehta wrote: > > What is the minimum that can be in IDB? I am guessing the following: > > 1. Sorted key-opaque value transactional store > 2. Lookup of keys by values (or parts thereof) Yes, this is what we need. In programmer speak: objects (opaque strings), sets (hash indexes), sorted sets (range indexes). > I know of no efficient way of doing callbacks with JS. Moreover, avoiding > indices completely seems to miss the point. Callbacks are unnecessary. This is what you would want to do as a developer using the current form of IDB: objectStore.putObject({ name: "Joran", emails: ["jo...@gmail.com", "jo...@ronomon.com"] }, { id: 'arbitraryObjectIdProvidedByTheApplication', indexes: ["emails=jo...@gmail.com", "emails=jo...@ronomon.com", "name=Joran"] }); IDB would then store the user object using the id provided by the application, and make sure it's referenced by this id in the "emails=jo...@gmail.com", "emails=jo...@ronomon.com", "name=Joran" index references provided (creating these indexes along the way if need be). The application is responsible for passing in the extra "id" and "indexes" options to putObject. Supporting range indexes would be a question of expanding the above to let the developer pass in a sort score along with the index reference. > Next, originally, I also had floated the idea of application managed indices, > but implementors thought of it as cruft. I can understand how application managed indices would lead to less work on the part of the spec committee. There seems to be some perverse human characteristic that likes to make easy things difficult. Ships will sail around the world but the Flat Earth Society will flourish. > I, for one, am not enamored by key paths. However, I am also morbidly aware > of the perils in JS land when using callback like mechanisms. Certainly, I > would like to hear from developers like you how you find IDB if you were to > not use any createIndex at all. Or at least that you would like to manage > your own indices. I am begging to be able to manage my indices. I know my data. I do not want to use any createIndex to declare indexes in advance of when I may or may not use them. What advantage would that give me? I want to create/update indexes only when I put or delete objects and I want to have control over which indexes to update accordingly. With one small change to the putObject and deleteObject interfaces, in the form of the "indexes" option, we can make that possible. We need these primitives in IDB: opaque strings, sets, sorted sets. Ideally, IDB need simply store these things and provide the standard interfaces (see Redis) to them along with a transactional mechanism. That's the perfect low-level API on which to build almost any database wrapper.
[IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque
> On 20 Mar 2011, at 4:54 AM, Jonas Sicking wrote: > > I don't understand what you are saying about application state though, > so please do start that as a separate thread. At present, there's no way for an application to tell IDB what indexes to modify w.r.t. an object at the exact moment when putting or deleting that object. That's because this behavior is defined in advance using "createIndex" in a "setVersion" transaction. And then how IDB extracts the referenced value from the object is done using an IDB idea of "key paths". But right there, in defining the indexes in advance (and not when the index is actually modified, which is when the object itself is modified), you've captured application state (data relationships that should be known only to the application) within IDB. Because this is done in advance (because IDB seems to have inherited this assumption that this is just the way MySQL happens to do it), there's a disconnect between when the index is defined and when it's actually used. And because of "key paths" you now need to spec out all kinds of things like how to handle compound keys, multiple values. It's becoming a bit of a spec-fest. That this bubble of state gets captured in IDB, it also means that IDB now needs to provide ways of updating that captured state within IDB when it changes in the application (which will happen, so essentially you now have your indexing logic stuck in the database AND in the application and the application developer now has to try and keep BOTH in sync using this awkward pre-defined indexes interface), thus the need for a setVersion transaction in the first place. None of this would be necessary if the application could reference indexes to be modified (and created if they don't exist, or deleted if they would then become empty) AT THE POINT of putting or deleting an object. Things like data migrations would also be better served if this were possible since this is something the application would need to manage anyway. Do you follow? The application is the right place to be handling indexing logic. IDB just needs to provide an interface to the indexing implementation, but not handle extracting values from objects or deciding which indexes to modify. That's the domain of the application. It's a question of encapsulation. IDB is crossing the boundaries by demanding to know ABOUT the data stored, and not just providing a simple way to put an object, and a simple way to put a reference to an object to an index, and a simple way to query an index and intersect or union an index with another. Essentially an object and its index memberships need to be completely opaque to IDB and you are doing the opposite. Take a look at the BDB interface. Do you see a setVersion or createIndex semantic in there? Take a look at Redis and Tokyo and many other things. Do you see a setVersion or createIndex semantic in there? Do these databases have any idea about the contents of objects? Any concept of key paths? No, and that's the whole reason these databases were created in the first place. I'm sure you have read the BDB papers. Obviously this is not the approach of MySQL. But if IDB is trying to be MySQL but saying it wants to be BDB then I don't know. In any event, Firefox would be brave to also embed SQLite. Let the better API win. How much simpler could it be? At the end of the day, it's all objects and sets and sorted sets, and see Redis' epiphany on this point. IDB just needs to provide transactional access to these sets. The application must decide what goes in and out of these sets, and must be able to do it when it wants to, not some time in advance. I bring this up because I once wrote the exact same kind of database that you are writing now (where one thinks it would be good if the database did NOT treat objects as opaque... that the database should be smart about the contents of objects and share control for how objects relate to each other etc.) and I have since seen how much better, simpler, faster the alternative is. So unless you have formidable reasons for maintaining the status quo in light of the above, even if you don't understand this concept of application state getting stuck in IDB, and even though you advocate that WebSQL is not deprecated and that we can consider LocalStorage to be an alternative, then it is my hope that you will heed this and make something of it. I'm sorry if this is not the kind of feedback you want to hear at this stage, but IDB needs to be good for more than just HTML 5 todo list demos.
Re: [IndexedDB] Compound and multiple keys
> On 16 Mar 2011, at 7:59 PM, Jonas Sicking wrote: > > The best way to do this is likely to start a new thread (as the changes you > are > suggesting isn't limited to "Compound and multiple keys"), and put a > draft proposal there. > > It by no means has to be perfect (it took us a long time to polish IDB > into what it is today), but it needs to be more detailed than what you > are saying above. > More thoughts: Firstly, my proposal for handling compound and multiple keys has already been put forward in a previous thread (i.e. adding the option to specify indexes to be modified when putting/deleting objects) so I see no need to create yet another thread. Secondly, in terms of IDB storing parts of application state, it is clear that this is a problem that needs to be addressed. I think you have said as much yourself? If so, then those drafting the IDB specification must take responsibility for fixing this, since it is an issue they created in the first place. Unless, of course they do not really believe it to be an issue, in which case it would be a filibuster to ask for a "draft proposal".
Re: [IndexedDB] Compound and multiple keys
> On 16 Mar 2011, at 7:59 PM, Jonas Sicking wrote: > > It seems like you are suggesting pretty big changes. The best way to > do this is likely to start a new thread (as the changes you are > suggesting isn't limited to "Compound and multiple keys"), and put a > draft proposal there. Not necessarily. Adding the option to specify indexes to be modified when putting or deleting an object would go a long way already, solving the problem of compound and multiple keys in the process. The next step after that, supporting compose-able set operations on indexes, would take some work, in terms of figuring out the best interface for doing it, hopefully keeping it fairly tightly coupled to the standard set operations themselves. > It by no means has to be perfect (it took us a long time to polish IDB > into what it is today), but it needs to be more detailed than what you > are saying above. Will do. The proposed changes have the potential to reduce the spec and implementation of IDB. The problem of IDB being exposed to a dose of application state certainly needs to be addressed. > Also, I should mention that time is running out on major changes. We > already have two database APIs, WebSQL and IDB, (three if you count > localStorage), so there both needs to be significant advantages over > the already existing APIs, and you would make yourself a favor by > acting fast as the other specifications are gaining momentum literally > by the day. > > / Jonas Do you really consider LocalStorage to be a database and what do you mean by database then? And how can you say that we "have" a database API in WebSQL if it is currently deprecated? Are there plans afoot to embed SQLite in Firefox? That would be a great idea by the way. As far as I am aware, LocalStorage cannot be used as a database. I have tried. Most browsers do not permit more than 10mb and do not provide a means for the user to adjust storage quota. Browsers provide no locking mechanism (although you could simulate a lock service on top of LocalStorage if you could tolerate the latency) and some implementations (Safari as far as I can recall) load the entire contents of LocalStorage into memory on first access, blocking the UI. As you know, WebSQL is deprecated and only available in WebKit and Opera. Chrome as far as I am aware provides no mechanism to adjust WebSQL quota limits. So that means we actually only have one potential cross-browser database API (and not three as you have stated), and that is IDB. It may be a good idea to slow down and get it right.
Re: [IndexedDB] Compound and multiple keys
> On 3/9/2011 09:45:51 Shawn Wilsher wrote: > > That makes sense since the original proposal was heavily based on BDB. > It's shifted a bit as we have made tweaks to improve it for the web. > > Cheers > > Shawn I agree. If I may add my two cents worth: one thing that IDB has not yet learned from BDB is statelessness. At the moment IDB requires a bit of application state to be mixed up in IDB (i.e. by predefining indexes as opposed to allowing the application to specify indexes to be modified when putting or deleting objects). So it's not a pure data+indexes store, it's actually a data+indexes+application state store. This is making IDB more complex than it needs to be and is making the IDB interface less powerful (things like compound keys etc. would already be possible if IDB were stateless). For instance, if IDB is to store application state, then the spec needs to define what happens when the application state changes. If IDB were stateless, this would not be necessary. After the web having had no options for offline storage for so many years, it is probably safe to say that web applications do not need help with things like migrations, pre-defined schemas or anything fancy or "helpful" like that, they just need a pure data+indexes solution (but they need this to be comprehensive: at least set operations supported on indexes, and indexes defined by the application when putting or deleting objects and NOT before). In my honest opinion, IDB is not yet there and from the discussions does not seem to be headed in that direction. It's trying to make unnecessary things easy when it really needs to be just a powerful low-level data store with first-class indexing. I'm not sure how many users of IDB are actively involved in this discussion, but after spending hours on it over the past few months, and having built databases over LocalStorage and WebSQL, as a real-world user, may I ask that these concerns begin to be addressed?
Re: [IndexedDB] Two Real World Use-Cases
On 08 Mar 2011, at 7:23 AM, Dean Landolt wrote: > This doesn't seem right. Assuming your WebSQL implementation had all the same > indexes isn't it doing pretty much the same things as using separate > objectStores in IDB? Why would it be an order of magnitude slower? I'm sure > whatever implementation you're using hasn't seen much optimization but you > seem to be implying there's something more fundamental? The only thing I can > think of to blame would be the fat in the objectStore interface -- like, for > instance, the index building facilities. It seems to me your proposed > solution is to add yet more fat to the interface (more complex indexing), but > wouldn't it be just as suitable to instead strip down objectStores to their > bare essentials to make them more suitable to act as indexes? Then the > indexing functionality and all the hard decisions could be punted to > libraries where they'd be free to innovate. Exactly. It's not what one would expect, and indication of the poor state of the IDB implementation (which is essentially a wrapper around SQLite anyway). If someone is advising that object stores be used to handle indexes then may I be the first to raise a red flag and say that IDB is failing us (and it would have been better for the spec team to provide a locking mechanism for LocalStorage so it could be used in that way). The whole point of IDB as far as I can see is to provide transactional indexed access to a key value store. > Why? You wouldn't necessarily have to store the whole object in each index, > just the index key, a value and some pointer to the original source object. > Something to resolve this pointer to the source would need to be spec'd (a la > couchdb's include_docs), but that's simple. Even better, say it were possible > to define a link relation on an object store that can resolve to its source > object -- you could define a source link relation and the property to use -- > and this would have the added bonus of being more broadly applicable than > just linking an index record to its source instance. Think of the object creation and JSON serialization/deserialization overhead for putting 50 indexes and you have got more than enough waste there already. > We can fix all of this right now very simply: > > 1. Enable objectStore.put and objectStore.delete to accept a setIndexes > option and an unsetIndexes option. The value passed for either option would > be an array (string list) of index references. > > This would only work for indexes arrays of strings, right? Things can get > much more complicated than that, and when they do you'd have to use an > objectStore to do your indexing anyway, right? No it would work for pretty much anything. The application would be free to determine the indexes, and also to convert query parameters into indexes when querying. It's essentially "computed indexes" without the hassles of IDB trying to do it (there was an interesting thread last year on the challenges of storing am index computing function in IDB). > Why is it more theoretically performant than using objectStores in the raw? It's a more direct interface. Think about it for a second. Using objectStores in the raw is interpolating O(n) complexity with multiple function calls, to give just one reason. If IDB can receive a list of indexes to add and remove an object to and from, then it can also do things like perform a set difference first to save unnecessary IO. I have written a database or two with this technique and it's certainly faster. > I don't necessarily understand the stateful vs. stateless distinction here. I > don't see how your proposed solution removes the requirement for IDB to > enforce constraints when certain indexes are present. Developers would > already be able to use IDB statefully (with predefined schemas) -- they'd > just use a library that has a schema mechanism. I doubt such a library for > IDB already exists, but it'd be quite easy to port perstore, for instance, > which is derived from the IDB API and already has this functionality using > json-schema. There will no doubt be many ORM-like libraries that will pop up > as soon as IDB starts to stabilize (or as soon as it gets a node.js > implementation). The trouble is you always think a database would "be quite easy" until you actually try to do it yourself. At first when I dug into IDB I didn't think there would be any problems that could not be handled in some way. I have actually switched back to WebSQL now and will encourage my users to use Safari or Chrome as long as these browsers support WebSQL (and I hope Chrome will at least finish up by adding a quota interface for WebSQL). IDB right now is like a completely neutered slower SQLite without any of the benefits to be expected of a transactional indexed KV store. It's really sad. For examples of stateless databases see the interfaces for Redis (the best example, and a perfect target
Re: [IndexedDB] Two Real World Use-Cases
> On 05 Mar 2011, at 3:50 AM, Jonas Sicking wrote: > > What we do need to do sooner rather than later though is allowing > multiple index values for a given entry using arrays. We also need to > add support for compound keys. But lets deal with those issues in a > separate thread. Multiple index values for a given entry using arrays, as well as compound keys, can be handled by letting the application provide an array of index references when putting or deleting objects. There is no need to make a Rube Goldberg device out of it. Regards Joran Greef
Re: [IndexedDB] Two Real World Use-Cases
Hi Jonas I have been trying out your suggestion of using a separate object store to do manual indexing (and so support compound indexes or index object properties with arrays as values). There are some problems with this approach: 1. It's far too slow. To put an object and insert 50 index records (typical when updating an inverted index) this way takes 100ms using IDB versus 10ms using WebSQL (with a separate indexes table and compound primary key on index name and object key). For instance, my application has a real requirement to replicate 4,000,000 emails between client and server and I would not be prepared to accept latencies of 100ms to store each object. That's more than the network latency. 2. It's a waste of space. Using a separate object store to do manual indexing may work in theory but it does not work in practice. I do not think it can even be remotely suggested as a panacea, however temporary it may be. We can fix all of this right now very simply: 1. Enable objectStore.put and objectStore.delete to accept a setIndexes option and an unsetIndexes option. The value passed for either option would be an array (string list) of index references. 2. The object would first be removed as a member from any indexes referenced by the unsetIndexes option. Any referenced indexes which would be empty thereafter would be removed. 3. The object would then be added as a member to any indexes referenced by the setIndexes option. Any referenced indexes which do not yet exist would be created. This would provide the much-needed indexing capabilities presently lacking in IDB without sacrificing performance. It would also enable developers to use IDB statefully (MySQL-like pre-defined schemas with the DB taking on the complexities of schema migration and data migration) or statelessly (See Berkeley DB with the application responsible for the complexities of data maintenance) rather than enforcing an assumption at such an early stage. Regards Joran Greef
Re: [IndexedDB] Two Real World Use-Cases
On 02 Mar 2011, at 1:31 PM, Jonas Sicking wrote: > I agree that we are currently enforcing a bit of schema due to the way > indexes work. However I think it's a good approach for an initial > version of this API as it covers the most simple use cases. Note that > the more complex use cases are still very possible by simply using a > separate objectStore as an index and manually add/remove things there. > > I still believe that using a function, which is persisted in the > database, is very doable. And yes, the function needs to be stateless > and it needs to be possible to change the set of functions which > manage the set of indexes associated with a given objectStore > (probably by simply allowing indexes to be created and removed, which > is already the case). > > / Jonas Thank you Jonas, I'm using your multi objectStore trick at the moment to store indexes. It just seems that the most direct way of doing all of this, would just be to let the application pass in the relevant index references when it makes put or delete calls. IDB is almost becoming a Rube Goldberg device trying to find other ways of doing this. The reason I bring it up, is because I just made this same change with my server database, which used to require schema knowledge, so it could compute indexes etc., and then I realized this could all be eliminated completely by just passing indexes per put and delete call. I really don't think IDB should try and dip it's toes into application state in the first place, let alone try and keep up with application state thereafter. What is the motivation for doing that? It's not absolutely necessary. It's an assumption that is bloating almost every part of the spec. It's not the killer feature of IDB, and it's getting in the way of things that could be, such as indexing and querying. If version 1 is done right, there will be no need for version 2. There's been a tremendous amount of discussion regarding IDB and people like yourself and Jeremy have certainly contributed massively, but I do get the feeling (as may you) that version 2 is becoming a stopover for things that have not been thought through completely, for which a solution is not yet clear, something's not right. I only say this from recently re-writing a database after making the same mistake.
Re: [IndexedDB] Two Real World Use-Cases
On 01 Mar 2011, at 7:27 PM, Jeremy Orlow wrote: > 1. Be able to put an object and pass an array of index names which must > reference the object. This may remove the need for a complicated indexing > spec (perhaps the reason why this issue has been pushed into the future) and > give developers all the flexibility they need. > > You're talking about having multiple entries in a single index that point > towards the same primary key? If so, then I strongly agree, and I think > others agree as well. It's mostly a question of syntax. A while ago we > brainstormed a couple possibilities. I'll try to send out a proposal this > week. I think this + compound keys should probably be our last v1 features > though. (Though they almost certainly won't make Chrome 11 or Firefox 4, > unfortunately, hopefully they'll be done in the next version of each, and > hopefully that release with be fairly soon after for both.) Yes, for example this user object { name: "Joran Greef", emails: ["jo...@ronomon.com", "jorangr...@gmail.com"] } with indexes on the "emails" property, would be found in the "jo...@ronomon.com" index as well as in the "jorangr...@gmail.com" index. What I've been thinking though is that the problem even with formally specifying indexes in advance of object put calls, is that this pushes too much application model logic into the database layer, making the database enforce a schema (at least in terms of indexes). Of course IDB facilitates migrations in the form of setVersion, but most schema migrations are also coupled with changes to the data itself, and this would still have to be done by the application in any event. So at the moment IDB takes too much responsibility on behalf of the application (computing indexes, pre-defined indexes, pseudo migrations) and not enough responsibility for pure database operations (index intersections and index unions). I would argue that things like migrations and schema's are best handled by the application, even if this is more work for the application, as most people will write wrappers for IDB in any event and IDB is supposed to be a core-level API. The acid-test must be that the database is oblivious to schemas or anything pre-defined or application-specific (i.e. stateless). Otherwise IDB risks being a database for newbies who wouldn't use it, and a database that others would treat as a KV anyway (see MySQL at FriendFeed). A suggested interface then for putting or deleting objects, would be: objectStore.put(object, ["indexname1", "indexname2", "indexname3"]) and then IDB would need to ensure that the object would be referenced by the given index names. When removing the object, the application would need to provide the indexes again (or IDB could keep track of the indexes associated with an object). Using a function to compute indexes would not work as this would entrap application-specific schema knowledge within the function (which would need to be persisted) and these may subsequently change in the application, which would then need a way to modify the function again. The key is that these things must be stateless. The objects must be opaque to IDB (no need for serialization/deserialization overhead at the DB layer). Things like key-paths etc. could be removed and the object id just passed in to put or delete calls. > 2. Be able to intersect and union indexes. This covers a tremendous amount of > ground in terms of authorization and filtering. > > Our plan was to punt some sort of join language to v2. Could you give a more > concrete proposal for what we'd add? It'd make it easier to see if it's > something realistic for v1 or not. If you can perform intersect or union operations (and combinations of these) on indexes (which are essentially sets or sorted sets), then this would be the join language. It has the benefit that the interface would then be described in terms of operations on data structures (set operations on sets) rather than a custom language which would take longer to spec out. I've written databases over append-only files, S3, WebSQL and even LocalStorage (!) and from what I've found with my own applications, you could handle everything from multi-tenant authorization to adequate filtering with the following operations: 1. intersect([ index1, index2 ]) 2. union([ index1, index2 ]) 3. intersect([ union([ index1, index2 ]), index3, index4, index5, index6, index7 ]) Hopefully, a join language described in terms of pure set operations would be much simpler to implement and easier to use and reason with. In fact I think if IDB offered only a single object store and an indexing system described above, it would be completely perfect. That's all that's needed. No need for a V2. Just a focus on high-performance thereafter.
[IndexedDB] Two Real World Use-Cases
I have been following the development behind IndexedDB with interest. Thank you all for your efforts. I understand that the initial version of IndexedDB will not support indexing array values. May I suggest an alternative derived from my home-brew server database evolved from experience using MySql, WebSql, LocalStorage, CouchDb, Tokyo Cabinet and Redis? 1. Be able to put an object and pass an array of index names which must reference the object. This may remove the need for a complicated indexing spec (perhaps the reason why this issue has been pushed into the future) and give developers all the flexibility they need. 2. Be able to intersect and union indexes. This covers a tremendous amount of ground in terms of authorization and filtering. These two needs are critical. Without them, I will either carry on using WebSql for as long as possible, or be forced to use IndexedDb as a simple key value store and layer my own indexing on top. I am writing an email application and have to deal with secondary indexes of up to 4,000,000 keys. It would not be ideal to do intersects and unions on these indexes in the application layer. Regards Joran Greef
FileSystem API: Avoiding Upload Forms And Temporary Downloads
I have some questions regarding the FileSystem API: 1. It would be great to be able to let the user choose where they want their sandboxed directory located for the web app, i.e. on the desktop for quick access. That way they can drag files directly to the directory, which could be used as a dropbox for synching to a server. Would this be possible (or at least a mechanism to link to the directory wherever the browser may choose to place it)? Otherwise, apps like Dropbox would not be possible in a browser. 2. It seems that dragging a file out of a web app is currently copy-on-write? So you drag a file out the web app into Excel but subsequent changes in Excel would be lost to the web app (it seems like it's already possible for the web app to poll the sandboxed directory for file changes)? If so, it means that the FileSystem API would force the following work flow: the user saves a temp file somewhere (probably on the desktop) then re-uploads using a web form, and then minimizes the browser and deletes the temp file, and then maximizes the browser again? That would be a bad case of Fitts' Law and quickly become a show-stopper if the user needs to frequently edit files using native applications. 3. It must be possible to link to files within the sandboxed directory and have them open in the default native application. I can understand that .exe's need to be neutered, but content files such as .doc and .xls must have a method for opening in the default application. Would this be possible? Otherwise the only solution would be to trigger the Download window, creating a temp file in the Downloads folder for a file that already exists on the filesystem? 4. In my mind, the FileSystem Api has a shot at improving user experience by helping to avoid file upload forms, and temp file downloads. I'm not sure these goals are possible with the current spec? These use-cases may prove to be vital building blocks for the next wave of networked applications and it would be great to see them in the new FileSystem API. Regards Joran Greef
Web Storage Mutex
"The use of the storage mutex to avoid race conditions is currently considered by certain implementors to be too high a performance burden, to the point where allowing data corruption is considered preferable. Alternatives that do not require a user-agent-wide per-origin script lock are eagerly sought after." It's not a question of mutex versus data corruption, but of implementation: Database storage is served by SQLite. LocalStorage would be better served by Tokyo Cabinet: http://1978th.net/tokyocabinet/. I doubt the current localStorage implementation is better than the current Tokyo Cabinet implementation. Joran Greef