Re: [IndexedDB] request feedback on IDBKeyRange.inList([]) enhancement
I'm person On Fri, May 17, 2013 at 2:37 PM, Ben Kelly bke...@mozilla.com wrote: Hello all, Recently I've been working on a mobile application that makes heavy use of IndexedDB. In particular, there are times when this app must query a potentially large, non-consecutive list of keys. Currently (to my knowledge) the IndexedDB API requires that this be done via separate get() calls. Due to some performance issues I investigated enhancing the IndexedDB API to allow the list of keys to be queried in a single request. The resulting changes seem to show significant performance improvement on the mozilla mobile platform. I would like to get your feedback and input on this API change. The enhancement essentially adds an inList() function to IDBKeyRange. Similar to the other factory methods on IDBKeyRange, this returns an object which can be used to query a matching set of keys. The inList() function takes an array of keys to match against. In practice it would look like the following: var keyRange = IDBKeyRange.inList(['key-1', 'key-2', 'key-3']); var request = index.openCursor(keyRange); Duplicate keys in the list are ignored. The order of the results would be controlled by the normal cursor ordering mechanisms. I've written a rough proof-of-concept for the mozilla platform here: https://bugzilla.mozilla.org/show_bug.cgi?id=872741 I realize there has been some discussion of this topic in the past. In particular, Ben Turner referred me to: https://www.w3.org/Bugs/Public/show_bug.cgi?id=16595 https://docs.google.com/a/mozilla.com/document/d/1vvC5tFZCZ9T8Cwd2DteUvw5WlU4YJa2NajdkHn6fu-I/edit From these links it sounds like there has been a lack of interest, but no strong objection. Since there appears to be some legitimate benefit from the API enhancement I thought I would send it out to the list for feedback. I have to admit I'm new to the standardization process, though. I apologize for the noise if this is essentially a non-starter. I have mixed feelings on this: What you're really talking about is an API that groups the get's and the get callbacks into a single call / callback - i.e. a simplification of the flow of control through the async APIs - the fact that it may or may not be a performance optimization is really an implementation detail. Its the implementation's job to figure out if there are multiple parallel get()'s in a turn that can be optimized. I think this is really the job of a polyfill or a library (like Kyaw's ydn-db), not an implementation. When you get down to the control-flow issue, you're really looking a Futures.every() (I know, roll your eyes now and get it over with!) and layers that could be built on top of that: Futures.every(objectStore.get(...), objectStore.get(...), objectStore.get(...)).then(function(result1, result2, result3) { // now you have result1, result2, result3, etc. }) But since we don't have futures in IDB yet, we have to decide if its worth adding this specialized API to IDB, or letting users suffer while they wait for futures. If it is decided that this should go into IDB, I don't think IDBKeyRange should be used for discontinuous/multiple ranges, (because it totally invalidates lower/upper/etc) so if we had to do this I'd personally go with IDBKeySet/etc. I think this is too tricky to add directly to get() because it changes the semantics of the call (i.e. if you say get([1,2,3]) how do you distinguish between the 3 keys 1, 2, and 3, and the *key* [1,2,3] Alec Any feedback is greatly appreciated. Thank you! Ben Kelly
Re: Collecting real world use cases (Was: Fixing appcache: a proposal to get us started)
I think there are some good use cases for not-quite-offline as well. Sort of a combination of your twitter and wikipedia use cases: Community-content site: Logged-out users have content cached aggressively offline - meaning every page visited should be cached until told otherwise. Intermediate caches / proxies should be able to cache the latest version of a URL. As soon as a user logs in, the same urls they just used should now have editing controls. (note that actual page contents *may* not have not changed, just the UI) Pages now need to be fresh meaning that users should never edit stale content. In an ideal world, once a logged in user has edited a page, that page is pushed to users or proxies who have previously cached that page and will likely visit it again soon. I know this example in particular seems like it could be accomplished with a series of If-Modified-Since / 304's, but connection latency is the killer here, especially for mobile - the fact that you have a white screen while you wait to see if the page has changed. The idea that you could visit a cached page, (i.e. avoid hitting the network) and then a few seconds later be told there is a newer version of this page available after the fact, (or even just silently update the page so the next visit delivers a fresh but network-free page) would be pretty huge. Especially if you could then proactively fetch a select set of pages - i.e. imagine an in-browser process that says for each link on this page, if I have a stale copy of the url, go fetch it in the background so it is ready in the cache (On this note it would probably be worth reaching out to the wiki foundation to learn about the hassle they've gone through over the years trying to distribute the load of wikipedia traffic given the constraints of HTTP caching, broken proxies, CDNs, ISPs, etc) Alec On Tue, Apr 30, 2013 at 9:06 PM, Jonas Sicking jo...@sicking.cc wrote: On Apr 18, 2013 6:19 PM, Paul Bakaus pbak...@zynga.com wrote: Hi Jonas, Thanks for this I feel this is heading somewhere, finally! I still need to work on submitting my full feedback, but I'd like to mention this: Why did nobody so far in this thread include real world use cases? For a highly complex topic like this in particular, I would think that collecting a large number of user use cases, not only requirements, and furthermore finding the lowest common denominator based on them, would prove very helpful, even if it's just about validation and making people understand your lengthy proposal. I.e. a news reader that needs to sync content, but has an offline UI. Do you have a list collected somewhere? Sorry for not including the list in the initial email. It was long enough as it was so I decided to stop. Some of the use cases we discussed were: Small simple game The game consists of a set of static resources. A few HTML pages, like high score page, start page, in-game page, etc. A larger number of media resources. A few data resources which contain level metadata. Small amount of dynamic data being generated, such as progress on each level, high score, user info. In-game performance is critical, all resources must be guaranteed to be available locally once the game starts. Little need for network connectivity other than to update game resources whenever an update is available. Advanced game Same as simple game, but also downloads additional levels dynamically. Also wants to store game progress on servers so that it can be synced across devices. Wikipedia Top level page and its resources are made available offline. Application logic can enable additional pages to be made available offline. When such a page is made available offline both the page and any media resources that it uses needs to be cached. Doesn't need to be updated very aggressively, maybe only upon user request. Twitter A set of HTML templates that are used to create a UI for a database of tweets. The same data is visualized in several different ways, for example in the user's default tweet stream, in the page for an individual tweet, and in the conversation thread view. Downloading the actual tweet contents and metadata shouldn't need to happen multiple times in order to support the separate views. The URLs for watching individual tweets needs to be the same whether the user is using appcache or not so that linking to a tweet always works. It is very important that users are upgraded to the latest version of scripts and templates very quickly after they become available. The website likely will want to be able to check for updates on demand rather than relying on implementation logic. If the user is online but has appcached the website it should be able to use the cached version. This should be the case even if the user navigates to a tweet page for a tweet for which the user hasn't yet cached the tweet content or metadata. In this case only the tweet content and metadata
Re: InedxedDB events : misconception?
On Mon, Apr 22, 2013 at 9:56 AM, Michaël Rouges michael.rou...@gmail.comwrote: Hum ... thank you for this answer, but ... Are you sure there is no possibility that the application is completed before adding events? I find it hard to perceive how it couldn't happen. Just to close the loop on this concern: the reason there is no possibility is that this part of the IndexedDB specification - all browsers must guarantee this behavior to have a working IndexedDB - in fact the rest of IndexedDB itself would be unusable if this guarantee was not met. Stuff like this can feel a little awkward if you're using to dealing in a multi-threaded world, but this API is fairly normal for a web api, at least in this respect. In fact XHR is the outlier here in requiring a specific xhrrequest.send() call. Alec
Re: Fixing appcache: a proposal to get us started
This is a tricky problem indeed. The current appcache actually has the behavior that you're advocating, but that's something that a lot of developers has complained about. In fact, that's the second biggest complaint that I've heard only trailing the confusing master entries behavior. I personally think the problem with this particular aspect of the existing appcache is that its so incredibly hard to clear the cache and go online during development - i.e. once you're offline you have to jump through hoops to get back online. A secondary issue was that once you as a developer got used to dealing with that, If your users somehow get stuck in an offline state because of a bug, there is/was no real way to repair them other than telling them to clear their cache. On the other hand, it creates the load twice to get latest version behavior that a lot of developers dislike. I.e. when a user opens a website they end up getting the previous version of the website and have to reload to get the new version. I think that if there is a programmatic API that is available early-on, then at least starting in offline gives the developer the option of going online if they so choose - and it could be done even before the onload handler if they want to avoid flashing the old/deprecated page in the browser. If you require hitting the network first, then I can't think of how you'd write programmatic hooks to bypass that. I personally think that no matter how expressive the declarative syntax is, developers are always going to need to work around it - expiration or staleness is simply too complex to just give an absolute or relative date or time - caching policy in apps can simply depend on things that extend beyond your caching syntax - I mean imagine a caching policy that depends on it being before or after sunset in your locale. If you have other ideas for how we can solve this then I'd love to hear it. If we need to add more knobs to allow authors to choose which policies they want to use then I have no problem with that. It would be particularly interesting to hear what policies people are planning on implementing using NavigationController to see if we can enable those. A more complex, real example: at my last company had a systemwide horizon expiration policy that we implemented with a caching proxy. Imagine this: a very large interconnected data set where individuals spent their time editing data in small regions of the corpus. The goal was if you made an edit, then everything YOU saw would be consistent with that edit. It was perfectly reasonable to have other users see stale versions of any page - a poor man's (i.e. startup with only a few application server's) eventually-consistent solution. The way this worked, if any individual user made changes to a particular dataset that affected a page, they would get a cookie set on their client saying you have made changes through time T and all future pages that they visited had to be newer than time T. When the browser would hit the proxy with an If-Modified-Since, the proxy would look at the cookie and say Hmm.. I have a stale version of this page at time T-6, I'd better regenerate it or I have a version of the page at time T+2, so I can give this to you - To make this work we had to set max-age=0, essentially bypassing the entire user's browser cache for every page, even if the server mostly responded with a 304. (so the proxy server sitting in our colo in Santa Clara functioned as your browser's cache because that was the place we could programatically write a policy) That really sucked for performance though, so we increased max-age to maybe 30 seconds, and put a generated script in the head that included the time the page was generated, and then compared the cookie to the embedded time. If the cookie was higher, then we know the page was served stale (by our definition of stale) from the browser cache so we forced a refresh. Since this was all in the head, the page didn't even flicker. Something like head scriptvar lastWriteTime=1292830; // generated in-page by the template engine if (lastWriteTime extractLWT(document.cookie)) reload(); // boilerplate cache policy /script But of course the problem there is that ONLY works on HTML - all other resources had to have a different policy. With a NavigationController model (or some other programmatic model) you can write arbitrary logic. to deal with these kinds of cases. I'm not saying the declarative model doesn't fix 80% of the issues, but more that you still need a programmatic model in addition. If the user is offline, or if we checked for update for the appcache within the last 5 minutes, we use the index.html from the appcache without hitting the network first. If index.html uses index.js or index.css, those will be immediately loaded from the cache. Is the opposite true? If index.html is loaded from the network will index.js ever come from the cache?
Re: IndexedDB, what were the issues? How do we stop it from happening again?
Sorry in advance for the long reply here.. TL;DR is: 1) I still don't think transactions are a requirement 2) I think fixing the platform's motley crew of async apis, and giving developers better control over transaction commits (When you do use them) is probably most important. On Tue, Mar 19, 2013 at 1:52 PM, Jonas Sicking jo...@sicking.cc wrote: var a; trans = db.transaction(); trans.get(a).onsuccess = function(e) { a = e.target.result; } trans.get(b).onsuccess = function(e) { trans.set(b, e.target.result + a); } I'll be honest: as a web developer who has been bitten numerous times with coordinating setTimeout and XHR callbacks, I'd never trust that the above worked at all, even with explicit transactions. I just don't think you'd ever write code like that because you're relying with the subtleties of how scoping works in JS, in addition to the order guarantees of IndexedDB. you'd write this instead: db.objectStore(foo).get(a).onsuccess = function(e) { var a = e.target.result; db.objectStore(foo).get(b).onsuccess = function(e) { db.objectStore(foo).set(b, e.target.result + a); } } But this is still a degenerate contrived example that I just don't believe is representative of real-world code. We'd be optimizing for a fairly specific pattern and I think people are far more often bit by auto-committing transactions than they are by races like this. If anything, XHR and setTimeout (and all the new-fangled HTML5 APIs) have taught people to be careful about races in async apis. But I don't think that requires that we get rid of transactions from the simple API. And I suspect that that doesn't need to meaningfully need to make the simple API that much more complicated. It seems like there are two kinds of races that we're talking about here: database races (i.e. read/writes not being atomic, the A in ACID) and event races (i.e. any two arbitrary operations not having guaranteed order, the I in ACID) - I think the latter is often solved with a better asynchronous API abstraction like Futures/Promises - i.e. an async pattern that lets you be explicit about your ordering rather than relying on a containing abstraction like transactions. I mean imagine your same identical code with XHR (drastically simplified, but hopefully you get the idea) xhr1.open(/url?key=key1); xhr1.onsuccess = function(e) { a = xhr1.responseText; } xhr2.open(/url?key=key2); xhr2.onsuccess = function(e) { xhr3.open(/update?first= + a + second= + xhr2.responseText); } in the context of XHR, it's now obvious to everyone watching that this is a race condition.. just a very crude one. I guess I'm doing this to demonstrate why no developer worth their salt would purposefully write the race condition that you're afraid may happen without transactions. No other Async api has a notion of transactions to work around races due to async responses. If that's our concern, then we should be focusing on getting to Futures/Promises. Having transactions doesn't solve races between async subsystems, like when using XHR + IDB together. The following pattern going to be far more common: var key = ... xhr1.open(/url?key= + key); xhr1.onsuccess = function(e) { var xhrValue = xhr1.responseText; indexedDB.get(key).onsuccess = function(e) { if (keyValue.value != e.target.result) { // update my cache... } } } but ultimately this is still ugly because you're serializing your operations and it's complicated to write code that runs them both in parallel and only compares them when both callbacks have fired. (Nevermind the fact that if we were dealing with our current auto-committing transactions, any open transaction would have committed while we were waiting for the XHR response) but with futures/promises and nice libraries like q.js you can imagine stuff like: Q.all([ xhr1.open(/get?key= + key) indexedDB.get(key) ]) .spread(function(responseText, idbValue) { if (responseText != idbValue) { // update my cache... } }); Bam. Ordering races are gone. Alec
Re: IndexedDB, what were the issues? How do we stop it from happening again?
transactions - Also should be optional. Vital to complex apps, but totally not necessary for many.. there should be a default transaction, like db.objectStore(foo).get(bar) I disagree. This would have made it too trivial to create pages that have race conditions. I.e. people would write code like: db.objectStore(foo).get(bar).onsuccess = function(e) { db.objectStore(foo).set(bar, e.target.result + 1) } without realizing that that contains a race condition. Its always possible for users to hang themselves - the web platform has lots of rope. I could name a dozen gotchas like that in the JavaScript language alone. The fact that we introduced shared workers introduces a whole mess of issues like that. Not to criticize either - I think its just something that happens as you introduce more flexible capabilities into the platform. In the above example, you could approach this with automatic transactions - all operations that run in the callback of another IDB operation run in the same transaction. So the set() and the get() are in the same transaction. When you need explicit transactional control then you use the transaction() API. transaction scoping - even when you do want transactions, the api is just too verbose and repetitive for get one key from one object store - db.transaction(foo).objectStore(foo).get(bar) - there should be implicit (lightweight) transactions like db.objectStore(foo).get(bar) We used to have this exact syntax, but it got everyone confused about what how the implicit transaction actually worked. This is surprising to me - *shrug* - I assume it was like the automatic transactions I mentioned above. named object stores - frankly, for many use cases, a single objectStore is all you need. a simple db.get(foo) would be sufficient. Simply naming a default isn't bad - whats bad is all the onupgradeneeded scaffolding required to create the objectstore in the first place. I think we should do this as part of a simple API. Similar to something like https://github.com/slightlyoff/async-local-storage Yes! I mean that's kind of where this conversation took off... I just don't think there should be an obvious distinction between the API with the transactions and versions and the one without. - if anything presenting them in a unified fashion allows for developers to migrate as they need individual features (Transactions, versions, etc) Alec
Re: IndexedDB, what were the issues? How do we stop it from happening again?
My primary takeaway from both working on IDB and working with IDB for some demo apps is that IDB has just the right amount of complexity for really large, robust database use.. but for a welcome to noSQL in the browser it is way too complicated. Specifically: 1. *versioning* - The reason this exists in IDB is to guarantee a schema (read: a fixed set of objectStores + indexes) for a given set of operations. Versioning should be optional. And if versioning is optional, so should *opening* - the only reason you need to open a database is so that you have a handle to a versioned database. You can *almost* implement versioning in JS if you really care about it...(either keep an explicit key, or auto-detect the state of the schema) its one of those cases where 80% of versioning is dirt simple and the complicated stuff is really about maintaining version changes across multiply-opened windows. (i.e. one window opens an idb, the next window opens it and changes the schema, the first window *may* need to know that and be able to adapt without breaking any in-flight transactions) - 2. *transactions* - Also should be optional. Vital to complex apps, but totally not necessary for many.. there should be a default transaction, like db.objectStore(foo).get(bar) 3. *transaction scoping* - even when you do want transactions, the api is just too verbose and repetitive for get one key from one object store - db.transaction(foo).objectStore(foo).get(bar) - there should be implicit (lightweight) transactions like db.objectStore(foo).get(bar) 4. *forced versioning* - when versioning is optional, it should be then possible to change the schema during a regular transaction. Yes, this is a lot of rope but this is actually for much more complex apps, rather than simple ones. In particular, it's not uncommon for more complex database systems to dynamically create indexes based on observed behavior of the API, or observed data (i.e. when data with a particular key becomes prevalent, generate an index for it) and then dynamically use them if present. At the moment you have to do a manual close/open/version change to dynamically bump up the version - effectively rendering fixed-value versions moot (i.e. the schema for version 23 in my browser may look totally different than the schema for version 23 in your browser) and drastically complicating all your code (Because if you try to close/open while transactions are in flight, they will be aborted - so you have to temporarily pause all new transactions, wait for all in-flight transactions to finish, do a close/open, then start running all pending/paused transactions.) This last case MIGHT be as simple as adding db.reopen(newVersion) to the existing spec. 5. *named object stores* - frankly, for *many* use cases, a single objectStore is all you need. a simple db.get(foo) would be sufficient. Simply naming a default isn't bad - whats bad is all the onupgradeneeded scaffolding required to create the objectstore in the first place. I do think that the IDBRequest model needs tweaking, and Futures seem like the obvious direction to head in. FWIW, the sync version of the API is more or less dead - nobody has actually implemented it. I think there is a very specialized set of applications that absolutely need the features that IDB has right now. Google Docs is a perfect example - long lived complicated application that needs to keep absolute integrity of schema across multiple tabs over a long period of time.. but for 99% of usecases out there, I think they're unnecessary. I think ultimately, a simplified IDB would allow progressive use of the api as your application grows. // basic interaction - some objectStore named 'default' gets crated under the hood. indexedDB.get(mykey); // named database, auto-create the 'first' objectStore named 'default', no need to 'close' anything indexedDB.database(mydb).get(mykey) // now we need multiple objectstores: indexedDB.database(mydb).objectStore(default).get(mykey) // time for versioning, but using 'default' indexedDB.open(mydb, 12).onupgradeneeded(function (db) {...}).get(bar) etc... Alec On Wed, Mar 6, 2013 at 6:01 AM, Alex Russell slightly...@google.com wrote: Comments inline. Adding some folks from the IDB team at Google to the thread as well as public-webapps. On Sunday, February 17, 2013, Miko Nieminen wrote: 2013/2/15 Shwetank Dixit shweta...@opera.com Why did you feel it was necessary to write a layer on top of IndexedDB? I think this is the main issue here. As it stands, IDB is great in terms of features and power it offers, but the feedback I recieved from other devs was that writing raw IndexedDB requires an uncomfortable amount of verbosity even for some simple tasks (This can be disputed, but that is the views I got from some of the developers I interacted with). Adding that much amount of
Re: IndexedDB, what were the issues? How do we stop it from happening again?
On Wed, Mar 6, 2013 at 11:02 AM, Ian Fette (イアンフェッティ) ife...@google.comwrote: I seem to recall we contemplated people writing libraries on top of IDB from the beginning. I'm not sure why this is a bad thing. We originally shipped web sql / sqlite, which was a familiar interface for many and relatively easy to use, but had a sufficiently large API surface area that no one felt they wanted to document the whole thing such that we could have an inter-operable standard. (Yes, I'm simplifying a bit.) As a result, we came up with an approach of What are the fundamental primitives that we need?, spec'd that out, and shipped it. We had discussions at the time that we expected library authors to produce abstraction layers that made IDB easier to use, as the fundamental primitives approach was not necessarily intended to produce an API that was as straightforward and easy to use as what we were trying to replace. If that's now what is happening, that seems like a good thing, not a failure. That's fine for building up, but I guess what I'm saying is that the primitives are too complicated to allow you to get started. There is an excellent HTML5rocks page on indexeddb describing a very simple usecase. But if I were a web developer, I'd say screw that, back to localStorage. Most of the html5 APIs seem almost too simple to be useful, and then people chain primitives together into useful APIs with libraries. For example, the DOM APIs are woefully verbose and primitive, but people have built much better APIs around them. But fundamentally they are primitive. IndexedDB seems like its build the other way - we looked at what would be hard for developers to implement themselves (transactions, versions) and built *primitives* that required them. At the same time, I'd argue that versioning and transactions *could* be built upon a transactionless, versionless keystore in JavaScript, using a library. Even the notion of multiple objectStores is an abstraction that could be implemented on a single keystore. To be fair, implementing transactions against a transactionless keystore would not be *performant*, but thats a separate issue. Now that we have transactions and versions, I wouldn't eliminate them from IDB by any means, but they're not required. Alec -Ian On Wed, Mar 6, 2013 at 10:14 AM, Alec Flett alecfl...@chromium.orgwrote: My primary takeaway from both working on IDB and working with IDB for some demo apps is that IDB has just the right amount of complexity for really large, robust database use.. but for a welcome to noSQL in the browser it is way too complicated. Specifically: 1. *versioning* - The reason this exists in IDB is to guarantee a schema (read: a fixed set of objectStores + indexes) for a given set of operations. Versioning should be optional. And if versioning is optional, so should *opening* - the only reason you need to open a database is so that you have a handle to a versioned database. You can *almost* implement versioning in JS if you really care about it...(either keep an explicit key, or auto-detect the state of the schema) its one of those cases where 80% of versioning is dirt simple and the complicated stuff is really about maintaining version changes across multiply-opened windows. (i.e. one window opens an idb, the next window opens it and changes the schema, the first window *may* need to know that and be able to adapt without breaking any in-flight transactions) - 2. *transactions* - Also should be optional. Vital to complex apps, but totally not necessary for many.. there should be a default transaction, like db.objectStore(foo).get(bar) 3. *transaction scoping* - even when you do want transactions, the api is just too verbose and repetitive for get one key from one object store - db.transaction(foo).objectStore(foo).get(bar) - there should be implicit (lightweight) transactions like db.objectStore(foo).get(bar) 4. *forced versioning* - when versioning is optional, it should be then possible to change the schema during a regular transaction. Yes, this is a lot of rope but this is actually for much more complex apps, rather than simple ones. In particular, it's not uncommon for more complex database systems to dynamically create indexes based on observed behavior of the API, or observed data (i.e. when data with a particular key becomes prevalent, generate an index for it) and then dynamically use them if present. At the moment you have to do a manual close/open/version change to dynamically bump up the version - effectively rendering fixed-value versions moot (i.e. the schema for version 23 in my browser may look totally different than the schema for version 23 in your browser) and drastically complicating all your code (Because if you try to close/open while transactions are in flight, they will be aborted - so you have
Re: IndexedDB: undefined parameters
On Tue, Oct 9, 2012 at 11:12 AM, Boris Zbarsky bzbar...@mit.edu wrote: On 10/9/12 1:52 PM, Joshua Bell wrote: The IDB spec does not have [TreatUndefinedAs=Missing] specified on openCursor()'s arguments (or anywhere else), so I believe Chrome's behavior here is correct. It looks correct as the spec is currently written. It's not clear to me why the spec is written the way it is. It could just as easily define that if the any value is undefined, it's ignored. Or it could use [TreatUndefinedAs=Missing], indeed. I have to say, as a developer it can be really frustrating to write abstractions on top of APIs that behave this way, when you want to say something like: var direction; var range; if (condition1) direction = 'prev'; else if (condition2) direction = 'prevuniq'; if (condition3) { range = range1; else if (condition4) range = range2; return source.openCursor(range, direction); Alec
Re: IndexedDB: undefined parameters
On Tue, Oct 9, 2012 at 11:37 AM, Alec Flett alecfl...@chromium.org wrote: On Tue, Oct 9, 2012 at 11:12 AM, Boris Zbarsky bzbar...@mit.edu wrote: On 10/9/12 1:52 PM, Joshua Bell wrote: The IDB spec does not have [TreatUndefinedAs=Missing] specified on openCursor()'s arguments (or anywhere else), so I believe Chrome's behavior here is correct. It looks correct as the spec is currently written. It's not clear to me why the spec is written the way it is. It could just as easily define that if the any value is undefined, it's ignored. Or it could use [TreatUndefinedAs=Missing], indeed. I have to say, as a developer it can be really frustrating to write abstractions on top of APIs that behave this way, when you want to say something like: Someone asked me to clarify: by this way I meant where passing undefined is different than calling without the parameter - meaning that in general, APIs should behave the same if you call foo(undefined) as if you called foo(). Otherwise it's notoriously hard to write anything functional (in the CS sense) around it. Alec var direction; var range; if (condition1) direction = 'prev'; else if (condition2) direction = 'prevuniq'; if (condition3) { range = range1; else if (condition4) range = range2; return source.openCursor(range, direction); Alec
IndexedDB: ambiguity around createIndex order-of-operations
jsbell and I have been discussing a possible ambiguity in the IndexedDB spec w.r.t. error handling around createIndex calls. In particular, createIndex() is supposed to behave somewhat synchronously in that calling: the implementation *must* create a newindexhttp://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html#dfn-index and return an IDBIndexhttp://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html#idl-def-IDBIndex object representing it. so that this is reasonable: objectStore.createIndex('foo',...) objectStore.put(...) objectStore.index('foo').get(...) But at the same time createIndex() behaves somewhat asynchrnously - while the metadata for the index needs to be there immediately, the actual indexing data doesn't have to: In some implementations it's possible for the implementation to asynchronously run into problems creating the index after the createIndex function has returned. For example in implementations where metadata about the newly created index is queued up to be inserted into the database asynchronously, or where the implementation might need to ask the user for permission for quota reasons. Such implementations *must* still create and return an IDBIndexhttp://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html#idl-def-IDBIndex object. Instead, once the implementation realizes that creating the index has failed, it *must* abort the transaction using thesteps for aborting a transactionhttp://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html#dfn-steps-for-aborting-a-transaction using the appropriate error as error parameter. The issue in question is how to handle this: objectStore.put({foo: 1, message: hello}); req = objectStore.put({foo: 1, message: goodbye}); objectStore.createIndex(foo, foo, {unique: true});// will fail asynchronously The question is, should req's onerror fire or not? Depending on the implementation, createIndex() could fully create/index the whole 'foo' index before the put's are serviced, which means by the time the 2nd put() happens, the index already says that the put is invalid. On the other hand, if the actual indexing happens later (asynchronously), but in the order written (i.e. put(), put(), createIndex) then the 2nd put would succeed, and THEN the index gets created. In either case the transaction is aborted. From a developer's perspective, I feel like making the 2nd put() fail is really confusing, because it seems really strange that a later API call (createIndex) could make an earlier put() fail - you might remove the createIndex() to debug the code and then magically it would succeed! On the other hand, that behavior does allow the creator to preventBubble() to prevent the failure, which could prevent the transaction from failing. In either case, I feel like this is a fairly degenerate case and I feel like we need to optimize this behavior for debugging, since I don't think normal usage patterns of IndexedDB should be doing this. Alec
Re: IndexedDB: ambiguity around createIndex order-of-operations
On Mon, Aug 13, 2012 at 12:23 PM, Jonas Sicking jo...@sicking.cc wrote: I think the two puts need to succeed. Implementation would be very complex and suboptimal otherwise. You need to know that there's a pending index-create operation and wait with firing success values for any requests until both all requests have succeeded and the index-create operation has succeeded before you can fire any events. Yeah - I think this is just easier for developers to wrap their head around too. On top of that you can get circular dependencies I think since if one of the two put operations failed for reasons unrelated to the index-create, then the index-create operation would succeed. The way we handle this in gecko is that we treat index-create as a normal async operation. FWIW, this is exactly what we mostly do in Chrome - and during some code refactoring, we realized we had a choice of behaviors and went to the spec for guidance.. Suggestions for how to clarify this in the spec is welcome. At the very least we need a bug. https://www.w3.org/Bugs/Public/show_bug.cgi?id=18551 Alec / Jonas
Re: IndexedDB and RegEx search
This is somewhat similar to [1] and something we decided was out-of-scope for v1. But for v2 I definitely think we should look at mechanisms for using JS code to filter/sort/index data in such a way that the JS code is run on the IO thread. [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=1 There's a lot of excellent prior art in CouchDB for what you're describing in that bug (or at least parts thereof). I think it's well worth looking at. If I understand CouchDB correctly, much like the suggestion elsewhere in this thread, CouchDB's views are really indexing primitives that support callbacks - the callbacks are (more or less) run at most once per document as the document is stored (or soon after) - rather than every time a cursor is created/iterated. This means that cursor iteration can still be very fast. I could be wrong, but I theorize MOST of the use cases for filters are more or less static/stateless, and that if you want to iterate once using a specific stateless callback/filter, then you'll probably going to want to iterate it again, many times. That particular usecase just begs for an index. Meaning, you probably want have code something like: objectStore.openCursor(function(value) { return value.foo value.bar; }).onsuccess = ... this could be done with a callback-based index: objectStore.createIndex(foobigger, function(value) { return value.foo value.bar }); objectStore.index(foobigger).openCursor(IDBKeyRange.only(true)); The next use case is for some kind of semi-static cursor, where the function isn't stateless, but it's parameterized by another value: var maxDifference = calculateMaxDifference() objectStore.openCursor(function(value) { return (value.foo - value.bar) maxDifference; }).onsuccess = ...; This too can be implemented/expressed with a callback-based index, such that the check for maxDifference is more of a range call: objectStore.index(difference).openCursor(IDBKeyRange.upperBound(maxDifference)) the final case I see is something where the callback really is stateful: objectStore.openCursor(function (value) { return (model.validate(value)); }).onsuccess = ...; Assuming model is fairly dynamic and well out of scope of indexing (i.e. validation can't be expressed on some linear scale that can be range-queried with IDBKeyRange) This is a MUCH harder problem that has all sorts of security issues that would need to be thought through... but the other use cases could still be addressed by indexes. I think part of the overall problem is that it's really rather cumbersome to create/remove indexes in IndexedDB - you need to change the database version to trigger a versionchange event, etc... it would be much nicer if there were ways to dynamically create them on the fly, or add them as needed. This has been brought up here in other contexts... I wonder if in IndexedDB v2 we could support creating indexes on the fly - I think indexeddb is trying too hard to enforce some kind of schema versioning that is tied to indexes, that handles a very strict usecase of lock-step schema changes, but I'm not sure everyone really needs that. I think that's a burden we should leave to consumers of the API. I'd much rather be able to say, in any transaction: if (!('myindex' in objectStore.indexNames) { objectStore.createIndex('myindex',); } Anyway, that's fodder for another thread :) Alec -- Robin Berjon - http://berjon.com/ - @robinberjon
Re: IndexedDB and RegEx search
FWIW it's fairly hard to for a database to index arbitrary content for regexes, to the point where it's going to be hard to do MUCH better than simply filtering based on regex. Out of curiosity, would the free text search feature on that wiki page that Arthur provided meet your needs? it's more along the lines of what SQL 'LIKE' provides. On Tue, Jul 31, 2012 at 11:17 AM, Michael Brooks firealwayswo...@gmail.comwrote: I like IndexedDB and non-relational databases. One feature that is very useful is the ability to search by regular expression: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-RegularExpressions from that page: An index on the field queried by regexp can increase performance significantly, as follows: - Simple prefix queries (also called rooted regexps) like /^prefix/ will make efficient use of the index (much like most SQL databases that use indexes for a LIKE 'prefix%' expression). This only works if the expression is left-rooted and the i (case-insensitivity) flag is not used. - All other queries will not make an efficient use of the index: all values in the index will be scanned and tested against the regular expression. While /^a/, /^a.*/, and /^a.*$/ are equivalent, they will have different performance characteristics. The latter two will be slower as they have to scan the whole string. The first format can stop scanning after the prefix is matched. By not having this feature, I can't port my application to IndexedDB from WebSQL because it relies upon LIKE statements for text search. Text search could easily be replaced by a RegEx search, however this feature is unfortunately not apart of IndexedDB. It seems like this feature was just forgotten when writing the spec of IndexedDB. One way to do this is to create a reverse index on the contents of the table that you would normally use 'like' for - this is basically what a freetext search would do. Create an objectStore on the side that maps tokens (i.e. what LIKE would tokenize, words) to keys in another table. This is the classic NoSQL way to solve this, and it's what SQL does under the hood. (But I agree it would be nice for IndexedDB to just do this for you!) Alec
Re: [UndoManager] Re-introduce DOMTransaction interface?
On Thu, Jul 12, 2012 at 12:38 PM, Jonas Sicking jo...@sicking.cc wrote: On Thu, Jul 12, 2012 at 2:07 AM, Yuval Sadan sadan.yu...@gmail.com wrote: I think we need to realize that a lot of the APIs that have been designed in the past aren't terribly good APIs. The IndexedDB API is rather new, and the manner in which it consistently uses event handlers on returned objects is rather innovative. The DOMTransaction object is more similar to that. The IndexedDB pattern is rather an old one - see the Python (I know, Ryosuke) Twisted library which uses 'deferreds' which went on to influence hundreds of projects in many languages. Eventually it migrated into JS libraries as promises (sans a few subtle differences...) I think the key difference between the two models (DOMTransaction vs IndexedDB's events) is about the creation of the event-owning object. hen you pass in a dictionary/DOMTransaction/etc, then the caller is is responsible for creating that object and doesn't give the callee (i.e. the API) a lot of control over the event propagation/etc. One thing IndexedDB's IDBRequest object has going for it is that the implementation creates it, therefore controls the behavior and the interaction between success, error, and abort events. One particularly nice thing the event pattern adds on top of the deferred pattern is the ability to have multiple semantically parallel event handlers, i.e. // wrapper around get() to update something whenever we encounter purple keys function myGet(k) { r = objectStore.get(k) r.addEventListener(dealWithPurpleKeys); return r; } r = myGet(key); r.addEventListener(dealWithThisKey); If the DOMTransaction/dictionary has to be passed in, this complicates this pattern quite a bit, and this is a pretty nice pattern in a dynamic language like JS. In IndexedDB events work out ok since we can take advantage of the event bubbling mechanism. That's not the case here. Likewise, in IndexedDB you generally only have to install a single event handler, which means that you can write code like: store.get(...).onsuccess = function(event) { ... }; I would argue that you can do this easily in IndexedDB *because* of event bubbling - or at least by the fact that there is a parent object (the IDBTransaction) - you can let error propagation be handled by a parent object so that you dont' have to install error handlers on every request. Just watching this conversation, the way I'm seeing it is it's fairly easy to make a wrapper API around the deferred/event model that looks like Ryosuke's model, than it is the other way around. It also seems like if you could add some kind of event handling (i.e. if a user had a kind of generic undo-er method that worked on 90% of events, but then could attach specific undoer-s to specific types of transactions) (I personally come from the school of thought that the DOM APIs are kind of the assembly language of the web - they're very powerful and exact, you can use them in a pinch but most people won't use them directly and will put some kind of higher level abstraction on top of them. I don't think this is a failing of the APIs by any means, no more than the proliferation of computer languages is a failing of CPUs) Alec or store.get(...).onsuccess = obj.myhandler.bind(obj); Neither of which is possible here. Yet, in IndexedDB I would have strongly preferred to use promices rather than DOM-Events based Request objects. The only reason we don't is because there are no standardized promices that we can use. In other words, I think it's more important to focus on what makes a good API, than what is consistent with other DOM APIs. Consistency has its value. Even if some is lacking, fixing it in some places and not in others might cause a jumble. Which is my feeling actually about the IndexedDB API. Adding more syntactical variations can lead to hectic code. However, I agree that it's not the primary concern. Indeed. Something that I really liked about the old API was the fact that using it created very intuitive code. Basically you just write a class the way you normally would write a class, and then pass in your object: x = { someState: 0, apply: function() { this.someState++; this.modifyDOM(); }, unapply: function() { this.subState--; this.modifyDOMOtherWay(); }, ... }; undoManager.transact(x); You can even do things like undoManager.transact(createParagraphTransaction(params)); How's that different from: function createParagrahTransaction(params) { x = new DOMTransaction(Create paragraph); x.apply = function() { ... use params... }; x.onundo = function() { ... use params ... }; return x; } Ah, you can, but that still doesn't give you nice class syntax. What I actually meant to say was that you can't do something like: undoManager.transact(new ParagraphTransaction(params)); Also, in your example, I think that in the
Re: [UndoManager] Re-introduce DOMTransaction interface?
[trying again with the right e-mail, sorry for the dupes...] On Thu, Jul 12, 2012 at 12:38 PM, Jonas Sicking jo...@sicking.cc wrote: On Thu, Jul 12, 2012 at 2:07 AM, Yuval Sadan sadan.yu...@gmail.com wrote: I think we need to realize that a lot of the APIs that have been designed in the past aren't terribly good APIs. The IndexedDB API is rather new, and the manner in which it consistently uses event handlers on returned objects is rather innovative. The DOMTransaction object is more similar to that. The IndexedDB pattern is rather an old one - see the Python (I know, Ryosuke) Twisted library which uses 'deferreds' which went on to influence hundreds of projects in many languages. Eventually it migrated into JS libraries as promises (sans a few subtle differences...) I think the key difference between the two models (DOMTransaction vs IndexedDB's events) is about the creation of the event-owning object. hen you pass in a dictionary/DOMTransaction/etc, then the caller is is responsible for creating that object and doesn't give the callee (i.e. the API) a lot of control over the event propagation/etc. One thing IndexedDB's IDBRequest object has going for it is that the implementation creates it, therefore controls the behavior and the interaction between success, error, and abort events. One particularly nice thing the event pattern adds on top of the deferred pattern is the ability to have multiple semantically parallel event handlers, i.e. // wrapper around get() to update something whenever we encounter purple keys function myGet(k) { r = objectStore.get(k) r.addEventListener(dealWithPurpleKeys); return r; } r = myGet(key); r.addEventListener(dealWithThisKey); If the DOMTransaction/dictionary has to be passed in, this complicates this pattern quite a bit, and this is a pretty nice pattern in a dynamic language like JS. In IndexedDB events work out ok since we can take advantage of the event bubbling mechanism. That's not the case here. Likewise, in IndexedDB you generally only have to install a single event handler, which means that you can write code like: store.get(...).onsuccess = function(event) { ... }; I would argue that you can do this easily in IndexedDB *because* of event bubbling - or at least by the fact that there is a parent object (the IDBTransaction) - you can let error propagation be handled by a parent object so that you dont' have to install error handlers on every request. Just watching this conversation, the way I'm seeing it is it's fairly easy to make a wrapper API around the deferred/event model that looks like Ryosuke's model, than it is the other way around. It also seems like if you could add some kind of event handling (i.e. if a user had a kind of generic undo-er method that worked on 90% of events, but then could attach specific undoer-s to specific types of transactions) (I personally come from the school of thought that the DOM APIs are kind of the assembly language of the web - they're very powerful and exact, you can use them in a pinch but most people won't use them directly and will put some kind of higher level abstraction on top of them. I don't think this is a failing of the APIs by any means, no more than the proliferation of computer languages is a failing of CPUs) Alec
Re: [webcomponents] HTML Parsing and the template element
On Thu, Jun 7, 2012 at 2:45 AM, Henri Sivonen hsivo...@iki.fi wrote: On Wed, Jun 6, 2012 at 7:13 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: A call like document.querySelectorAll('p') doesn't *want* to get the p inside the template. I think it's backwards to assume that querySelectorAll() works a particular way and that's that's not what authors want and to change the DOM in response. You could make this argument about any assumption anyone makes anywhere which in any way adds features. template is a pretty fundamentally different use of the DOM from existing practices and it seems like violations of existing rules are likely. That said, is a best practice here to put template in the head if web developers are inclined to inline the template declaration? At least then you can rely on document.body.querySelectorAll(). That seems like the cleanest approach from a developer's perspective and is consistent with practices like putting embedded CSS in the head. Alec There are various solutions that don't involve drastic changes to the correspondence between the markup and the DOM, for example: * Invoking querySelectorAll() on a wrapper element that's known not to be a parent of the templates on the page. * Using a selector that fails to match elements whose ancestor chain contains a template element. * Introducing an API querySelectorNonTemplate(). (Don't say All if you don't mean *all*). Even though XML has fallen out of favor, I think violations of the DOM Consistency principle and features that don't work with the XHTML serialization should be considered huge red flags indicating faulty design. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: IndexedDB: ambiguity around IDBTransaction.error
Done. https://www.w3.org/Bugs/Public/show_bug.cgi?id=17236 On Tue, May 29, 2012 at 2:31 AM, Jonas Sicking jo...@sicking.cc wrote: On Fri, May 25, 2012 at 1:16 PM, Alec Flett alecfl...@google.com wrote: I have found what feels like an ambiguity in the spec around IDBTransaction..error and when it is available. In particular, the spec says: When the done flag is true, getting this property must return the error of the request that caused the transaction to be aborted. [...] When the done flag is false, getting this property must throw a DOMException of type InvalidStateError. The ambiguity here is that the 'done flag' is technically something that resides on the request, not the transaction. After the transaction itself is complete, the 'error' attribute should be the error that caused the transaction to abort, if any. So the question is, which 'done' flag does this correspond to - the done flag on the current request, the done flag on the request that caused the abort, or some other 'done' state about the transaction itself. An example: transaction = ... transaction.objectStores(foo).put(badValue).onerror = function(event) { // can I access transaction.error here? // the request has its own error though. requestError = event.target.error; transaction.objectStores(foo).put(goodValue).onsuccess =function(event) { // can I access transaction.error here? If so, is it requestError, or is it null? } } transaction.objectStores(foo).put(goodValue).onsuccess = As a consumer of this interface, I'd expect the transaction's error property to be set early - i.e. available as soon as the error handler fires, above, and then I'd expect it to remain unchanged for the duration of the rest of the transaction. But I can see arguments against that. For instance, what happens if preventDefault() is called? we need to avoid setting the error in that case.. I think. So that would argue for some kind of 'done' flag / state for the transaction itself. Thoughts? We have the 'finished' flag already, which I think is what we should use here. Unfortunately it's somewhat ambigious when a transaction becomes finished. Would you mind filing a bug on this? / Jonas
IndexedDB: use of TypeError
So I didn't start working on/implementing indexedDB until after this message went out: http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0958.html So I haven't had to face the full implications of using TypeError until this week. I'd like to (hopefully not too late) voice my opposition to TypeError for the following scenarios: - invalid value for the 'direction' parameter to open*Cursor (must be 'next', 'prev', 'nextunique', or 'prevunique') - advance(count) where count 1 - open(version) where version 1 The problem here is that these are not easily enforceable by the WebIDL spec, which means implementations have an extra burden of customizing their WebIDL implementation to custom-annotate the IDL extend things like [EnforceRange=Positive] or introducing JS semantics into their otherwise JS-free code. Now this certainly could just be an implementors complaint, but I feel like I can retroactively justify it: The rationale behind the current spec is that these are very similar to a range error where the API has a very narrow range of values, that precede any higher level semantic parameter constraints. But the problem is that they still have some semantic value within IndexedDB - i.e. even though '0' is a valid numeric value to JavaScript, it is invalid because of specific consistencies within the IndexedDB spec. Even though foo is a valid string to JavaScript, strings like next and prev are the only valid strings because they have specific meaning in the context of IndexedDB. So I'd argue that these errors should be indexeddb-specific errors, and/or reused DOMExceptions. What I miss is the NotAllowedError that I believe in an earlier version of the spec, but I could also believe that DOMException's InvalidAccessError could also be used (The object does not support the operation or argument) Alec
IndexedDB: ambiguity around IDBTransaction.error
I have found what feels like an ambiguity in the spec around IDBTransaction.error and when it is available. In particular, the spec says: When the done http://www.w3.org/TR/IndexedDB/#dfn-request-done flag is true, getting this property *must* return the error of the request that caused the transaction http://www.w3.org/TR/IndexedDB/#dfn-transaction to be aborted. [...] When the donehttp://www.w3.org/TR/IndexedDB/#dfn-request-done flag is false, getting this property *must* throw a DOMException of type InvalidStateError. The ambiguity here is that the 'done flag' is technically something that resides on the request, not the transaction. After the transaction itself is complete, the 'error' attribute should be the error that caused the transaction to abort, if any. So the question is, which 'done' flag does this correspond to - the done flag on the current request, the done flag on the request that caused the abort, or some other 'done' state about the transaction itself. An example: transaction = ... transaction.objectStores(foo).put(badValue).onerror = function(event) { // can I access transaction.error here? // the request has its own error though. requestError = event.target.error; transaction.objectStores(foo).put(goodValue).onsuccess =function(event) { // can I access transaction.error here? If so, is it requestError, or is it null? } } transaction.objectStores(foo).put(goodValue).onsuccess = As a consumer of this interface, I'd expect the transaction's error property to be set early - i.e. available as soon as the error handler fires, above, and then I'd expect it to remain unchanged for the duration of the rest of the transaction. But I can see arguments against that. For instance, what happens if preventDefault() is called? we need to avoid setting the error in that case.. I think. So that would argue for some kind of 'done' flag / state for the transaction itself. Thoughts? Alec