Re: [IndexedDB] Detailed comments for the current draft
On 1/31/2010 11:33 PM, Nikunj Mehta wrote: a. 3.1.3: do we really need in-line + out-of-line keys? Besides the concept-count increase, we wonder whether out-of-line keys would cause trouble to generic libraries, as the values for the keys wouldn't be part of the values iterated when doing a "foreach" over the table. Certainly it is a matter of prioritizing among various requirements. Out-of-line keys enable people to store simple persistent hash maps. I think it would be wrong to require that data be always stored as objects. A library can always elide the availability of out-of-line keys if that poses a problem to its users. What about just supporting out-of-line keys? If somebody wanted the key to be part of what they were iterating, they could still have the object contain the key. With that said, wouldn't a persistent hash map be better done in local or session storage? I really think we should drop one of these concepts. I don't presently have a strong opinion on which. Cheers, Shawn smime.p7s Description: S/MIME Cryptographic Signature
Re: [IndexedDB] Detailed comments for the current draft
On Wed, Feb 3, 2010 at 3:37 AM, Pablo Castro wrote: > > > > I prefer to leave composite keys to a future version. > > I don't think we can get away with this. For indexes this is quite common > (if anything else to have stable ordering when the prefix of the index has > repeats). Once we have it for indexes the delta for having it for primary > keys as well is pretty small (although I wouldn't oppose leaving out > composite primary keys if that would help scope the feature). > After talking to some of our web app teams at Google, I'm going to have to strongly agree with this. As far as we can tell, there's no efficient way to emulate this behavior in JavaScript and it's a pretty common use case.
Re: [IndexedDB] Detailed comments for the current draft
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2/2/2010 8:37 PM, Pablo Castro wrote: d. 3.2.4.2: in our experiments writing application code, the fact that this method throws an exception when an item is not found is quite inconvenient. It would be much natural to just return undefined, as this can be a primary code path (to not find something) and not an exceptional situation. Same for 3.2.5, step 2 and 3.2.6 step 2. >>> > > I am not comfortable specifying the API to be dependent on the separation between undefined and null. Since null is a valid return value, it doesn't make sense to return that either. The only safe alternative appears to be to throw an error. > What do other folks think about this? I understand your concern, but it makes writing regular code really noisy as you need try/catch blocks to handle non-exceptional situations. > I agree with returning undefined for non-existent keys. JavaScript objects are key-value sets, and they return undefined when you attempt to access a non-existent key. Consistency suggests that JavaScript database should do the same. I also agree with Pablo's point that users would be likely to turn to doing an exists() and get() call together, which would most likely be more expensive than a single get(). - -- Kris Zyp SitePen (503) 806-1841 http://sitepen.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkto+I4ACgkQ9VpNnHc4zAzi+wCgqbHM+uYRUlgE8fX4br88IkFx k+AAoJRQ9aFmGx7hicGolb2jEnzxHJy8 =j7lx -END PGP SIGNATURE-
RE: [IndexedDB] Detailed comments for the current draft
On Mon, Feb 1, 2010 at 1:30 AM, Jeremy Orlow wrote: > > > 1. Keys and sorting > > > a. 3.1.1: it would seem that having also date/time values as keys > > > would be important and it's a common sorting criteria (e.g. as part of a > > > composite primary key or in general as an index key). > > The Web IDL spec does not support a Date/Time data type. Could your use > > case be supported by storing the underlying time with millisecond precision > > using an IDL long long type? I am willing to change the spec so that it > > allows long long instead of long IDL type, which will provide adequate > > support for Date and time sorting. > Can the spec not be augmented? It seems like other specs like WebGL have > created their own types. If not, I suppose your suggested change would > suffice as well. This does seem like an important use case. I agree, either we could augment the spec or we could describe it in terms of Javascript object values. That is, we can say something specific about the treatment of Javascript's Date object. Would that be possible? E.g. we could require implementations to provide full order for dates if they find an instance of that type in a path. > > > b. 3.1.1: similarly, sorting on number in general (not just > > > integers/longs) would be important (e.g. price lists, scores, etc.) > > I am once again hampered by Web IDL spec. Is it possible to leave this for > > future versions of the spec? Actually Web IDL does define the "double" type and its Javascript binding. Can we add double to the list of types an index can be applied to? > > > c. 3.1.1: cross type sorting and sorting of long values are clear. > > > Sorting of strings however needs more elaboration. In particular, which > > > collation do we use? Does the user or developer get to choose a > > > collation? If we pick up a collation from the environment (e.g. the OS), > > > if the collation changes we'd have to re-index all the databases. > > I propose to use Unicode collation algorithm, which was also suggested by > > Jonas during a conversation. I don't think this is specific enough, in that it still doesn't say which collation tables to use and how to specify them. A single collation strategy won't do for all languages (it'll range from slightly wrong to nonsense depending on the target language). This is a trickier area than I had initialize thought. We'll bake on this a bit and get back to this group with ideas. > > > d. 3.1.3: spec reads ".key path must be the name of an enumerated > > > property."; how about composite keys (would make the related APIs take a > > > DOMString or DOMStringList) > > I prefer to leave composite keys to a future version. I don't think we can get away with this. For indexes this is quite common (if anything else to have stable ordering when the prefix of the index has repeats). Once we have it for indexes the delta for having it for primary keys as well is pretty small (although I wouldn't oppose leaving out composite primary keys if that would help scope the feature). > > > b. Query processing libraries will need temporary stores, which need > > > temporary names. Should we introduce an API for the creation of temporary > > > stores with transaction lifetime and no name? > > Firstly, I think we can leave this safely to a future version. Secondly, my > > suggestion would be to provide a parameter to the create call to indicate > > that an object store being created is a transient one, i.e., not backed by > > durable storage. They could be available across different transactions. If > > your intention is to not make these object stores unavailable across > > connections, then we can also offer a connection-specific transient object > > store. > > In general, it requires us to introduce the notion of create params, which > > would simplify the evolution of the API. This is also similar to how > > Berkeley DB handles various options, not just those related to creation of > > a Berkeley "database". Let's see how we progress on this one, and maybe revisit it a bit later. I'm worried about code that wants to do things such as a block-sort that needs to spill to disk, as it would have to either use some pattern or ask the user for temp table names. > > > c. It would be nice to have an estimate row count on each store. > > > This comes at an implementation and runtime cost. Strong opinions? > > > Lacking everything else, this would be the only statistic to base > > > decisions on for a query processor. > > I believe we need to have a general way of estimating the number of records > > in a cursor once a key range has been specified. Kris Zyp also brings this > > up in a separate email. I am willing to add an estimateCount attribute to > > IDBCursor for this. EstimateCount sounds good. > > > d. The draft does not touch on how applications would do optimistic > > > concurrency. A common way
Re: [IndexedDB] Detailed comments for the current draft
On Sun, Jan 31, 2010 at 11:33 PM, Nikunj Mehta wrote: > > On Jan 26, 2010, at 12:47 PM, Pablo Castro wrote: > > These are notes that we collected both from reviewing the spec (editor's >> draft up to Jan 24th) and from a prototype implementation that we are >> working on. I didn't realize we had this many notes, otherwise I would have >> been sending intermediate notes early. Will do so next round. >> >> >> 1. Keys and sorting >> >> a. 3.1.1: it would seem that having also date/time values as keys >> would be important and it's a common sorting criteria (e.g. as part of a >> composite primary key or in general as an index key). >> > > The Web IDL spec does not support a Date/Time data type. Could your use > case be supported by storing the underlying time with millisecond precision > using an IDL long long type? I am willing to change the spec so that it > allows long long instead of long IDL type, which will provide adequate > support for Date and time sorting. Can the spec not be augmented? It seems like other specs like WebGL have created their own types. If not, I suppose your suggested change would suffice as well. This does seem like an important use case. > b. 3.1.1: similarly, sorting on number in general (not just >> integers/longs) would be important (e.g. price lists, scores, etc.) >> > > I am once again hampered by Web IDL spec. Is it possible to leave this for > future versions of the spec? > > > c. 3.1.1: cross type sorting and sorting of long values are clear. >> Sorting of strings however needs more elaboration. In particular, which >> collation do we use? Does the user or developer get to choose a collation? >> If we pick up a collation from the environment (e.g. the OS), if the >> collation changes we'd have to re-index all the databases. >> > > I propose to use Unicode collation algorithm, which was also suggested by > Jonas during a conversation. > > > d. 3.1.3: spec reads "…key path must be the name of an enumerated >> property…"; how about composite keys (would make the related APIs take a >> DOMString or DOMStringList) >> > > I prefer to leave composite keys to a future version. > > > >> >> 2. Values >> >> a. 3.1.2: isn't the requirement for "structured clones" too much? It >> would mean implementations would have to be able to store and retrieve File >> objects and such. Would it be more appropriate to say it's just graphs of >> Javascript primitive objects/values (object, string, number, date, arrays, >> null)? >> > > Your list leaves out File, Blob, FileList, ImageData, and RegExp types. > While I don't feel so strongly about all these types, I believe that support > for Blob/File and ImageData will be beneficial to those who work with > browsers. Instead of profiling this algorithm, I think it is best to just > require the same algorithm. > > > >> >> 3. Object store >> >> a. 3.1.3: do we really need in-line + out-of-line keys? Besides the >> concept-count increase, we wonder whether out-of-line keys would cause >> trouble to generic libraries, as the values for the keys wouldn't be part of >> the values iterated when doing a "foreach" over the table. >> > > Certainly it is a matter of prioritizing among various requirements. > Out-of-line keys enable people to store simple persistent hash maps. I think > it would be wrong to require that data be always stored as objects. A > library can always elide the availability of out-of-line keys if that poses > a problem to its users. > > > b. Query processing libraries will need temporary stores, which need >> temporary names. Should we introduce an API for the creation of temporary >> stores with transaction lifetime and no name? >> > > Firstly, I think we can leave this safely to a future version. Secondly, my > suggestion would be to provide a parameter to the create call to indicate > that an object store being created is a transient one, i.e., not backed by > durable storage. They could be available across different transactions. If > your intention is to not make these object stores unavailable across > connections, then we can also offer a connection-specific transient object > store. > > In general, it requires us to introduce the notion of create params, which > would simplify the evolution of the API. This is also similar to how > Berkeley DB handles various options, not just those related to creation of a > Berkeley "database". > > > c. It would be nice to have an estimate row count on each store. This >> comes at an implementation and runtime cost. Strong opinions? Lacking >> everything else, this would be the only statistic to base decisions on for a >> query processor. >> > > I believe we need to have a general way of estimating the number of records > in a cursor once a key range has been specified. Kris Zyp also brings this > up in a separate email. I am willing to add an estimateCount attribute to > IDBCursor for this. > > > d. The draft does not touch on h
Re: [IndexedDB] Detailed comments for the current draft
On Jan 31, 2010, at 11:33 PM, Nikunj Mehta wrote: d. The current draft fails to format in IE, the script that comes with the page fails with an error I am aware of this and am working with the maintainer of ReSpec.js tool to publish an editor's draft that displays in IE. Would it be OK if this editor's draft that works in IE is made available at an alternate W3C URL? http://dev.w3.org/2006/webapi/WebSimpleDB/post-Overview.html has the current static version, which should work in IE. I will try to keep it updated, not too far behind the default URL.
Re: [IndexedDB] Detailed comments for the current draft
On Jan 26, 2010, at 12:47 PM, Pablo Castro wrote: These are notes that we collected both from reviewing the spec (editor's draft up to Jan 24th) and from a prototype implementation that we are working on. I didn't realize we had this many notes, otherwise I would have been sending intermediate notes early. Will do so next round. 1. Keys and sorting a. 3.1.1: it would seem that having also date/time values as keys would be important and it's a common sorting criteria (e.g. as part of a composite primary key or in general as an index key). The Web IDL spec does not support a Date/Time data type. Could your use case be supported by storing the underlying time with millisecond precision using an IDL long long type? I am willing to change the spec so that it allows long long instead of long IDL type, which will provide adequate support for Date and time sorting. b. 3.1.1: similarly, sorting on number in general (not just integers/longs) would be important (e.g. price lists, scores, etc.) I am once again hampered by Web IDL spec. Is it possible to leave this for future versions of the spec? c. 3.1.1: cross type sorting and sorting of long values are clear. Sorting of strings however needs more elaboration. In particular, which collation do we use? Does the user or developer get to choose a collation? If we pick up a collation from the environment (e.g. the OS), if the collation changes we'd have to re- index all the databases. I propose to use Unicode collation algorithm, which was also suggested by Jonas during a conversation. d. 3.1.3: spec reads "…key path must be the name of an enumerated property…"; how about composite keys (would make the related APIs take a DOMString or DOMStringList) I prefer to leave composite keys to a future version. 2. Values a. 3.1.2: isn't the requirement for "structured clones" too much? It would mean implementations would have to be able to store and retrieve File objects and such. Would it be more appropriate to say it's just graphs of Javascript primitive objects/values (object, string, number, date, arrays, null)? Your list leaves out File, Blob, FileList, ImageData, and RegExp types. While I don't feel so strongly about all these types, I believe that support for Blob/File and ImageData will be beneficial to those who work with browsers. Instead of profiling this algorithm, I think it is best to just require the same algorithm. 3. Object store a. 3.1.3: do we really need in-line + out-of-line keys? Besides the concept-count increase, we wonder whether out-of-line keys would cause trouble to generic libraries, as the values for the keys wouldn't be part of the values iterated when doing a "foreach" over the table. Certainly it is a matter of prioritizing among various requirements. Out-of-line keys enable people to store simple persistent hash maps. I think it would be wrong to require that data be always stored as objects. A library can always elide the availability of out-of-line keys if that poses a problem to its users. b. Query processing libraries will need temporary stores, which need temporary names. Should we introduce an API for the creation of temporary stores with transaction lifetime and no name? Firstly, I think we can leave this safely to a future version. Secondly, my suggestion would be to provide a parameter to the create call to indicate that an object store being created is a transient one, i.e., not backed by durable storage. They could be available across different transactions. If your intention is to not make these object stores unavailable across connections, then we can also offer a connection-specific transient object store. In general, it requires us to introduce the notion of create params, which would simplify the evolution of the API. This is also similar to how Berkeley DB handles various options, not just those related to creation of a Berkeley "database". c. It would be nice to have an estimate row count on each store. This comes at an implementation and runtime cost. Strong opinions? Lacking everything else, this would be the only statistic to base decisions on for a query processor. I believe we need to have a general way of estimating the number of records in a cursor once a key range has been specified. Kris Zyp also brings this up in a separate email. I am willing to add an estimateCount attribute to IDBCursor for this. d. The draft does not touch on how applications would do optimistic concurrency. A common way of doing this is to use a timestamp value that's automatically updated by the system every time someone touches the row. While we don't feel it's a must have, it certainly supports common scenarios. Do you strongly feel that the manner in which optimistic concurrency is performed needs to be described in this spec? I do
Re: [IndexedDB] Detailed comments for the current draft
On Wed, Jan 27, 2010 at 10:58 AM, Jeremy Orlow wrote: > On Tue, Jan 26, 2010 at 12:47 PM, Pablo Castro > wrote: >> >> 2. Values >> >> a. 3.1.2: isn't the requirement for "structured clones" too much? It >> would mean implementations would have to be able to store and retrieve File >> objects and such. Would it be more appropriate to say it's just graphs of >> Javascript primitive objects/values (object, string, number, date, arrays, >> null)? > > If LocalStorage is able to store structured clones, then I'm not sure if > there's too much of an additional burden on implementations. I think we > should either change both to "graphs of javascript primitives" or leave both > as "structured clones". > As a data point: does anyone currently plan on implementing the structured > clone requirement of the WebStorage spec? Yes, we do at mozilla. Storing File objects is definitely something that we get requests for and that we see useful for a number of use cases, such as offline mail, resumable background file upload, etc. / Jonas
Re: [IndexedDB] Detailed comments for the current draft
On Tue, Jan 26, 2010 at 12:47 PM, Pablo Castro wrote: > 2. Values > > a. 3.1.2: isn't the requirement for "structured clones" too much? It > would mean implementations would have to be able to store and retrieve File > objects and such. Would it be more appropriate to say it's just graphs of > Javascript primitive objects/values (object, string, number, date, arrays, > null)? > If LocalStorage is able to store structured clones, then I'm not sure if there's too much of an additional burden on implementations. I think we should either change both to "graphs of javascript primitives" or leave both as "structured clones". As a data point: does anyone currently plan on implementing the structured clone requirement of the WebStorage spec?
Re: [IndexedDB] Detailed comments for the current draft
Hi Pablo, Great work and excellent feedback. I will take a little bit of time to digest and respond. Nikunj On Jan 26, 2010, at 12:47 PM, Pablo Castro wrote: These are notes that we collected both from reviewing the spec (editor's draft up to Jan 24th) and from a prototype implementation that we are working on. I didn't realize we had this many notes, otherwise I would have been sending intermediate notes early. Will do so next round. 1. Keys and sorting a. 3.1.1: it would seem that having also date/time values as keys would be important and it's a common sorting criteria (e.g. as part of a composite primary key or in general as an index key). b. 3.1.1: similarly, sorting on number in general (not just integers/longs) would be important (e.g. price lists, scores, etc.) c. 3.1.1: cross type sorting and sorting of long values are clear. Sorting of strings however needs more elaboration. In particular, which collation do we use? Does the user or developer get to choose a collation? If we pick up a collation from the environment (e.g. the OS), if the collation changes we'd have to re- index all the databases. d. 3.1.3: spec reads "…key path must be the name of an enumerated property…"; how about composite keys (would make the related APIs take a DOMString or DOMStringList) 2. Values a. 3.1.2: isn't the requirement for "structured clones" too much? It would mean implementations would have to be able to store and retrieve File objects and such. Would it be more appropriate to say it's just graphs of Javascript primitive objects/values (object, string, number, date, arrays, null)? 3. Object store a. 3.1.3: do we really need in-line + out-of-line keys? Besides the concept-count increase, we wonder whether out-of-line keys would cause trouble to generic libraries, as the values for the keys wouldn't be part of the values iterated when doing a "foreach" over the table. b. Query processing libraries will need temporary stores, which need temporary names. Should we introduce an API for the creation of temporary stores with transaction lifetime and no name? c. It would be nice to have an estimate row count on each store. This comes at an implementation and runtime cost. Strong opinions? Lacking everything else, this would be the only statistic to base decisions on for a query processor. d. The draft does not touch on how applications would do optimistic concurrency. A common way of doing this is to use a timestamp value that's automatically updated by the system every time someone touches the row. While we don't feel it's a must have, it certainly supports common scenarios. 4. Indexes a. 3.1.4 mentions "auto-populated" indexes, but then there is no mention of other types. We suggest that we remove this and in the algorithms section describe side-effecting operations as always updating the indexes as well. b. If during insert/update the value of the key is not present (i.e. undefined as opposite to null or a value), is that a failure, does the row not get indexed, or is it indexed as null? Failure would probably cause a lot of trouble to users; the other two have correctness problems. An option is to index them as undefined, but now we have undefined and null as indexable keys. We lean toward this last option. 5. Databases a. Not being able to enumerate database gets in the way of creating good tools and frameworks such as database explorers. What was the motivation for this? Is it security related? b. Clarification on transactions: all database operations that affect the schema (create/remove store/index, setVersion, etc.) as well as data modification operations are assumed to be auto-commit by default, correct? Furthermore, all those operations (both schema and data) can happen within a transaction, including mixing schema and data changes. Does that line up with others' expectations? If so we should find a spot to articulate this explicitly. c. No way to delete a database? It would be reasonable for applications to want to do that and let go of the user data (e.g. a "forget me" feature in a web site) 6. Transactions a. While we understand the goal of simplifying developers' life with an error-free transactional model, we're not sure if we're making more harm by introducing more concepts into this space. Wouldn't it be better to use regular transactions with a well-known failure mode (e.g. either deadlocks or optimistic concurrency failure on commit)? b.If in auto-commit mode, if two cursors are opened at the same time (e.g. to scan them in an interleaved way), are they in independent transactions simultaneously active in the same connection? 7. Algorithms a. 3.2.2: steps 4 and 5 are inverted in order. b. 3.2.2: when there is a key generator and the store uses in- line
[IndexedDB] Detailed comments for the current draft
These are notes that we collected both from reviewing the spec (editor's draft up to Jan 24th) and from a prototype implementation that we are working on. I didn't realize we had this many notes, otherwise I would have been sending intermediate notes early. Will do so next round. 1. Keys and sorting a. 3.1.1: it would seem that having also date/time values as keys would be important and it's a common sorting criteria (e.g. as part of a composite primary key or in general as an index key). b. 3.1.1: similarly, sorting on number in general (not just integers/longs) would be important (e.g. price lists, scores, etc.) c. 3.1.1: cross type sorting and sorting of long values are clear. Sorting of strings however needs more elaboration. In particular, which collation do we use? Does the user or developer get to choose a collation? If we pick up a collation from the environment (e.g. the OS), if the collation changes we'd have to re-index all the databases. d. 3.1.3: spec reads "…key path must be the name of an enumerated property…"; how about composite keys (would make the related APIs take a DOMString or DOMStringList) 2. Values a. 3.1.2: isn't the requirement for "structured clones" too much? It would mean implementations would have to be able to store and retrieve File objects and such. Would it be more appropriate to say it's just graphs of Javascript primitive objects/values (object, string, number, date, arrays, null)? 3. Object store a. 3.1.3: do we really need in-line + out-of-line keys? Besides the concept-count increase, we wonder whether out-of-line keys would cause trouble to generic libraries, as the values for the keys wouldn't be part of the values iterated when doing a "foreach" over the table. b. Query processing libraries will need temporary stores, which need temporary names. Should we introduce an API for the creation of temporary stores with transaction lifetime and no name? c. It would be nice to have an estimate row count on each store. This comes at an implementation and runtime cost. Strong opinions? Lacking everything else, this would be the only statistic to base decisions on for a query processor. d. The draft does not touch on how applications would do optimistic concurrency. A common way of doing this is to use a timestamp value that's automatically updated by the system every time someone touches the row. While we don't feel it's a must have, it certainly supports common scenarios. 4. Indexes a. 3.1.4 mentions "auto-populated" indexes, but then there is no mention of other types. We suggest that we remove this and in the algorithms section describe side-effecting operations as always updating the indexes as well. b. If during insert/update the value of the key is not present (i.e. undefined as opposite to null or a value), is that a failure, does the row not get indexed, or is it indexed as null? Failure would probably cause a lot of trouble to users; the other two have correctness problems. An option is to index them as undefined, but now we have undefined and null as indexable keys. We lean toward this last option. 5. Databases a. Not being able to enumerate database gets in the way of creating good tools and frameworks such as database explorers. What was the motivation for this? Is it security related? b. Clarification on transactions: all database operations that affect the schema (create/remove store/index, setVersion, etc.) as well as data modification operations are assumed to be auto-commit by default, correct? Furthermore, all those operations (both schema and data) can happen within a transaction, including mixing schema and data changes. Does that line up with others' expectations? If so we should find a spot to articulate this explicitly. c. No way to delete a database? It would be reasonable for applications to want to do that and let go of the user data (e.g. a "forget me" feature in a web site) 6. Transactions a. While we understand the goal of simplifying developers' life with an error-free transactional model, we're not sure if we're making more harm by introducing more concepts into this space. Wouldn't it be better to use regular transactions with a well-known failure mode (e.g. either deadlocks or optimistic concurrency failure on commit)? b.If in auto-commit mode, if two cursors are opened at the same time (e.g. to scan them in an interleaved way), are they in independent transactions simultaneously active in the same connection? 7. Algorithms a. 3.2.2: steps 4 and 5 are inverted in order. b. 3.2.2: when there is a key generator and the store uses in-line keys, should the generated key value be propagated to the original object (in addition to the clone), such that both are in sync after the put operation? c. 3.2.3: step 2, probably editorial mistake? Wouldn't all indexes have