RE: [WebStorage] Concerns on spec section 'Processing Model'
On Thu, 30 Jul 2009, Laxmi Narsimha Rao Oruganti wrote: I am unable to understand on how we are expecting spec readers to interpret the spec lines. At one end, we are not open to make it clear that it is a hint/model, and on the other end we are ok to not implementing that way. Think about a person reading the spec, who is *not* in touch with this discussion forum; he immediately will conclude that he *must* implement in the same way as we describe in the spec. So, yes his implementation will be interoperable but we are essentially making whole world implement the way we want:(. I sincerely hope people put the readers hat than editors hat here. It is so sad to see people are hesitant to detail out implantation aspects as hints. I now have a great respect for IETF specs, where they detail out everything in clear. If we want the spec to have quality, then I suggest keep implementation details completely out, but if we think that it is not the way W3C specs are written. Then, I would suggest keep those implementation details as hints and have the spec text clear like for example The database system should allow only one writer at a time. *One way* vendors could ensure this is by taking up an exclusive lock on the database file . The conformance section says: # Conformance requirements phrased as algorithms or specific steps may be # implemented in any manner, so long as the end result is equivalent. (In # particular, the algorithms defined in this specification are intended to # be easy to follow, and not intended to be performant.) -- http://dev.w3.org/html5/webdatabase/#conformance-requirements Is that not clear enough? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
RE: [WebStorage] Concerns on spec section 'Processing Model'
I am unable to understand on how we are expecting spec readers to interpret the spec lines. At one end, we are not open to make it clear that it is a hint/model, and on the other end we are ok to not implementing that way. Think about a person reading the spec, who is *not* in touch with this discussion forum; he immediately will conclude that he *must* implement in the same way as we describe in the spec. So, yes his implementation will be interoperable but we are essentially making whole world implement the way we want:(. I sincerely hope people put the readers hat than editors hat here. It is so sad to see people are hesitant to detail out implantation aspects as hints. I now have a great respect for IETF specs, where they detail out everything in clear. If we want the spec to have quality, then I suggest keep implementation details completely out, but if we think that it is not the way W3C specs are written. Then, I would suggest keep those implementation details as hints and have the spec text clear like for example The database system should allow only one writer at a time. *One way* vendors could ensure this is by taking up an exclusive lock on the database file . Hope you guys take the criticism in the right spirit. Thanks, Laxmi -Original Message- From: Ian Hickson [mailto:i...@hixie.ch] Sent: Thursday, July 30, 2009 1:24 AM To: Laxmi Narsimha Rao Oruganti; Nikunj R. Mehta; Aaron Boodman Cc: public-webapps@w3.org Subject: Re: [WebStorage] Concerns on spec section 'Processing Model' On Thu, 16 Jul 2009, Laxmi Narsimha Rao Oruganti wrote: - Why is the spec mandating a transaction to take an *exclusive write lock on the entire database*? It is to avoid the possibility that two transactions will conflict and then have one be forced to roll back. For example, we want to make sure that there is no way that a Web page will ever have a failure if it opens a transaction that does: read from row A. read from row B. read from row C. write to row A. write to row B. write to row C. ...even if the page is opened twice, and the two pages both do this transaction at the same time. Since client-side have very low contention, getting a lock on the entire database rather than requiring that the author explicitly lock specific rows or columns is good enough. On Fri, 24 Jul 2009, Nikunj R. Mehta wrote: If you want to provide an application programmer with a limited degree of freedom from a certain class of errors, then there is a different solution. It is called isolation level [1]. When opening a transaction, just provide the required isolation level. Heck, if you'd like, make SERIALIZABLE the default value. This doesn't prevent rollback in the above case as far as I can tell. But don't disallow other possibilities or create the illusion of silver bullets. Why not? The exclusive lock model described in the spec is just a model, it isn't intended to be actually require an exclusive lock. If an implementation can get the same result using some other mechanism, that's fine. The spec says: [[ If the mode is read/write, the transaction must have an exclusive write lock over the entire database ]] Therefore, correct me if I am wrong, but the spec prohibits the following: An implementation of the Database object allows more than one transaction to write in a database while another transaction has a write lock on the same database, it is a failure. Assuming you mean that you would cause one of the two transactions above to fail, then this is not a black-box equivalent implementation of what the spec says, and it is therefore not conforming. If so, then I want to formally object to that spec text because it is overly restrictive on implementers as well as on application programmers. Do you have any alternative proposal that doesn't risk either transaction failing in the above example? 3. A read-only transaction includes inside it a read-write transaction. This isn't possible with the current asynchronous API as far as I can tell. With the synchronous API, it would hang trying to open the read-write transaction for however long it takes the UA to realise that the script that is trying to get the read-write transaction is the same one as the one that has an open read-only transaction, and then it would fail with error code 7. Then again the spec is too restrictive because application programmers need the ability to upgrade their lock from read-only to read-write What are the use cases for this? and an application should never deadlock itself. I would expect implementations to catch this case, but even if they didn't, it would time out eventually anyway. I can add explicit text saying that this should check to see if there is a synchronous read/write transaction open in the same thread; would that help? Therefore, I formally object to the spec disallowing
RE: [WebStorage] Concerns on spec section 'Processing Model'
[Being in a different time zone (IST), was not able to actively engage on this thread :(] There seems to be Update Loss issue here. If the UI thread which is supposed to enqueue the statements was scheduled out in the middle of a transaction block. And the background thread got scheduled in, consumed the already queued items alone part of transaction and committed. That means, we have lost atomicity right. I am sure spec did not intend this. Am I missing something? Thanks, Laxmi -Original Message- From: Aaron Boodman [mailto:a...@google.com] Sent: Saturday, July 25, 2009 6:35 AM To: Nikunj R. Mehta Cc: Ian Hickson; public-webapps WG; Laxmi Narsimha Rao Oruganti Subject: Re: [WebStorage] Concerns on spec section 'Processing Model' On Fri, Jul 24, 2009 at 5:32 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: The spec is also silent about what happens if I put a wait by making another asynchronous call inside my transaction callback logic. By inference, this would be allowed since all statements are executed inside callbacks, so why distinguish between transaction and other (non-SQLTransactionErrorCallback) types of callbacks. The processing model in 4.3.2 simply says that the SQL statements are queued up. It is unclear what if anything happens if the database runs out of statements to execute if the transaction logic takes time to add another statement to the queue before the database decides to commit. Am I wrong or is this an ambiguous, but correct interpretation? 4.3.2, step 6 says: While there are any statements queued up in the transaction, perform the following steps for each queued up statement in the transaction, oldest first. Each statement has a statement, optionally a result set callback, and optionally an error callback. ... then there are the steps to process the statement, handle errors ... And then step 8 is: Commit the transaction. So it seems pretty clear to me that the transaction commits automatically when there are no more statement queued and all executed successfully. Also I helped design this, and I can tell you that was the intent. Those who are worried about throwing complexity of transaction recovery on Web programmers should perhaps also be worried about the insane complexity of asynchronous transaction programming, that no one in the world should have to learn. The mainstream database developers don't have to deal with that. Why should poor Web programmers have to suffer this? That is something that is unfortunate. However, it is a browser vendor requirement to not allow synchronous IO on the UI thread, because it leads to blocked web pages and browser UI. So there's not really alternative. This is part of the reason for the introduction of web workers. They can allow synchronous IO because the entire program is running on a separate thread. Moreover, with an asynchronous database the spec doesn't allow an application to rollback a transaction, should certain application logic require that. This is yet another case of creating a storage API that is different from traditional database developers. It does allow it. If an exception is thrown by a developer inside a statement callback, the transaction will be rolled back: 4.3.2, step 6, substep 6: If the callback was invoked and raised an exception, jump to the last step in the overall steps. There seems to be a pattern of ignoring good API practices when interacting with a database and it appears intentional. Am I wrong in my interpretation? I think you may be. It is common when working with databases in C++ to use the RAII pattern (http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization) to explicitly scope transactions. This is done to prevent developers from forgetting to close transactions. The analogue to this pattern in JavaScript is callbacks. You can't use RAII in the same way as C++ with Java, but C# introduced the using statement (http://msdn.microsoft.com/en-us/library/yh598w02.aspx) to allow the same sorts of things. In particular, the .net IDbTransaction interface inherits IDisposable, so that you can use it with the using statement (http://msdn.microsoft.com/en-us/library/system.data.idbtransaction.aspx), and that is the recommended pattern. It does appear that it is possible to hold a transaction open all day with the DatabaseSync interface (http://dev.w3.org/html5/webdatabase/#databasesync). Specifically the SQLTransactionSync method has commit/rollback methods. The DatabaseSync interface was added after I worked on this, so I can't say why it doesn't use callbacks. In any case, I was talking about the async flavor which is what my example code referred to. Do you agree it is not possible to hang transactions open from Database (http://dev.w3.org/html5/webdatabase/#database)? If not, what am I missing? I can't agree simply because the spec says nothing about it. In fact, if anything the rest of the spec text around
RE: [WebStorage] Concerns on spec section 'Processing Model'
That is all the responsibility of database system. We don't need to tell database systems on how to do it, we just need to tell them on what to do. Today database systems do have lock manager which takes care of these responsibilities. Coming to the question of failing transaction unpredictably, even with current specification; transaction do fail. For example, if there exists a writer transaction which is already holding an exclusive lock, this new thread would fail to acquire lock. The failures would be there. Now the next question people would ask is on how do we make sure that partial changes are not causing problem in case of a failure in the middle of sequence of operations. That is the responsibility of transaction manager. Note that transaction manager treats the whole sequence as a single atomic unit. Are we missing something? Thanks, Laxmi -Original Message- From: Ian Hickson [mailto:i...@hixie.ch] Sent: Friday, July 24, 2009 6:55 AM To: Nikunj R. Mehta Cc: public-webapps WG; Laxmi Narsimha Rao Oruganti Subject: Re: [WebStorage] Concerns on spec section 'Processing Model' On Thu, 16 Jul 2009, Nikunj R. Mehta wrote: The spec should not restrict implementations to any one level of concurrency unless there are specific undesirable effects. Restricting the database to a single writer means that if there are separate workers or background threads working to update non-overlapping portions, then they have to wait for the lone current writer. Implementations can certainly compete to produce the level of concurrency that developers need. Specifically, I propose that the following text [[ If the mode is read/write, the transaction must have an exclusive write lock over the entire database. If the mode is read-only, the transaction must have a shared read lock over the entire database. The user agent should wait for an appropriate lock to be available. ]] be replaced with the following text [[ Multiple read-only transactions may share the same data as long as there is no transaction attempting to write the data being read. The user agent must wait for transactions that are reading some data before allowing a read/write transaction on the same data to continue. ]] Since there's no way for the author to say ahead of time which rows or cells the transactions are going to use, how can you do the above without ending up with some transactions failing unpredictably? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
RE: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, 24 Jul 2009, Laxmi Narsimha Rao Oruganti wrote: That is all the responsibility of database system. We don't need to tell database systems on how to do it, we just need to tell them on what to do. Today database systems do have lock manager which takes care of these responsibilities. Coming to the question of failing transaction unpredictably, even with current specification; transaction do fail. For example, if there exists a writer transaction which is already holding an exclusive lock, this new thread would fail to acquire lock. The failures would be there. Now the next question people would ask is on how do we make sure that partial changes are not causing problem in case of a failure in the middle of sequence of operations. That is the responsibility of transaction manager. Note that transaction manager treats the whole sequence as a single atomic unit. As I understand it, with what is specced now, if you try to get a write transaction lock, it will only fail if it times out, which would probably be a symptom of a more serious bug anyway. There's never going to be a forced rollback; once you have got a transaction lock, you are not going to ever have it fail on you unexpectedly. I think this is an important invariant, because otherwise script writers _will_ shoot themselves in the foot. These aren't professional database developers; Web authors span the gamut of developer experience from the novice who is writing code more by luck than by knowledge all the way to the UI designer who wound up stuck with the task for writing the UI logic but has no professional background in programing, let alone concurrency in databases. We can't be firing unexpected exceptions when their users happen to open two tabs to the same application at the same time, leaving data unsaved. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
RE: [WebStorage] Concerns on spec section 'Processing Model'
Let me probe this further to get clarity. [Ian] As I understand it, with what is specced now, if you try to get a write transaction lock, it will only fail if it times out, which would probably be a symptom of a more serious bug anyway. [Ian] There's never going to be a forced rollback; once you have got a transaction lock, you are not going to ever have it fail on you unexpectedly. My understanding of your requirement is Database should allow only one active writer transaction. How the database systems achieve this need not be explained. Note that, this need not be achieved only by acquiring an exclusive lock on the database file. Think about a database implementation which is not a single file based (Log + Checkpoint design model) where there is one data file and a couple of log files. Spec-ing that they have to hold exclusive lock on database file is ambiguous between data file and log file. If you take BDB JE as an example, they don't even have data file. Their model is a sequence of log files. I have a question: - Many of the database systems today don't ask the user to specify the purpose of use (read/write) when opening a transaction. But the spec is expecting to specify the use of transaction for read/write. If we really need this to fly with multiple database systems, we should start that type of choice at connection opening time and not at transaction creation time. A connection typically accepts read/write or read only connection. Nikunj or others should comment on whether taking read/write Vs read-only choice at connection time is practiced in their corresponding products. Thanks, Laxmi -Original Message- From: Ian Hickson [mailto:i...@hixie.ch] Sent: Friday, July 24, 2009 2:07 PM To: Laxmi Narsimha Rao Oruganti Cc: Nikunj R. Mehta; public-webapps WG Subject: RE: [WebStorage] Concerns on spec section 'Processing Model' On Fri, 24 Jul 2009, Laxmi Narsimha Rao Oruganti wrote: That is all the responsibility of database system. We don't need to tell database systems on how to do it, we just need to tell them on what to do. Today database systems do have lock manager which takes care of these responsibilities. Coming to the question of failing transaction unpredictably, even with current specification; transaction do fail. For example, if there exists a writer transaction which is already holding an exclusive lock, this new thread would fail to acquire lock. The failures would be there. Now the next question people would ask is on how do we make sure that partial changes are not causing problem in case of a failure in the middle of sequence of operations. That is the responsibility of transaction manager. Note that transaction manager treats the whole sequence as a single atomic unit. As I understand it, with what is specced now, if you try to get a write transaction lock, it will only fail if it times out, which would probably be a symptom of a more serious bug anyway. There's never going to be a forced rollback; once you have got a transaction lock, you are not going to ever have it fail on you unexpectedly. I think this is an important invariant, because otherwise script writers _will_ shoot themselves in the foot. These aren't professional database developers; Web authors span the gamut of developer experience from the novice who is writing code more by luck than by knowledge all the way to the UI designer who wound up stuck with the task for writing the UI logic but has no professional background in programing, let alone concurrency in databases. We can't be firing unexpected exceptions when their users happen to open two tabs to the same application at the same time, leaving data unsaved. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, Jul 24, 2009 at 1:54 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: Experience has shown that there is no easy way out when dealing with transactions, and locking at the whole database level is no solution to failures. The thing that makes the web browser environment different an interesting is that multiple independent applications can end up having access to the same database if the run in the same origin. This could be multiple instances of the same app (eg multiple gmail windows) or just different apps that happen to be on the same origin (many Google apps run on www.google.com). Because these apps are isolated from each other, they have no way to cooperate to reduce conflicts. They also have no control over whether there are multiple copies of themselves (the user control this). Therefore if the platform does not protect against this, basically any statement can fail due to conflict. This was a big problem with Gears, and led to applications having to go to crazy contortions to do things like master election. When we designed the HTML5 version of the database API we specifically tried to avoid it. I do not agree that database-level locking is a big problem for web applications. They are typically serving as most a handful of clients. As long as you can specify read vs read/write transactions, I think the current design is the correct trade-off in terms of complexity and correctness. - a
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, Jul 24, 2009 at 2:06 PM, Aaron Boodmana...@google.com wrote: I do not agree that database-level locking is a big problem for web applications. Preemptive correction: I mean for the client-side of web applications. There are usually at most a handful of clients accessing an HTML5 database instance. - a
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, Jul 24, 2009 at 2:17 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: On Jul 24, 2009, at 1:36 AM, Ian Hickson wrote: On Fri, 24 Jul 2009, Laxmi Narsimha Rao Oruganti wrote: That is all the responsibility of database system. We don't need to tell database systems on how to do it, we just need to tell them on what to do. Today database systems do have lock manager which takes care of these responsibilities. Coming to the question of failing transaction unpredictably, even with current specification; transaction do fail. For example, if there exists a writer transaction which is already holding an exclusive lock, this new thread would fail to acquire lock. The failures would be there. Now the next question people would ask is on how do we make sure that partial changes are not causing problem in case of a failure in the middle of sequence of operations. That is the responsibility of transaction manager. Note that transaction manager treats the whole sequence as a single atomic unit. As I understand it, with what is specced now, if you try to get a write transaction lock, it will only fail if it times out, which would probably be a symptom of a more serious bug anyway. Can you explain a more serious bug? The write lock may actually happen in the middle of a read-only transaction, can't it? I don't see spec text prohibiting that. It's not clear to me what you're asking about in this paragraph. Write transactions should be exclusive, read transactions can be shared. Write transactions queue until all open read transactions complete. I can't find the actual spec right now (where is its canonical home now?) so I can't point at the exact text, but that is the goal I believe. There's never going to be a forced rollback; once you have got a transaction lock, you are not going to ever have it fail on you unexpectedly. Even if you have a transaction lock, 1. the application logic could cause an exception 2. the application finds an unacceptable data condition and needs to rollback the transaction In these cases, the developer caused the exception himself, and it only affects his own application. So it's not unexpected in the same sense locking exceptions are. 3. face a disk failure Yes, but there is nothing that the platform can do about this case. It is an exception in the truest sense of the word. 4. encounter a bug in the underlying software The platform shouldn't have API just in case underlying software has bugs. Throwing an exception and unwinding the transaction is the most sensible thing to do and what it does now. In either of these cases, how would the application code be expected to recover? It can't, but these are exceptional circumstances that are generally unexpected and indicate bigger problems. This is different from access conflicts which are expected and do not indicate bigger problems. The platform can and should make this easier. These aren't professional database developers; Web authors span the gamut of developer experience from the novice who is writing code more by luck than by knowledge all the way to the UI designer who wound up stuck with the task for writing the UI logic but has no professional background in programing, let alone concurrency in databases. This is a strong reason to avoid SQL in the front-end. I am also interested in a non-SQL storage API, but I think this is a separate issue. We can't be firing unexpected exceptions when their users happen to open two tabs to the same application at the same time, leaving data unsaved. So you'd much rather tell an application user that they should close one of the two tabs since they can't obtain a read-write lock in both. I still don't understand how the exclusive database lock helps. Would you please elaborate? I think you are either misunderstanding the spec or it has a bug, because this is not the intent. Requests to obtain read/write transactions are asynchronous and are queued until they can be granted: myDatabase.transaction(function(tx) { // this callback doesn't occur until the caller has exclusive access }); - a
RE: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, 24 Jul 2009, Laxmi Narsimha Rao Oruganti wrote: Let me probe this further to get clarity. As I understand it, with what is specced now, if you try to get a write transaction lock, it will only fail if it times out, which would probably be a symptom of a more serious bug anyway. There's never going to be a forced rollback; once you have got a transaction lock, you are not going to ever have it fail on you unexpectedly. My understanding of your requirement is Database should allow only one active writer transaction. How the database systems achieve this need not be explained. Sure, so long as the implementation is black-box indistinguishable from what the spec says, it can do whatever it wants. Note that, this need not be achieved only by acquiring an exclusive lock on the database file. Think about a database implementation which is not a single file based (Log + Checkpoint design model) where there is one data file and a couple of log files. Spec-ing that they have to hold exclusive lock on database file is ambiguous between data file and log file. If you take BDB JE as an example, they don't even have data file. Their model is a sequence of log files. The exclusive lock model described in the spec is just a model, it isn't intended to be actually require an exclusive lock. If an implementation can get the same result using some other mechanism, that's fine. On Fri, 24 Jul 2009, Nikunj R. Mehta wrote: Database developers (whether experienced DBAs or newcomer WebApp programmers) identify the data set they are using through statements they execute (within or outside transactions). It is the database's job to find out which records are being used. Sure. The concepts of transaction processing apply no matter the granularity of a data item, whether it is a record or a disk block, or a whole file. There are many kinds of failures (and yes, failures are always unpredictable) [1]. Let's focus on failures arising from concurrency control enforcement, which is probably the one most people worry about from a programming perspective. In the following discussion, I use the term locking , even though other protocols have been developed and are in use, to guarantee serializability, i.e., correct interleaving of concurrent transactions. A knowledgeable database programmer would read the smallest set of data in a transaction so as to avoid locking the entire database for concurrent operations. Moreover, this approach also minimizes starvation, i.e., the amount of time a program would need to wait to obtain permission to exclusively access data. Transactions can fail even if locking occurs at the whole database level. As example, consider the situation: 1. A read-only transaction is timed out because some read-write transaction went on for too long. 2. A read-write transaction is timed out because some read-only transaction went on for too long. These are the only failure modes possible currently, I believe. 3. A read-only transaction includes inside it a read-write transaction. This isn't possible with the current asynchronous API as far as I can tell. With the synchronous API, it would hang trying to open the read-write transaction for however long it takes the UA to realise that the script that is trying to get the read-write transaction is the same one as the one that has an open read-only transaction, and then it would fail with error code 7. Experience has shown that there is no easy way out when dealing with transactions, and locking at the whole database level is no solution to failures. It's not supposed to be a solution to failures, it's supposed to be, and is, as far as I can tell, a way to make unpredictable, transient, intermittent, and hard-to-debug concurrency errors into guaranteed, easy-to-debug errors. On Fri, 24 Jul 2009, Nikunj R. Mehta wrote: There's never going to be a forced rollback; once you have got a transaction lock, you are not going to ever have it fail on you unexpectedly. Even if you have a transaction lock, 1. the application logic could cause an exception 2. the application finds an unacceptable data condition and needs to rollback the transaction Sure, but both of those are under the control of the author. 3. face a disk failure This is an exceptional situation from which there is no good recovery. It isn't an expected situation resulting from a complicated API. 4. encounter a bug in the underlying software We can't do anything to prevent these in the spec. In either of these cases, how would the application code be expected to recover? In the first two and the last one, the author can debug the problem and fix or work around the bug. In the case of hardware failure, there is no sane recovery model. These are very different from concurrency bugs. I think this is an important invariant, because otherwise script
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Jul 24, 2009, at 2:53 PM, Ian Hickson wrote: These are very different from concurrency bugs. There are only three concurrency bugs 1. The Lost Update Problem 2. The Temporary Update (or Dirty Read) Problem 3. The Incorrect Summary Problem. Neither of these is related to the granularity of locking. All of these are solved through the use of transactions. If an application uses transactions correctly, then it is free from concurrency bugs. Nikunj http://o-micron.blogspot.com
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, 24 Jul 2009, Nikunj R. Mehta wrote: On Jul 24, 2009, at 2:53 PM, Ian Hickson wrote: These are very different from concurrency bugs. There are only three concurrency bugs 1. The Lost Update Problem 2. The Temporary Update (or Dirty Read) Problem 3. The Incorrect Summary Problem. Neither of these is related to the granularity of locking. All of these are solved through the use of transactions. If an application uses transactions correctly, then it is free from concurrency bugs. If you have two applications in two tabs, and they both need to read row A, then write to row B, and they start doing these two tasks simultaneously, how do you prevent either from failing if you don't have database-wide locking? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, Jul 24, 2009 at 2:54 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: On Jul 24, 2009, at 2:06 PM, Aaron Boodman wrote: On Fri, Jul 24, 2009 at 1:54 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: Experience has shown that there is no easy way out when dealing with transactions, and locking at the whole database level is no solution to failures. The thing that makes the web browser environment different an interesting is that multiple independent applications can end up having access to the same database if the run in the same origin. Applications have the ability to specify which database they want to use. So I don't see problems in apps sharing an origin. Right, but say two Gmail tabs are opened independently. They both say they want to access the messages database. They have no way to know about each other except through shared storage (postMessage does not work across multiple independent tabs). Now they can conflict with each other. There is no way for the developer to deal with this problem other than retrying or implementing another concurrency system on top of shared storage. This seems bad. This could be multiple instances of the same app (eg multiple gmail windows) or just different apps that happen to be on the same origin (many Google apps run on www.google.com). When running multiple instances of the same application, or when different applications share the same data, you are beginning to deal with multi-user applications (even though it may be the same security principal). In multi-user applications, database transactions are the same as what they are on the server. Applications have no choice but to be careful in performing transactions. Let me illustrate this with an example. Say that I had a spreadsheet app. The value of a cell was displayed to the user as X. Now, I go in to one tab A and say add five to X. I also go in to B and say add five to X. One of those operations will have to fail because it finds that the version of X is not what it was when the transaction started out. Even if you put a lock on the entire database, you can't avoid that problem. The issue of the data changing between the time when it was displayed to the user and the time when an update is started is different than the problem of the data changing while a multi-step update (a transaction) is in progress. The first problem is well known and understood by client-side web developers because the web is stateless and the same can occur between contacts with the server. It's also pretty self-evident that if you copy data out of storage and into the UI that the two can change independently. The second problem is not well known by the same people and would be surprising. Up until recently there was no local storage except cookies and some proprietary things, and both were synchronous. Because all browsers until recently were single-threaded, this effectively meant that clients had storage-wide locks (there were actually bugs with this in Firefox+cookies, I am told, but cookies were not frequently used in a way that exposed it). It seems that the way the spec is written, novice programmers would be led to either 1. face lost updates because they assume the browser locks the entire database, and so they won't bother to do their own analysis of whether data has changed since the last time they saw it. Some very novice users will not realize that their UI and local store can change independently, or will, but won't realize that there can be multiple copies of their apps. I think the current design is a good trade-off because addressing that problem would essentially mean binding the UI to the datastore, which would introduce gigantic API complexity making it basically not workable. And many developers will understand the problem from experience with the web. 2. create single-instance-only apps , i.e., hold a write lock on the database forever since they don't want to deal version checks. I don't think you understand the spec - it isn't actually possible to hold the lock forever. Locks aren't an explit part of the API, but are implicit and released automatically when functions return. Take a look at the transaction method again: db.transaction(function() { tx.executeSql(strSql, function() { }); }); The transaction is implicitly released when the last sql statement is completed (or fails). The only way you can keep this transaction open is to execute more SQL. Because these apps are isolated from each other, they have no way to cooperate to reduce conflicts. They also have no control over whether there are multiple copies of themselves (the user control this). Sorry, but there is postMessage, localStorage, and the database itself. What do you mean these apps are isolated and have no way to cooperate? postMessage can't be used across independent tabs. Even if it could, it is asynchronous and most vendors would be reluctant to put more in the spec
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Jul 24, 2009, at 3:11 PM, Ian Hickson wrote: On Fri, 24 Jul 2009, Nikunj R. Mehta wrote: On Jul 24, 2009, at 2:53 PM, Ian Hickson wrote: These are very different from concurrency bugs. There are only three concurrency bugs 1. The Lost Update Problem 2. The Temporary Update (or Dirty Read) Problem 3. The Incorrect Summary Problem. Neither of these is related to the granularity of locking. All of these are solved through the use of transactions. If an application uses transactions correctly, then it is free from concurrency bugs. If you have two applications in two tabs, and they both need to read row A, then write to row B, and they start doing these two tasks simultaneously, how do you prevent either from failing if you don't have database-wide locking? First of all, the value of row A never changes in this example. So it is immaterial whether the transaction locked the whole database or just row B. Your application that wrote this kind of a query/update has a concurrency bug, namely lost update. IOW, it is losing the first update because it did not check the value of B before modifying it and didn't modify row A when it modified row B. Therefore, your question itself has a concurrency bug. This is why I said that locking is not a silver bullet and multi-user concurrency should not be taken lightly. For a primer on isolation levels, transactions, and locks, please see [1]. This discussion is an indicator of both the complexity involved in designing standards such as these and the amount of background knowledge required to design a good standard. Proponents of existing spec language have chosen to never explicitly back up their statements with the body of knowledge that exists in the database sciences. Taken together, all this makes it unlikely that a good SQL standard can be developed by this WG in a short period of time that some might be expecting. Nikunj http://o-micron.blogspot.com [1] http://www.oracle.com/technology/oramag/oracle/05-nov/o65asktom.html
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Jul 24, 2009, at 3:57 PM, Aaron Boodman wrote: So you are reduced to very awkward ways of cooperating -- using the database itself as a queue or for master election, or designing a separate transaction system between tabs which might be on separate threads, using an asynchronous API. Or you just accept that any statement can fail and retry everything. Or your app is just buggy if multiple instances are open. Did you consider for a moment that all this is merely a result of the SQLite feature to lock the entire database? Nikunj http://o-micron.blogspot.com
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, Jul 24, 2009 at 4:12 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: On Jul 24, 2009, at 3:57 PM, Aaron Boodman wrote: 2. create single-instance-only apps , i.e., hold a write lock on the database forever since they don't want to deal version checks. I don't think you understand the spec - it isn't actually possible to hold the lock forever. It is a little insulting for you to say that, but I will not take offense to it. I didn't mean any offense, I really don't think you understand the spec completely :). Locks aren't an explit part of the API, but are implicit and released automatically when functions return. Take a look at the transaction method again: db.transaction(function(tx) { tx.executeSql(strSql, function() { }); }); The transaction is implicitly released when the last sql statement is completed (or fails). The only way you can keep this transaction open is to execute more SQL. (corrected a slight typo in the example - it was missing the parameter definition for tx) Thanks for the correction. Code is for conversational purposes only. I also may be forgetting some API details since I haven't looked at this in awhile. If I put in a timer or another asynchronous call inside the block and that block used the variable tx, wouldn't it force the implementation to continue holding the database lock? If so, there is no limit to how long I can hold on to variables, and hence I could hold on to the database as an exclusive reader/writer for as long as I wanted to. A novice programmer would probably not even understand what a transaction means, except that they need a handle to change stuff. That programmer could hold on to this handle for the duration of the session. No. The transaction is not closed on GC, it is closed when the last statement that is part of the transaction completes. So holding a reference to the tx variable does nothing one way or the other. The only way to hang the transaction open would be to execute statements over and over. On Fri, Jul 24, 2009 at 4:13 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: On Jul 24, 2009, at 3:57 PM, Aaron Boodman wrote: So you are reduced to very awkward ways of cooperating -- using the database itself as a queue or for master election, or designing a separate transaction system between tabs which might be on separate threads, using an asynchronous API. Or you just accept that any statement can fail and retry everything. Or your app is just buggy if multiple instances are open. Did you consider for a moment that all this is merely a result of the SQLite feature to lock the entire database? No, having the database not be able to change out from under a multi-step update was a design goal of the API. Implementing a complex application without exclusive transactions would be very difficult. I do understand that your position that there is a tradeoff: you give up some performance because a skilled developer could do finer grained locking and get better concurrency. And I think you are arguing that there should be an option for non-exclusive write transactions, or at least it should be up to the UA. I still feel that with the number of clients a typical HTML5 database will have, this is a non-issue and the spec makes the correct tradeoff. - a
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Jul 24, 2009, at 3:57 PM, Aaron Boodman wrote: I do not agree that database-level locking is a big problem for web applications. Our problem is not with databases doing database-level locking. Our problem is that such behavior is a MUST. I think it is very desirable for it to appear to the developer that writes to the local datastore are atomic. Lots of complexity falls out if this is not true. It is implicit that transactions give atomicity (that's what A in ACID stands for). It would be mischaracterizing this discussion to say that we are arguing about atomicity. We are, however, talking about isolation (the I in ACID), or more precisely the degree of isolation. In some models (non-SQL) it may be easier to arrange a large update in the application layer and commit it all at once. In SQL, this is less true so it is important to provide API that makes conflicts impossible while a multi-step update is in progress. This problem exists in the WebStorage model [1]. More specifically, there is no way to perform multiple updates atomically in it. The proposal that I have sketched about B-trees [2] does not have this problem since it is possible to work with transactions to get the atomicity as well as a desired isolation level. I take it that there are no issues with that proposal since I have not heard anyone say so. Perhaps your real issue is that the current API does not work well for non SQL data stores. Not at all! It would be disingenuous to find an ulterior motive in my arguments. Nikunj http://o-micron.blogspot.com [1] http://dev.w3.org/html5/webstorage/ [2] http://www.w3.org/mid/f480f60a-5dae-4b73-922a-93ed401cf...@oracle.com
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, Jul 24, 2009 at 4:30 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: On Jul 24, 2009, at 3:57 PM, Aaron Boodman wrote: In some models (non-SQL) it may be easier to arrange a large update in the application layer and commit it all at once. In SQL, this is less true so it is important to provide API that makes conflicts impossible while a multi-step update is in progress. This problem exists in the WebStorage model [1]. More specifically, there is no way to perform multiple updates atomically in it. I agree. The proposal that I have sketched about B-trees [2] does not have this problem since it is possible to work with transactions to get the atomicity as well as a desired isolation level. I take it that there are no issues with that proposal since I have not heard anyone say so. I haven't reviewed that. I only chimed into this conversation because it looked like there were some misunderstandings and I worked on it early-on. - a
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Jul 24, 2009, at 4:25 PM, Aaron Boodman wrote: On Fri, Jul 24, 2009 at 4:12 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: On Jul 24, 2009, at 3:57 PM, Aaron Boodman wrote: 2. create single-instance-only apps , i.e., hold a write lock on the database forever since they don't want to deal version checks. I don't think you understand the spec - it isn't actually possible to hold the lock forever. It is a little insulting for you to say that, but I will not take offense to it. I didn't mean any offense, I really don't think you understand the spec completely :). I beg to differ. Au contraire, I really don't think you understand databases at all. Locks aren't an explit part of the API, but are implicit and released automatically when functions return. This is completely incorrect. Read below for more details. Take a look at the transaction method again: db.transaction(function(tx) { tx.executeSql(strSql, function() { }); }); The transaction is implicitly released when the last sql statement is completed (or fails). The only way you can keep this transaction open is to execute more SQL. (corrected a slight typo in the example - it was missing the parameter definition for tx) Thanks for the correction. Code is for conversational purposes only. I also may be forgetting some API details since I haven't looked at this in awhile. If I put in a timer or another asynchronous call inside the block and that block used the variable tx, wouldn't it force the implementation to continue holding the database lock? If so, there is no limit to how long I can hold on to variables, and hence I could hold on to the database as an exclusive reader/writer for as long as I wanted to. A novice programmer would probably not even understand what a transaction means, except that they need a handle to change stuff. That programmer could hold on to this handle for the duration of the session. No. The transaction is not closed on GC, it is closed when the last statement that is part of the transaction completes. So holding a reference to the tx variable does nothing one way or the other. The only way to hang the transaction open would be to execute statements over and over. A transaction is not complete until I either commit or rollback the transaction, which I can choose to do as late as I want to, e.g., at window.onclose. Therefore locks on the database will not be released for as long as the application wants to hold on to the transaction. On Fri, Jul 24, 2009 at 4:13 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: On Jul 24, 2009, at 3:57 PM, Aaron Boodman wrote: So you are reduced to very awkward ways of cooperating -- using the database itself as a queue or for master election, or designing a separate transaction system between tabs which might be on separate threads, using an asynchronous API. Or you just accept that any statement can fail and retry everything. Or your app is just buggy if multiple instances are open. Did you consider for a moment that all this is merely a result of the SQLite feature to lock the entire database? No, having the database not be able to change out from under a multi-step update was a design goal of the API. Implementing a complex application without exclusive transactions would be very difficult. I am not proposing to take away your choice. But please don't take away mine. It would be useful to see an explanation as to why the proposal I made [[ add an isolation level parameter with a default value of SERIALIZABLE and remove the exclusive database-level write lock requirement ]] is worse than the current spec text. You can refer to SQL92 explain the meaning of SERIALIZABLE. AFAIK, there are no interoperability problems with transaction isolation levels. Nikunj http://o-micron.blogspot.com
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, Jul 24, 2009 at 4:45 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: No. The transaction is not closed on GC, it is closed when the last statement that is part of the transaction completes. So holding a reference to the tx variable does nothing one way or the other. The only way to hang the transaction open would be to execute statements over and over. A transaction is not complete until I either commit or rollback the transaction, which I can choose to do as late as I want to, e.g., at window.onclose. Therefore locks on the database will not be released for as long as the application wants to hold on to the transaction. I don't think that this is true, at least in the Database interface: http://dev.w3.org/html5/webdatabase/#asynchronous-database-api There is no explicit commit() method. The commit happens implicitly after all queued statements have been executed successfully. It does appear that it is possible to hold a transaction open all day with the DatabaseSync interface (http://dev.w3.org/html5/webdatabase/#databasesync). Specifically the SQLTransactionSync method has commit/rollback methods. The DatabaseSync interface was added after I worked on this, so I can't say why it doesn't use callbacks. In any case, I was talking about the async flavor which is what my example code referred to. Do you agree it is not possible to hang transactions open from Database (http://dev.w3.org/html5/webdatabase/#database)? If not, what am I missing? - a
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Jul 24, 2009, at 4:58 PM, Aaron Boodman wrote: On Fri, Jul 24, 2009 at 4:45 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: No. The transaction is not closed on GC, it is closed when the last statement that is part of the transaction completes. So holding a reference to the tx variable does nothing one way or the other. The only way to hang the transaction open would be to execute statements over and over. A transaction is not complete until I either commit or rollback the transaction, which I can choose to do as late as I want to, e.g., at window.onclose. Therefore locks on the database will not be released for as long as the application wants to hold on to the transaction. I don't think that this is true, at least in the Database interface: http://dev.w3.org/html5/webdatabase/#asynchronous-database-api There is no explicit commit() method. The commit happens implicitly after all queued statements have been executed successfully. The spec is also silent about what happens if I put a wait by making another asynchronous call inside my transaction callback logic. By inference, this would be allowed since all statements are executed inside callbacks, so why distinguish between transaction and other (non-SQLTransactionErrorCallback) types of callbacks. The processing model in 4.3.2 simply says that the SQL statements are queued up. It is unclear what if anything happens if the database runs out of statements to execute if the transaction logic takes time to add another statement to the queue before the database decides to commit. Am I wrong or is this an ambiguous, but correct interpretation? Those who are worried about throwing complexity of transaction recovery on Web programmers should perhaps also be worried about the insane complexity of asynchronous transaction programming, that no one in the world should have to learn. The mainstream database developers don't have to deal with that. Why should poor Web programmers have to suffer this? Moreover, with an asynchronous database the spec doesn't allow an application to rollback a transaction, should certain application logic require that. This is yet another case of creating a storage API that is different from traditional database developers. There seems to be a pattern of ignoring good API practices when interacting with a database and it appears intentional. Am I wrong in my interpretation? It does appear that it is possible to hold a transaction open all day with the DatabaseSync interface (http://dev.w3.org/html5/webdatabase/#databasesync). Specifically the SQLTransactionSync method has commit/rollback methods. The DatabaseSync interface was added after I worked on this, so I can't say why it doesn't use callbacks. In any case, I was talking about the async flavor which is what my example code referred to. Do you agree it is not possible to hang transactions open from Database (http://dev.w3.org/html5/webdatabase/#database)? If not, what am I missing? I can't agree simply because the spec says nothing about it. In fact, if anything the rest of the spec text around asynchronous processing suggests that it is possible to hang transactions indefinitely. Nikunj http://o-micron.blogspot.com
Re: [WebStorage] Concerns on spec section 'Processing Model'
On Fri, Jul 24, 2009 at 4:45 PM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: I am not proposing to take away your choice. But please don't take away mine. It would be useful to see an explanation as to why the proposal I made [[ add an isolation level parameter with a default value of SERIALIZABLE and remove the exclusive database-level write lock requirement ]] is worse than the current spec text. You can refer to SQL92 explain the meaning of SERIALIZABLE. AFAIK, there are no interoperability problems with transaction isolation levels. I'm personally not opposed to adding more isolation levels in addition to the current single option of SERIALIZABLE. It could be added as an argument to transaction(). I don't think it is a particularly high value feature, but I also don't see a big problem with it. And I can imagine that some particularly ambitious developers might want to take advantage of it. - a
RE: [WebStorage] Concerns on spec section 'Processing Model'
+ Ian Hey Ian, Can you please express your opinion on this as an editor of Web Storage (now Web Database) Spec? Thanks, Laxmi From: Nikunj R. Mehta [mailto:nikunj.me...@oracle.com] Sent: Thursday, July 16, 2009 11:45 PM To: public-webapps WG Cc: Laxmi Narsimha Rao Oruganti Subject: Re: [WebStorage] Concerns on spec section 'Processing Model' The spec should not restrict implementations to any one level of concurrency unless there are specific undesirable effects. Restricting the database to a single writer means that if there are separate workers or background threads working to update non-overlapping portions, then they have to wait for the lone current writer. Implementations can certainly compete to produce the level of concurrency that developers need. Specifically, I propose that the following text [[ If the mode is read/write, the transaction must have an exclusive write lock over the entire database. If the mode is read-only, the transaction must have a shared read lock over the entire database. The user agent should wait for an appropriate lock to be available. ]] be replaced with the following text [[ Multiple read-only transactions may share the same data as long as there is no transaction attempting to write the data being read. The user agent must wait for transactions that are reading some data before allowing a read/write transaction on the same data to continue. ]] Nikunj http://o-micron.blogspot.com On Jul 16, 2009, at 6:46 AM, Laxmi Narsimha Rao Oruganti wrote: [Adding the subject, sorry for spam!] Hey folks, I have few questions on Web Storage Spec. I have checked the content of both latest published spechttp://www.w3.org/TR/webstorage/ and latest editors spechttp://dev.w3.org/html5/webstorage/. And the questions are applicable to both the versions of the spec. Section: 4.4.2 Processing model Text: 1. Open a new SQL transaction to the database, and create a SQLTransactionhttp://www.w3.org/TR/webstorage/#sqltransaction object that represents that transaction. If the mode is read/write, the transaction must have an exclusive write lock over the entire database. If the mode is read-only, the transaction must have a shared read lock over the entire database. The user agent should wait for an appropriate lock to be available. Concerns: - Why is the spec mandating a transaction to take an *exclusive write lock on the entire database*? No database book design mandates it. In fact, many client databases out there don't do this. I guess SQLite does this kind. But that does not mean that all implementations have this nature. I am kind of worried that we are putting implementation in theory. For me they are too separate, there are many ways a database could be designed. Like, log+checkpoint approach, shadow copy, version store, journals ...etc. I guess spec should say what a browser should do and not how. I would be happy to get enlightened. Thanks, Laxmi
Re: [WebStorage] Concerns on spec section 'Processing Model'
The spec should not restrict implementations to any one level of concurrency unless there are specific undesirable effects. Restricting the database to a single writer means that if there are separate workers or background threads working to update non- overlapping portions, then they have to wait for the lone current writer. Implementations can certainly compete to produce the level of concurrency that developers need. Specifically, I propose that the following text [[ If the mode is read/write, the transaction must have an exclusive write lock over the entire database. If the mode is read-only, the transaction must have a shared read lock over the entire database. The user agent should wait for an appropriate lock to be available. ]] be replaced with the following text [[ Multiple read-only transactions may share the same data as long as there is no transaction attempting to write the data being read. The user agent must wait for transactions that are reading some data before allowing a read/write transaction on the same data to continue. ]] Nikunj http://o-micron.blogspot.com On Jul 16, 2009, at 6:46 AM, Laxmi Narsimha Rao Oruganti wrote: [Adding the subject, sorry for spam!] Hey folks, I have few questions on Web Storage Spec. I have checked the content of both latest published spec and latest editors spec. And the questions are applicable to both the versions of the spec. Section: 4.4.2 Processing model Text: 1. Open a new SQL transaction to the database, and create a SQLTransaction object that represents that transaction. If the mode is read/write, the transaction must have an exclusive write lock over the entire database. If the mode is read-only, the transaction must have a shared read lock over the entire database. The user agent should wait for an appropriate lock to be available. Concerns: - Why is the spec mandating a transaction to take an *exclusive write lock on the entire database*? No database book design mandates it. In fact, many client databases out there don’t do this. I guess SQLite does this kind. But that does not mean that all implementations have this nature. I am kind of worried that we are putting implementation in theory. For me they are too separate, there are many ways a database could be designed. Like, log+checkpoint approach, shadow copy, version store, journals … etc. I guess spec should say what a browser should do and not how. I would be happy to get enlightened. Thanks, Laxmi