Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
On Tue, Jan 31, 2017 at 12:15 PM, Jens Alfkewrote: > > > On Jan 31, 2017, at 9:39 AM, James K. Lowden > wrote: > > > > According the SQL standard, every SQL statement is atomic. SELECT has > > no beginning and no end: the results it returns reflect the state of > > the database as of the moment the statement was executed. If you fetch > > the last row six days after the first, it still belongs to the database > > as it stood when you began. > > That is the behavior I was assuming and desiring, but it’s not what > actually occurs. If there are concurrent mutations in the same connection, > the rows returned by SELECT do _not_ reflect the prior state of the > database, but suffer from “undefined” behavior. In other words, there is a > lack of isolation between the SELECT and the concurrent UPDATEs. > > It’s possible I’m misunderstanding your point, though! > > My immediate workaround (implemented last night) is to iterate over the > statement at the moment the query is run, saving all the rows in memory. > Our enumerator object then just reads and returns successive rows from that > list. > > In the medium term I have ideas for optimizations that can let us avoid > this memory hit in most circumstances (since most queries are not made at > the same time as mutations.) For example, I could use the original > enumerator behavior by default, but when the client requests a mutation I > first notify all in-progress enumerators [on that connection], which will > immediately read the rest of their rows into memory. > I think you said something earlier about a fear that the record set might be too big to fit in memory (or wanting to avoid that possibility). You could select the record set you want to a temp table then select *that* while running updates on the original tables. Probably something you already thought of (or maybe I subconsciously read it from someone else already; sorry if adding noise), but thought I'd toss it out. SDR ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
On 1/31/17, Jens Alfkewrote: > > My immediate workaround (implemented last night) is to iterate over the > statement at the moment the query is run, saving all the rows in memory. Our > enumerator object then just reads and returns successive rows from that > list. That's how client/server SQL database engines do it. They run the query to completion before starting on any of the updates. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
> On Jan 31, 2017, at 9:39 AM, James K. Lowdenwrote: > > According the SQL standard, every SQL statement is atomic. SELECT has > no beginning and no end: the results it returns reflect the state of > the database as of the moment the statement was executed. If you fetch > the last row six days after the first, it still belongs to the database > as it stood when you began. That is the behavior I was assuming and desiring, but it’s not what actually occurs. If there are concurrent mutations in the same connection, the rows returned by SELECT do _not_ reflect the prior state of the database, but suffer from “undefined” behavior. In other words, there is a lack of isolation between the SELECT and the concurrent UPDATEs. It’s possible I’m misunderstanding your point, though! My immediate workaround (implemented last night) is to iterate over the statement at the moment the query is run, saving all the rows in memory. Our enumerator object then just reads and returns successive rows from that list. In the medium term I have ideas for optimizations that can let us avoid this memory hit in most circumstances (since most queries are not made at the same time as mutations.) For example, I could use the original enumerator behavior by default, but when the client requests a mutation I first notify all in-progress enumerators [on that connection], which will immediately read the rest of their rows into memory. —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
On Mon, 30 Jan 2017 19:29:40 -0800 Jens Alfkewrote: > if I iterate over the the rows in a table using sqlite3_step, and > update each row after it?s returned, Bad Stuff happens. Specifically, > my query is just getting the first row over and over and over again, > and the iteration runs forever. I think you've solved your immediate problem and come to grips with SQLite's behavior. I thought it might be helpful to explain why it works that way, and that what you want is also valid, but goes by another name: a cursor. According the SQL standard, every SQL statement is atomic. SELECT has no beginning and no end: the results it returns reflect the state of the database as of the moment the statement was executed. If you fetch the last row six days after the first, it still belongs to the database as it stood when you began. SQLite in WAL mode gives you that isolation. You weren't bitten by a *lack* of isolation; you were bitten by isolation you didn't expect. The idea of updating rows as they are read -- without completing the transaction -- is supported in SQL with a cursor. Standard SQL has DECLARE CURSOR syntax; some cursors can be declared as FOR UPDATE and have behavior much like what you expected. That syntax, as you know, is not supported by SQLite. Technically speaking the product of ORDER BY is also a cursor, and it's not hard to find references to sqlite3_step as using "a cursor". All meaning stands in context, and those should not be confused with an SQL cursor. --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
Maybe adding "order by rowid" to your select statement can help avoid "sawing off the branch you are sitting on". Unless you need to update rowids... -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Jens Alfke Gesendet: Dienstag, 31. Jänner 2017 04:30 An: SQLite mailing listBetreff: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection I’ve just run headlong in to the issues described in "No Isolation Between Operations On The Same Database Connection”. Specifically, I’ve discovered (after some debugging) that if I iterate over the the rows in a table using sqlite3_step, and update each row after it’s returned, Bad Stuff happens. Specifically, my query is just getting the first row over and over and over again, and the iteration runs forever. :( I had been under the impression that, since I’m using the WAL, queries operate on a snapshot of the database as of the time they begin, and are unaffected by subsequent changes. I got this from reading about "snapshot isolation” in a previous section of that document. (Also, another key/value database engine I’ve used recently _does_ behave this way, so it’s what I was expecting.) I now see that the “read transaction” described in that section has to be occurring in a different connection than the write transaction. (Right?) I’m unsure what to do now. I am working on a library whose API exposes iterator objects that run queries; the iterator’s “next()” method internally calls sqlite3_step. Thus the interleaving of the query and updating the database is not under my control; it’s up to the developer using our library, and I do _not_ want to expose inconvenient undefined behavior like this, or tell developers that “you can’t modify the database while you’re iterating it”. I can’t be the first person to run into this. Is there a best practice for enabling concurrent iteration and mutation? I can think of two solutions: A. Batch up all of the query results in memory at the start of the iteration, and have the iterator just read them out of the in-memory list. I’d like to avoid this because of the obvious memory overhead and latency imposed on large queries. Version 1 of our library worked this way, which is why I probably hadn’t noticed the problem until now. B. Create a separate SQLite connection for the query; then it’ll be isolated from any changes being made in the main connection. This seems elegant, but it will of course use more memory for the extra connection (with its own cache.) Moreover, it seems like I’ll need to open an indefinite number of extra connections: if the caller starts a query, makes some changes, and then starts another query (before reading the final row of the first query), I need to open another connection for the second query because it has to see the changes, which aren’t yet visible in the first query's connection … right? —Jens [1]: https://www.sqlite.org/isolation.html ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
On 31 Jan 2017, at 5:26am, Jens Alfkewrote: > I don’t follow. What’s the “resource” you’re talking about here? In your case, the NSEnumerator . Would the solution I proposed in my post work for you ? Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
> On Jan 30, 2017, at 9:10 PM, Simon Slavinwrote: > > Nope. Cannot do that. Any number of things might happen between the first > _step() and the _finalize(). For all you know someone might delete the > object the iterator is currently on instead of just updating it. Then where > would the iterator be ? As I explained, my assumption was that the iteration operated on a snapshot of the database at the time it was started, i.e. at the first call to sqlite3_step. There are other databases that operate that way, although I now understand SQLite doesn’t. I understand the situation. And I outlined two ways around the problem. So this isn’t a blanket “Cannot do that” situation, unless you’re saying I can’t do it the way I’ve been dong it … but I already know that! That’s why I posted my question. Reiterating it doesn’t help. > How would you know to release the resource ? And once you knew it, how would > you do it ? I don’t follow. What’s the “resource” you’re talking about here? —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
> On Jan 30, 2017, at 8:03 PM, Rowan Worthwrote: > > If the iterator isn't exhausted, how do you know when to dispose the > sqlite3_stmt? The iterator (which is an Objective-C NSEnumerator object) will be deleted shortly after it exits scope. Some of the refcounting is deferred via the autorelease pool, but basically by the time the thread returns back to its event loop. —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
On 31 Jan 2017, at 3:29am, Jens Alfkewrote: > I’ve discovered (after some debugging) that if I iterate over the the rows in > a table using sqlite3_step, and update each row after it’s returned, Bad > Stuff happens. Specifically, my query is just getting the first row over and > over and over again, and the iteration runs forever Is your UPDATE command changing a value which is used for the SELECT ? If so, what you reported is expected behaviour. If you’re expecting to execute two statements at the same time you should be using two connections. > I’m unsure what to do now. I am working on a library whose API exposes > iterator objects that run queries; the iterator’s “next()” method internally > calls sqlite3_step. Nope. Cannot do that. Any number of things might happen between the first _step() and the _finalize(). For all you know someone might delete the object the iterator is currently on instead of just updating it. Then where would the iterator be ? How would you know to release the resource ? And once you knew it, how would you do it ? One solution is to make each call to .next() do its own SELECT. So if, for example, it was acceptable to iterate the rows in rowid order then calling .next() would do SELECT rowid FROM MyTable WHERE rowid > [current rowid] ORDER BY rowid LIMIT 1 If this gives a row, that’s your next object. If it doesn’t, you’ve reached the end. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection
The iterator pattern has another caveat when applied to sqlite: foreach (row in statement) { if (isMatch(row)) { return true } } return false If the iterator isn't exhausted, how do you know when to dispose the sqlite3_stmt? There are other ways to manage the statement's lifetime so this isn't a deal breaker, just something to keep in mind. I know that our code base uses step/UPDATE/step/UPDATE/... in a couple of places, without problems. But I guess that is just luck; as you note the documentation clearly says the behaviour is undefined. To repeatedly get the _same_ row over and over seems incredibly unfortunate though! -Rowan On 31 January 2017 at 11:29, Jens Alfkewrote: > I’ve just run headlong in to the issues described in "No Isolation Between > Operations On The Same Database Connection”. Specifically, I’ve discovered > (after some debugging) that if I iterate over the the rows in a table using > sqlite3_step, and update each row after it’s returned, Bad Stuff happens. > Specifically, my query is just getting the first row over and over and over > again, and the iteration runs forever. :( > > I had been under the impression that, since I’m using the WAL, queries > operate on a snapshot of the database as of the time they begin, and are > unaffected by subsequent changes. I got this from reading about "snapshot > isolation” in a previous section of that document. (Also, another key/value > database engine I’ve used recently _does_ behave this way, so it’s what I > was expecting.) I now see that the “read transaction” described in that > section has to be occurring in a different connection than the write > transaction. (Right?) > > I’m unsure what to do now. I am working on a library whose API exposes > iterator objects that run queries; the iterator’s “next()” method > internally calls sqlite3_step. Thus the interleaving of the query and > updating the database is not under my control; it’s up to the developer > using our library, and I do _not_ want to expose inconvenient undefined > behavior like this, or tell developers that “you can’t modify the database > while you’re iterating it”. > > I can’t be the first person to run into this. Is there a best practice for > enabling concurrent iteration and mutation? I can think of two solutions: > > A. Batch up all of the query results in memory at the start of the > iteration, and have the iterator just read them out of the in-memory list. > I’d like to avoid this because of the obvious memory overhead and > latency imposed on large queries. Version 1 of our library worked this way, > which is why I probably hadn’t noticed the problem until now. > > B. Create a separate SQLite connection for the query; then it’ll be > isolated from any changes being made in the main connection. > This seems elegant, but it will of course use more memory for the > extra connection (with its own cache.) Moreover, it seems like I’ll need to > open an indefinite number of extra connections: if the caller starts a > query, makes some changes, and then starts another query (before reading > the final row of the first query), I need to open another connection for > the second query because it has to see the changes, which aren’t yet > visible in the first query's connection … right? > > —Jens > > [1]: https://www.sqlite.org/isolation.html > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users