Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-31 Thread Scott Robison
On Tue, Jan 31, 2017 at 12:15 PM, Jens Alfke  wrote:

>
> > On Jan 31, 2017, at 9:39 AM, James K. Lowden 
> wrote:
> >
> > According the SQL standard, every SQL statement is atomic.  SELECT has
> > no beginning and no end: the results it returns reflect the state of
> > the database as of the moment the statement was executed.  If you fetch
> > the last row six days after the first, it still belongs to the database
> > as it stood when you began.
>
> That is the behavior I was assuming and desiring, but it’s not what
> actually occurs. If there are concurrent mutations in the same connection,
> the rows returned by SELECT do _not_ reflect the prior state of the
> database, but suffer from “undefined” behavior. In other words, there is a
> lack of isolation between the SELECT and the concurrent UPDATEs.
>
> It’s possible I’m misunderstanding your point, though!
>
> My immediate workaround (implemented last night) is to iterate over the
> statement at the moment the query is run, saving all the rows in memory.
> Our enumerator object then just reads and returns successive rows from that
> list.
>
> In the medium term I have ideas for optimizations that can let us avoid
> this memory hit in most circumstances (since most queries are not made at
> the same time as mutations.) For example, I could use the original
> enumerator behavior by default, but when the client requests a mutation I
> first notify all in-progress enumerators [on that connection], which will
> immediately read the rest of their rows into memory.
>

I think you said something earlier about a fear that the record set might
be too big to fit in memory (or wanting to avoid that possibility). You
could select the record set you want to a temp table then select *that*
while running updates on the original tables. Probably something you
already thought of (or maybe I subconsciously read it from someone else
already; sorry if adding noise), but thought I'd toss it out.

SDR
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-31 Thread Richard Hipp
On 1/31/17, Jens Alfke  wrote:
>
> My immediate workaround (implemented last night) is to iterate over the
> statement at the moment the query is run, saving all the rows in memory. Our
> enumerator object then just reads and returns successive rows from that
> list.

That's how client/server SQL database engines do it.  They run the
query to completion before starting on any of the updates.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-31 Thread Jens Alfke

> On Jan 31, 2017, at 9:39 AM, James K. Lowden  wrote:
> 
> According the SQL standard, every SQL statement is atomic.  SELECT has
> no beginning and no end: the results it returns reflect the state of
> the database as of the moment the statement was executed.  If you fetch
> the last row six days after the first, it still belongs to the database
> as it stood when you began.  

That is the behavior I was assuming and desiring, but it’s not what actually 
occurs. If there are concurrent mutations in the same connection, the rows 
returned by SELECT do _not_ reflect the prior state of the database, but suffer 
from “undefined” behavior. In other words, there is a lack of isolation between 
the SELECT and the concurrent UPDATEs.

It’s possible I’m misunderstanding your point, though!

My immediate workaround (implemented last night) is to iterate over the 
statement at the moment the query is run, saving all the rows in memory. Our 
enumerator object then just reads and returns successive rows from that list.

In the medium term I have ideas for optimizations that can let us avoid this 
memory hit in most circumstances (since most queries are not made at the same 
time as mutations.) For example, I could use the original enumerator behavior 
by default, but when the client requests a mutation I first notify all 
in-progress enumerators [on that connection], which will immediately read the 
rest of their rows into memory.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-31 Thread James K. Lowden
On Mon, 30 Jan 2017 19:29:40 -0800
Jens Alfke  wrote:

> if I iterate over the the rows in a table using sqlite3_step, and
> update each row after it?s returned, Bad Stuff happens. Specifically,
> my query is just getting the first row over and over and over again,
> and the iteration runs forever.  

I think you've solved your immediate problem and come to grips with
SQLite's behavior.  I thought it might be helpful to explain why it
works that way, and that what you want is also valid, but goes by
another name: a cursor.  

According the SQL standard, every SQL statement is atomic.  SELECT has
no beginning and no end: the results it returns reflect the state of
the database as of the moment the statement was executed.  If you fetch
the last row six days after the first, it still belongs to the database
as it stood when you began.  

SQLite in WAL mode gives you that isolation.  You weren't bitten by a
*lack* of isolation; you were bitten by isolation you didn't expect.   

The idea of updating rows as they are read -- without completing the
transaction -- is supported in SQL with a cursor.  Standard SQL has
DECLARE CURSOR syntax; some cursors can be declared as FOR UPDATE and
have behavior much like what you expected.  That syntax, as you know, is
not supported by SQLite.  

Technically speaking the product of ORDER BY is also a cursor, and it's
not hard to find references to sqlite3_step as using "a cursor".  All
meaning stands in context, and those should not be confused with an SQL
cursor.  

--jkl


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-30 Thread Hick Gunter
Maybe adding "order by rowid" to your select statement can help avoid "sawing 
off the branch you are sitting on". Unless you need to update rowids...

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Jens Alfke
Gesendet: Dienstag, 31. Jänner 2017 04:30
An: SQLite mailing list 
Betreff: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the 
same connection

I’ve just run headlong in to the issues described in "No Isolation Between 
Operations On The Same Database Connection”. Specifically, I’ve discovered 
(after some debugging) that if I iterate over the the rows in a table using 
sqlite3_step, and update each row after it’s returned, Bad Stuff happens. 
Specifically, my query is just getting the first row over and over and over 
again, and the iteration runs forever. :(

I had been under the impression that, since I’m using the WAL, queries operate 
on a snapshot of the database as of the time they begin, and are unaffected by 
subsequent changes. I got this from reading about "snapshot isolation” in a 
previous section of that document. (Also, another key/value database engine 
I’ve used recently _does_ behave this way, so it’s what I was expecting.) I now 
see that the “read transaction” described in that section has to be occurring 
in a different connection than the write transaction. (Right?)

I’m unsure what to do now. I am working on a library whose API exposes iterator 
objects that run queries; the iterator’s “next()” method internally calls 
sqlite3_step. Thus the interleaving of the query and updating the database is 
not under my control; it’s up to the developer using our library, and I do 
_not_ want to expose inconvenient undefined behavior like this, or tell 
developers that “you can’t modify the database while you’re iterating it”.

I can’t be the first person to run into this. Is there a best practice for 
enabling concurrent iteration and mutation? I can think of two solutions:

A. Batch up all of the query results in memory at the start of the iteration, 
and have the iterator just read them out of the in-memory list.
I’d like to avoid this because of the obvious memory overhead and 
latency imposed on large queries. Version 1 of our library worked this way, 
which is why I probably hadn’t noticed the problem until now.

B. Create a separate SQLite connection for the query; then it’ll be isolated 
from any changes being made in the main connection.
This seems elegant, but it will of course use more memory for the extra 
connection (with its own cache.) Moreover, it seems like I’ll need to open an 
indefinite number of extra connections: if the caller starts a query, makes 
some changes, and then starts another query (before reading the final row of 
the first query), I need to open another connection for the second query 
because it has to see the changes, which aren’t yet visible in the first 
query's connection … right?

—Jens

[1]: https://www.sqlite.org/isolation.html
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-30 Thread Simon Slavin

On 31 Jan 2017, at 5:26am, Jens Alfke  wrote:

> I don’t follow. What’s the “resource” you’re talking about here?

In your case, the NSEnumerator .

Would the solution I proposed in my post work for you ?

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-30 Thread Jens Alfke

> On Jan 30, 2017, at 9:10 PM, Simon Slavin  wrote:
> 
> Nope.  Cannot do that.  Any number of things might happen between the first 
> _step() and the _finalize().  For all you know someone might delete the 
> object the iterator is currently on instead of just updating it.  Then where 
> would the iterator be ?  

As I explained, my assumption was that the iteration operated on a snapshot of 
the database at the time it was started, i.e. at the first call to 
sqlite3_step. There are other databases that operate that way, although I now 
understand SQLite doesn’t.

I understand the situation. And I outlined two ways around the problem. So this 
isn’t a blanket “Cannot do that” situation, unless you’re saying I can’t do it 
the way I’ve been dong it … but I already know that! That’s why I posted my 
question. Reiterating it doesn’t help.

> How would you know to release the resource ?  And once you knew it, how would 
> you do it ?

I don’t follow. What’s the “resource” you’re talking about here?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-30 Thread Jens Alfke

> On Jan 30, 2017, at 8:03 PM, Rowan Worth  wrote:
> 
> If the iterator isn't exhausted, how do you know when to dispose the
> sqlite3_stmt? 

The iterator (which is an Objective-C NSEnumerator object) will be deleted 
shortly after it exits scope. Some of the refcounting is deferred via the 
autorelease pool, but basically by the time the thread returns back to its 
event loop.

—Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-30 Thread Simon Slavin

On 31 Jan 2017, at 3:29am, Jens Alfke  wrote:

> I’ve discovered (after some debugging) that if I iterate over the the rows in 
> a table using sqlite3_step, and update each row after it’s returned, Bad 
> Stuff happens. Specifically, my query is just getting the first row over and 
> over and over again, and the iteration runs forever

Is your UPDATE command changing a value which is used for the SELECT ?  If so, 
what you reported is expected behaviour.  If you’re expecting to execute two 
statements at the same time you should be using two connections.

> I’m unsure what to do now. I am working on a library whose API exposes 
> iterator objects that run queries; the iterator’s “next()” method internally 
> calls sqlite3_step.

Nope.  Cannot do that.  Any number of things might happen between the first 
_step() and the _finalize().  For all you know someone might delete the object 
the iterator is currently on instead of just updating it.  Then where would the 
iterator be ?  How would you know to release the resource ?  And once you knew 
it, how would you do it ?

One solution is to make each call to .next() do its own SELECT.  So if, for 
example, it was acceptable to iterate the rows in rowid order then calling 
.next() would do

SELECT rowid FROM MyTable WHERE rowid > [current rowid] ORDER BY rowid LIMIT 1

If this gives a row, that’s your next object.  If it doesn’t, you’ve reached 
the end.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bitten by lack of isolation between SELECT and UPDATE on the same connection

2017-01-30 Thread Rowan Worth
The iterator pattern has another caveat when applied to sqlite:

  foreach (row in statement) {
  if (isMatch(row)) {
  return true
  }
  }
  return false

If the iterator isn't exhausted, how do you know when to dispose the
sqlite3_stmt? There are other ways to manage the statement's lifetime so
this isn't a deal breaker, just something to keep in mind.


I know that our code base uses step/UPDATE/step/UPDATE/... in a couple of
places, without problems. But I guess that is just luck; as you note the
documentation clearly says the behaviour is undefined. To repeatedly get
the _same_ row over and over seems incredibly unfortunate though!

-Rowan


On 31 January 2017 at 11:29, Jens Alfke  wrote:

> I’ve just run headlong in to the issues described in "No Isolation Between
> Operations On The Same Database Connection”. Specifically, I’ve discovered
> (after some debugging) that if I iterate over the the rows in a table using
> sqlite3_step, and update each row after it’s returned, Bad Stuff happens.
> Specifically, my query is just getting the first row over and over and over
> again, and the iteration runs forever. :(
>
> I had been under the impression that, since I’m using the WAL, queries
> operate on a snapshot of the database as of the time they begin, and are
> unaffected by subsequent changes. I got this from reading about "snapshot
> isolation” in a previous section of that document. (Also, another key/value
> database engine I’ve used recently _does_ behave this way, so it’s what I
> was expecting.) I now see that the “read transaction” described in that
> section has to be occurring in a different connection than the write
> transaction. (Right?)
>
> I’m unsure what to do now. I am working on a library whose API exposes
> iterator objects that run queries; the iterator’s “next()” method
> internally calls sqlite3_step. Thus the interleaving of the query and
> updating the database is not under my control; it’s up to the developer
> using our library, and I do _not_ want to expose inconvenient undefined
> behavior like this, or tell developers that “you can’t modify the database
> while you’re iterating it”.
>
> I can’t be the first person to run into this. Is there a best practice for
> enabling concurrent iteration and mutation? I can think of two solutions:
>
> A. Batch up all of the query results in memory at the start of the
> iteration, and have the iterator just read them out of the in-memory list.
> I’d like to avoid this because of the obvious memory overhead and
> latency imposed on large queries. Version 1 of our library worked this way,
> which is why I probably hadn’t noticed the problem until now.
>
> B. Create a separate SQLite connection for the query; then it’ll be
> isolated from any changes being made in the main connection.
> This seems elegant, but it will of course use more memory for the
> extra connection (with its own cache.) Moreover, it seems like I’ll need to
> open an indefinite number of extra connections: if the caller starts a
> query, makes some changes, and then starts another query (before reading
> the final row of the first query), I need to open another connection for
> the second query because it has to see the changes, which aren’t yet
> visible in the first query's connection … right?
>
> —Jens
>
> [1]: https://www.sqlite.org/isolation.html
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users