Re: [sqlite] When is data committed on one connection seen on another?

2016-09-19 Thread James K. Lowden
On Wed, 14 Sep 2016 18:29:36 + (UTC)
Alex Ward  wrote:

> Perhaps our schema needs a rework, would one table with a million
> rows be better than 500 tables with 2000 rows each? 

500 tables isn't right or wrong, but *counting* tables is.  Table count
is not a design-quality metric.  

There is no rule of thumb except Boyce-Codd Normal Form.  

Table design reflects the entities you choose to represent your domain
of discourse.  You should strive, as Einstein advised, to make your
model as simple as possible, but no simpler.  

Sometimes we see schemas on this list with sets of identical tables,
each set dedicated to a different client or somesuch.  Those folks buy
themselves trouble, because otherwise identical SQL has to vary by
tablename, adding nothing but complexity to the application layer.  

At the opposite end of the spectrum is the classic
entity-attribute-value design error.  One table conquers all, including
the programmers when they discover how slow things are when SQL is used
before learned.  

Number of columns?  Many widely used and correctly normalized financial
market databases maintain thousands of rows on hundreds of columns.  

Complex models can easily have a hundred tables.  Before I would agree
a design has "too many" tables, I would want an affirmative answer to
one of two questions:

1.  Is there a process in place to dynamically extend the schema?  If
so, that indicates data -- some change over time -- has found its way
into the metadata (the table names).  

2.  Do you frequently find yourself looking in more than one place for
what -- to you -- substantially the same information.  That would
indicate the tables do not reflect your mental model of the problem
domain.  

There are forces that drive intelligent table design other than the
logical model.  Things like performance and convenience inevitably
cause tables that would otherwise pass textbook muster to be split and
combined.  Those things count, too.  One thing that doesn't count is
the count.  

Regards, 

--jkl


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-15 Thread R Smith


On 2016/09/14 8:29 PM, Alex Ward wrote:


We currently have 500 tables and 1500 triggers in the schema.   Perhaps that is 
why we didn't have much luck having one connection per thread or opening a 
connection per access.  Perhaps our schema needs a rework, would one table with 
a million rows be better than 500 tables with 2000 rows each?


YES. In 99% of cases it is far more better to have x million rows in a 
single table than trying to use multiple tables. It is the job of the DB 
engine to efficiently handle large tables, and sqlite (like most other 
DB engines) does a brilliant job of it because it is the Alpha use-case 
and the single biggest focus of efforts to enhance during development.
There are cases when a dispersed schema might be superior, but they are 
few and obscure. 1 Table = 1 set of Indices, 1 set of triggers, 1 set of 
views.


Perhaps your case was different, but mostly when people do something 
like this multiple-table thing, it is because the try to think for the 
DB engine, and assume the multiple table way is better (maybe it just 
"feels" better), but upon testing, you will find the amount of effort to 
run multiple tables outweighs any gain in access speed significantly. 
Also, a B-Tree Index works far better on one large table than many 
B-Tree indices on many tables. The increase in time taken to hit a 
specific PK in a large table diminishes rapidly with table growth. (In 
simple binary terms, IIRC, it takes 5 lookup steps to hit a PK in just 
30 rows, yet only 24 look-up steps to hit a PK in 1 million rows, 25 to 
hit it in 2 mil rows, etc. - Law of diminishing returns in action)




We were considering adding a view per table too, that would make it 3000 
elements in the schema, if 100 is considered a lot are we attempting to use 
sqlite in a manner that it is not suited for?  We may need to consider another 
alternative if that's the case, although that would be a shame.


This alone should be enough of a reason to reconsider.


Sadly we are not comparing sqlite with another SQL DB. We are 
attempting to replace an in-house memory resident non-sql database 
with sqlite. So our comparison is between a) reading from memory and 
b) opening a connection and reading from sqlite. 


This is perfect, it's what SQLite was born to do. Just trust SQLite to 
do it's thing and don't try to pre-empt how it will fare and prematurely 
optimize by distributed schemata and the like. Take the simplest route 
first, if that turns out to really be too slow, /then/ perhaps ask what 
can be done to improve, given the data/schema specifics.


(And yes, we are not oblivious to the fact that you may have already 
invested insane amounts of time in doing it the other way, don't throw 
away the code yet, just try the normal way also)!


Good luck!
Ryan


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-15 Thread Hick Gunter
>> As already stated, and per my own experience, each thread should have it's
>> own connection and do whatever it needs to do there, without interference
>> from other threads.
>
>I appreciate this point.  Early prototyping indicated that this might not be 
>possible for our system, which makes me a little nervous if that is what most 
>users end up doing.  We will >definitely take another look at not sharing 
>connections.
>

Maybe consider creating a pool of "reader" threads that get created at 
application startup. Each one opens it's very own dedicated connection at the 
beginning. Also have one "listener" thread that accepts requests and farms them 
out to the "readers".

That way you have control over the number of connections being created and the 
overhead of connecting is done once at application startup.


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Olivier Mascia
Alex,

> Le 14 sept. 2016 à 20:29, Alex Ward  a écrit :
> 
>> Unless you have a very complex schema (I would say at the very least more 
>> than on hundred tables and other create statement), opening a SQLite 
>> connection is lightweight, 
> 
> We currently have 500 tables and 1500 triggers in the schema.   Perhaps that 
> is why we didn't have much luck having one connection per thread or opening a 
> connection per access.  Perhaps our schema needs a rework, would one table 
> with a million rows be better than 500 tables with 2000 rows each? 
> 
> We were considering adding a view per table too, that would make it 3000 
> elements in the schema, if 100 is considered a lot are we attempting to use 
> sqlite in a manner that it is not suited for?  We may need to consider 
> another alternative if that's the case, although that would be a shame. 

My 100 number is nothing definitive.  SQLite stores the text of the schema, and 
parses the schema on connection.  The parser has an impressive speed.  The 
larger/more complex the schema becomes, it clearly will take a little bit 
longer to parse.  But it is very hard to know where to draw the line.  Below 
100 is certainly small, that does not necessarily make > 100 too big.

For sure one million rows (assuming they're not each extraordinary in size) is 
_not_ a large number of rows.  Is it more fitted than 500 tables with 2000 rows 
each?  I can't discuss your schema (and many people here are way more 
proficient than me in SQL design) with so few knowledge of what's the model of 
the data.  I can only say that if you're artificially splitting a single 
logical data set in 500 tables of 2000 rows, then I would keep them in a single 
table.  You would have much less triggers also. But I assume this is not simply 
the case.  (Reviewing this text before posting, I read Simon goes even further 
along the same line. I'd take that as a good incentive to have a closer look at 
your schema.)

This aside, if you have a problem with the time taken by establishing a new 
connection, then building on what I briefly suggested, I would pool threads 
without closing their connection when the thread is done with its work and 
ready to suspend until needed again. This way you would have the benefit of one 
connection per thread, without the full impact of having to re-open the 
database file each time a thread has got to do some work.  It's a pool of both 
threads and connections, to gain on the two sides.

A simple pool of pre-connected connections could be worth considering (probably 
what you had in mind) but I would again strongly advise you to refrain to share 
any of these connections between two (or more) running threads.  You would have 
to either use SQLite in its mode of SERIALIZED (default unless changed at 
runtime or compile time) or add mutual exclusion between threads sharing a 
connection if using the MULTITHREAD mode.  The MULTITHREAD mode is not a magic 
mode that 'makes it work' within threaded applications.  To the contrary, it is 
meant for threaded applications which take the whole responsibility of knowing 
what they're doing.  SQLite will not protect concurrent access to the 
connection state.  So bad things will happen, unless very properly serialized.  
In either solution, threads sharing a same connection are now executing 
serialized.  It might be equivalent and simpler to queue up the requests and 
process them one after each other through one single thread...

If you can achieve/afford one connection per thread, MULTITHREAD mode is then 
easy to use without risks. Along with the database set for WAL journal mode, 
all threads doing reads will really have opportunities to work at the same time.

Will the time needed to establish a new connection per each thread will kill 
the benefits of having threads which can actually work simultaneously?  Only 
you will tell.

(Sorry for these long answers.)

Best,
-- 
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Simon Slavin

On 14 Sep 2016, at 7:29pm, Alex Ward  wrote:

> Perhaps our schema needs a rework, would one table with a million rows be 
> better than 500 tables with 2000 rows each? 

Yes.  As a guideline, if two tables have the same columns (or even nearly the 
same columns) you should consider merging them and adding one extra column to 
replace the table name.  This is not a 100% rule, but it's definitely the way 
to think.

Fewer tables mean you fewer indexes, fewer views, fewer database pages, better 
re-use of released space and more efficient schema handling (no need to search 
through hashes of 500 tables for each command).

Since we're talking guidelines, another is that you should be able to hold the 
entire column makeup of one table in your head at once.  So a table with more 
than say 20 columns is another sign you might rethink things.  I'm not accusing 
you of making that mistake, I just thought I'd mention the two guidelines 
together.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Simon Slavin

> On 14 Sep 2016, at 5:47pm, Alex Ward  wrote:
> 
>> Behalf Of Simon Slavin 
> 
>> It is not enough to lock the _prepare, lock the _step()s, and lock the 
>> _finalize. 
>> If they're sharing a connection with other threads then the lock has to be 
>> placed at the start of the prepare and be released at the end of the 
>> finalize. 
>> Don't forget that they're all part of the same transaction. 
> 
> Got it, it's the set of the 3 operations that should be serialized.  Thanks. 
> 
> The finalize is the end of the implicit transaction, correct?

Right.

> Assuming that it is the finalize for the only statement being executed on 
> that connection.  Then if the same thread did another prepare/step/finalize 
> on that same connection it would be in a new implicit transaction.  Do I have 
> that right? 

The parameter you pass to _step() and _finalize() is a statement handle (and 
the statement knows which connection it should be using).  So yes, it's 
possible to maintain multiple statements for one connection, and SQLite will 
know which statement you're finalizing.

The SQL standard says that if you try doing SELECT without a BEGIN it should 
return an error message.  Because you shouldn't be doing database stuff outside 
a transaction.

Instead of returning an error SQLite automatically wraps BEGIN/COMMIT around 
any statement (even if it just reads) if you haven't already done your own 
BEGIN.  So what you're getting from prepare/step/finalize is actually (I might 
have this wrong I haven't read the source code)

_prepare()
first _step()
BEGIN
SELECT result (first row)
more _step()
SELECT results (more rows)
eventually a SELECT results in SQLITE_DONE
_finalize()
COMMIT

So think about what would happen if you had two of these running 
simultaneously.  The BEGIN on one query would not lock out the BEGIN on another 
query, but it could prevent a INSERT/UPDATE from writing to the database.  This 
is what you're trying to simulate when you do your own locking/mutex, and 
that's why it's so easy to get it wrong.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Alex Ward
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
> Behalf Of Olivier Mascia 

> If I'm permitted: you're wrong. 

Ha, yes, well if our initial threading model is anything to go by, I think 
you're on safe ground with that assertion. 

> Unless you have a very complex schema (I would say at the very least more 
> than on hundred tables and other create statement), opening a SQLite 
> connection is lightweight, 

We currently have 500 tables and 1500 triggers in the schema.   Perhaps that is 
why we didn't have much luck having one connection per thread or opening a 
connection per access.  Perhaps our schema needs a rework, would one table with 
a million rows be better than 500 tables with 2000 rows each? 

We were considering adding a view per table too, that would make it 3000 
elements in the schema, if 100 is considered a lot are we attempting to use 
sqlite in a manner that it is not suited for?  We may need to consider another 
alternative if that's the case, although that would be a shame. 

> is lightweight, especially compared to whatever experience you 
> might have had with quite any other SQL system. 

Sadly we are not comparing sqlite with another SQL DB.  We are attempting to 
replace an in-house memory resident non-sql database with sqlite.  So our 
comparison is between a) reading from memory and b) opening a connection and 
reading from sqlite. 

> you intend to loose on both of them (pooling connections 
> and sharing them across threads at the cost of mutual exclusion contention 
> to get it working). 

Believe me that we would like nothing more than to do what you suggest, we will 
be looking into any way possible to get this implemented in the optimal manner. 
 Thanks a lot for giving your advice. 

Merci 
Alex 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Alex Ward

> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
> Behalf Of Igor Tandetnik 

> Statement execution starts with the first sqlite3_step after sqlite3_prepare 
> or the most recent sqlite3_reset; and ends with sqlite3_reset or 
> sqlite3_finalize. 

Thanks makes sense.  Thanks.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Olivier Mascia
> Le 14 sept. 2016 à 18:14, Alex Ward  a écrit :
> 
> I don't think we can afford to have a connection per thread. We have an 
> arbitrarily large number of threads coming and going all the time and a 
> combination of the amount of memory each connection is taking up and how long 
> it takes to connect to the db it looks like limiting the number of 
> connections would be the direction we go if we can get it to work. Maybe a 
> pool of connections. 

If I'm permitted: you're wrong.  This assertion is plain wrong, even when I 
don't have the least idea of your code architecture and the number of threads 
we're talking here.

You, generally, won't get anything, except problems or at least 
'complications', by sharing a SQLite connection between multiple threads.  For 
one thing, please, do _not_ think you will gain anything like better 
performance from this approach.  To get it to work you will basically have to 
get to do so much mutual exclusion between threads competing for the shared 
connection that the workload done by these threads could as well be serialised 
in a single thread.  And this is not what your threading design is intended to 
do, I guess.

Unless you have a very complex schema (I would say at the very least more than 
on hundred tables and other create statement), opening a SQLite connection is 
lightweight, especially compared to whatever experience you might have had with 
quite any other SQL system.

Also please keep in mind that in SQLite, the 'connection' object is the 
'transactional-control' object, which quite often is a distinct object in other 
SQL systems.

My only recommendation would be to take the time to test your software with one 
new connection opened by any thread when it needs it or at start of thread, 
then properly closed when done with it or right before thread end.

The one case where I agree this wouldn't be appropriate is if the database file 
is NON local (network filesystem).  But then a heavily multi-threaded piece of 
software dealing with such a remote file is probably wrong in the first place.

On Windows systems what I do is to pool threads.  Creating a thread is far from 
being costless (in resources and time).  There I get performance benefits.  And 
from using WAL mode along with a nice distinct connection per each thread, I 
get a good level of read concurrency among threads.  I win on two levels where 
you intend to loose on both of them (pooling connections and sharing them 
across threads at the cost of mutual exclusion contention to get it working).

:)

-- 
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Alex Ward
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
> Behalf Of Simon Slavin 

> It is not enough to lock the _prepare, lock the _step()s, and lock the 
> _finalize. 
> If they're sharing a connection with other threads then the lock has to be 
> placed at the start of the prepare and be released at the end of the 
> finalize. 
> Don't forget that they're all part of the same transaction. 

Got it, it's the set of the 3 operations that should be serialized.  Thanks. 

The finalize is the end of the implicit transaction, correct?  Assuming that it 
is the finalize for the only statement being executed on that connection.  Then 
if the same thread did another prepare/step/finalize on that same connection it 
would be in a new implicit transaction.  Do I have that right? 

> This is why you don't generally share a connection between simultaneous 
> threads.  If the threads have different connections and you let SQLite handle 
> the locking things work properly. 

Hmm yes, I'm hearing that repeatedly.  We will attempt to identify why using a 
lot of connections is using such a huge amount of memory and maybe we will be 
able to go that route. 

Thanks again 
Alex 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Alex Ward
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
> Behalf Of Hick Gunter 

> Sharing a connection between threads makes it practically impossible for any 
> one thread to tell when a transaction begins or ends. From the point of view 
> of the database connection, the first statement to begin processing opens a 
> transaction and the last statement to end processing (which could be in a 
> totally different thread) closes it. IT ia all too easy to have a thread open 
> a 
> transaction and forget about it later. This causes all the other threads to 
> see 
> consistent (not stale) data. Enabling "read uncommitted" may alleviate the 
> sypmtoms, but it does not remove the cause. 

Seems like if we are going to share a single connection we would need to ensure 
that only one operation is happening at one time.  I'm hoping that when you say 
above "and the last statement to end processing...closes it" means that if we 
guarantee to have just one set of prepare/step/finalize happening at one time 
on a connection then the next set will always start a new transaction.  Does 
that sound correct? 

> Your assertions "a deleted row on one connection is found by a select on the 
> other" and "BEGIN/DELETE/COMMIT and SELECT is happening in the same 
> thread" is perfectly consistent if BEGIN/DELETE/COMMIT happens on one 
> connection and SELECT on the other. 
  

Yes that is exactly what we were doing.  BEGIN/DELETE/COMMIT happened on one 
connection and SELECT on the other.  Ooops. 

> As already stated, and per my own experience, each thread should have it's 
> own connection and do whatever it needs to do there, without interference 
> from other threads. 

I appreciate this point.  Early prototyping indicated that this might not be 
possible for our system, which makes me a little nervous if that is what most 
users end up doing.  We will definitely take another look at not sharing 
connections. 

Danke sehr 
Alex 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Alex Ward
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
> Behalf Of Simon Slavin 

> Your description of your fault suggests that at least two of your 
> threads/processes are trying to use the same connection to the database at 
> the same time 

We have a bunch of reads happening at the same time on one connection. The 
writes were serialized with a write lock but the reads were just doing whatever 
they wanted.  It seems like this is our mistake. 

> > We are trying to implement a system where all writes occur on one 
> connection ... Does it 
> sound like we are doing something fundamentally wrong by trying to share 
> connections across threads in this way? 
> 
> If you don't have your own mutex locks, then yes. Don't try to set up a 
> situation where two transactions are happening at the same time with the 
> same connection. The standard way to do that correctly is to give each 
> thread its own connection to the database and let SQLite do all the locking 
> necessary. However it's not difficult to make your own mutex system work, 
> it just seems like your own one isn't working properly. 
> 

Great info.  I don't think we can afford to have a connection per thread. We 
have an arbitrarily large number of threads coming and going all the time and a 
combination of the amount of memory each connection is taking up and how long 
it takes to connect to the db it looks like limiting the number of connections 
would be the direction we go if we can get it to work. Maybe a pool of 
connections. 

We'll try serializing the reads on one connection and see what performance that 
gives us. It seems like we will be giving up a lot of parallelism, but as we 
are not sure yet exactly how the locking in the DB works that may not be as bad 
system impact as we fear. 

Thanks for the info 
Alex 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-14 Thread Hick Gunter
AFAIK there is no "write mark" in the WAL journal. Instead, each read 
transaction (either implicit or explicit) has an associated "read mark" that 
determines which data the transaction will see (i.e. the data from the main 
file plus any pages in the WAL journal file before the read mark). This is the 
"CI" in ACID. Whatever the state of any table in the database, the transaction 
will always see the same state, even when querying the same row more than once 
(Consistent), irrespective of what other transactions may have "concurrently" 
changed (Isolation).

Sharing a connection between threads makes it practically impossible for any 
one thread to tell when a transaction begins or ends. From the point of view of 
the database connection, the first statement to begin processing opens a 
transaction and the last statement to end processing (which could be in a 
totally different thread) closes it. IT ia all too easy to have a thread open a 
transaction and forget about it later. This causes all the other threads to see 
consistent (not stale) data. Enabling "read uncommitted" may alleviate the 
sypmtoms, but it does not remove the cause.

Your assertions "a deleted row on one connection is found by a select on the 
other" and "BEGIN/DELETE/COMMIT and SELECT is happening in the same thread" is 
perfectly consistent if BEGIN/DELETE/COMMIT happens on one connection and 
SELECT on the other.

As already stated, and per my own experience, each thread should have it's own 
connection and do whatever it needs to do there, without interference from 
other threads.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Alex Ward
Gesendet: Dienstag, 13. September 2016 22:00
An: sqlite-users@mailinglists.sqlite.org
Betreff: [sqlite] When is data committed on one connection seen on another?

We think we are seeing the case where a read on one WAL mode Sqlite database 
connection using the C API is reading stale data after a transaction has 
committed data changes on another connection.

For instance, a deleted row on one connection is still found by a select on the 
other.  The BEGIN/DELETE/COMMIT then SELECT (prepare/step/finalize for each) is 
happening in the same thread.We expected the commit of the write to be the 
point in time after which any read would read that committed data.  This does 
not seem to be the case here.
What is the standard idiom to avoid stale data while still allowing all but a 
writing thread not to see uncommitted data?

Is there a window of time between a commit on one connection and the data being 
available on another connection?  Is that deterministic?  When is the WAL mode 
"end mark" moved on a connection past other changes?  Is it possible that we 
have an end mark on the read connection that is still behind the write on the 
other connection?  What would trigger it to move?

Would other threads doing reads in parallel on the same connection affect when 
the end mark is moved?  In our test we serialized reads so there could only be 
one happening at a time in an attempt to remove any chance of this.  But we 
still saw stale data.

We are trying to implement a system where all writes occur on one connection 
(in a SQL transaction where a writing thread would see the uncommitted data) 
and all other reads on a second connection.  Does it sound like we are doing 
something fundamentally wrong by trying to share connections across threads in 
this way?

Any advice greatly appreciated
Alex
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-13 Thread Simon Slavin
On 13 Sep 2016, at 9:00pm, Alex Ward  wrote:

> What is the standard idiom to avoid stale data while still allowing all but a 
> writing thread not to see uncommitted data?

You should not need to do anything special to arrange this.

> Is there a window of time between a commit on one connection and the data 
> being available on another connection?  Is that deterministic?  When is the 
> WAL mode "end mark" moved on a connection past other changes?  Is it possible 
> that we have an end mark on the read connection that is still behind the 
> write on the other connection?  What would trigger it to move?

It doesn't work quite the way you describe but again you should not need to do 
anything special to make it all work properly.

> Would other threads doing reads in parallel on the same connection affect 
> when the end mark is moved?  In our test we serialized reads so there could 
> only be one happening at a time in an attempt to remove any chance of this.  
> But we still saw stale data.

There's no 'end mark' as such.  A read should, at absolute worst, delay changes 
making it to the database until the read is finalized.  A read done using 
sqlite3_exec() finalizes itself.  A read done using _prepare() and _step() must 
be manually finalized using either _finalize() or _reset().

Your description of your fault suggests that at least two of your 
threads/processes are trying to use the same connection to the database at the 
same time.  (It doesn't really matter whether these threads are reading or 
writing, you can get confusion on the connection either way.)  Another 
possibility is that you're using a PRAGMA which looks like it should speed 
things up but it does so by telling SQLite not to do necessary 
multi-thread/multi-processing checks.

> We are trying to implement a system where all writes occur on one connection 
> (in a SQL transaction where a writing thread would see the uncommitted data) 
> and all other reads on a second connection.  Does it sound like we are doing 
> something fundamentally wrong by trying to share connections across threads 
> in this way?

If you don't have your own mutex locks, then yes.  Don't try to set up a 
situation where two transactions are happening at the same time with the same 
connection.  The standard way to do that correctly is to give each thread its 
own connection to the database and let SQLite do all the locking necessary.  
However it's not difficult to make your own mutex system work, it just seems 
like your own one isn't working properly.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] When is data committed on one connection seen on another?

2016-09-13 Thread Igor Tandetnik

On 9/13/2016 4:00 PM, Alex Ward wrote:

We think we are seeing the case where a read on one WAL mode Sqlite database 
connection using the C API is reading stale data after a transaction has 
committed data changes on another connection.

For instance, a deleted row on one connection is still found by a select on the 
other.  The BEGIN/DELETE/COMMIT then SELECT (prepare/step/finalize for each) is 
happening in the same thread.We expected the commit of the write to be the 
point in time after which any read would read that committed data.  This does 
not seem to be the case here.


Somehow or other, you have an open read transaction on the connection 
doing the SELECT, dating from before the write was committed. Often 
caused by forgetting to reset or finalize a statement.



Would other threads doing reads in parallel on the same connection affect when 
the end mark is moved?


Which connection is "the same" one? The reader or the writer? If you 
have other reads in progress on the reader connection, then they likely 
keep a transaction open, and your SELECT is done within that 
transaction, which goes back in time farther than you thought. If you 
have other reads in progress on the writer connection, then COMMIT 
likely fails.



We are trying to implement a system where all writes occur on one connection 
(in a SQL transaction where a writing thread would see the uncommitted data) 
and all other reads on a second connection.  Does it sound like we are doing 
something fundamentally wrong by trying to share connections across threads in 
this way?


Realize that a transaction is a property of a connection, not a thread 
or a query. Suppose thread A starts a read at time T and ends it at time 
T+20; and thread B starts a read on the same connection at time T+10 and 
ends it at T+30. Both reads are part of the same unbroken transaction 
lasting (at least) from T to T+30. If there's an update committed at T+5 
on a different connection, neither read would see it, even though thread 
B started after it.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users