[sqlite] last_insert_rowid() and changes() per statement instead of per connection?

2015-12-14 Thread Olivier Mascia
> Keith:
> 
> The documentation for sqlite3_last_insert_rowid clearly states that it 
> returns the rowid last inserted on the connection, as does the documentation 
> for sqlite3_changes.  While I agree that it may very well be possible to 
> maintain the data by statement, that is not what the current functions as 
> documented do.
> 
> Perhaps you can/might want to request an enhancement?

Exactly.  Which brings us to my very initial post:

> Why isn't there some:
> 
>   sqlite3_int64 sqlite3_stmt_last_insert_rowid(sqlite3_stmt*);
>   int sqlite3_stmt_changes(sqlite3_stmt*);
> 
> in addition to these:
> 
>   sqlite3_int64 sqlite3_last_insert_rowid(sqlite3*);
>   int sqlite3_changes(sqlite3*);

I was asking for experience from SQLite users, before making it a feature 
request.
I surely wouldn't expect the current functions (taking sqlite3*) to return 
anything else than information maintained by connection. But I would have 
valuable usage of the same functions (taking sqlite3_stmt*), returning such 
information maintained by statement.

I'll check what are the proper way to draw a documented request for enhancement 
for SQLite. We will anyway surely buy some yearly technical support, or at the 
very least maintenance subscription, which I guess will streamline the 
introduction of enhancement requests.

--
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om


-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 



[sqlite] last_insert_rowid() and changes() per statement instead of per connection?

2015-12-13 Thread Olivier Mascia
Thanks.  I'm reading you with attention.

> Clemens:
> Please note that transactions work on the connection level.

That was clear.

> Simon:
> Which, of course, decreases the point of you having competing threads in the 
> first place.  Which is related to the FAQ pointing at 
> .

Those threads which I deal with are there for other purposes than accessing the 
DB, but they happen to have. Besides, the issue is not at first about those 
threads but about the SQLite API which quite clearly *could* easily know and 
keep within the statement state information, the count of changes and the last 
inserted rowid, which could then in turn be happily queried by the application. 
Keeping them at the connection level is essentially useless (since they could 
be wrong or undefined) except in very restrictive conditions. I just would not 
like to fix holes in the road by asking people to stop driving cars. :)

Reading: https://www.sqlite.org/threadsafe.html the default for SQLite, unless 
compiled differently is serialized which means that "SQLite can be safely used 
by multiple threads with no restriction". It obviously implies more contention 
than transaction isolation should dictate, but it states a clear contract for 
the application developer. Contract which is very quickly violated by API like 
last_insert_rowid() and changes() reporting on the connection level rather than 
statement level to the extent of having to document their inability to report 
any useful result in a multi-threaded configuration. 

> Keith:
> INSERT INTO table VALUES ('somedata');
> SELECT rowid FROM table WHERE data='somedata';

My question cited last_insert_rowid and changes. Regarding last_insert_rowid() 
the whole purpose was indeed to know what rowid got assigned by an insert, 
without having to run yet another query for that, not to mention that in a not 
fully normalized world, there might not be other unique combination of values 
to uniquely select the just inserted row, except its rowid (or whatever else 
integer primary key it uses).

Other SQLs have INSERT ... RETURNING ..., solving that case very effectively 
(as the answer is known and does not need to be looked up again after the 
facts).

As I wrote, UPDATE OR INSERT has value when needing to update some attributes 
of a row, if it exist (WHERE clause) and INSERT a new row, when not found. The 
behavior cannot be mimicked by INSERT OR REPLACE which, if it happens to have 
to REPLACE does just that: substitute the whole row (DELETing it, INSERTing 
it).  Doesn't allow ? and that's expected for that statement ? for partial 
update of a row.

-- 
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om




[sqlite] last_insert_rowid() and changes() per statement instead of per connection?

2015-12-13 Thread Keith Medcalf
On Sunday, 13 December, 2015 17:04, Olivier Mascia  wrote:

> Thanks.  I'm reading you with attention.

> > Clemens:
> > Please note that transactions work on the connection level.

> That was clear.

> > Simon:

> > Which, of course, decreases the point of you having competing threads in
> the first place.  Which is related to the FAQ pointing at
> .
> 
> Those threads which I deal with are there for other purposes than
> accessing the DB, but they happen to have. Besides, the issue is not at
> first about those threads but about the SQLite API which quite clearly
> *could* easily know and keep within the statement state information, the
> count of changes and the last inserted rowid, which could then in turn be
> happily queried by the application. Keeping them at the connection level
> is essentially useless (since they could be wrong or undefined) except in
> very restrictive conditions. I just would not like to fix holes in the
> road by asking people to stop driving cars. :)

The documentation for sqlite3_last_insert_rowid clearly states that it returns 
the rowid last inserted on the connection, as does the documentation for 
sqlite3_changes.  While I agree that it may very well be possible to maintain 
the data by statement, that is not what the current functions as documented do.

Perhaps you can/might want to request an enhancement?

> Reading: https://www.sqlite.org/threadsafe.html the default for SQLite,
> unless compiled differently is serialized which means that "SQLite can be
> safely used by multiple threads with no restriction". It obviously implies
> more contention than transaction isolation should dictate, but it states a
> clear contract for the application developer. Contract which is very
> quickly violated by API like last_insert_rowid() and changes() reporting
> on the connection level rather than statement level to the extent of
> having to document their inability to report any useful result in a multi-
> threaded configuration.

You are misconstruing the serialization and what threadsafe means.  
"Threadsafe" means there is no distinction between threads and no per-thread 
state information maintained between calls into the engine.  Thus you can 
execute a select on one thread, and "step" on 10,000 different threads to each 
retrieve one row of the result set -- there is no isolation between threads -- 
only between connections.  "Serialized" means "side-by-each" (to use 
newfy-speak).  In other words, as if made from THE SAME THREAD.  The calls are 
interleaved so that only one occurs at a time, but that is it.  Serialized data 
access requires transactions.  Transactions are per connection.  Hence, 
multiple threads performing operations on the same connection (and thus are 
part of the same transaction) which change database state are not isolated from 
each other in any way.

> > Keith:
> > INSERT INTO table VALUES ('somedata');
> > SELECT rowid FROM table WHERE data='somedata';
> 
> My question cited last_insert_rowid and changes. Regarding
> last_insert_rowid() the whole purpose was indeed to know what rowid got
> assigned by an insert, without having to run yet another query for that,
> not to mention that in a not fully normalized world, there might not be
> other unique combination of values to uniquely select the just inserted
> row, except its rowid (or whatever else integer primary key it uses).

> Other SQLs have INSERT ... RETURNING ..., solving that case very
> effectively (as the answer is known and does not need to be looked up
> again after the facts).

> As I wrote, UPDATE OR INSERT has value when needing to update some
> attributes of a row, if it exist (WHERE clause) and INSERT a new row, when
> not found. The behavior cannot be mimicked by INSERT OR REPLACE which, if
> it happens to have to REPLACE does just that: substitute the whole row
> (DELETing it, INSERTing it).  Doesn't allow ? and that's expected for that
> statement ? for partial update of a row.

Then you will have to create your own mutex to control the contention you are 
creating on the database connection/transaction.






[sqlite] last_insert_rowid() and changes() per statement instead of per connection?

2015-12-13 Thread Simon Slavin

On 13 Dec 2015, at 5:34pm, Clemens Ladisch  wrote:

> Olivier Mascia wrote:
>> should the design of competing threads revolve around each one having
>> a distinct connection handle?
> 
> Yes.

Which, of course, decreases the point of you having competing threads in the 
first place.  Which is related to the FAQ pointing at



Simon.


[sqlite] last_insert_rowid() and changes() per statement instead of per connection?

2015-12-13 Thread Clemens Ladisch
Olivier Mascia wrote:
> even if using two distinct sqlite3_stmt* handles (linked to a same
> sqlite3* handle), two competing threads would get unusable answers
> from both these API.

Yes.

Please note that transactions work on the connection level.

> should the design of competing threads revolve around each one having
> a distinct connection handle?

Yes.


Regards,
Clemens


[sqlite] last_insert_rowid() and changes() per statement instead of per connection?

2015-12-13 Thread Olivier Mascia
Dear all,

Why isn't there some:

sqlite3_int64 sqlite3_stmt_last_insert_rowid(sqlite3_stmt*);
int sqlite3_stmt_changes(sqlite3_stmt*);

in addition to these:

sqlite3_int64 sqlite3_last_insert_rowid(sqlite3*);
int sqlite3_changes(sqlite3*);

Reading this:

"If a separate thread makes changes on the same database connection while 
sqlite3_changes() / sqlit3_last_insert_rowid() is running then the value 
returned is unpredictable and not meaningful."

I interpret that (I may be wrong): even if using two distinct sqlite3_stmt* 
handles (linked to a same sqlite3* handle), two competing threads would get 
unusable answers from both these API.

Wouldn't it be more useful to have this info restricted to some statement 
handle?

Or should the design of competing threads revolve around each one having a 
distinct connection handle?

Reason for asking: I was about to consider using sqlite3_changes() after some 
specific UPDATEs (returning 0 changes) to implement an UPDATE OR INSERT similar 
to some other SQL engines offer. But as far as I understand from these API 
descriptions, it looks like I would have to manage a distinct single connection 
per thread, instead of sharing the connection and using only distinct statement 
handles.

I'm really new to SQLite and still need to learn a lot from source code, so I 
can't yet appreciate if sharing connection handle between threads is 
interesting or if a model where each thread have all their private handles is a 
better world, without major performance impact.

-- 
Meilleures salutations, Met vriendelijke groeten, Best Regards,
Olivier Mascia, integral.be/om




[sqlite] last_insert_rowid() and changes() per statement instead of per connection?

2015-12-13 Thread Keith Medcalf

On Sun 2015-12-13 13:47, Simon Slavin  wrote:
> On 13 Dec 2015, at 5:34pm, Clemens Ladisch  wrote:
> > Olivier Mascia wrote:

> >> should the design of competing threads revolve around each one having
> >> a distinct connection handle?

> > Yes.

> Which, of course, decreases the point of you having competing threads in
> the first place.  Which is related to the FAQ pointing at

> 

That fellow merely points out that coders who do not know what they are doing 
should not attempt to use multiple threads.  
Proper use of multiple threads of concurrent execution requires a programmer 
versed in concurrency.

Only co-operating threads can share a single connection betwixt themselves and 
then only for use for the purpose for which they are co-operating.  Competing 
threads will need separate connections per thread.

In other words, if execution in one thread can change the state seen by another 
thread "in undesirable ways" then you made a design error by classifying 
obviously "competing" uses of "shared state" improperly as "co-operating" uses 
of "shared state", in this case a database connection.

You can either (a) stop sharing state; (b) make the sharing of state moot by 
not depending on it.  In this case (b) would mean using a select to get the 
rowid for the target row and not using a function which accesses shared state.  
(a) could mean assigning a separate connection per thread or it could mean 
creating a mutex that serializes "insert and get last insert rowid" operations 
so that only one at a time may affect the "state" of the connection.  (assuming 
that your shared connection does not introduce any other anomalies for which 
you have failed to account).

In other words:

INSERT INTO table VALUES ('somedata');
SELECT LAST_INSERT_ROWID();

presumes that no other thread is permitted to execute *ANY* operation affecting 
the last_insert_rowid between the two statements.  This can be guaranteed by 
not allowing other "things" to issue update/inserts on the connection:

block_waiting_for_mutex('INSERT AND GET ROWID')
INSERT INTO table VALUES ('somedata');
SELECT LAST_INSERT_ROWID();
release_mutex('INSERT AND GET ROWID')


Alternatively,  if you use something like:

INSERT INTO table VALUES ('somedata');
SELECT rowid FROM table WHERE data='somedata';

will always work properly since it is not dependent on shared state (assuming 
that your database is properly normalized, or course).