Re: [sqlite] unexpected sqlite_busy behaviour within transactions

2020-02-22 Thread Andy Bennett

Hi,

A busy_timout is honored by write commands outside of 
transactions and by single write commands inside transactions, 
but not if preceded by a read command. I did not find this 
behaviour in the documentation, thus it might be a bug.


It's documented in the description of sqlite3_busy_handler: 
https://sqlite.org/c3ref/busy_handler.html . Look for a 
paragraph mentioning "deadlock".


Here's my understanding as I initially struggled to understand the nauance 
of the supplied documentation link.




   sqlite3 test.db --init sqlitetest.init "BEGIN TRANSACTION; SELECT * FROM t; 
INSERT INTO t VALUES(1); COMMIT;"


This returns SQLITE_BUSY after the SELECT statement and before the INSERT 
statement. This is when the SHARED lock needs to be upgraded to a RESERVED 
lock.


At this time, the IMMEDIATE transaction has a RESERVED lock that it will 
need to upgrade to an EXCLUSIVE lock in order to commit successfully. 
RESERVED locks can only be held by one transaction at a time but allow 
other processes to read the database.


This other process has called "BEGIN IMMEDIATE TRANSACTION" and 
https://sqlite.org/rescode.html#busy says


"The BEGIN IMMEDIATE command might itself return SQLITE_BUSY, but if it 
succeeds, then SQLite guarantees that no subsequent operations on the same 
database through the next COMMIT will return SQLITE_BUSY." 

As I understand it, this means that the only way this transaction will fail 
to commit is if the statements executed within it lead to some kind of 
constraint violation. i.e. it wont abort due to the actions of other 
proceses.


So that it can commit successfully this process will need to first upgrade 
its RESERVED lock to a PENDING lock on its way to an EXCLUSIVE lock.


The PENDING lock means that no new processes will be granted SHARED locks.
When all the existing SHARED locks have disappeared, the PENDING lock can 
be upgraded to an EXCLUSIVE lock.



This is how the deadlock happens: the DEFERRED transaction cannot upgrade 
its SHARED lock to a RESERVED lock because the other transaction already 
has a RESERVED lock, so it must wait. The IMMEDIATE transaction can upgrade 
its RESERVED lock to a PENDING lock but will be unable to upgrade that to 
an EXCLUSIVE lock until the DEFERRED transaction has released its SHARED 
lock.


We have a deadlock because the two transactions are waiting for each other.

Therefore, the DEFERRED transaction recieves SQLITE_BUSY straight away so 
that the IMMEDIATE transaction does not have to wait for it's busy handler 
to expire before getting an EXCLUSIVE lock.



Of course, this is also good for the integrity of the DEFERRED transaction. 
If the IMMEDIATE transaction ends up commiting then it may change the data 
that the DEFERRED transaction has already read from the database and based 
its subsequent calculations on. Allowing this would be bad for the 
integrity of that transaction. This is how the SERIALISABLE property of 
SQLite is provided.





Further reading:

https://sqlite.org/c3ref/busy_handler.html
https://sqlite.org/rescode.html#busy
https://sqlite.org/lang_transaction.html#immediate
https://www.sqlite.org/lockingv3.html





Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Size limits

2020-01-07 Thread Andy Bennett

Hi Andy,


What are reasonable limits for size Sqlite3 database file and large blobs?


Here are some benchmarks on BLOB performance:

https://sqlite.org/intern-v-extern-blob.html


However, see the note here in section 4.1 about using the incremental BLOB 
I/O routines so that you don't thrash your memory allocator:


https://sqlite.org/malloc.html#_computing_and_controlling_parameters_m_and_n


limits.html gives you the actual hard SQLite limits:

https://sqlite.org/limits.html

The default limit for BLOBs is 1GB, but you can raise or lower that value 
at compile-time. The maximum is 2GiB.



The overall maximum database size is 140TB (128TiB).


Here are some more reports on using SQLite for BLOB storage:

https://sqlite.org/fasterthanfs.html

https://sqlite.org/affcase1.html


In reality you will be limited by your ability to find the BLOBs 
efficiently. i.e. what other metadata your store with them and how you 
index them. If your individual data items are larger than 1GB then you'll 
probably need to look at storing the underlying data across several BLOBs.





Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance vs. memory trade-off question

2019-12-15 Thread Andy Bennett

Hi,


(3)  Each database connection uses about 72 KB less heap space.


How much is this as a percentage of a freshly initialised connection and 
how much as a percentage of some kind of "typical" connection (is there 
even one?)?





andy...@ashurst.eu.org
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Request: Combining skip-scan with 'max' optimization

2019-11-21 Thread Andy Bennett

Hi,

I hadn't seen this thread when I posted my recent thread on optimising 
MAX aggregates but I suspect this could help my case as well.


At the moment I'm trying to limit the amount of data that the aggregate 
query has to visit in order to keep latency low but this optimisation 
would give me a bigger window within my latency target.


I have similarly structured indexes to Jens.



I'm following up on my "Optimizing `SELECT a, max(b) GROUP BY a`"
thread from a few weeks ago, rephrasing it as a clearer enhancement
request.

ACTUAL BEHAVIOR: A query of the form `SELECT a, max(b) GROUP BY a`
runs slowly (O(n) with the number of table rows), even if there is an
index on (a, b DESC). The query plan explanation says "SCAN TABLE ...
USING INDEX". This is in SQLite 3.28.

EXPECTED BEHAVIOR: Query runs faster :-) My big-O fu is not strong
enough to express it that way, but I'd imagine it to be proportional
to the number of distinct `a` values, not the number of rows in the
table.

DIAGNOSIS: According to Keith Medcalf, "it appears that the optimizer
will not utilize a skip-scan *AND* apply the max optimization
concurrently."

According to Keith, a workaround is to rewrite the query as
select name,
  (
   select max(timestamp)
 from table
where name=outer.name
  )
 from (
   select distinct name
 from table
  );

This is of course a lot more complex. And unfortunately in my case the
query generator my program uses does not (yet) have the capability to
generate nested SELECTs, so the optimization is unavailable to me
until/unless we implement that.




andy...@ashurst.eu.org
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimising query with aggregate in subselect.

2019-11-21 Thread Andy Bennett

Hi,

Thanks to everyone who helped with this!

I'll try some stuff out and see if I can get things efficient, fast *and* 
simple.


:-)


"There's a small sidenote (that I'm too lazy too find right 
now) in the select docs that mentions that, in case of using min 
or max as aggregate, the non-aggregate columns will come from 
the row that held the min/max value."



Look in
https://www.sqlite.org/quirks.html
under "6. Aggregate Queries Can Contain Non-Aggregate Result 
Columns That Are Not In The GROUP BY Clause"


and also in
https://www.sqlite.org/lang_select.html
In section 3 search for: "Side note: Bare columns in an aggregate queries."


--
Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimising query with aggregate in subselect.

2019-11-20 Thread Andy Bennett

Hi,

In past attempts at improving query performance these have 
been added to encourage it to use an index that it can do a 
SCAN thru' rather than the table that it would need to do a 
SEARCH thru'.


SQLite is not using the PRIMARY INDEX to immediately locate the 
appropriate row, but is actually faster when you fake it into 
using a longer index ?  That's weird.


There is more than one row returned. Potentially several million.

...so it's quicker to do a linear scan thru' something that's in the same 
order (table or index) rather than a series of tree accesses for each and 
every row.




Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimising query with aggregate in subselect.

2019-11-20 Thread Andy Bennett

Hi,

Did you try retrieving the data "directly" or do you need the 
subselect in order to maintain compatibility with other SQL 
dialects that are no longer able to retrieve data from the row 
on which the max was found?


Thanks Keith!

I understood that selecting other columns during an aggregate lead to 
ill-specific or undefined values in those columns. Does SQLite make more 
guarantees than the SQL standard here? Do you have a pointer to the docs as 
I tried and failed to find it in there.






CREATE TABLE entrys
(
logid   INTEGER NOT NULL,
entrynumber INTEGER NOT NULL,
region  TEXT NOT NULL,
key TEXT NOT NULL,
timestamp   INTEGER NOT NULL,
PRIMARY KEY (logid, entrynumber)
);

CREATE INDEX a on entrys (region, logid, key, entrynumber);

  SELECT entrys.logidAS logid,
 max(entrys.entrynumber) AS entrynumber,
 entrys.region   AS region,
 entrys.key  AS key,
 entrys.timestampAS timestamp
FROM entrys
   WHERE entrys.region = ?
 AND entrys.key > ?
 AND entrys.logid = ?
GROUP BY key
;

NB:  I changed the ill-conceived column names to ones that do 
not require quoting and the identifier quoted items that are not 
column names with parameter markers.






Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimising query with aggregate in subselect.

2019-11-20 Thread Andy Bennett

Hi,


INNER JOIN "entrys"
ON
1 = "entrys"."log-id" AND
"specific-entrys"."key" = "entrys"."key" AND
"user" = "entrys"."region" AND
"specific-entrys"."entry-number" = "entrys"."entry-number"
AND "entrys"."key" > "G"


I can't solve your problem, but the PRIMARY KEY for "entrys" is

("log-id", "entry-number")

.  You shouldn't need to match so many different fields when 
just two of them, which you already have values for, narrow your 
search down to a single row.  Though I may be missing something.


In past attempts at improving query performance these have been added to 
encourage it to use an index that it can do a SCAN thru' rather than the 
table that it would need to do a SEARCH thru'.


I'm pretty happy with the indexes it's currently choosing (apart from the 
MATERIALIZE). Adding a covering index on timestamp theoretically improves 
things but doesn't seem to make a (measurable) difference in practice with 
current data sizes.





Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimising query with aggregate in subselect.

2019-11-20 Thread Andy Bennett

Hi,


ORDER BY "key" DESC


This should be ASC, not DESC: I've been working on versions of the query 
that can go forwards and backwards and made an editor snafu when writing 
the eMail.




Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Optimising query with aggregate in subselect.

2019-11-20 Thread Andy Bennett

Hi,

I'm trying to implement a "streaming" version of the classic "select the 
latest version of a record" query.



By "streaming" I mean a query that executes by streaming what it needs out 
of tables and indexes as it needs it rather than using temporary b-trees or 
materializing anything up front.


I'm looking for a query that I can run and then just consume as many 
results as I want without worrying about the size of the entire result set.



Here's the schema I'm working with:

-
CREATE TABLE "entrys" ("log-id" INTEGER NOT NULL , "entry-number" INTEGER 
NOT NULL , "region" TEXT NOT NULL , "key" TEXT NOT NULL , "timestamp" 
INTEGER NOT NULL , PRIMARY KEY ("log-id", "entry-number"))


CREATE UNIQUE INDEX "entrys-log-id-region-key-entry-number" ON "entrys" ( 
"log-id" ASC, "region" ASC, "key" ASC, "entry-number" ASC)

-

There's only a couple of million rows in "entrys" and my query times are 
into 2 or 3 seconds of startup time before the first row is returned.




Here's my query:

-
-- explain query plan
SELECT
"entrys"."log-id"   AS "log-id",
"entrys"."entry-number" AS "entry-number",
"entrys"."region"   AS "region",
"entrys"."key"  AS "key",
"entrys"."timestamp"AS "timestamp"

FROM
(SELECT
MAX("entry-number") AS "entry-number",
"key"
FROM "entrys"
WHERE
"log-id" = 1 AND
"region" = "user" AND
"entry-number" <= 1700108
AND key > "G"
GROUP BY "key"
ORDER BY "key" DESC
limit 20 -- (1)
) AS "specific-entrys"

INNER JOIN "entrys"
ON
1 = "entrys"."log-id" AND
"specific-entrys"."key" = "entrys"."key" AND
"user" = "entrys"."region" AND
"specific-entrys"."entry-number" = "entrys"."entry-number"
AND "entrys"."key" > "G"

WHERE
"entrys"."log-id" = 1

ORDER BY "key" ASC
;
-

...which has this query plan in SQLite verson 3.31.0

-
QUERY PLAN
|--MATERIALIZE 1
|  `--SEARCH TABLE entrys USING COVERING INDEX 
entrys-log-id-region-key-entry-number (log-id=? AND region=? AND key|--SEARCH TABLE entrys USING INDEX entrys-log-id-region-key-entry-number 
(log-id=? AND region=? AND key`--SEARCH SUBQUERY 1 AS specific-entrys USING AUTOMATIC COVERING INDEX 
(key=?)

-


My problem is with the MATERIALIZE.

The query produces just shy of 2 million rows.

It takes several seconds to start up but is then pretty quick when fetching 
each row.


What I want to do is get rid of the startup costs so that I can paginate it 
efficiently.


If I run the subselect on its own then there is no startup cost. The 
results just get streamed straight out of the index.

Its query plan is

-
QUERY PLAN
`--SEARCH TABLE entrys USING COVERING INDEX 
entrys-log-id-region-key-entry-number (log-id=? AND region=? AND key
-


Is there anything I can do to make the original version of the query stream 
the results in this way?


The best I have come up with is to insert a LIMIT clause at the point 
denoted with "-- (1)". This keeps the subselect small and then 
materialising the subselect and generating the automatic covering index 
becomes cheap.


For pagination I then feed in the key from the last row of the previous 
batch at the points denoted with "-- (2)" and "-- (3)".


If I do this then it seems equally cheap to access batches at the start and 
end of the complete result set. The per-query cost is determined by the 
batch size as set but the LIMIT clause.



However, I have been under the impression that LIMIT is supposed to be a 
"debugging" extension to the language and not recommended for use in 
queries that end up in one's program. LIMIT also only hides the latency; it 
amortises it over each batch, but I still end up with memory requirements 
in each "client" thread that are larger than I'd like; ideally I'd just 
store the current row.




Thanks for any tips!




Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Getting "chunked" output from a large SELECT operation.

2019-10-16 Thread Andy Bennett

Hi,

I'm having a situation where the results of a large SELECT 
operation are apparently too big to fit in memory.


Obviously I could jerry-rig something to work around this, but 
I have a vague recollection that SQLite provides a nice way to 
get the results of a query in "chunks" so that the memory 
demands can be reduced as much as needed by going back to the 
well a (potentially large) number of times.


Am I remembering this right?  Can anyone refresh my memory on 
how to do it if so?


What are you seeing that leads you to believe that the results are too big 
to fit in memory?


Are you able to show us your query and how you are executing it from your 
language?






Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SPAM?] Is WAL mode serializable?

2019-07-10 Thread Andy Bennett

Hi,


Thanks for the detailed explanation of how it works. What I meant to
ask was "is there really a difference in the *semantics*?". i.e. from
the user perspective, can databases in the two different modes end up
in different states?


My understanding is NO, as long as the same transactions complete.


Thanks! That explanation is really great.



In both cases, if a read transaction attempting to upgrade to a write
transaction gets a busy, it needs to end the transaction and re-do its
reads before it can do its write.


This is assuming that you have the defauly busy handler engaged?




Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is WAL mode serializable?

2019-07-10 Thread Andy Bennett

Hi,


Is that last quote just imprecise writing or is there really a
difference between SQLite's Snapshot Isolation in WAL mode and 
its Serializable isolation in ROLLBACK mode?


Yes, there is a difference.


...sorry, it seems it was my turn to do some sloppy writing!

Thanks for the detailed explanation of how it works. What I meant to ask 
was "is there really a difference in the *semantics*?". i.e. from the user 
perspective, can databases in the two different modes end up in different 
states?



... it is looking at the database as it 
was at the time it obtained its "repeatable read" timestamp, and 
cannot be permitted to make changes since it is looking at an 
"old view" that does not represent an updateable state of 
affairs.


Semantically, I understand this to mean that, from the POV of the writer 
and all other transactions, that reader was virtually kicked out in the 
same way as it would have been in ROLLBACK mode but, because enough state 
is available, it's allowed to stay in its bubble until it's finished with 
it.




I guess there's a difference in ordering behaviour under load but that's 
just to do with exactly which order ends up being chosen for the 
transactions, not that it's no longer equivalent to some serial order being 
chosen?


The difference in ordering seems less if all write transactions always 
start with BEING IMMEDIATE tho'.





see https://www.sqlite.org/wal.html
especially 2.2 Concurrency


I don't think that the stuff about checkpointing, durability aside, is 
relevant from a semantic POV, yes?



Thanks for your help.




Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Is WAL mode serializable?

2019-07-09 Thread Andy Bennett

Hi,

I'm reading the docs:

https://sqlite.org/isolation.html

...and I need some clarification!


It is easy to see that ROLLBACK mode is SERIALIZABLE because all concurrent 
readers have to leave before a PENDING lock is upgraded to an EXCLUSIVE 
lock.


However, the wording for WAL mode is confusing me.


isolation.html says 'all transactions in SQLite show "serializable" 
isolation.' but it also says 'In WAL mode, SQLite exhibits "snapshot 
isolation"'.


Snapshot Isolation and Serializable often (in other engines) mean different 
things at commit time ( 
https://blogs.msdn.microsoft.com/craigfr/2007/05/16/serializable-vs-snapshot-isolation-level/ 
), but SQLite seems to say that the snapshot isolation is upgraded to 
serializable by forbidding readers to upgrade to writers if another writer 
got in before them:


'The attempt by X to escalate its transaction from a read transaction to a 
write transaction fails with an SQLITE_BUSY_SNAPSHOT error because the 
snapshot of the database being viewed by X is no longer the latest version 
of the database.'



So far, so good.

However. the definition of SQLITE_BUSY_SNAPSHOT at 
https://sqlite.org/rescode.html#busy_snapshot says:


-
1. Process A starts a read transaction on the database and does one or more
  SELECT statement. Process A keeps the transaction open.
2. Process B updates the database, changing values previous read by process
  A.
3. Process A now tries to write to the database. But process A's view of 
the

  database content is now obsolete because process B has modified the
  database file after process A read from it. Hence process A gets an
  SQLITE_BUSY_SNAPSHOT error. 
-


In particular 'Process B updates the database, changing values previous 
read by process A.' seems to suggest that values read by A have to be 
changed to effect the SQLITE_BUSY_SNAPSHOT error in A.



Is that last quote just imprecise writing or is there really a difference 
between SQLite's Snapshot Isolation in WAL mode and its Serializable 
isolation in ROLLBACK mode?



Thanks for your help!



Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] wal

2019-06-28 Thread Andy Bennett

Hi,

THe most persists after it has been set.
The file comes and goes as needed. I seem to remember there are some 
caveats in the manual about what directory permissions are required and 
what happens when you can write the journal file and the database file 
but not the directory that contains them.




"The WAL journaling mode uses a write-ahead log instead of a rollback
journal to implement transactions. The WAL journaling mode is
persistent; after being set it stays in effect across multiple database
connections and after closing and reopening the database."

When using 'single shot' access to the database, with no other
connections, I see a wal file being created and deleted. Just for my
understanding, would it be of advantage to have a second persistent
connection just for keeping the wal alive?

(I have no real world scenario for this, I just saw this happen and
wondered while setting up SQLTools on Sublime for SQLite. It
doesn't/can't create a persistent connection)

Ingo
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Please explain multi-access behaviour to me

2019-06-18 Thread Andy Bennett

Hi,


Actually, you're better off setting a timeout rather than handling
retry in your own code:




As I understand it, SQLITE_BUSY can occur in two situations: one where 
busy looping will fix it and one where it won't. Using the busy_timeout 
handles the first case automatically and means that if you get 
SQLITE_BUSY in user code it's always the deadlocking variant.
To get around those deadlocks you have to abort the whole transaction 
and retry from the beginning. Keith explains some of these scenarios in 
more detail in a message a few moments before your own so I won't repeat 
them here.


Nevertheless, you need to retry because of concurrency contention, which 
can be handled automatically and reasonably efficiently, *AND* you need 
to retry at the business logic layer due to ACID correctness guarantees 
being provided and enforced.



Transactions can also fail for other reasons (such as a UNIQUE 
constraint violation or a read-modify-write hazard that happens when 
another transaction changes some data that you read and are manipulating 
and writing back) and you need to be able to recover from these 
scenarios.


Having said that, SQLITE is nice because its single writer model means 
that read-modify-write hazards are less of a problem than other RDBMSes, 
provided transactions that intend to write use BEGIN IMMEDIATE.




--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Please explain multi-access behaviour to me

2019-06-17 Thread Andy Bennett

Hi,

A connection doesn't need to check locks on every statement - 
only when it tries to spill to disk, most commonly during 
commit.


I think I understand what you wrote.

So the bit of my program can think that its changes were 
written to the database and only later might my program find 
that they weren't ?  By which time it might no longer know what 
they were ?


Should I have used BEGIN EXCLUSIVE ?


The part of your program that performs the transaction and associated 
business logic should be repeatable. That is, you should be able to call it 
one or more times without side effects. After it succeeds you can perform 
any side effects you require.


If I don't have time to engineer all the business logic in a functional 
style I try to do all the reads in the first part of the transaction and 
then all the writes in the second part. I then store all the writes in a 
list and, if the transaction fails, I can retry just the statements in the 
list. This has several caveats:


This assumes that the transaction does eventually succeed and that the 
writes are not dependent on the reads whose data might have been modified 
by other transactions. You can sometimes arrange for this property with 
careful design of the database schema but unless you're very careful you 
can end up corrupting the business state in the database in a way that is 
difficult or impossible to detect or debug. It works best (i.e. most 
safely) with plain INSERTs. i.e. when you don't overwrite any of the 
existing data with UPDATE or UPSERT-style statements.



If this is too much trouble then you can, as you say, use BEGIN EXCLUSIVE 
and fix it when you get to the point where more concurrency is required.






Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Have SQLite handle values of my own type

2019-05-24 Thread Andy Bennett

Hi,


IP addresses are easy: convert to the form aaa.bbb.ccc.ddd where each
of the four parts is always three digits long.


IPv4 addresses are 32bit unsigned integers internally. The dotted -quad 
notation is 4 8bit unsigned integers that get concatenated together. If 
you store them as a 32bit integer in SQLite then you can use the regular 
arithmetic operators such as < and > to work out if a given address is 
in a particular subnet. You can also use the BETWEEN clause.



Over Christmas I was working on an encoding for arbitrary struct-style 
datatypes that outputs valid UTF-8. It has the interesting property that 
the encoded values sort in the same order as the decoded values. i.e. 
the integer 2 sorts before 10 rather than what you'd get if you made the 
naive conversion to text. The primitive types it knows about include 
boolean, number, text and blob. The number type is really cool because 
it can represent any rational number, and the sorting property holds. 
The primitive types can be combined into compound types and you can 
control the relative order that different compound types sort in.


I'm using this to implement a database-style system where the data may 
be in different backends (i.e. SQLite, Postgres or CSV) at different 
times but I need the ordering and the available types to be stable.


The downside is that it's not "fast" to encode or decode and it's not as 
compact as a machine native encoding. Having said that, there are 
operations that can be done directly on the encoded data such as 
sorting, comparison and composition.


It's not quite on topic for this list but if anyone's interested I can 
provide more information and links to implementations in Scheme and 
Javascript.




andy...@ashurst.eu.org
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the recommended way to write to views?

2019-02-20 Thread Andy Bennett

Hi,

Like how do I get ROWID of the "just inserted" record in A to insert 
into C

properly. I am thinking "should I make transactions, but we don't have
variables"... and going round-and-round without solution.


Is last_insert_rowid what you're looking for?
https://www.sqlite.org/c3ref/last_insert_rowid.html


--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Am I understanding how to use ROLLBACK - SAVEPOINT correctly?

2019-02-13 Thread Andy Bennett

Hi,


For example, would I do this:
Connect to the DB
[ Pass the command to save: ]SAVEPOINT 'current'
[ User choices are processed: SELECT and UPDATE statements ]
[ if error or user changes their mind ]    ROLLBACK TRANSACTION TO
SAVEPOINT 'current'

[ otherwise... upon success ]    RELEASE SAVEPOINT 'current'


That looks pretty good. I think you also need to RELEASE the SAVEPOINT 
after you've ROLLed BACK to it (if you no longer want it) otherwise it 
stays on the stack so that you can ROLLBACK to it again.




I have also recently implemented something using SAVEPOINT TRANSACTIONS 
and I'm unclear (having read the docs) what state one can expect 
statements to be in after a ROLLBACK. One of the mental models says that 
it's as if the transaction was restarted from the beginning, up until 
the SAVEPOINT. However, I assume that SELECT statements stay put, 
returning the next row after the one that was last fetched rather the 
one that was fetched immediately after the first time that the SAVEPOINT 
was passed.


Does anyone know what the guarantees are?




andy...@ashurst.eu.org
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ISO8601 vs Numeric Timestamp for Date Storage

2019-02-06 Thread Andy Bennett

Hi,


Integer unix timestamps are only accurate to one second, where ISO8601
(at least as implemented by SQLite) can go to 1 millisecond.  Also you
have to know the epoch to interpret a unix timestamp - not everybody
uses 1970-01-01 00:00:00.  Will people be able to figure out what the
field value means when somebody discovers your data in 100 years?

The SQLite implementation is *not* subject to the 2038-01-19 integer
overflow problem.  But other systems that might interact with SQLite
are and so that is something to keep in mind as well.

On the other hand, integer timestamps take up less space in the file.

The third option is a fractional julian day number stored as an 8-byte
floating point value.


Do you want to store timestamps from the future (for example, a 
calendar-style application) or will all your timestamps always represent 
points in the past?


If you want to be able to accurately store future timestamps then the 
integer seconds and julian day number representations require some extra 
metadata.


Whilst past timestamps can always be deterministically converted from their 
local timezone to UTC and back again, this is not the case for timestamps 
in the future.


This is because the timezone and daylight saving rules change from 
time-to-time. In the integer seconds and julian day number representations, 
future timestamps must be stored in local time along with their timezone so 
that information is not lost.


In a perfect world you would store local time and location (rather than 
timezone) as places do also occasionally move timezone. However, this is 
difficult to work with because databases that map from suitably represented 
locations to timezones are more difficult to come by than databases that 
describe the relationships between timezones.



This is not a hypothetical problem or one where the error is small.

Bangladesh cancelled their daylight savings observances in 2010. In 2011 
Russia made their daylight savings time permenant; moving onto it in the 
Spring and never moving back.


If you had prematurely converted timestamps from these places to UTC then 
your error would be measured on the order of an hour.


In 1994 Eastern Kiribati crossed the International Date Line 
(bureaucratically speaking). If you had prematurely converted a timestamp 
from there then your error would be on the order of a day!




Doing consistent arithmentic on future dates is left as an exercise for the 
reader (sorry)!






Regards,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite slow when lots of tables

2019-01-29 Thread Andy Bennett

Hi,

what happens if I put all data in a single table and this table 
become very huge (for example millions of rows)?


Will I have same performace problems?


The INSERTing profile has been covered by others but I'll just add 
something about SELECT as well.


It depends on how you query it. i.e. what you put in the WHERE clause. If 
it's by primary key ("WHERE   ") then it 
will remain fast. If it's by other columns then you may need to add an 
index. When you need to do this will depend on the exact nature of the 
query.


If a query becomes slow you can prefix it with "EXPLAIN QUERY PLAN " to see 
what it is doing. You can use that information to add the correct index.


Of course, when that happens, we can help you with interpretation of the 
query plan if you want it.





Regards,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planning Knowledge

2019-01-23 Thread Andy Bennett

Hi,


I could use the inner join for the "entrys" join and the "items" join
but not the "entry-items" join because each entry can have more than
one item.


  WITH a(id, name) AS (VALUES (1, 'A')),
   b(id, name) AS (VALUES (1, 'B1'), (1, 'B2'))
  SELECT * FROM a INNER JOIN b USING (id);

  1|A|B1
  1|A|B2

The only difference between inner and outer joins is how rows without
any match are handled.


Thanks Clemens! In my case entries can have zero items as well and I still 
want the entry itself to show up.




I started with an OUTER JOIN as I find it easier to show that it's
doing the correct thing because I can search the output for errant
NULLs. Trying to detect missing rows in an INNER JOIN is harder.


If the join columns have the same name, using USING is easier.

And it would be a good idea to enforce the relationships between the
tables with foreign key constraints: 


(However, constraints do not affect how you have to write your queries.)


Ah yes. It might be worth looking at this. I've always avoided it in the 
past because my experience with other engines taught me that it makes 
experimenting at the monitor harder. Are there any efficiency benefits or 
is it just there to enforce data integrity (very important, of course;-))?


It looks like they have to be enabled on a per connection basis. In this 
case I (currently) control all the client code but is it possible for the 
foreign key relationships to get out of sync if one of the connections 
omits to apply the pragma?



Thanks for the tips!




Regards,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planning Knowledge

2019-01-22 Thread Andy Bennett

Hi David,

Thanks for your thoughtful reply.



Can't go into as much detail as you. But a couple comments.

"primary key unique" is redundant, and will actually create a 
redundant unique index.


Are you refering to the CREATE TABLE clauses for the "items" and 
"registers" tables?



I appear to have indexes named "sqlite_autoindex_items_1", 
"sqlite_autoindex_items_2" (for the blob column) and 
"sqlite_autoindex_registers_1".


If I run

CREATE TABLE "items2" ("item-id" INTEGER PRIMARY KEY  NOT NULL   , "blob" 
BLOB NOT NULL  UNIQUE )


...then I just get "sqlite_auto_index_items2_1" for the blob colum.


Thanks for that: it's a good find! I'll do the same for the "registers" 
table as well.




"Do later version of SQLite manage to spot the constant 
propagation opportunity?"
This requires way more algebra and proof than I think you 
realize. "entrys" is on the right side of a LEFT OUTER JOIN, and 
therefore may be null when it comes time to do the next OUTER 
JOIN to "entry-items", so a direct replacement of an "equals" 
constraint isn't possible. And after that it again starts 
becoming algebra as to whether that null can affect things etc.


Yes, you are right. I could use the inner join for the "entrys" join and 
the "items" join but not the "entry-items" join because each entry can have 
more than one item.


Changing the "entrys" join to an INNER JOIN and adding both ORDER BY 
clauses gets me a much better query plan:


-
1|0|0|SEARCH TABLE entrys USING COVERING INDEX 
entrys-log-id-region-key-entry-number (log-id=? AND region=?)
0|0|1|SEARCH TABLE entrys USING INDEX entrys-log-id-region-key-entry-number 
(log-id=? AND region=?)
0|1|0|SEARCH SUBQUERY 1 AS specific-entrys USING AUTOMATIC COVERING INDEX 
(key=?)
0|2|2|SEARCH TABLE entry-items USING COVERING INDEX 
entry-items-log-id-entry-number-item-id (log-id=? AND entry-number=?)

0|3|3|SEARCH TABLE items USING INTEGER PRIMARY KEY (rowid=?)
-

Changing the "items" join to an INNER JOIN doesn't get me any more but I'll 
do it anyway as the OUTER JOIN is not strictly needed.



Thanks for that!

I started with an OUTER JOIN as I find it easier to show that it's doing 
the correct thing because I can search the output for errant NULLs. Trying 
to detect missing rows in an INNER JOIN is harder. I then failed to realise 
that an INNER JOIN would suffice.


It now even does the constant propagation of the log-id at (** 5 **) but 
not the "entry-number" one!




For the ordering, I recommend seeing if you can replace one of 
those "entrys" indexes so that they start with "key" as the 
first field in the index. That would at least give it more 
opportunity to use that index for the ordering rather than 
needing to order things after they're all collected. That and 
explicitly stating the order by in _both_ the sub select and the 
final might make it notice "I can use this _ordered_ sub select 
as the outer table in the joins and get my overall ordering that 
way"... or it may not. Worth a try though.


Yeah. I've had a fiddle around and can't get it to do it. However, 
converting to an INNER JOIN as above seems to get it to work without 
changing the indexes tho'.


In the schema I have,

CREATE UNIQUE INDEX "entrys-log-id-region-key-entry-number" ON "entrys" ( 
"log-id" ASC, "region" ASC, "key" ASC, "entry-number" ASC)


...which seems to get used for the sub-query. It gives it its ordering and 
the "key" part also gets used when I add the extra WHERE constraint on 
"key" (** 2**) so it seems to be having the effect you mention.


If I add an ORDER BY clause to the sub-select (** 3 **) the query plan 
doesn't change so I guess it's happy to use that index to get the ordering.


If I then also add an ORDER BY to the main select it still uses a temporary 
b-tree to confirm the sort. This is the main source of my confusion because 
the query plan shows that the joins are all executed as inner loops so the 
ordering of the outer query should be preserved




Thanks for your help David and thanks for solving the problem!




Regards,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Query Planning Knowledge

2019-01-22 Thread Andy Bennett

Hi,

I'm having some problems understanding what the query planner is doing and 
how to convince it to work in the way I want.


Sorry that this is such a long and involved eMail. I've tried to describe 
my problem and show the steps I have taken in debugging it. I have made 
some effort to present a minimal and basic illustration of the problem. 
Either I'm in error and glad to be assisted, or there's some opportunities 
that the query planner is not taking. Hopefully my detail helps someone who 
knows more than me get to the bottom of the issue!



My GUI montior is reporting SQLite 3.8.11.1 and my app is using 3.17.0. I 
get the same result with both.


I've included more complete details of my schema at the bottom of this 
message but only introduced the bits I'm talking about as I go through.



I'm aiming to write a query that's good for feeding an iterator and for 
pagination. i.e. I don't want a query that needs to compute the whole 
result up-front, before returning the first row. I want something where I 
can execute the query, consume a few rows for a while, close it and then 
later start a new query with a parameter based on how far I got through and 
then continue where I left off.


The structure of my dataset is such that once rows are written to the 
database they are not changed, so there exist a set of parameters where I 
can do this.



The data is an append only log. The log consists of entries (in the entrys 
table) and they have monotonically increasing "entry-number"s. The table 
can store more than one log so there's a "log-id" as well. These entries 
have some data of their own and each point to zero or more items which are 
stored in the "items" table. Thus there's a join table "entry-items" which 
consists of 3 columns, all parts of the primary key, "log-d", 
"entry-number" and "item-id".


Each entry has a "key". As entries are added to the log, the "latest" entry 
for a particular key describes the current state of that key. The set of 
keys and their current state is called "records" and they only exist as a 
query on the tables. It is this query that I'm trying to plan well.


The idea is for the app user to be able to ask for a cursor on the records 
and then consume them a page at a time. I'm not particularly bothered which 
order they keys come out in but it has to be deterministic so that I can 
restart the query again. It's nice if they come out in "key" order but 
"entry-number" order will do too. The main property I want is to be able to 
pass this app-defined cursor over my API boundary without holding open a 
read transaction on SQLite.


I've started with a log that isn't huge but needs some attention to make it 
work efficiently. This log has 54,272 entries and 42,737 records at the 
latest version. I expect to have logs with a few tens of million active 
records but probably nothing larger than 100 million. Not exactly Big Data 
;-) but worthy of indexing.



So, here's my query, annotated with a few numbered points so that I can 
talk about things. The bits that are commented out get commented in (and 
out again!) as I discuss my problems with the query plan.


-
explain query plan
SELECT
"entrys"."log-id"   AS "log-id",
"entrys"."entry-number" AS "entry-number",
"entrys"."region"   AS "region",
"entrys"."key"  AS "key",
"entrys"."timestamp"AS "timestamp",
"entry-items"."item-id" AS "item-id",
"items"."blob"  AS "blob"

FROM
(SELECT -- ** 1 **
  MAX("entry-number") AS "entry-number",
  "key"
  FROM "entrys"
  WHERE
  "log-id" = 51 AND
  "region" = "user" AND
  "entry-number" <= 54272
-- AND key > "19" -- ** 2 **
-- AND key > "145943"
  GROUP BY "key"
-- order by key -- ** 3 **
) AS "specific-entrys"

LEFT OUTER JOIN "entrys"
ON
"specific-entrys"."entry-number" = "entrys"."entry-number"
AND "specific-entrys"."key" = "entrys"."key" -- ** 4 **
AND "user" = "entrys"."region"

LEFT OUTER JOIN "entry-items"
ON
51 = "entry-items"."log-id" AND -- ** 5 **
"specific-entrys"."entry-number" = "entry-items"."entry-number"

LEFT OUTER JOIN "items"
ON
"items"."item-id" = "entry-items"."item-id"

WHERE
"entrys"."log-id" = 51
  -- order by key -- ** 6 **
  ;
-

Here's the query plan for the query above:

-
1|0|0|SEARCH TABLE entrys USING COVERING INDEX 
entrys-log-id-region-key-entry-number-desc (log-id=? AND region=?)

0|0|0|SCAN SUBQUERY 1 AS specific-entrys
0|1|1|SEARCH TABLE entrys USING INDEX 
entrys-log-id-region-key-entry-number-desc (log-id=? AND region=? AND key=? 
AND entry-number=?)
0|2|2|SEARCH TABLE entry-items