Re: [sqlite] SQLite Application Server Concurrency

2018-01-13 Thread Dinu
James,
You don't need to go all the length to creating a sqlite proxy RPC as has
been suggested.
Your service could implement one thing only - a lock to the database file
(external to the SQLite locking mechanism, of course). So it only needs to
serialize concurrency; the client does the RPC (via either protocol handy to
you - http, WCF, whatever) to obtain the lock; then it can safely write to
the DB over NFS. As long as all clients pass through the locking mechanism,
you are safe.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient ways to maintaining order rank / row_number() in a rather large set ?

2018-01-09 Thread Dinu
Analogous to the percentile solution (it's actually the same thing), you can
use a checkpointing table. This has roughly the complexity of SQRT(n) for
both read and write.

I.E. say you expect to have 1M records and define order based on value then
id.

You then make a checkpoint table (first_rank,value,rec_id) holding every
1000th record in sorted order. So row 1 of checkpoint table coresponds to
the 1000th sorted record.

When you insert/delete a row, you only need to update checkpoints that come
after said row.

When you are searching for row 4521, you do something like:

SELECT 
FROM 
table 
JOIN
checkpoint
WHERE 
(
(table.value=checkpoint.value AND table.id>=checkpoint.id) OR 
table.value>checkpoint.value
) AND
checkpoint.first_rank=4500
ORDER BY
table.value ASC,table.id ASC
LIMIT 21,1



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient ways to maintaining order rank / row_number() in a rather large set ?

2018-01-09 Thread Dinu
The only way to efficiently do this would be to have counting (range) index
b-trees. Since you don't, you're stuck with a O(n) implementation, either on
reading or writing. So your solution is as good as it gets, save maybe some
implementation particularities.

However, you may consider a shift in perspective: with this kind of data,
statisticians use percentiles.
That is, instead of querying for ranks 21,000-22,000, you query for "top
1%", "6-8%" etc, based on either value or rank; this way, you can maintain a
percentile rank table as granular as you like (i.e. every 1% results in a
table with 100 lines). Each line would have count, value min, value max.
Such a table is much faster to update and then if you need to retrieve the
actual records, you use by range (value BETWEEN min AND max) joined with the
percentile table.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-08 Thread Dinu
Another reason to do at least the LEFT JOIN + WHERE -> INNER JOIN
optimisation:
If a view is based on a LEFT JOIN, running a WHERE query on it will exhibit
the same poor behavior and here there will be no way to rewrite the query.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-07 Thread Dinu
... and the downside that it's just linear overhead for i.e. an unique index,
it works best for indexes with low cardinality... win some, lose some :)



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-07 Thread Dinu
What I do notice reading https://www.sqlite.org/fileformat.html (if I get it
right) is that index lines are in the form (for an index on a,b,c ie):

   
   
   

Whereas they could be represented as:

   [ , ,  ](3)

whith [pk_list] being a subtree; reverse lookup from table record to index
record would be arguably as fast if not faster. So this would have the
advantage of more compact indexes (less data), having an index line count
(no prefix so there is always just 1 update involved), with the downside of
the complexity of an added level of indirection.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why doesn't SQLite optimise this subselect?

2018-01-06 Thread Dinu
If you want just 1 value (any) from lookup, you can use:

SELECT
 post_processing_info, 
 (
SELECT
is_json
FROM
tables_lookup
WHERE
tables_lookup.content_hash=webpage_contents.content_hash
LIMIT 1
 ) AS is_json
FROM 
 webpage_contents
WHERE 
 content_hash = 'abc' 

If you want only one value,
or

SELECT
post_processing_info, 
x.is_json
FROM
 webpage_contents
JOIN
 (
SELECT 
content_hash,
MIN(is_json)
FROM
tables_lookup
GROUP BY
content_hash
 ) AS x ON
 x.content_hash=webpage_contents.content_hash
FROM 
 webpage_contents

if you want the full join... I'm using MIN() here as an arbitrary function
to get just one value from the group.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_column_decltype and max and min

2018-01-06 Thread Dinu
Keith Medcalf wrote
> Full Schema Tables:

Thanks for this, I will have grat use for it too, soon! Actually it's so
nice I think it could go into the documentation.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Dinu
I think storing index prefix counts would only make sense in a special kind
of 'statistical' index, where you would store count(x IS NOT NULL), sum(x),
sum(x^2) so that usual statistical functions can be computed optimally.

For a table count, I think it would make sense.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why doesn't SQLite optimise this subselect?

2018-01-06 Thread Dinu
Could you describe your intent in more detail?

SELECT * FROM ... GROUP BY is unstable at best, so I cannot really grasp
your intention. Which lookup record's 

If I guess right, you might want something in the lines of:

SELECT
 w.post_processing_info, 
 l.is_json 
FROM 
 webpage_contents w 
JOIN 
 (SELECT DISTINCT is_json from lookup_tables.lookups) l 
 USING (content_hash) 
WHERE 
 content_hash = 'abc' 

(this might return 0-3 rows if lookup_tables contains 3 values (0,1,and
NULL) for is_json)

which in turn I think would optimize best if written as:

SELECT DISTINCT
 w.content_hash,
 w.post_processing_info, 
 l.is_json 
FROM 
 webpage_contents w 
JOIN 
 lookup_tables.lookups l 
 USING (content_hash) 
WHERE 
 content_hash = 'abc' 




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Dinu
Richard Hipp-3 wrote
> all the parent b-tree pages must be updated

Yup, no question about it, at best it could be an opt-in. But as it is a
design decision, I checked to make sure count() really is O(n) as Jonathan's
question implied.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Dinu
Clemens Ladisch wrote
> For count(*), the database does not need the actual table rows.

I think this is not true, he has a point here: SELECT COUNT(*) WHERE
=? needs to examine every index key prefix (excluding at least
ROWID) that matches. This may mean reading in the whole index.

I think b-trees can store the counts of descendant nodes for every node to
solve this issue in O(log n), but I don't see anything like it in the SQLite
format.




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Keith Medcalf wrote
> The primary difference is likely the test on b.c.  

/There was some text here that seems to have been lost and my browser opened
a "super resume templates" page... maybe I've been already spectre-hacked?
Anyway, I repost:/

The primary difference from what I see is that it changes the query plan to
start off with an index-based search on b, not a search (scan) on a. To me,
this can be due to the supplimentary condition in the ON clause making this
plan more appealing to the optimizer. So with this in mind, an easy
optimisation would be to just "permeate" any imperative non-null condition
in the where clause to the on clause. I don't know if the execution plan is
identical to that of an inner join (I never tried to actually convert it to
an inner join and I'm working on sth else atm), but it's satisfactory and it
starts off the right way, with a search on b, with an execution time well in
the expected margins.


Keith Medcalf wrote
> I think the query is ill-conceived when written as an outer join.

If anything, to reflect intentionality the query should have been written as
such:

SELECT FROM (
  SELECT FROM a LEFT JOIN b
) WHERE b.c=5




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Keith Medcalf wrote
> The primary difference is likely the test on b.c.  


Keith Medcalf wrote
> I think the query is ill-conceived when written as an outer join.

If anything, to reflect intentionality the query should have been written as
such:

SELECT FROM (
  SELECT FROM a LEFT JOIN b
) WHERE b.c=5




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
To reiterate, Keith: to get the query to execute properly, I didn't change
the LEFT JOIN to an INNER JOIN!
Nope,
I rewrote

SELECT
FROM
  a
  LEFT JOIN b ON 
WHERE b.c=5

to

SELECT
FROM
  a
  LEFT JOIN b ON  AND b.c=5
WHERE b.c IS NOT NULL

So I just added a redundant predicate and it runs perfectly, on SQLite!
That's why I said this simple improvement can surely be taken care of on the
optimizer, while the larger discussion of actually changing the outer join
to an inner join or even tackling the -OR- case is for sure something nice
to think of, but increasingly more complicated.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Keith Medcalf wrote
> but cannot be re-ordered for OUTER JOINS.

Actually, I think order requirements (or rather, row grouping requirements,
as far as I can paint it the requirement is just that all outer joined rows
come in a bunch for each main join tuple) would not be violated if the join
is made on an unique key left-side and an index is used right-side :) or
something similar. I don't know, extensive algebra must be involved :) Even
without index order inference, the main trunk keys can be sorted in a temp
structure to preserve the condition, like in a GROUP BY query. However, I do
see that SQLite seems to actually do it (scan b before a I mean) if I
reqrite the query as I showed.

select *
  from a, b
 where a.a *= b.a

was replaced by

select *
  from a LEFT JOIN b ON a.a = b.a

Right, right, and with this in mind you can see my problem with the query is
so easy to understand:

My query, on the old format, is:

select *
from a, b
where 
  a.a *= b.a AND
  b.c = 5

My "improved" query, on the old format:

select *
from a, b
where 
  a.a *= b.a AND
  b.c *= 5 AND
  b.c = 5 // (OR IS NOT NULL)

You can see the b.c *= 5 (JOIN ON ... AND b.c=5 ... WHERE b.c IS NOT NULL)
is redundant, because it's just a weaker predicate, and I needed to add it
just as an index hint on the join loop to trigger the right execution plan.


Keith Medcalf wrote
> I would sincerely doubt that there is *any* SQL optimizer or query planner
> that can optimize mis-spoken queries containing OUTER JOINS.

I don't know about mis-spoken, I don't think anything is mis-spoken.
This app is running fine on Maria, I'm in the process of porting in to
SQLite. I wouldn't have picked on this query unless it was lagging behind
orders of magnitude (2.5s vs 50ms). So I think Maria does it (I haven't
bothered to check the execution plan there, went straight to hacking
SQLite).



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Short sum-up:
- The -OR- alternative I provided is not semantically equivalent, I will
work on one that is :)
- The other one, without the -OR- (second post) still stands.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Richard,

Thanks for acknowledging this, you are absolutely right, that's why I stated
that no DB does perfect optimisations and that computing the alternative
-OR- based closures are probably much harder to tackle. Also E. Pasma
pointed out the -OR- queries as I wrote them are not really semantically
equivalent unless the 2 joins are disjunct.

However, the case of the imperative WHERE NOT NULL implying INNER JOIN is
just a matter of replacing a predicate with a stronger one, so in all
fairness I imagined it a far lesser overhead than, say, the query flattener.
And I imagine it's a much more common situation, too, especially when users
are adding additional filters via WHERE clauses to a base query, so it might
benefit a lot of users, too. I know it would us, by not having to rewrite
these queries when porting; we are working on x86 servers, and a stick of
memory or a hard drive cost less than a programmer's day for us :) 

For the extra memory, I know for computing relational closures the spatial
complexity can get big, but only when the structure of the query is written
warrants it in the first place, so it shouldn't manifest heavily on a query
that doesn't have this structure.

This is just my best view on this, obviously it's a political decision to be
made so it's no make-it-or-break-it thing, like mentioned before, we are
porting some pretty big system and when I notice differences with SQLite, I
jolt them down, in the hope it might benefit you or the millions of users,
if not by changing SQLite, then simply by pointing out the workaround to
other users, such as moving the WHERE condition out to the ON clause, it's
not necessarily a trivial thing to consider for everyone. 

Ryan, 

You cannot ask SQL a Method query, that's where my whole RDBMS understanding
takes me. It nullifies the purpose of queries as well as all efforts you
yourselves have put into a lot of things, query flattening to mention just
one. The "same result" is not accidental, the equivalent queries will
produce the same result no matter which data populates the tables. That is
the only deffinition I know of semantic equivalence. SQL is declarative and
thus everything that describes the same thing is the same thing.

E.Pasma,

Thanks for taking the time to make the TC. This is always a huge putdown for
me, because finally the execution plan depends on the data indexes are
populated with (via ANALYZE) and are tables are huge so it's always a
putdown for me to create a minimal TC.

Indeed I noticed just now the 2 queries are not equivalent that way :)
Thanks for pointing that out! I will work on an equivalent -and- optimized
rewrite :)

For the query plans though, here is where the index stats come in: here a
"SCAN a" makes sense, but in our case the number of records in a is on the
order of 1x records to b and c, and also the cardinality of b.d and c.d
is on the order of 1000; so a "SEARCH b, SEARCH c" works out. 

At minimum you should have indexes on b.d, c.d, a.ab, a.ac; but even so and
with adding another 1000 records on a, b and c and running the query:

EXPLAIN QUERY PLAN
SELECT * 
FROM 
  a 
JOIN 
  b ON b=ab AND b.d IN (1,2,3) 
JOIN 
  c ON c=ac AND c.d IN (4,5) 

selectid |order |from |detail |
-|--|-|---|
0|0 |0|SCAN TABLE a   |
0|1 |1|SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) |
0|0 |0|EXECUTE LIST SUBQUERY 1|
0|2 |2|SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) |

it still plans a "SCAN a" first. So I guess I'll have to backtrack from the
real data to generate a TC.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
Thank you for your answer, Keith. I had my problem "fixed" before I wrote the
first mail. Also with every problem I also provided the fix that worked, for
anyone that might run into the same problem.
However, it's difficult to not get a little frustrated with your answer. 

At https://sqlite.org/queryplanner.html I read:

"The best feature of SQL (in all its implementations, not just SQLite) is
that it is a declarative language, not a procedural language. When
programming in SQL you tell the system what you want to compute, not how to
compute it."

And I completely agree with this, "how to compute it" is called relational
algebra and it's what a query planner should do best. And the two queries
are algebrically identical. "(X ∊ S or X:=null) AND (X is not null)" is
equivalent to "X ∊ S is not null". The two queries might look different only
from an imperative programming point of view.

As to why the query is written that way: with the above in mind, I will
contend that there can absolutely never exist a "mistaken" way to write a
query, as long as the description of the predicates is correct and
consistent with the schema. You should consider that quite frequently
queries are the result of one or more levels of logic abstraction (ORM,
DBAL, etc). In my case, modifying the query was not difficult to do, but in
other cases one may have few options on rewriting the way the query
structure is generated. The only way to reduce a fabricated query is through
relational algebra, and that is up to the DB, not the programmer, not the
abstractions in-between.

In this particular case, the where is optional; depending on parameters, I
want the set of data that is correctly defined as the left join of tables a
and b, or I might want a subset of this join that has a particular property
over the left-joined set. The query was correctly written, to rewrite it so
that the query planner might know how to run it is wrong, IMHO.

To sum it up: I think it's every DB's intention to optimize as best possible
a query into an execution plan. None does it perfectly, but all try to, very
hard. With this intention, I reported a case where the query planner COULD
be improved. I think you will at least agree with me that making it better
can't be wrong. Whether that happens tomorrow, in a year or never, that's up
to the mercy, resources and priorities of the developers, so I am really am
not interested in an argue over this. 



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
Algebrically, having a non-null imperative lookup condition in the WHERE
clause means you have a stronger predicate on the same subject (ALL MUST fit
vs. ANY that fit).



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
I think they are equivalent, if you look closer.

SELECT FROM a LEFT JOIN b ON a.x=b.x WHERE b.y=5 -is- equivalent to 
SELECT FROM a JOIN b ON a.x=b.x AND b.y=5
SELECT FROM a JOIN b WHERE a.x=b.x AND b.y=5
SELECT FROM a LEFT JOIN b ON a.x=b.x AND b.y=5 WHERE b.y IS NOT NULL

All the above are semantically equivalent. When there is only one LEFT JOIN,
the presence of any non-null non-alternative condition on the joined table
in the WHERE clause transforms it in an INNER join. There is no other way to
have a non-null value except if the row exists. The reciprocal is not true
of course.

I don't know how difficult it is to compute the -OR- closure, as it is more
difficult. But for an imperative non-null condition, I did expect the WHERE
condition to be ported to the ON lookup for optimisation.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
Probably related:

Compound join with a left outer join generates different execution plans:

LEFT JOIN (
  b 
  JOIN c ON 
)
WHERE
  b.something = 5

vs.

LEFT JOIN (
  b 
  JOIN c ON  AND b.something = 5
)
WHERE
  b.something IS NOT NULL




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
Hi all,
I've ran into an optimisation problem with a double-left join that works as
an "either" clause.

The query is as follows:

SELECT *
FROM
  a
LEFT JOIN
  b ON 
LEFT JOIN 
  c ON 
WHERE
  b.someId IN (1,2,3) OR
  c.someId IN (4,5)

This results in a bloated execution plan:
SEARCH a
SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX

However, the semantically equivalent:
SELECT *
FROM
  a
LEFT JOIN
  b ON  AND b.someId IN (1,2,3)
LEFT JOIN 
  c ON AND c.someId IN (4,5)
WHERE
  b.someId IS NOT NULL OR
  c.someId IS NOT NULL

Gets the proper execution plan:
SEARCH b
SEARCH c
EXECUTE LIST SUBQUERY



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-02 Thread Dinu
Yes, Lifepillar's way is the more orthodox approach, however I always
preferred the path-based one because:
1) One seldom runs queries only based on the descendants map; there usually
is an "AND -some other conditions-" involved; thus the ability to have one
covering index of the condition comes in very handy
2) In ontologies you very often also require "structural" sorting - such as,
the chapters in a book may need to be read like 1<1.1<1.1.A<1.1.B<1.2 etc;
this is very likely to be needed at some point and only achievable using
some modified form of the path field (with alpa-sortable numbers). This
cannot be achieved with a normalized descendency map.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-01 Thread Dinu
If a different perspective may be helpful to you:
If moving overhead to writes is an option (ie you dont have many or time
critical writes), then the tree descendants problem can be sped up to
stellar speeds by using a path column.

IE. 
add a column "path" in the nodes table that would contain something like
"1.2.3" for node 3 that is descendant of node 2 that is descendant of node
1.
Then querying 2's descendant would result in the following range:
"path">='1.2.' AND "path"<'1.2/' or you can try using LIKE semantics - both
can use an index if you are careful with collation; I found out the hard way
that the range queries are more resilient and portable, SQLite and others
have a pretty awkward way of plugging in the LIKE optimisations that may
result in the index being skipped for not-so-obvious reasons.

Inserting nodes is trivial, but moving edges requires an algorithm to update
paths (whenever a node's parent changes, all descendant's paths must be
updated). However, for most real-world ontology use scenarios, this
opperation happens very rarely and usually on the admin range of functions,
so you can afford this operation that can be pretty slow.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-22 Thread Dinu
Thank you all for the replies,
I will hack this problem one way or another after the hoildays and let you
know how it went.
In the mean time, I wish you all happy peaceful holidays, and a great New
Year!

Dinu



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-19 Thread Dinu
Hi sub sk79,
I have so far from this thread the following suggestions:
1) Copy table -> TRUNCATE -> copy back; this doesn't work, what was supposed
to be TRUNCATE semantics (DELETE FROM without WHERE) has the same
performance as with WHERE.
2) Structure alterations; either drop table, or drop indexes : I am
reluctant to do this; my evangelical instinct tells me hacking the semantics
of life might lead to implosion of Earth :)
3) "Deleted" bit field - presumably the "soft delete" as you call it; I am
analyzing this, but here the question is whether we include the bit in the
indexes. If so, performing a heap of UPDATEs should be even more
inefficient; if we don't include it in the index, the problem of the cost of
filtering the row needs some analysis which I will probably do; the problem
with this solution is that is has residual effects: we run some pretty
complex queries against this table, with complicated joins and we already
got some surprising execution plans that needed query rewriting. So with
this "deleted" bit out of the index pool, we need to check various other
queries to make sure they are still optimized to what we need.

All this said and done, 3 hours to delete 15G of data seems atrocious even
if you do it by standards resulted from generations of DOD and NSA
inbreeding... so I'm still hopeful for some DB-related solution.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Re: Atomic DELETE index optimisation?

2017-12-18 Thread Dinu
Hick Gunter wrote
> If you are running an SQLite version 3.11.0 or newer, only pages that are
> changed for the first time in the transaction are copied to the WAL file,
> thus extending it.
> 
> While progressing the transaction, the rate of "newly changed" to "changed
> again" pages will shift towards re-reading and rewriting the WAL file copy
> of a page; this will tend to slow down the rate at which the WAL file is
> growing, even at a constant delete rate, until every page has been updated
> at least once, and then stop growing until all the remaining deletes have
> been processed.

Running 3.19; thanks for the explanation, this never occured to me and makes
perfect sense.




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-18 Thread Dinu
Simon Slavin-3 wrote
> DELETE FROM MyTable

We have tried it and DELETE FROM table (without any WHERE) behaves exactly
the same! I reiterate there are no FKs or triggers defined. So this is no
fix...



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-18 Thread Dinu
Simon Slavin-3 wrote
> Once you’ve bust the cache things slow down.

I do realize that. However, not illinearly. Once I bust the cache,
throughput should drop X times and stay there. Here, the speed decreases
with the progress. Let me put it this way: say the cache size was 0. I would
expect the delete to work very slow, but near-linear with the # of rows
being deleted or the progress of deleting them. Here the performance
dramatically and constantly decreases with growing of the WAL file. You can
literally see how every M of throughput is slower than the previous :) I am
trying to find out why. For me it's very important for the migration
process.


Simon Slavin-3 wrote
> Have you tried these things ?  Did the time taken improve or get worse ?

Not yet, we will probably implement what you proposed (it's been suggested
before). But if I can find out the reason behind the nonlinear DELETE
behavior, it would still help greatly.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-18 Thread Dinu
Richard Hipp-3 wrote
> DROP, CREATE, and ALTER are transactional in SQLite, just like DELETE,
> INSERT, and UPDATE.  This has always been the case, and always shall
> be.

Thanks! But still, isn't this incompatible with any lower-than-DB-level
transaction locking mechanism? I.E. should you ever have row-level locking,
this absolutely needs a persistent, cross-connection ROWID index; while any
FOR UPDATE locking semantics need persistent gap-locking indexes... Just a
thought for the distant future, I realize it's not a discussion to have now
:)


Keith Medcalf wrote
> pragma cache_size=262144; -- 1 GB page cache

Actually I realized that the DB page size is 1K. Is this bad? I tried to run
the pragma query with 1M pages, to amount to the same 1G; there seems to be
a dramatic improvement in throughput at the beginning of the query, but it
quickly succombs to nonlinear slow-down-to-a-drag nonetheless.

Richard, leaving alternate strategies aside (DROP, TRUNCATE etc.), what
could be the reason behing this non-linear delete behavior? Why does it slow
down to a grinding halt? It would be tremendously helping for me to know; we
are in the process of migrating more than just this table to SQLite so
knowledge about the inner workings of SQLite helps us tremendously. Thanks!



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-18 Thread Dinu
Rowan Worth-2 wrote
> The entire point of SQL transactions is to implement this magic!

I beg to differ; you proposed:
> (1) BEGIN TRANSACTION 
> > (2) DROP all indexes from the table 
This is by no means valid SQL semantics; in all RDBBMS I've worked with,
structure changes' relation to a transaction is undefined at best. Even if
it works now, there's no guarantee the "BEGIN; DROP" behavior will be
consistent any time in the future. So in repect to this, the alternative of
copying to a different table, TRUNCATE, copy back, looks much more
semantically acceptable.

Rowan Worth-2 wrote
> WAL has different concurrency characteristics.

Yeap, it's supposed to do just this, keep readers from starving until a
write is complete; the WAL flush works quite well it seems, keeping the HDD
at a respectable 100% and so the WAL flush will take less than 30s.
Populating the table, on the other hand, takes much longer. 



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-18 Thread Dinu
Keith Medcalf wrote
> If you "moved" that memory (or some of it) from Level 2 to Level 1 you
> would increase performance tremendously.
> 
> pragma cache_size=262144; -- 1 GB page cache

Thanks, I will try that!




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Re: Atomic DELETE index optimisation?

2017-12-18 Thread Dinu
Hick Gunter wrote
> SQLite does this too

Thanks!




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-18 Thread Dinu
Rowan Worth-2 wrote
> I'm not sure what you're worried about? Dropping and recreating identical
> indices within a transaction won't cause a visible structure change to
> concurrent readers -- that's the point of a transaction.

I honestly don't see how in any DB system the client process would not crash
if the index it's running a curson on were to be removed. Even if SQLite
were to pull this magic out of the hat, starving client processes for the
lack of an index (a full scan query would probably take in excess of 30s)
would quickly pile up the clients to the point where one would have to kill
them anyway.
So with this in mind, I'm really not looking for a barbaric fix to this, I'm
more of tryng to understand the problem and find a viable, semantically
stable solution (and maybe trigger some improvements in SQLite, if there's a
system bug).




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu
Richard Hipp-3 wrote
> Can you try this:

Thanks Richard, as mentioned earlier, any structure change is unacceptable
due to concurrent reader clients.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu
Keith Medcalf wrote
> I had a similar problem with a multi-terabyte database once upon a time.  

Na, they are not a single-time use rows, otherwise I'd have used a FIFO :)
Every now and then, a large portion of the table becomes obsolete by
external factors.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu
SQLite 3.19.3
CentOS 7 (64 bit)
RAM: 6G total, 4.4G buff/cache, 0.6G unused
Sqlite memory: 133M RES, 0 SWAP, 0 DIRTY - don't know if it's relevant
anymore, the query seems to have entered into another execution phase, it
looks like now it's flushing the WAL.
No other PRAGMA
No FKs, no triggers.






--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu
INSERTs are taking constant time as they should, the DELETE is quite
obviously working is a blatantly non-linear progress.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu
Thanks, it's something worth trying.
However, no, that can't be the only reason. Populating scriptically with the
same data takes less than 10 minutes, including side processing... So the
disproportion between insert and delete performance is huge.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu
For the table swap operation, that is out of the question, the DB is running
on a live system that has multiple clients. Foreign keys or not, any
structure change crashes all clients (DB structure has changed). Too high a
price to pay for a delete thay may happen routinely.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu
By "one time only", I mean in my understanding, the way most DBs do on a
DELETE is this: cache the ROWIDs while deleting data rows from the main and
from the indexes, then when all ROWIDS are explored, sort the ROWID stream,
and prune the trees from a sorted stream. This is both highly efficient
(just like inserts, deletes of already ordered records are very efficient)
and highly parallelizable.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu
Ok, so to address the size of the problem properly:

We have ~32M records out of which we are deleting about ~24M.
The raw size of the DB is about 20G out of which 15G goes away. Under these
circumstances any page cache becomes irrelevant.
The delete operation, which is a single delete run from the sqlite3 tool, is
running for 3 hours now.
The WAL size is in excess of 9G atm.
The WAL size is growing ever slower (was about 10M/s, now it's 1M every 2s,
slowing down). This indicates to me a non-linear process that I can link
only to the B-trees, it's the only non-linear component I can think of that
could cause this slowing-down-to-a-drag. The CPU is capped up badly, the HDD
is at idle level so this also hints to the same issue.

In reply to your remarks:
- I understand the B-trees need to be modified. However, if you prune,
(maybe rebalance), write-to-disk every node at a time, that is hugely
stressful and inefficient when you are pruning half of a 32M nodes tree.
Since the operation is atomic, the indexes could be updated one time and one
time only.

So, what to do? I think this delete may never end... by the time it took to
write this reply, the WAL grow has succombed to 1M every 4s. 



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu Marina
Update: we are also using WAL; I have noticed during a huge delete, the 
WAL also grows huge. Could this also be a reason for slowness, that 
SQLite duplicates every block that hosts a delete? Is there any way to 
work around this?



On 17.12.2017 22:53, Dinu Marina wrote:

Hi all,
It became apparent to me from performance measurements that the DELETE 
operation is very slow, when operating on large recordsets with 
indexes involved. My deduction is that SQLite updates the indexes for 
every deleted row, which in painstakingly slow...


Since the DELETE should be atomic, the index update could also be 
atomic, and a mass pruning of the index tree is hugely faster than 
individual removals.


My question is: is my deduction correct? Is there any way to 
fix/improve this in userland? Or are there prospects for this to be 
improved in SQLite in the foreseeable future?


Thanks,
Dinu



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Atomic DELETE index optimisation?

2017-12-17 Thread Dinu Marina

Hi all,
It became apparent to me from performance measurements that the DELETE 
operation is very slow, when operating on large recordsets with indexes 
involved. My deduction is that SQLite updates the indexes for every 
deleted row, which in painstakingly slow...


Since the DELETE should be atomic, the index update could also be 
atomic, and a mass pruning of the index tree is hugely faster than 
individual removals.


My question is: is my deduction correct? Is there any way to fix/improve 
this in userland? Or are there prospects for this to be improved in 
SQLite in the foreseeable future?


Thanks,
Dinu

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXISTS optimisation?

2017-11-28 Thread Dinu
Quick note to self and others:
IN() and EXISTS() in all RDB's I know of are the uncle noone mentions; that
is to say, they have different compare semantics than JOIN so the naive
strategy is to evaluate them as dependent subqueries, not correlated ones,
which would be consistent with the behavior I noticed. However, I do know of
Maria and Postgres that do a decent job at optimizing EXISTS () (which I
think is by all means the correct semantic for this intent). But there's by
no means a golden standard across RDB's so that's why it would be very
useful to have some documentation on it, as it's one of the migration
pitfalls.

I'm still in debt with the TC, will work on it the next days.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXISTS optimisation?

2017-11-28 Thread Dinu
Thanks Clemens,

I triple-checked it and it is indeed generating different execution plans,
with the queries being absolutely equivalent. I will try to produce a
minimal test case (right now the query where this occurs is a 100 lines long
monster). 

However, I am a bit confused by the examples you provided:
The 3rd query is the equivalent of the first 2 only if the lookup table has
an unique index on x (which it does). However, this would make a very
restricted case of why you would use EXISTS(). 

In my case, this is completely opposite: x is definitely not unique in the
lookup table, that's precisely why I'm using EXISTS or IN, to avoid the row
multiplication generated by a JOIN.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Weird (slow) TEMP B-TREE ORDER BY

2015-03-15 Thread Dinu Marina
Yeap. And as I said, the predicted result sizes by ANALYZE are pretty 
accurate, the data is pretty linearly distributed. The unsorted plan is 
perfect, so a wrong prediction is not at fault, but the sorted plan is 
completely different. The sorted column is an aggregate anyway, so no 
index can help, but a sort in place could deal with the problem. After 
some more research it appears sqlite does not have a sort buffer 
(someone correct me if I'm wrong), but then what is the TEMP B-TREE?

On 14.03.2015 18:40, Bert Huijben wrote:
>
>> -Original Message-
>> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
>> bounces at mailinglists.sqlite.org] On Behalf Of Dinu Marina
>> Sent: donderdag 12 maart 2015 11:34
>> To: sqlite-users at mailinglists.sqlite.org
>> Subject: Re: [sqlite] Weird (slow) TEMP B-TREE ORDER BY
>>
>> Hi Richard,
>>
>> Like said, we've already tried:
>> CREATE TEMP TABLE x AS (...query without outer sort...); CREATE xind ON
>> x (price); SELECT * FROM x ORDER BY 2
>> but it also takes 30 seconds; for some really esoteric reason, CREATE
>> TABLE x AS ... takes all of those 30s, even though the execution plan
>> looks identical to the simple select (the same 72 steps). This is what I
>> would dare call "spooky", any explanation would be appreciated here too.
> Did you try running analyze after adding some data and before looking at the
> query plans?
>
> With indexes that contain multiple columns adding the sqlite_stat1 table
> (via analyze, or in another way) may introduce huge differences in query
> plans and performance.
>
> We found that out the hard way in Subversion, because we have a pretty much
> static first column wc_id in most of our indexes in preparation for some new
> features. Even when it used the right indexes in query plans, it sometimes
> didn't use all the columns that it should.
>
> We now add an initial sqlite_stat1 table and we can trust to see a stable
> (good) performance from Sqlite.
> https://www.sqlite.org/optoverview.html#manctrl
>
>
>   Bert
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Releasing a read (SHARED) lock

2015-03-13 Thread Dinu Marina
You should also check, as R.Smith and Slavin pointed, that you don't 
accidentally have a BEGIN [DEFERRED] somewhere, as that would trigger 
exactly this behavior: it will lock on the first select (the first 
step() actually)

On 13.03.2015 02:44, Simon Slavin wrote:
> On 13 Mar 2015, at 12:17am, Barry  wrote:
>
>> On 13 March 2015 at 01:21, Dinu Marina  wrote:
>>
>>> You should be also aware of a more common pitfall: unclosed result sets.
>>> Any lock is held until you read PAST the last row or you call stmt_finalize
>>> (or the equivalent abstraction in your DBAL). Always close select
>>> statements.
>> Hi Dinu,
>>
>> Am I correct in my understanding then that it is not enough to call
>> sqlite3_reset, I must call sqlite3_finalize after I am done with a
>> statement?
> In order to dispose of a statement you can do either sqlite3_reset() or 
> sqlite3_finalize().  The common mistake is to do something like this:
>
> SELECT * FROM myTable WHERE rowid = 36
>
> and then not do either _reset() or _finalize() because you know you have 
> asked for only one row so you expect SQLite to have done a _finalize() for 
> you.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Releasing a read (SHARED) lock

2015-03-13 Thread Dinu Marina
Na, it should be the same, everywhere I said finalize you can replace 
with reset; I had this problem come over and over from people forgetting 
to finalize (there was no intention to reuse the statement). But reset 
is the same. My point is just that if you don't have explicit 
transactions, a read lock is acquired by the first step() and not 
released until free() or reset() or step() returning SQLITE_MISUSE due 
to one too many calls and you should check one-rowers (COUNT is the 
usual suspect for me) first to make sure they are finalized (reset) 
properly because they are the easiest to miss, since they don't have an 
ugly loop following. Other than that, there is no reason a lock should 
be held in autocommit mode.

You could debug the reader client this way: create a second connection 
and try to create a write lock (BEGIN EXCLUSIVE) at various points. It 
will return SQLITE_BUSY when you have a leaked lock.


On 13.03.2015 02:17, Barry wrote:
> On 13 March 2015 at 01:21, Dinu Marina  wrote:
>
>> You should be also aware of a more common pitfall: unclosed result sets.
>> Any lock is held until you read PAST the last row or you call stmt_finalize
>> (or the equivalent abstraction in your DBAL). Always close select
>> statements.
>
> Hi Dinu,
>
> Am I correct in my understanding then that it is not enough to call
> sqlite3_reset, I must call sqlite3_finalize after I am done with a
> statement?
>
> Cheers,
>
>   - Barry
>
>
>> On Mar 12, 2015 11:40 AM, "R.Smith"  wrote:
>>
>>>
>>> On 2015-03-12 04:38 AM, Barry wrote:
>>>
>>>> Hello everybody,
>>>>
>>>> I have a situation where two processes are accessing the same SQLite
>>>> database. One process only reads from the database, one process reads
>> and
>>>> writes.
>>>>
>>>> These processes keep a single database connection open for the lifetime
>> of
>>>> the process.
>>>>
>>>> It seems to me that once the reader process accesses the database (after
>>>> it
>>>> performs its first SELECT statement), it maintains a lock on the
>> database
>>>> until the connection is closed (when the program is exited). This
>> prevents
>>>> the writer process from updating the database.
>>>>
>>> Hi Barry,
>>>
>>> This is the usual situation when one of your transactions in the "reading
>>> only" database does not finalize. i.e. you started a transaction there
>> and
>>> did not end it with either "END TRANSACTION" or "COMMIT" or "ROLLBACK".
>>>
>>> In the non-WAL DB it will simply hold the lock preventing changes. In the
>>> WAL DB it will hold the lock for its own view of the data but let the
>> other
>>> writer write.. however it won't see the changes for itself.
>>>
>>> This is very common actually. Just find every transaction you start
>>> (explicitly or implicitly) and make sure you end it and when you end it,
>>> see what the return value is from SQLite and whether it reports any
>> error.
>>> HTH!
>>> Ryan
>>>
>>>
>>>
>>>> I tried changing to WAL. This made the writer process able to commit its
>>>> changes, but now the reader does not see any modifications made to the
>>>> database until it is restarted (It seems to see a snapshot of the DB at
>>>> the
>>>> time of its first read).
>>>>
>>>> I am using prepared statements: On opening the DB, I create all my
>>>> prepared
>>>> statements. When I need to execute a statement, I bind to the statement,
>>>> call sqlite3_step (possibly multiple times), then call sqlite3_reset. I
>> do
>>>> not finalise the statements until the program closes.
>>>>
>>>> In order to simulate 'save' behaviour, the writer process always holds a
>>>> transaction open. When the user chooses 'save', the current transaction
>> is
>>>> committed and a new transaction is begun. (I understand that the reader
>>>> will not see any changes in the uncommitted transaction, but is not
>> seeing
>>>> any committed transactions either).
>>>>
>>>> I have checked quite thoroughly through my code and cannot find any
>>>> instances of statements executed without a sqlite3_reset quickly
>>>> following.
>>>>
>>>> Is this intended behaviour - that once a connection has performed a
>> read,
>>>> it maintains its lock on

[sqlite] Releasing a read (SHARED) lock

2015-03-12 Thread Dinu Marina
There is also a more obscure side-effect exhibited by sqlite only: if 
you issue consecutive unfinalized select statements, it never releases 
the write lock (although they should be atomic), but it does trip the 
deadlock mechanism; any writer in the wait will receive SQLITE_BUSY at 
the very moment the second read statement is issued, regardless of the 
busy_timeout value.
This happens in non-WAL mode.

The usual culprit are one-record results, like SELECT COUNT(*)... 
usually the client expects one row and reads one row, but you need to 
either try to read 2 rows or explicitly free the cursor.

Do note (for evangelism sake) that DBs with random-access cursors do not 
free the read lock even when you read past the last record, because you 
could decide to rewind. So always free result sets.


[sqlite] Releasing a read (SHARED) lock

2015-03-12 Thread Dinu Marina
You should be also aware of a more common pitfall: unclosed result sets.
Any lock is held until you read PAST the last row or you call stmt_finalize
(or the equivalent abstraction in your DBAL). Always close select
statements.
On Mar 12, 2015 11:40 AM, "R.Smith"  wrote:

>
>
> On 2015-03-12 04:38 AM, Barry wrote:
>
>> Hello everybody,
>>
>> I have a situation where two processes are accessing the same SQLite
>> database. One process only reads from the database, one process reads and
>> writes.
>>
>> These processes keep a single database connection open for the lifetime of
>> the process.
>>
>> It seems to me that once the reader process accesses the database (after
>> it
>> performs its first SELECT statement), it maintains a lock on the database
>> until the connection is closed (when the program is exited). This prevents
>> the writer process from updating the database.
>>
>
> Hi Barry,
>
> This is the usual situation when one of your transactions in the "reading
> only" database does not finalize. i.e. you started a transaction there and
> did not end it with either "END TRANSACTION" or "COMMIT" or "ROLLBACK".
>
> In the non-WAL DB it will simply hold the lock preventing changes. In the
> WAL DB it will hold the lock for its own view of the data but let the other
> writer write.. however it won't see the changes for itself.
>
> This is very common actually. Just find every transaction you start
> (explicitly or implicitly) and make sure you end it and when you end it,
> see what the return value is from SQLite and whether it reports any error.
>
> HTH!
> Ryan
>
>
>
>> I tried changing to WAL. This made the writer process able to commit its
>> changes, but now the reader does not see any modifications made to the
>> database until it is restarted (It seems to see a snapshot of the DB at
>> the
>> time of its first read).
>>
>> I am using prepared statements: On opening the DB, I create all my
>> prepared
>> statements. When I need to execute a statement, I bind to the statement,
>> call sqlite3_step (possibly multiple times), then call sqlite3_reset. I do
>> not finalise the statements until the program closes.
>>
>> In order to simulate 'save' behaviour, the writer process always holds a
>> transaction open. When the user chooses 'save', the current transaction is
>> committed and a new transaction is begun. (I understand that the reader
>> will not see any changes in the uncommitted transaction, but is not seeing
>> any committed transactions either).
>>
>> I have checked quite thoroughly through my code and cannot find any
>> instances of statements executed without a sqlite3_reset quickly
>> following.
>>
>> Is this intended behaviour - that once a connection has performed a read,
>> it maintains its lock on the database for its lifetime? Is SQLite smart
>> enough to know that the pages it holds in cache of the reader are invalid
>> after the writer has made changes to the DB on disk?
>>
>> If this is not the intended behaviour - is there a way I can find out
>> which
>> statements are causing the lock to be held open? Or can I force SQLite to
>> discard its cache?
>>
>> Any help would be appreciated.
>>
>> Regards,
>>
>> Barry Smith
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Weird (slow) TEMP B-TREE ORDER BY

2015-03-12 Thread Dinu Marina
Hi Richard,

Like said, we've already tried:
CREATE TEMP TABLE x AS (...query without outer sort...); CREATE xind ON 
x (price); SELECT * FROM x ORDER BY 2
but it also takes 30 seconds; for some really esoteric reason, CREATE 
TABLE x AS ... takes all of those 30s, even though the execution plan 
looks identical to the simple select (the same 72 steps). This is what I 
would dare call "spooky", any explanation would be appreciated here too.

Thanks


[sqlite] UPDATE Problem

2015-03-12 Thread Dinu Marina
Hi Peter,

 From https://www.sqlite.org/datatype3.html:

"When text data is inserted into a NUMERIC column, the storage class of 
the text is converted to INTEGER or REAL (in order of preference) if 
such conversion is lossless and reversible"; basically any fraction you 
insert into a NUMERIC column with at most 15 digits will be converted do 
REAL (float).

What you are looking for is DECIMAL data type, which is not supported by 
sqlite, and NUMERIC is not supposed to be a replacement, NUMERIC is only 
useful as an extension beyond the range of INTEGER or FLOAT for huge 
numbers or extreme precision, in which case it will behave as TEXT and 
it's the responsibility of the client app to convert back to some 
numeric representation.

Do note however, that if you're dealing with only positive fractions, 
you can safely use TEXT to emulate DECIMAL, as long as all strings are 
the same length and right-aligned:
" 100.50" < "999.00"

On 12.03.2015 01:27, Peter Haworth wrote:
> I have a table, Transactions, with a column , Value, of type NUMERIC.  The
> Value column is supposed to have 2 decimal places in all rows but some have
> only one.
>
> To correct this I issued the following command:
>
> UPDATE Transactions SET Value=Value||0 WHERE substr(Value,-2,1)='.'
>
> No errors on execution but nothing was changed.
>
> To help me figure out why that didn't work, I issued the following command:
>
> SELECT Value||0 FROM Transactions WHERE substr(Value,-2,1)='.'
>
> This time, all the values with only 1 decimal place were listed with a
> trailing zero added.
>
> On the face of it, the WHERE statement works fine in a SELECT statement but
> does not find any rows in an UPDATE statement.  Either that or the SET is
> not calculating the correct value.
>
> Is this a bug or am I missing something?
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Weird (slow) TEMP B-TREE ORDER BY

2015-03-11 Thread Dinu Marina
Hi guys,
Sorry to be a pain, but does anyone have an idea about this issue? I 
have to recommend switching DBs this week based on this, and I am 
reluctant to do so, but we haven't been able to figure a fix either.

Thanks,
Dinu


[sqlite] Weird (slow) TEMP B-TREE ORDER BY

2015-03-10 Thread Dinu Marina
Hi Donald,
Thanks for the answer, I'll try to address these issues:

1,4)
I was lazy and got the DDL from the GUI without looking at it. Here is 
the proper form from the master table (it shouldnt affect replicability 
though)

CREATE TABLE [materializedInstances] ([provider] TEXT, [instanceId] 
TEXT, [packageId] INTEGER, [date] TEXT, [duration] INTEGER, 
[transportType] INTEGER, PRIMARY KEY ([provider], [instanceId]))
CREATE INDEX "1ab7a7b8a24dce53a34a650587c5bfdabcc785a2" ON 
[materializedInstances] ( "packageId" , "provider" , "instanceId" , 
"date" , "duration" , "transportType")

CREATE TABLE "materializedPrices" ("id" INTEGER ,"price" FLOAT 
,"perPerson" float ,"currency" TEXT ,"promotionType" INTEGER 
,"promotionName" TEXT ,"provider" text ,"instanceId" TEXT 
,"accomodationHash" TEXT ,"accomodationName" TEXT ,"start" DATE ,"end" 
DATE ,PRIMARY KEY ("id"))
CREATE INDEX "ffd0d22435eccb1020821f3ec70d038dfc7e5d91" ON 
"materializedPrices"("provider","instanceId","start","end","accomodationHash","price")

Seems my GUI does some postprocessing on the DDL to constraint + index 
(not sure how wrong that is).

2) As this is basically a hierarchical query it would be best suited for 
CTEs, but I wouldn't go there as I'm sure optimizations aren't in place 
for these in sqlite. The problems with joins are:
 1) An optimization problem regarding WHERE and GROUP BY that 
seem to not be able to both use the same index in sqlite (I'll get back 
with that later)
 2) Since this aggregates over chunks of the same table, a JOIN 
is always of C1*C2logN complexity while iterating with nested selects 
does (C1+C2)logN

Example:

 SELECT
 MIN(a.d)+MIN(b.d)
 FROM
 t a JOIN t b ON a.y=b.y
 WHERE
 a.cond=1 AND
 b.cond=2
 GROUP BY
 a.y

Is basically squared time (looks like cubic but I think that's the 
planner bug I was talking about, need to reproduce a minimal test case) 
while

 SELECT
 (SELECT
 MIN(d)
 FROM
 t
 WHERE
 t.y=j.y AND
 t.cond=1
 )+
 (SELECT
 MIN(d)
 FROM
 t
 WHERE
 t.y=j.y AND
 t.cond=2
 )
 FROM
 (SELECT DISTINCT t.y) j

Although much uglier, gets there in index search time.

3) Yeap. And as I said, the predicted result sizes by ANALYZE are pretty 
accurate, the data is pretty linearly distributed. The unsorted plan is 
perfect, so a wrong prediction is not at fault, but the sorted plan is 
completely different.

Thanks,
Dinu

On 10.03.2015 17:17, Donald Griggs wrote:
> Hi, Dinu,
>
> I'll have to leave it to those more knowledgeable to respond to other
> aspects of your question, but I was struck by the following.
>
> 1) Regarding columns such as "provider TEXT(20)"
>
> Do you really mean that values in such columns might need to contain up to
> two giga-characters (2 * 10**9)?   Sqlite will simply interpret this as
> "TEXT" plain and simple -- but if your actual data contains
> monstrously-long values, AND these are part of primary keys, then I can
> imagine this would disastrously affect performance.
>
> 2) Complexity.
>
> I understand you said could not achieve good performance with
> straightforward joins.   I wondered if it might be helpful to post your
> cleanest, simplest JOIN-based SELECT (and indexes and query plan) that you
> thought should have good performance, yet does not.
>
>
> 3) Analyze.   You've likely already run the ANALYZE command, right?  (A
> one-time run of VACUUM wouldn't hurt either, I suppose.)
>
> 4) Regarding: "CREATE INDEX sqlite_autoindex_materializedInstances_1 ON
> materializedInstances (provider,instanceId);
>
> Am I right that since (provider,instanceId) is already PRIMARY KEY, then
> this index is redundant? (Even so, I would only expect it to take up disk
> space and slow down inserts/deletes, but not slow down SELECTs.)
>
>
> Donald
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Weird (slow) TEMP B-TREE ORDER BY

2015-03-10 Thread Dinu Marina
Forgot to add DDL for the outer table:

CREATE TABLE materializedInstances (
 provider TEXT(20),
 instanceId TEXT(20),
 packageId INTEGER,
 date TEXT(20),
 duration INTEGET,
 transportType INTEGER,
 CONSTRAINT MATERIALIZEDINSTANCES_PK PRIMARY KEY (provider,instanceId)
);

CREATE INDEX 1ab7a7b8a24dce53a34a650587c5bfdabcc785a2 ON 
materializedInstances 
(packageId,provider,instanceId,date,duration,transportType);

CREATE INDEX sqlite_autoindex_materializedInstances_1 ON 
materializedInstances (provider,instanceId);



Re: [sqlite] Problem with sqlite3_column_origin_name and AS Clause

2009-08-20 Thread Dinu Scheppelmann (DAISY)
"Igor Tandetnik" <itandet...@mvps.org> schrieb im Newsbeitrag
news:h6bg82$td...@ger.gmane.org...
> Dinu Scheppelmann (DAISY) wrote:
> > Unfortunately when I get the column names by function
> > sqlite3_column_origin_name(), the result columns have the names
> > "DokId", "Name" and "Name" again - instead of "Id", "PatientName" and
> > "Name"!!
>
> You want sqlite3_column_name
>
> Igor Tandetnik
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

Thanks for that fast help;-)

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Problem with sqlite3_column_origin_name and AS Clause

2009-08-17 Thread Dinu Scheppelmann (DAISY)

Hello to all SQLite people
(that's my first post - please be patient:-))

I already searched the mailings from 2008/2009 but could not find a post
that describes this problem.

I have a select over a few tables, and since many tables use identical names
for some columns (like "Id"), I rename the column using AS clause:

SELECT Dokument.DokId AS Id, Patient.Name AS PatientName, Einsender.Name
FROM Dokument, Patient, Einsender
WHERE Dokument.AkteId = Patient.Id
AND Dokument.SenderId = Einsender.Id;

Unfortunately when I get the column names by function
sqlite3_column_origin_name(), the result columns have the names "DokId",
"Name" and "Name" again - instead of "Id", "PatientName" and "Name"!!

For me this is severe because I pass back the whole result set of a query as
one nice "object" and the column names have to be unique! (hope you
understand what I'm saying)

Is it possible to fix this "bug" or is there a workaround? (a VIEW does not
help either)

Thanks for any help!
Dinu
==
DAISY Archivierungssysteme GmbH
Preysingstr. 2
68526 Ladenburg
www.adakta.de

Tel: +49-6203-40400
Fax: +49-6203-404020
Mail: dinu.scheppelm...@adakta.de

HRB: 432841 Amtsgericht Mannheim
Geschäftsführer: Dr. Dinu Scheppelmann
==

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users