Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-18 Thread RSmith


On 2014/11/17 14:48, RP McMurphy wrote:
Upon further analysis it appears that the data "shape" is different in different periods within the table. That is, some sections 
have the inverse shape to other sections. So it looked like query times would change over time but actually they are changing 
because different portions of the table are being accessed. Is it possible to tell sqlite to analyze different sections and keep 
separate a record for each? 


Ok - this one is tricky - but I bet folks on here would love a go at cracking that. Can you post a moderate size section of the DB 
which includes the problem areas on a file-share somewhere (feel free to obscure the data if it is sensitive but try to keep the 
shape in tact - to be sure, whatever you produce should still cause the problem you described). Then post the queries that produce 
the bad timings (multiple to show the relevant discrepancies) and maybe some description of what it is you really want the intended 
result to look like (in case the queries produce somewhat different results to what you need).


Sounds like an interesting problem.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-18 Thread RSmith


On 2014/11/17 14:48, RP McMurphy wrote:
PS: Some administravia; Does anyone know of a way to reduce the posting delay for this list? Is it always like this? Or is it some 
problem with the gmane site in general?


Join the mailing list directly (you can unsubscribe once you got what you were 
looking for) at:

http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Not sure why gmane is slow or /if/ indeed gmane is the real culprit, but the list usually distributes posts within a minute (for 
non-digest users of course).



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-17 Thread Simon Slavin

On 17 Nov 2014, at 12:48pm, RP McMurphy  wrote:

> Upon further analysis it appears that the data "shape" is different in 
> different periods within the table. That is, some sections have the inverse 
> shape to other sections. So it looked like query times would change over time 
> but actually they are changing because different portions of the table are 
> being accessed.

It would be difficult to spot that.  Nice analysis.

> Is it possible to tell sqlite to analyze different sections and keep separate 
> a record for each?

No, but the amount by which such a wrong choice should increase your execution 
time should be extremely slow.  Are you actually getting annoyingly long 
execution times or are you just trying to wring every last millisecond out of 
your application ?  If the latter, then I feel you should stop worrying about 
this and move onto something else.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-17 Thread RP McMurphy
>When you say the data changes, does the shape actually change? Because all that
>matters to analyze is the shape of the data, not the data itself.

Upon further analysis it appears that the data "shape" is different in 
different periods within the table. That is, some sections have the inverse 
shape to other sections. So it looked like query times would change over time 
but actually they are changing because different portions of the table are 
being accessed. Is it possible to tell sqlite to analyze different sections and 
keep separate a record for each?

>Unlike me, some other people on here are extremely knowledgeable on SQL and 
>specifically the inner
>workings of SQLite - Tell us the full story, we might save you hundreds of 
>hours.

I will try to take advantage of that. Let me see what I am able to say about 
our system and hopefully come up with some concise and pertinent questions.

RP

PS: Some administravia; Does anyone know of a way to reduce the posting delay 
for this list? Is it always like this? Or is it some problem with the gmane 
site in general?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-16 Thread Hick Gunter
I would agree with the suspicion that your data is "changing shape" i.e. the 
cardinality of index fields is becoming very different from what ANALYZE stored.

As for bypassing the query planner/code generator you might want to contact 
Prakash Premkumar  who is apparently quite determined 
to go this route.

I think judicious use of the available hints (i.e. CROSS JOIN and USING) will 
get you further quicker. We almost exclusively use virtual tables here (for 
which ANALYZE is useless) and find CROSS JOIN to be our best friend in fixing 
queries that xBestIndex return values cannot coax into performing well.

-Ursprüngliche Nachricht-
Von: RP McMurphy [mailto:rpm0...@yahoo.com]
Gesendet: Freitag, 14. November 2014 15:32
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

I am resending this message below from 3 days ago because it never made it to 
the list.

RP

PS: Messages seem to take a long time to go through the gmane system, at least 
half a day and sometimes more in my experience so far.


On Tue, 11/11/14, RP McMurphy  wrote:

 Subject: Re: Discrepancy with indexing and WHERE clause with AND/OR
 To: sqlite-users@sqlite.org
 Date: Tuesday, November 11, 2014, 6:31 PM

 > If you can provide
 any examples where ANALYZE makes a query slower, I  suspect the developer team 
would like  > to see them.
 >

 After we run analyze and then
 let the process run for a while the DB
 contents change - and it can change quite  considerably depending  upon what 
is  happening. I suspect that the analyze data gets stale, but  I  don't know 
how to track such things in  sqlite. Anyhow we can't keep running  analyze 
every few minutes because it takes a  long time to run with our DB  and it 
appears  to block all other actions until it is done.

 A this point we are
 considering writing VDBE code directly and bypassing the  parser. Has anyone 
else done this? Is it going  to be a huge ugly  can-of-worms if we do  that?

 RP




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-15 Thread RSmith


On 2014/11/14 16:32, RP McMurphy wrote:


On Tue, 11/11/14, RP McMurphy  wrote:

  > If you can provide
  any examples where ANALYZE makes a query slower, I
  suspect the developer team would like
  > to see them.
  >
  
  After we run analyze and then

  let the process run for a while the DB
  contents change - and it can change quite
  considerably depending
  upon what is
  happening. I suspect that the analyze data gets stale, but
  I
  don't know how to track such things in
  sqlite. Anyhow we can't keep running
  analyze every few minutes because it takes a
  long time to run with our DB
  and it appears
  to block all other actions until it is done.


Firstly, that is a considerably different problem to your initial claim that ANALYZE makes queries slower. That said, it is an 
understandable concern.  When you say the data changes, does the shape actually change? Because all that matters to analyze is the 
shape of the data, not the data itself.


As an example, consider the following three small tables:

   A:
   ID, No, Name,   Value, XT,  AX
   1,  1,  John,   34,Panthera Leo,2014-11-12 09:44:26
   2,  2,  Chris,  88,Panthera Pardis, 2014-11-13 04:17:43
   3,  3,  Martin, 20, etc...

   B:
   ID, No, Name,  Value, XT,   AX
   1,  1,  John,  34,,
   2,  X1, John,  ,  Panthera Leo,
   3,  A1, John,  ,  , 2014-11-12 09:44:26
   4,  2,  Chris, 88,,
   5,  X2, Chris, ,  Panthera Pardis,
   6,  A2, Chris, ,  , 2014-11-13 04:17:43
   etc.

   C:
   ID,  No, Name, Value, XT,  AX
   106, 62, Andrew,   18,Loxodonta Africana,  -00-00 00:00:00
   107, 677,James,12,Podocarpus Falcatus, 2014-11-09 14:27:12
   108, 176612, Flemming, 40,Tectonia Grandis,2014-11-14 03:31:55
   etc.



From table A to B the data did not change significantly, but the shape of it changed significantly in respect of cardinality, value 
types, fill rate etc.  From A to C the data is completely different, but the shape is exactly the same.


Running Analyze for table A which shape-shifts into B over time (which is highly unlikely) will certainly see the values become 
"stale". Running Analyze for A which eventually changes into C (very likely) will have minimal effect on the outcome of the Analyze 
data and hence on the outcome of the Query Planner decisions.


To re-iterate the question posed earlier - If you can show that the QP makes a worse decision after analyzing a table - OR - that 
the decision degrades over time for a table of which the shape does not change effectively - I am sure the case would be considered 
as undesirable and the devs would probably be interested in doing something about it.


Other options include forcing the use of an Index or Indices (see "USING" in 
the SQLite SQL docs) or using cross-joins and the like.



  A this point we are
  considering writing VDBE code directly and bypassing the
  parser. Has anyone else done this? Is it going
  to be a huge ugly
  can-of-worms if we do
  that?


Not just a can of worms - an insane one, and what is worse, you will immediately lose any benefit of periodic updates and all the 
other advances in technology so frequently visited upon SQLite because your version is no longer interchangeable with the new ones. 
Further to that, this speaks of the willingness to take 3 steps in the wrong direction to avoid one step in the right direction - 
maybe some past experience influence or some other unknowns which are not immediately obvious now is influencing this thought? (I 
only ask because this is not uncommon).


If you do have a table of which the shape keeps changing to the point it needs re-analyzing every few minutes, then you are working 
with the worst DB in history and should really make your own native storage solution - but I am 90% certain this is not the case and 
you are simply overlooking something or not doing something efficiently - which we will be glad to help with if you post schemata 
and maybe link an example DB or such.


Unlike me, some other people on here are extremely knowledgeable on SQL and specifically the inner workings of SQLite - Tell us the 
full story, we might save you hundreds of hours.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-15 Thread Simon Slavin

On 14 Nov 2014, at 2:32pm, RP McMurphy  wrote:

> After we run analyze and then
> let the process run for a while the DB
> contents change - and it can change quite
> considerably depending
> upon what is
> happening
> 
> I suspect that the analyze data gets stale, but
> I don't know how to track such things in
> sqlite. Anyhow we can't keep running
> analyze every few minutes because it takes a
> long time to run with our DB
> and it appears
> to block all other actions until it is done.

Okay.  ANALYZE isn't used for specific information about every piece of data.  
It's used to evaluate the 'chunkiness' of each column and index.  For instance 
one column may only ever have two values in it -- "yes" or "no" -- whereas 
another column may have a different value in it for each row.

So unless the character of your data changes, if you run ANALYZE once when you 
have realistic data, you shouldn't need to run it again even if your tables 
grow to 100 times their original size.  You don't need to run ANALYZE under 
normal operation.  Perhaps once a year in a yearly maintenance run or 
something.  Or if you create a new index so SQLite can see how useful that 
index will be for certain operations.

As others have shown you using EXPLAIN QUERY PLAN, SQLite is itself capable of 
doing the expansion you describe to get an efficient query.  I would definitely 
not recommend you try beating the SQLite internals which have been developed 
and improved for years.  It's going to be more effective to see if you can spot 
some point at which your query starts executing slowly.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-15 Thread RP McMurphy
I am resending this message below from 3 days ago because it never made it to 
the list.

RP

PS: Messages seem to take a long time to go through the gmane system, at least 
half a day and sometimes more in my experience so far.


On Tue, 11/11/14, RP McMurphy  wrote:

 Subject: Re: Discrepancy with indexing and WHERE clause with AND/OR
 To: sqlite-users@sqlite.org
 Date: Tuesday, November 11, 2014, 6:31 PM
 
 > If you can provide
 any examples where ANALYZE makes a query slower, I
 suspect the developer team would like
 > to see them.
 > 
 
 After we run analyze and then
 let the process run for a while the DB
 contents change - and it can change quite
 considerably depending
 upon what is
 happening. I suspect that the analyze data gets stale, but
 I
 don't know how to track such things in
 sqlite. Anyhow we can't keep running
 analyze every few minutes because it takes a
 long time to run with our DB
 and it appears
 to block all other actions until it is done.
 
 A this point we are
 considering writing VDBE code directly and bypassing the
 parser. Has anyone else done this? Is it going
 to be a huge ugly
 can-of-worms if we do
 that?
 
 RP
 
  
 
 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-11 Thread Simon Slavin

On 10 Nov 2014, at 10:55pm, RP McMurphy  wrote:

>> 
> 
> Okay, for my simplified example analyze does improve the times. But for our 
> application this does not help, and it also harmed a couple of other queries.

If you can provide any examples where ANALYZE makes a query slower, I suspect 
the developer team would like to see them.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-11 Thread RP McMurphy

> 

Okay, for my simplified example analyze does improve the times. But for our 
application this does not help, and it also harmed a couple of other queries. 
I'll have to see if I can make a better example schema showing the problem 
because I can't upload the entire 900MB database.

RP


On Sun, 11/9/14, RP McMurphy  wrote:

 Subject: Discrepancy with indexing and WHERE clause with AND/OR
 To: sqlite-users@sqlite.org
 Date: Sunday, November 9, 2014, 2:35 AM
 
 If there is a large table and we need
 to select a subset of values using a WHERE clause with an
 AND/OR construct sqlite has trouble finding the answer in a
 reasonable time. Breaking the queries down into separate
 SELECT statements speeds up the process exponentially.
 
 For example the following takes a few seconds to return the
 answer 1334. Note that the index (w) is a "low quality"
 index with the arguments in the wrong order. The reason for
 this is explained further down:
 
     with recursive cnt(x) as (select 1 union
 all select x+1 from cnt limit 1000)
     insert into v select x % 3,x from cnt;
 
     create index w on v(z,y);
     select count(*) from v
 where    z = 0 and
            
     (    y between 100 and
 1001000 
            
     or    y between 200
 and 2001000
            
     or    y between 300
 and 3001000
            
     or    y between 400
 and 4001000);
 
 The reason the the "low quality" index is because this data
 is also accessed in a different manner. Namely like this:
 
     select count(*) from v group by z;
 
 Both of the above queries each take about 3 seconds to run.
 I don't think the second query can be made faster, but the
 first query can certainly be much faster even with the "low
 quality" index. Thus:
 
     select 
     (select count(*) from v where z = 0 and y
 between 100 and 1001000) +
     (select count(*) from v where z = 0 and y
 between 200 and 2001000) +
     (select count(*) from v where z = 0 and y
 between 300 and 3001000) +
     (select count(*) from v where z = 0 and y
 between 400 and 4001000);
 
 Now the query returns the result 1334 almost immediately.
 The only difference is that the WHERE clause has been
 manually flattened and broken into separate SELECT
 portions.
 
 When we change the index to "high quality" (u) and put the
 arguments in the other order.
 
     drop index w;
     create index u on v(y,z);
 
 And rerun the query:
 
     select count(*) from v
 where    z = 0 and
            
     (    y between 100 and
 1001000 
            
     or    y between 200
 and 2001000
            
     or    y between 300
 and 3001000
            
     or    y between 400
 and 4001000);
 
 The answer 1334 is returned almost immediately. And
 flattening this query gives no advantage. But now the second
 type of query runs terribly slow because the index is very
 poor for this type of query:
 
     select count(*) from v group by z;
 
 And takes more then 30 seconds to finish.
 
 Trying to make two indices in the hope that sqlite will find
 the optimal one by itself gives back result times the same
 as if only the w index is present:
 
     create index w on v(z,y);
     create index u on v(y,z);
 
 Both queries now take a few seconds to run. It is as if the
 u index does not exist.
 
 In our application we have only created the w index (since
 the u index is awful for the grouping query) and manually
 generate the WHERE/AND/OR flattened queries in a loop. This
 solution is unsatisfying to us and I think we must be doing
 something wrong. Is there a way we can make the w index work
 with both queries and not have to run external loops to
 flatten all the WHERE clauses?
 
 RP
 
 PS: Below is the text in one unit that can be copied and
 pasted into a shell session running sqlite3.exe:
 
 create table times(idx,j);
 create table v(z,y);
 
 with recursive cnt(x) as (select 1 union all select x+1 from
 cnt limit 1000)
 insert into v select x % 3,x from cnt;
 
 create index w on v(z,y);
 select 'With index (z,y)';
 insert into times select 0,julianday('now');
 select 
 (select count(*) from v where z = 0 and y between 100
 and 1001000) +
 (select count(*) from v where z = 0 and y between 200
 and 2001000) +
 (select count(*) from v where z = 0 and y between 300
 and 3001000) +
 (select count(*) from v where z = 0 and y between 400
 and 4001000);
 insert into times select 1,julianday('now');
 select strftime('%f',(select j from times where
 idx=1)-(select j from times where idx=0));
 select count(*) from v where    z = 0 and
            
 (    y between 100 and 1001000 
            
 or    y between 200 and 2001000
            
 or    y between 300 and 3001000
            
 or    y between 400 and 4001000);
 insert into times select 2,julianday('now');
 select strftime('%f',(select j from times where
 idx=2)-(select j f

Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-10 Thread Simon Davies
On 10 November 2014 16:03, Don V Nielsen  wrote:
> Isn't this the result of the results cache?  The two queries are identical.
>

The query plan changes...

.
.
.
sqlite>  explain query plan select count(*) from v wherez = 0 and
   ...>   (   y between 100 and 1001000
   ...>   or  y between 200 and 2001000
   ...>   or  y between 300 and 3001000
   ...>   or  y between 400 and 4001000);
0|0|0|SEARCH TABLE v USING COVERING INDEX w (z=?)
sqlite>
sqlite>
sqlite> analyze;
sqlite>
sqlite>
sqlite>  explain query plan select count(*) from v wherez = 0 and
   ...>   (   y between 100 and 1001000
   ...>   or  y between 200 and 2001000
   ...>   or  y between 300 and 3001000
   ...>   or  y between 400 and 4001000);
0|0|0|SEARCH TABLE v USING COVERING INDEX w (z=? AND y>? AND y? AND y? AND y? AND y

Regards,
Simon
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-10 Thread Don V Nielsen
Isn't this the result of the results cache?  The two queries are identical.

On Mon, Nov 10, 2014 at 9:26 AM, Clemens Ladisch  wrote:

> RP McMurphy wrote:
> > Is there a way we can make the w index work with both queries and not
> > have to run external loops to flatten all the WHERE clauses?
>
> 
>
> sqlite> .timer on
> sqlite> select count(*) from v wherez = 0 and
>...> (   y between 100 and 1001000
>...> or  y between 200 and 2001000
>...> or  y between 300 and 3001000
>...> or  y between 400 and 4001000);
> 1334
> Run Time: real 1.100 user 1.092007 sys 0.00
> sqlite> analyze;
> sqlite> select count(*) from v wherez = 0 and
>...> (   y between 100 and 1001000
>...> or  y between 200 and 2001000
>...> or  y between 300 and 3001000
>...> or  y between 400 and 4001000);
> 1334
> Run Time: real 0.002 user 0.00 sys 0.00
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

2014-11-10 Thread Clemens Ladisch
RP McMurphy wrote:
> Is there a way we can make the w index work with both queries and not
> have to run external loops to flatten all the WHERE clauses?



sqlite> .timer on
sqlite> select count(*) from v wherez = 0 and
   ...> (   y between 100 and 1001000
   ...> or  y between 200 and 2001000
   ...> or  y between 300 and 3001000
   ...> or  y between 400 and 4001000);
1334
Run Time: real 1.100 user 1.092007 sys 0.00
sqlite> analyze;
sqlite> select count(*) from v wherez = 0 and
   ...> (   y between 100 and 1001000
   ...> or  y between 200 and 2001000
   ...> or  y between 300 and 3001000
   ...> or  y between 400 and 4001000);
1334
Run Time: real 0.002 user 0.00 sys 0.00


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users