[sqlite] Is there a limit for the number of items in an IN clause?

2015-03-21 Thread Simon Slavin

On 21 Mar 2015, at 9:19pm, Simon Slavin  wrote:

> Can someone point me to that discussion somewhere that doesn't need a 
> password ?

Sorry, I see that it is still going on, entitled "Query times vary between 0.2 
s and 30 s for very".  The logic looks very weird.

Simon.


[sqlite] Is there a limit for the number of items in an IN clause?

2015-03-21 Thread Simon Slavin

On 21 Mar 2015, at 6:46pm, Mario M. Westphal  wrote:

> In a recent question 
> (http://mailinglists.sqlite.org/cgi-bin/mailman/private/sqlite-users/2015-March/058668.html)
>  I found out that joining with a single-column temporary table with 500 rows 
> is sometimes several hundred times (!) slower than using an IN clause.

Can someone point me to that discussion somewhere that doesn't need a password ?

> So far my code switched to using a temporary table and a JOIN instead of an 
> IN clause when the IN clause would contain more than 500 elements (numbers). 
> I would like to use larger IN clauses if that?s possible to avoid using 
> temporary tables, but I could not find a limit for how many elements I can 
> use in IN().

If it's anywhere, it's here:



I don't see it, so I suspect you're limited only by maximum statement length.  
However, if your statement is very long, I wonder if your first paragraph will 
still hold.

Simon.


[sqlite] Is there a limit for the number of items in an IN clause?

2015-03-21 Thread Mario M. Westphal
In a recent question 
(http://mailinglists.sqlite.org/cgi-bin/mailman/private/sqlite-users/2015-March/058668.html)
 I found out that joining with a single-column temporary table with 500 rows is 
sometimes several hundred times (!) slower than using an IN clause.



So far my code switched to using a temporary table and a JOIN instead of an IN 
clause when the IN clause would contain more than 500 elements (numbers). I 
would like to use larger IN clauses if that?s possible to avoid using temporary 
tables, but I could not find a limit for how many elements I can use in IN().



[sqlite] Query times vary between 0.2 s and 30 s for very

2015-03-21 Thread Mario M. Westphal
I don?t think I can always run an analyze on the TEMP table for each query. May 
ruin performance worse than trying my luck with the temp table.



I think this boils down why a JOIN with a 500 row table (one column) is so much 
(several hundred times) slower than using an IN clause with 500 values. 

I have not changed the code or query for a long time, so my assumption is that 
a change in one of the recent SQLite updates caused the changed behavior of the 
optimizer/query engine. I recall a similar problem maybe a year ago. I think 
only one of the SQLite developers may shed some light on this.



For now I have increased the threshold for IN clauses (instead of TEMP tables) 
and use WHERE IN (SELECT ? from TEMP) instead of a JOIN.







[sqlite] Query times vary between 0.2 s and 30 s for very

2015-03-21 Thread Keith Medcalf
>I don?t think I can always run an analyze on the TEMP table for each
>query. May ruin performance worse than trying my luck with the temp
>table.

>I think this boils down why a JOIN with a 500 row table (one column) is
>so much (several hundred times) slower than using an IN clause with 500
>values.

>I have not changed the code or query for a long time, so my assumption is
>that a change in one of the recent SQLite updates caused the changed
>behavior of the optimizer/query engine. I recall a similar problem maybe
>a year ago. I think only one of the SQLite developers may shed some light
>on this.

>For now I have increased the threshold for IN clauses (instead of TEMP
>tables) and use WHERE IN (SELECT ? from TEMP) instead of a JOIN.

Of course, when you use IN (...) the optimizer KNOWS that the RHS is unique -- 
whether you are using IN (...list of values...) or IN (SELECT x FROM y).

However, if you are using a temp table and using a JOIN, the optimizer only 
knows that the RHS join predicate is unique if you tell it that is the case.  
So for example, if you use:

create table temp_table (x integer);
insert into temp_table values (1),(2),(3) ...

then this query:

select ... from b join temp_table on b.col = temp_table.x;

is significantly different from:

select ... from b where col in (select x from temp_table);

but the following are equivalent to the later, and to using the list of values 
directly in the IN clause:

create table temp_table (x integer not null primary key);
insert into temp_table values (1),(2),(3) ...

select ... from b join temp_table where b.col = temp_table.x;

which is equivalent to:

select ... from b where exists (select 1 from temp_table where x = b.col);

and

select ... from b where col in (select x from temp_table);

and

select ... from b where col in temp_table;

I don't recall ever seeing your definition of your temp_table ...

---
Theory is when you know everything but nothing works.  Practice is when 
everything works but no one knows why.  Sometimes theory and practice are 
combined:  nothing works and no one knows why.







[sqlite] Query times vary between 0.2 s and 30 s for very

2015-03-21 Thread Scott Robison
On Mar 21, 2015 1:43 PM, "James K. Lowden"  wrote:
>
> The query optimizer has to be sophisticated enough to recognize
> those conditions, which is unlikely in the case of a temporary table.

Are temporary tables really that different? Other than being dropped
automatically at the end of a session, I was under the impression they were
just another table. Proper indexing would be key to performance of course,
as you indicated.

Also, is there a limit on the size of in? By which I mean, there is a limit
on the size of the query text, but if you use "colx in (select coly from
taby)" the table size should not have an arbitrary limit. As long as it is
indexed properly, it should be performant.

SDR


[sqlite] Query times vary between 0.2 s and 30 s for very

2015-03-21 Thread James K. Lowden
On Sat, 21 Mar 2015 19:01:16 +0100
"Mario M. Westphal"  wrote:

> For now I have increased the threshold for IN clauses (instead of
> TEMP tables) and use WHERE IN (SELECT ? from TEMP) instead of a JOIN.

Because the two are conceptually different, it's not surprising they
run differently.  IN is an existence test: does the value appear in the
set?  JOIN is a product: produce every matching combination.  

For some queries, they reduce to the same thing.  For example, 

FROM A JOIN B ON A.b = B.b

is the same as IN if B.b 

1.  is unique, and 
2.  no other columns from B are used.  

The query optimizer has to be sophisticated enough to recognize
those conditions, which is unlikely in the case of a temporary table.  

There is also the question of linear vs binary searches.  When you
supply a list of constants to IN, most if not all DBMSs search the list
sequentially.  When IN (or EXISTS) is supplied from an indexed column,
the search is often binary.  For a small number of elements, there's no
distinction.  For 1000 elements, it's 2 orders of magnitude: 1000 hops
versus 10.  

--jkl


[sqlite] Query times vary between 0.2 s and 30 s for very

2015-03-21 Thread Scott Robison
On Mar 21, 2015 1:43 PM, "James K. Lowden"  wrote:
>
> The query optimizer has to be sophisticated enough to recognize
> those conditions, which is unlikely in the case of a temporary table.

Are temporary tables really that different? Other than being dropped
automatically at the end of a session, I was under the impression they were
just another table. Proper indexing would be key to performance of course,
as you indicated.

Also, is there a limit on the size of in? By which I mean, there is a limit
on the size of the query text, but if you use "colx in (select coly from
taby)" the table size should not have an arbitrary limit. As long as it is
indexed properly, it should be performant.

SDR


[sqlite] PRAGMA index_list() SQLException

2015-03-21 Thread da...@dandymadeproductions.com
> On 2015-03-21 06:27 AM, danap at dandymadeproductions.com wrote:
>> Hello,
>>
>> PRAGMA index_list(table-name)
>>
>> In Sqlite's past history has PRAGMA index_list(table-name) returned a
>> SQLException indicating no ResultSet or a non-empty ResultSet that may
>> throw an exception on accessing the fields in the normal way?
>
> Nope, not in the past and not currently. (There are very many test
> suites that check for this and has to be passed before any release date).
>
> That specific pragma very often returns an empty result-set, not the
> usual case with other queries, so that might be worth testing your code
> for. Other than that it has much the same internal process than any
> other query.
>
> If you post some code, some people here are bound to use the same
> platform / wrapper / interface and may have gotten the same kind of
> thing before.
>
> Good luck,
> Ryan

Thank you for the response Ryan,

The issues I was having has been addressed by the latest code that I
found over at github, which does do the empty result-set test on PRAGMA
index_list() before returning results, getIndexInfo().

The project is for sqlite-jdbc that appears to have moved from bitbucket
to github. The latest bitbucket JAR file has issues and recommend anyone
using sqlite with Java make note to use the github code.

https://github.com/xerial/sqlite-jdbc

Dana Proctor
MyJSQLView Project Manager.



[sqlite] Query times vary between 0.2 s and 30 s for very similar queries - how to trick the optimizer?

2015-03-21 Thread Eduardo Morras
On Wed, 18 Mar 2015 14:43:26 +0100
"Mario M. Westphal"  wrote:

> I?m using 3.8.8.1 on Windows via the ?C? interface.
> 
> I work with SQLite for several years with good success. And I know
> that no optimizer ever will be perfect. But I?ve just stumbled upon a
> case where a very similar query requires between 0.2 seconds and a
> whopping 30 seconds.
> 
> I?ve simplified things as much as possible and included the create
> instructions and queries below.

<...snip...>

> The question is: When JOINing large tables with a temporary table,
> how to ensure that the optimizer can work optimal? Running ANALYZE
> with a temporary table probably does not work, and ANALYZE takes
> about 1 minute on this database so this is not feasible for each
> query.
> 
> I'm glad to have found an apparently working solution (IN instead of
> JOIN) but I wonder if this could be somehow automated by the
> optimizer? Or maybe this is a worst-case for the optimizer?
> 

You can run ANALYZE on any table, try:

ANALYZE temp_table;
SELECT 

You can also test-stress IN and know where is the limit. I 
think/suppouse/suspect that in this case there is no winning for using a 
temporal table intstead an IN, it should be faster when the temp table has more 
columns used in the where clause or additional join restriction.

---   ---
Eduardo Morras 


[sqlite] PRAGMA index_list() SQLException

2015-03-21 Thread R.Smith


On 2015-03-21 06:27 AM, danap at dandymadeproductions.com wrote:
> Hello,
>
> PRAGMA index_list(table-name)
>
> In Sqlite's past history has PRAGMA index_list(table-name) returned a
> SQLException indicating no ResultSet or a non-empty ResultSet that may
> throw an exception on accessing the fields in the normal way?

Nope, not in the past and not currently. (There are very many test 
suites that check for this and has to be passed before any release date).

That specific pragma very often returns an empty result-set, not the 
usual case with other queries, so that might be worth testing your code 
for. Other than that it has much the same internal process than any 
other query.

If you post some code, some people here are bound to use the same 
platform / wrapper / interface and may have gotten the same kind of 
thing before.

Good luck,
Ryan



[sqlite] PRAGMA index_list() SQLException

2015-03-21 Thread da...@dandymadeproductions.com
Hello,

PRAGMA index_list(table-name)

In Sqlite's past history has PRAGMA index_list(table-name) returned a
SQLException indicating no ResultSet or a non-empty ResultSet that may
throw an exception on accessing the fields in the normal way?

{rs.getInt(1), rs.getString(2), rs.getInt(3)}

danap.