Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/10/11 18:52, Simon Slavin wrote:
> I'll just use LIMIT 51 on my query and if I get 51 results back I'll
> know there are too many for the query to be useful.

Incidentally you don't need to add a limit on the query.  In the code that
calls sqlite3_step just track if the 51st consecutive call returns
SQLITE_ROW or SQLITE_DONE.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk6Py9QACgkQmOOfHg372QROzgCglOXXEa6lkXWZx99VCsZpg52m
4XIAoIcfYhVKlus34jq3zpw+QS9SNkV3
=Nf1K
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Simon Slavin

On 8 Oct 2011, at 2:27am, Roger Binns wrote:

> Although various people keep mentioning it I haven't seen you acknowledge
> using EXPLAIN QUERY PLAN.  It would be nice to hear if it turns out to be
> helpful especially as it doesn't require mucking with the query itself.

Yes it was a good suggestion to my original problem, but now I know LIMIT works 
properly I don't think I need it.  I'll just use LIMIT 51 on my query and if I 
get 51 results back I'll know there are too many for the query to be useful.

My original query was posted because I thought SQLite always processed all rows 
even if you used LIMIT.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/10/11 15:52, Simon Slavin wrote:
> But I was sure I had read that in SQLite the full result set was
> retrieved even if LIMIT was used.

For the outer query all LIMIT does is cause sqlite3_step to return done
after limit number of calls.  ie you could omit LIMIT and just call
finalize after you've had as many results as you want and it wouldn't make
any difference.  Similarly the first offset number of times when a row
would be returned are skipped.

Although various people keep mentioning it I haven't seen you acknowledge
using EXPLAIN QUERY PLAN.  It would be nice to hear if it turns out to be
helpful especially as it doesn't require mucking with the query itself.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk6PpwoACgkQmOOfHg372QS/7gCdEtt7iuh1fOkZDP5lw/WMKEm4
OPEAoNOH6+A88h05+YC9XAFYJeVO0X1d
=AJuc
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Nico Williams
On Fri, Oct 7, 2011 at 5:52 PM, Simon Slavin  wrote:
> Okay, I understand why defining an ORDER BY requires the entire result set to 
> be retrieved.  I had intended to remove ORDER BY when I used COUNT(*), though 
> I didn't mention that.

If the ORDER BY can be satisfied entirely by an index, then that's not true.

You can see whether this is the case from the query plan.

So you could reject queries for which you can't use OFFSET & LIMIT efficiently.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Kees Nuyt
On Fri, 7 Oct 2011 19:57:09 +0100, Simon Slavin
 wrote:

>On 7 Oct 2011, at 7:17pm, Roger Binns wrote:
>
>> On 07/10/11 09:52, Simon Slavin wrote:
>>> "Do you really want to see all 50,000 entries that that search would
>>> return ?".  If this kind of search returns more than 100 records,
>>> there's no point in doing it at all.
>> 
>> You can solve this at the user interface layer and there is no need to
>> calculate how many matching records there are.
>> 
>> Lets say you can show 50 results on the screen.  Run the query but leave
>> it open until hitting the 51st result.  Show the 50 results on the screen
>> with a More button.
>
> It's not quite the problem I have.  My problem is that there is
> a huge amount of data involved in finding the 50,000 matches
> and merely processing the SELECT is very slow.  And SQLite
> has to find all 50,000 before it can show the first screen
> of matches.
>
> Imagine you're watching _CSI_ on TV and they have a test that
> identifies possible matches to something ... car colours and
> tyres perhaps.  If there are only 50 possible matches,
> it's worth getting printouts and eliminating them one by one.
> If there are 50,000 possible matches there's no point.
>
> My problem really seems to be with the way SQLite implements
> LIMIT n.  If I understand previous posts on this list correctly,
> instead of finding only the first n records, it does the entire
> search first, then returns the first n of the results it found.

What the first n rows are is determined by the ORDER BY clause.

> With a standard implementation I would simply do a search with
> a LIMIT 51 clause.  If 51 results are returned then I know the
> search is useless.  But even with a LIMIT clause SQLite will
> still do all the fetching involved in all 50,000 records.

Read about some optimization options on: 
http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor
-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Simon Slavin

On 7 Oct 2011, at 10:27pm, Roger Binns wrote:

> On 07/10/11 11:57, Simon Slavin wrote:
>> My problem really seems to be with the way SQLite implements LIMIT n.
>> If I understand previous posts on this list correctly, instead of
>> finding only the first n records, it does the entire search first, then
>> returns the first n of the results it found.
> 
> That is completely incorrect.  Each time you call sqlite3_step it finds
> the next row that matches the query and no more.  This is why SQLite uses
> that virtual machine (VDBE) stuff.  It needs to execute some code to find
> a matching row, return it, and then allow another call to get the next
> row.  This requires some way of encapsulating the processing state of a
> query so it can be resumed from where it was left off in order to get the
> next row.
> 
> The one fly in the ointment is if the results are sorted.  For example if
> there are 10,000 result rows then there is no way to return the first 100
> without getting all result rows since doing a sort requires you have all
> the data first to sort.

Okay, I understand why defining an ORDER BY requires the entire result set to 
be retrieved.  I had intended to remove ORDER BY when I used COUNT(*), though I 
didn't mention that.

But I was sure I had read that in SQLite the full result set was retrieved even 
if LIMIT was used.  Obviously I misremembered this.  If so, very good.  In that 
case, I can see if I have too many rows returned for everything that doesn't 
use a sub-SELECT or LIMIT by ...

Replace everything up to the first "FROM" by "SELECT COUNT(*) ".
Looking for ORDER BY and removing it and anything after it.
Add my own LIMIT clause.

Fortunately, the user-interface I'm using doesn't allow the use of sub-selects 
or LIMIT.  Or UNION/INTERSECT/EXCEPT.  Or DISTINCT.  Probably some others I 
didn't think of.

Thank you to everyone who has contributed to answering my question.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/10/11 11:57, Simon Slavin wrote:
> My problem really seems to be with the way SQLite implements LIMIT n.
> If I understand previous posts on this list correctly, instead of
> finding only the first n records, it does the entire search first, then
> returns the first n of the results it found.

That is completely incorrect.  Each time you call sqlite3_step it finds
the next row that matches the query and no more.  This is why SQLite uses
that virtual machine (VDBE) stuff.  It needs to execute some code to find
a matching row, return it, and then allow another call to get the next
row.  This requires some way of encapsulating the processing state of a
query so it can be resumed from where it was left off in order to get the
next row.

The one fly in the ointment is if the results are sorted.  For example if
there are 10,000 result rows then there is no way to return the first 100
without getting all result rows since doing a sort requires you have all
the data first to sort.

But as I mentioned you can add together the numbers from EXPLAIN QUERY
PLAN which will at least give you an idea of probable order of magnitude
of the results (eg if there are tens, thousands or millions).

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk6PbtIACgkQmOOfHg372QSczQCfdEFFqE798O53wHsdPLYOkkua
9rIAnj/4E2BZLqdD1W+X1k0WfIFxq+zL
=NNCg
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Nico Williams
On Fri, Oct 7, 2011 at 1:17 PM, Roger Binns  wrote:
> On 07/10/11 09:52, Simon Slavin wrote:
>> "Do you really want to see all 50,000 entries that that search would
>> return ?".  If this kind of search returns more than 100 records,
>> there's no point in doing it at all.
>
> You can solve this at the user interface layer and there is no need to
> calculate how many matching records there are.
>
> Lets say you can show 50 results on the screen.  Run the query but leave
> it open until hitting the 51st result.  Show the 50 results on the screen
> with a More button.  Most of the time the user will realise their query is
> too broad and change it, or see the values they were looking for.  If they
> hit next you can get the next 50 results which will allow you to know if
> you should show another More button.  Heck you can even show page numbers
> to 10 like Google does and if they get clicked on keep advancing the query
> through the results.  If the user clicks on page 10 but there are only 7
> pages of results then update the user interface to show page 7 and get rid
> of 8, 9 and 10.

The problem is you need to reap state in the case of the user
abandoning this "session" without telling the application.  This
happens all the time in web apps.

If there was a way to serialize the state of a SQLite3 VM...  then the
server could encrypt it and send it to the client as a cookie, say.
But that's not possible, and rather ugly.  Better not mention that
again :/

You could add OFFSET/LIMIT to the query and add those to the "next"
URLs.  But for many queries this doesn't really help.  Another
possibility is to combine this with what you propose next so the user
doesn't feel the pain, but the server still would:

> A different approach is infinite scroll.  As the user hits the bottom of
> the currently shown results you append more results in the display.

Definitely doable with JavaScript.

For web UIs you could also limit the queries to canned ones whose
results can be cached or better estimated.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Simon Slavin

On 7 Oct 2011, at 7:17pm, Roger Binns wrote:

> On 07/10/11 09:52, Simon Slavin wrote:
>> "Do you really want to see all 50,000 entries that that search would
>> return ?".  If this kind of search returns more than 100 records,
>> there's no point in doing it at all.
> 
> You can solve this at the user interface layer and there is no need to
> calculate how many matching records there are.
> 
> Lets say you can show 50 results on the screen.  Run the query but leave
> it open until hitting the 51st result.  Show the 50 results on the screen
> with a More button.

It's not quite the problem I have.  My problem is that there is a huge amount 
of data involved in finding the 50,000 matches and merely processing the SELECT 
is very slow.  And SQLite has to find all 50,000 before it can show the first 
screen of matches.

Imagine you're watching _CSI_ on TV and they have a test that identifies 
possible matches to something ... car colours and tyres perhaps.  If there are 
only 50 possible matches, it's worth getting printouts and eliminating them one 
by one.  If there are 50,000 possible matches there's no point.

My problem really seems to be with the way SQLite implements LIMIT n.  If I 
understand previous posts on this list correctly, instead of finding only the 
first n records, it does the entire search first, then returns the first n of 
the results it found.  With a standard implementation I would simply do a 
search with a LIMIT 51 clause.  If 51 results are returned then I know the 
search is useless.  But even with a LIMIT clause SQLite will still do all the 
fetching involved in all 50,000 records.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/10/11 09:52, Simon Slavin wrote:
> "Do you really want to see all 50,000 entries that that search would
> return ?".  If this kind of search returns more than 100 records,
> there's no point in doing it at all.

You can solve this at the user interface layer and there is no need to
calculate how many matching records there are.

Lets say you can show 50 results on the screen.  Run the query but leave
it open until hitting the 51st result.  Show the 50 results on the screen
with a More button.  Most of the time the user will realise their query is
too broad and change it, or see the values they were looking for.  If they
hit next you can get the next 50 results which will allow you to know if
you should show another More button.  Heck you can even show page numbers
to 10 like Google does and if they get clicked on keep advancing the query
through the results.  If the user clicks on page 10 but there are only 7
pages of results then update the user interface to show page 7 and get rid
of 8, 9 and 10.

A different approach is infinite scroll.  As the user hits the bottom of
the currently shown results you append more results in the display.

There is a "cheating" way of giving some idea of how many results there
may be.  Use EXPLAIN QUERY PLAN and just add together the estimated number
of rows from each line returned.  Sure it won't be accurate but it will be
more accurate than a wild guess and the number will be somewhat
proportional to the amount of work necessary to process the query anyway.

Another approach is to run the query in a different thread feeding updates
to the UI.  You can call sqlite3_interrupt from the UI if you no longer
want any more results.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk6PQkgACgkQmOOfHg372QRniwCgoRmQM818lLm4m6k7BUCfjed+
tR4An1Nm8WevXNEBxJ7SulzIUYiRyEGX
=K2O/
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread David Garfield
Puneet Kishor writes:
> Especially, note Pavel's recent, wonderful explanation of how SQLite steps 
> through the result set handing you the data row by row. It *has* to go 
> through the set to know how big the set is... there is no way around it.
> 

Actually, sometimes there are.  Consider:

SELECT COUNT(*) FROM t1, t2;

SELECT (SELECT COUNT(*) FROM t1) * (SELECT COUNT(*) FROM t2);

Of course, those are special cases.  Application programmers should
probably watch for this kind of thing, but it probably wasn't what the
programmer wanted to count in the first place.

--David Garfield

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Puneet Kishor

On Oct 7, 2011, at 11:52 AM, Simon Slavin wrote:

> 
> On 7 Oct 2011, at 5:31pm, Puneet Kishor wrote:
> 
>> assuming you have some other application level language you are using to get 
>> the data, you could stuff the result set into an array and then report the 
>> highest index of the array which might be faster than doing a second query 
>> for just the count.
> 
> I'm trying to save the resources necessary for having SQLite fetch the values 
> which the SELECT would return.  Any solution mentioned so far has not done 
> that.
> 
> To see what I'm after, imagine a use interface which asked the user "Do you 
> really want to see all 50,000 entries that that search would return ?".  If 
> this kind of search returns more than 100 records, there's no point in doing 
> it at all.  I want to know the "50,000" number while placing the minimum load 
> on the system.
> 

As a person way more knowledgable than I once stated (I believe it was Igor), 
the database can't magically know there are 50,000 entries in the table. It has 
to sometime, somewhere, somehow count it. It can't just divine it.

You could track the count of each table in a counts_table that you 
increment/decrement with TRIGGERs, and then query just that table. But that 
still wouldn't solve your specific problem of "how many queries are returned by 
a specific SELECT." Somewhere, somehow you will have to count it.

Especially, note Pavel's recent, wonderful explanation of how SQLite steps 
through the result set handing you the data row by row. It *has* to go through 
the set to know how big the set is... there is no way around it.

Unless you are making an app in a device with limited resources, who cares what 
load you put on SQLite... it won't complain.

--
Puneet Kishor.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Simon Slavin

On 7 Oct 2011, at 5:31pm, Puneet Kishor wrote:

> assuming you have some other application level language you are using to get 
> the data, you could stuff the result set into an array and then report the 
> highest index of the array which might be faster than doing a second query 
> for just the count.

I'm trying to save the resources necessary for having SQLite fetch the values 
which the SELECT would return.  Any solution mentioned so far has not done that.

To see what I'm after, imagine a use interface which asked the user "Do you 
really want to see all 50,000 entries that that search would return ?".  If 
this kind of search returns more than 100 records, there's no point in doing it 
at all.  I want to know the "50,000" number while placing the minimum load on 
the system.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Puneet Kishor

On Oct 7, 2011, at 9:46 AM, Simon Slavin wrote:

> 
> On 7 Oct 2011, at 2:19pm, Igor Tandetnik wrote:
> 
>> SQLite has a non-standard extension whereby aliases assigned to expressions 
>> in the SELECT clause may be used in the WHERE and other clauses:
>> 
>> select 1+2 as alias from mytable where alias > 0;
> 
> Arg !
> 
> Okay, so I guess the form
> 
>> select count(*) from (
>> select ...
>> );
> 
> is necessary.  But that defeats the point of me doing it in the first place: 
> I was hoping that COUNT(*) would be less resource-hungry than the real query,



assuming you have some other application level language you are using to get 
the data, you could stuff the result set into an array and then report the 
highest index of the array which might be faster than doing a second query for 
just the count.


> and I suppose this form doesn't realise it doesn't need the values.
> 
> Bah.  Okay, I'll abandon that idea, then.
> 
> Thanks, Igor.  You saved me some hugely annoying support calls long after the 
> program was in the field.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Simon Slavin

On 7 Oct 2011, at 2:19pm, Igor Tandetnik wrote:

> SQLite has a non-standard extension whereby aliases assigned to expressions 
> in the SELECT clause may be used in the WHERE and other clauses:
> 
> select 1+2 as alias from mytable where alias > 0;

Arg !

Okay, so I guess the form

> select count(*) from (
>  select ...
> );

is necessary.  But that defeats the point of me doing it in the first place: I 
was hoping that COUNT(*) would be less resource-hungry than the real query, and 
I suppose this form doesn't realise it doesn't need the values.

Bah.  Okay, I'll abandon that idea, then.

Thanks, Igor.  You saved me some hugely annoying support calls long after the 
program was in the field.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread David Garfield
WHERE or HAVING clauses that refer back to named results could be a
problem with the simple replacement.

SELECT a,b,a+b AS ab FROM t WHERE ab>10

Igor's suggestion work there too.   

--David Garfield

Simon Slavin writes:
> I'm trying to write some code which has to be useful under many different 
> circumstances.  Something I want to be able to do is to take an arbitrary 
> SELECT statement and replace the columns which would normally be returned 
> with COUNT(*) to find out how many rows would be returned.  To do this I 
> replace the text between "SELECT" and "FROM".
> 
> I suspect this won't work well with unusual SELECTs which include 
> sub-selects.  That's okay.
> 
> I'm trying to get my head around whether JOINs could be a problem.
> 
> Also, does anyone know whether some combination of NULs might make COUNT(*) 
> give the wrong result ?
> 
> And anything else relevant anyone wants to mention is fine too.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Simon Slavin

On 7 Oct 2011, at 2:19pm, Igor Tandetnik wrote:

[useful stuff]

Thanks Igor.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using COUNT(*) in unusual circumstances

2011-10-07 Thread Igor Tandetnik
Simon Slavin  wrote:
> I'm trying to write some code which has to be useful under many different 
> circumstances.  Something I want to be able to do is to
> take an arbitrary SELECT statement and replace the columns which would 
> normally be returned with COUNT(*) to find out how many
> rows would be returned.  To do this I replace the text between "SELECT" and 
> "FROM".

An alternative might be to wrap the counting SELECT around the original 
statement:

select count(*) from (
  select ...
);

> I suspect this won't work well with unusual SELECTs which include sub-selects.

I don't think subselects between SELECT and FROM should be any problem. Of 
course you'd need to count parentheses and quotes and square brackets carefully 
to find the right FROM.

SQLite has a non-standard extension whereby aliases assigned to expressions in 
the SELECT clause may be used in the WHERE and other clauses:

select 1+2 as alias from mytable where alias > 0;

> I'm trying to get my head around whether JOINs could be a problem.

I don't see how.

> Also, does anyone know whether some combination of NULs might make COUNT(*) 
> give the wrong result ?

No. COUNT(*) counts the number of rows, regardless of what those rows contain.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users