Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Richard Hipp
On Fri, Jun 7, 2013 at 11:51 AM, Ryan Johnson
wrote:

> On 06/06/2013 10:52 AM, Gabriel Corneanu wrote:
>
>> In my opinion, count(*) is the same as count(rowid) (I see that even
>> count() is accepted); I could say it's even the same as count(x) (any
>> other
>> field).
>>
> Not quite... count(x) only counts rows having non-NULL x. Granted, that's
> not a problem for rowid/pk (which are not allowed to be NULL), but it
> matters a lot in the general case.
>

PRIMARY KEYs (except for INTEGER PRIMARY KEYs) are allowed to be NULL in
SQLite.  This goes back to a bug in the code from many years ago.  By the
time the bug was discovered, SQLite was already in wide-spread use and so
the decision was made to not fix the bug since doing so would cause
compatibility problems.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Ryan Johnson

On 06/06/2013 10:52 AM, Gabriel Corneanu wrote:

In my opinion, count(*) is the same as count(rowid) (I see that even
count() is accepted); I could say it's even the same as count(x) (any other
field).
Not quite... count(x) only counts rows having non-NULL x. Granted, 
that's not a problem for rowid/pk (which are not allowed to be NULL), 
but it matters a lot in the general case.


(but that doesn't explain the problem you're seeing)

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Eduardo Morras
On Fri, 07 Jun 2013 13:12:14 +0200
Clemens Ladisch  wrote:

> Eduardo Morras wrote:
> > where t.a = NULL
> 
>   where t.a IS NULL
> 
> (NULL compares as not equal to any value, including itself.)

OPppss you're right. Thought too fast and wrote even faster :(

> 
> Regards,
> Clemens

---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Clemens Ladisch
Eduardo Morras wrote:
> where t.a = NULL

  where t.a IS NULL

(NULL compares as not equal to any value, including itself.)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Eduardo Morras
On Thu, 6 Jun 2013 10:53:55 -0400
Richard Hipp  wrote:

> On Thu, Jun 6, 2013 at 10:52 AM, Gabriel Corneanu  > wrote:
> 
> > Strange is, count(*) uses the cover index for a but "select count(a)" does
> > NOT use the same cover index...
> >
> 
> count(a) has to check for NULL values of a, which are not counted.
> count(*) does not.

If I understand well , select count(a) from t = (select count(*) from t) - 
(select count(*) from t where t.a = NULL) and both selects will use cover 
indexs, doesn't it? 


> -- 
> D. Richard Hipp
> d...@sqlite.org

---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Gabriel Corneanu

This is not at all my case ...

I don't obviously write 1 by 1, but using blocks of data ( array of 
struct ), virtual tables wrappers, and "insert ... select".
This way I can achieve >200k rec/s, or at least 100k when having some 
more fields.
Right now I'm completely CPU bound, it's 100% load at high rate. IO is 
almost out of question, at <10MB /s; and I use 8k page size and of 
course synchronous off, wal mode...
Another type of data (less fields but with a blob inside 2-32kB) easily 
reaches ~40MB/s but only a few thousands rec/s.
The performance drops abruptly when having more fields (I don't remember 
the magic threshold); it seems most of the load is needed for field 
coding ? I use only integers for space optimization (varint); this is 
also good as I have high dynamic range.


Multi-core sure helps to have enough CPU power for the rest (hardware 
connection, pre-processing, etc).


I would definitely like to be able to get more performance, but I can 
live with the current numbers. One can use some high-end CPUs if really 
wants such high rates (the hardware around costs ~100x more :) ).
BTW I asked a few times already, is it possible to get/compile a windows 
dll for sqlite4 (just for evaluation)?

Last time I checked, it didn't compile on windows at all.

Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Simon Slavin

On 7 Jun 2013, at 8:57am, Gabriel Corneanu  wrote:

> BTW I found this by opening some file over network, which of course made 
> everything worse.
> [...]
> Not that I really need, but I have to support specified data rates up to 100k 
> records / second.

Maximum speed of a SQLite database is usually limited by speed of rotating hard 
disk.  If you do the maths on rotational latency you'll find there's no way to 
get 100k disk accesses per second.  SSD improves on this.

Do you have to support 100k records/second over network ?  If so,

what networking (Ethernet ?  WiFi ?) are you using
what networking file system are you using, what
what kind of mass storage device is your database file stored on ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Gabriel Corneanu
I admit I didn't think (or didn't even read in detail) about technical 
implementation.


This is an extract from analyzer:
*** Table AE_DATA 

Percentage of total database..  99.89%
Number of entries. 1030371
Bytes of storage consumed. 67846144
Bytes of payload.. 6118671990.2%
Average payload per entry. 59.38
Average unused bytes per entry 0.34
Average fanout 752.00
Fragmentation.   0.35%
Maximum payload per entry. 65
Entries that use overflow. 00.0%
Index pages used.. 11
Primary pages used 8271
Overflow pages used... 0
Total pages used.. 8282
Unused bytes on index pages... 15678   17.4%
Unused bytes on primary pages. 337429   0.50%
Unused bytes on overflow pages 0
Unused bytes on all pages. 353107   0.52%

So I understand that the 11 index pages are pure btree pages, but the 
leaves are actually in the ~8000 data pages.
And it probably needs to visit (i.e. load) all data pages to count the 
leaves...
Even if there would be some counter in the header of each page, it still 
needs to load the pages which is bad for IO...


BTW I found this by opening some file over network, which of course made 
everything worse.
For my case (file format) the data is append (write) only, so max(rowid) 
works equally good.
As a note, I actually HAVE the record count stored somewhere else but I 
had this query in a generic copy routine which was also used for some 
other small tables.
I agree it's some kind of corner case, usually tables have some kind of 
indices. But in this case I need high speed, indices would bring 
performance down.
Not that I really need, but I have to support specified data rates up to 
100k records / second.

And I only access the data sequentially by rowid.

Just for the sake of discussion: I imagine some hacks to the btree to 
optimize this special case.
The btree nodes could store the number of leaves just for the data pages 
(e.g. 0: unknown, >0 valid number); it would need to propagate up the 
info just until it reaches a parent in an index page. And it needs to 
update this info only when a node changes from leaf to having a child.


Thanks for all your time,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-06 Thread Igor Tandetnik

On 6/6/2013 11:26 AM, Gabriel Corneanu wrote:

Again sorry for count(a), I wrote too fast. I understand of course about
null values.

Otherwise by rowid I mean the autogenerated primary key. In my actual
case, I have a field as alias.

CREATE TABLE t(id integer primary key, a);
explain query plan select count(*) from t -> scan table

create index ia on t(id);
explain query plan select count(*) from t -> SCAN TABLE t USING COVERING
INDEX...

1. It means, the primary key is not as good as a cover index??


Again - there was *no* separate index until you explicitly created one. 
Another way to look at it is that the table as a whole *is* in fact a 
covering index for itself, ordered by rowid and covering all the fields 
in the table. In this view, "scan table" is just a shorthand for "scan 
table using covering index which is the table itself".


If you need this count real fast for some reason, then create a separate 
table, with one column and one row, that would store the count. Then 
create INSERT and DELETE triggers that would maintain the count.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-06 Thread Richard Hipp
On Thu, Jun 6, 2013 at 11:26 AM, Gabriel Corneanu  wrote:

>
> 2. Is there NO WAY to quickly get the row count WITHOUT full scan if I
> only have the auto primary key??
>

The b-tree structures in the SQLite file format do not store the row count,
as that slows down writes (since the row count would have to be updated
with each insert or delete).

If you want fast access to a row count, store it in a separate table and
keep it up-to-date using triggers.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-06 Thread Clemens Ladisch
Gabriel Corneanu wrote:
> I was surprised to find that simple query "select count(*) from table" took
> too much time, and found that it does NOT use the primary key index??

In SQLite, indexes are stored as B-trees, ordered by the indexed columns.

Tables are _also_ stored as B-trees, ordered by the rowid.

This means that if the primary key is the rowid, the table _is_ the
index corresponding to the primary key.  (There is no separate index
structure in this case.)

> If I use CREATE TABLE t(a unique), then it uses the auto-generated cover
> index.

Because any index has exactly the same number of entries as its table,
but is likely to occupy fewer pages.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-06 Thread Gabriel Corneanu
Again sorry for count(a), I wrote too fast. I understand of course about 
null values.


Otherwise by rowid I mean the autogenerated primary key. In my actual 
case, I have a field as alias.


CREATE TABLE t(id integer primary key, a);
explain query plan select count(*) from t -> scan table

create index ia on t(id);
explain query plan select count(*) from t -> SCAN TABLE t USING COVERING 
INDEX...


1. It means, the primary key is not as good as a cover index??

2. Is there NO WAY to quickly get the row count WITHOUT full scan if I 
only have the auto primary key??


Thanks,
Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-06 Thread Gabriel Corneanu
OK I understand, then it remains the question why it does not use the 
primary key??


Thanks,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-06 Thread Jay A. Kreibich
On Thu, Jun 06, 2013 at 04:52:12PM +0200, Gabriel Corneanu scratched on the 
wall:
> I was surprised to find that simple query "select count(*) from table" took
> too much time, and found that it does NOT use the primary key index??
> e.g.
> CREATE TABLE t(a);
> explain query plan select count(*) from t
> 
> I get : SCAN TABLE t (~100 rows)
> 
> If I use CREATE TABLE t(a unique), then it uses the auto-generated cover
> index.
> Even if I write
> select count(rowid) from t
> it still uses scan table...
> However I would expect that it should also use the primary key for
> counting, or not??

  What PK?  Rowid is not a PK unless you define it as such.  The table
  itself is stored in rowid order, so the "index" for rowid is the
  table itself-- there is no "other" index for rowid.

> In my opinion, count(*) is the same as count(rowid) (I see that even
> count() is accepted); I could say it's even the same as count(x) (any other
> field).

  That is not true.  The SQLite docs are quite clear:

http://www.sqlite.org/lang_aggfunc.html#count

count(X)
count(*) 

The count(X) function returns a count of the number of times that X
is not NULL in a group. The count(*) function (with no arguments)
returns the total number of rows in the group. 

  If you provide an actual column name, count() only counts non-NULL
  rows.  The two versions of the function are equivalent if "X"
  prohibits NULL entries (such as the rowid column), but not in the
  general case.

  This is not SQLite specific... this is standard SQL.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-06 Thread Richard Hipp
On Thu, Jun 6, 2013 at 10:52 AM, Gabriel Corneanu  wrote:

> Strange is, count(*) uses the cover index for a but "select count(a)" does
> NOT use the same cover index...
>

count(a) has to check for NULL values of a, which are not counted.
count(*) does not.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] select count(*) does not use primary key??

2013-06-06 Thread Gabriel Corneanu
I was surprised to find that simple query "select count(*) from table" took
too much time, and found that it does NOT use the primary key index??
e.g.
CREATE TABLE t(a);
explain query plan select count(*) from t

I get : SCAN TABLE t (~100 rows)

If I use CREATE TABLE t(a unique), then it uses the auto-generated cover
index.
Even if I write
select count(rowid) from t
it still uses scan table...
However I would expect that it should also use the primary key for
counting, or not??

In my opinion, count(*) is the same as count(rowid) (I see that even
count() is accepted); I could say it's even the same as count(x) (any other
field).
Strange is, count(*) uses the cover index for a but "select count(a)" does
NOT use the same cover index...

Am I making any mistake here??
Thanks,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users