Re: [sqlite] select count(*) does not use primary key??
On Fri, Jun 7, 2013 at 11:51 AM, Ryan Johnsonwrote: > On 06/06/2013 10:52 AM, Gabriel Corneanu wrote: > >> In my opinion, count(*) is the same as count(rowid) (I see that even >> count() is accepted); I could say it's even the same as count(x) (any >> other >> field). >> > Not quite... count(x) only counts rows having non-NULL x. Granted, that's > not a problem for rowid/pk (which are not allowed to be NULL), but it > matters a lot in the general case. > PRIMARY KEYs (except for INTEGER PRIMARY KEYs) are allowed to be NULL in SQLite. This goes back to a bug in the code from many years ago. By the time the bug was discovered, SQLite was already in wide-spread use and so the decision was made to not fix the bug since doing so would cause compatibility problems. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
On 06/06/2013 10:52 AM, Gabriel Corneanu wrote: In my opinion, count(*) is the same as count(rowid) (I see that even count() is accepted); I could say it's even the same as count(x) (any other field). Not quite... count(x) only counts rows having non-NULL x. Granted, that's not a problem for rowid/pk (which are not allowed to be NULL), but it matters a lot in the general case. (but that doesn't explain the problem you're seeing) Ryan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
On Fri, 07 Jun 2013 13:12:14 +0200 Clemens Ladischwrote: > Eduardo Morras wrote: > > where t.a = NULL > > where t.a IS NULL > > (NULL compares as not equal to any value, including itself.) OPppss you're right. Thought too fast and wrote even faster :( > > Regards, > Clemens --- --- Eduardo Morras ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
Eduardo Morras wrote: > where t.a = NULL where t.a IS NULL (NULL compares as not equal to any value, including itself.) Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
On Thu, 6 Jun 2013 10:53:55 -0400 Richard Hippwrote: > On Thu, Jun 6, 2013 at 10:52 AM, Gabriel Corneanu > wrote: > > > Strange is, count(*) uses the cover index for a but "select count(a)" does > > NOT use the same cover index... > > > > count(a) has to check for NULL values of a, which are not counted. > count(*) does not. If I understand well , select count(a) from t = (select count(*) from t) - (select count(*) from t where t.a = NULL) and both selects will use cover indexs, doesn't it? > -- > D. Richard Hipp > d...@sqlite.org --- --- Eduardo Morras ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
This is not at all my case ... I don't obviously write 1 by 1, but using blocks of data ( array of struct ), virtual tables wrappers, and "insert ... select". This way I can achieve >200k rec/s, or at least 100k when having some more fields. Right now I'm completely CPU bound, it's 100% load at high rate. IO is almost out of question, at <10MB /s; and I use 8k page size and of course synchronous off, wal mode... Another type of data (less fields but with a blob inside 2-32kB) easily reaches ~40MB/s but only a few thousands rec/s. The performance drops abruptly when having more fields (I don't remember the magic threshold); it seems most of the load is needed for field coding ? I use only integers for space optimization (varint); this is also good as I have high dynamic range. Multi-core sure helps to have enough CPU power for the rest (hardware connection, pre-processing, etc). I would definitely like to be able to get more performance, but I can live with the current numbers. One can use some high-end CPUs if really wants such high rates (the hardware around costs ~100x more :) ). BTW I asked a few times already, is it possible to get/compile a windows dll for sqlite4 (just for evaluation)? Last time I checked, it didn't compile on windows at all. Gabriel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
On 7 Jun 2013, at 8:57am, Gabriel Corneanuwrote: > BTW I found this by opening some file over network, which of course made > everything worse. > [...] > Not that I really need, but I have to support specified data rates up to 100k > records / second. Maximum speed of a SQLite database is usually limited by speed of rotating hard disk. If you do the maths on rotational latency you'll find there's no way to get 100k disk accesses per second. SSD improves on this. Do you have to support 100k records/second over network ? If so, what networking (Ethernet ? WiFi ?) are you using what networking file system are you using, what what kind of mass storage device is your database file stored on ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
I admit I didn't think (or didn't even read in detail) about technical implementation. This is an extract from analyzer: *** Table AE_DATA Percentage of total database.. 99.89% Number of entries. 1030371 Bytes of storage consumed. 67846144 Bytes of payload.. 6118671990.2% Average payload per entry. 59.38 Average unused bytes per entry 0.34 Average fanout 752.00 Fragmentation. 0.35% Maximum payload per entry. 65 Entries that use overflow. 00.0% Index pages used.. 11 Primary pages used 8271 Overflow pages used... 0 Total pages used.. 8282 Unused bytes on index pages... 15678 17.4% Unused bytes on primary pages. 337429 0.50% Unused bytes on overflow pages 0 Unused bytes on all pages. 353107 0.52% So I understand that the 11 index pages are pure btree pages, but the leaves are actually in the ~8000 data pages. And it probably needs to visit (i.e. load) all data pages to count the leaves... Even if there would be some counter in the header of each page, it still needs to load the pages which is bad for IO... BTW I found this by opening some file over network, which of course made everything worse. For my case (file format) the data is append (write) only, so max(rowid) works equally good. As a note, I actually HAVE the record count stored somewhere else but I had this query in a generic copy routine which was also used for some other small tables. I agree it's some kind of corner case, usually tables have some kind of indices. But in this case I need high speed, indices would bring performance down. Not that I really need, but I have to support specified data rates up to 100k records / second. And I only access the data sequentially by rowid. Just for the sake of discussion: I imagine some hacks to the btree to optimize this special case. The btree nodes could store the number of leaves just for the data pages (e.g. 0: unknown, >0 valid number); it would need to propagate up the info just until it reaches a parent in an index page. And it needs to update this info only when a node changes from leaf to having a child. Thanks for all your time, Gabriel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
On 6/6/2013 11:26 AM, Gabriel Corneanu wrote: Again sorry for count(a), I wrote too fast. I understand of course about null values. Otherwise by rowid I mean the autogenerated primary key. In my actual case, I have a field as alias. CREATE TABLE t(id integer primary key, a); explain query plan select count(*) from t -> scan table create index ia on t(id); explain query plan select count(*) from t -> SCAN TABLE t USING COVERING INDEX... 1. It means, the primary key is not as good as a cover index?? Again - there was *no* separate index until you explicitly created one. Another way to look at it is that the table as a whole *is* in fact a covering index for itself, ordered by rowid and covering all the fields in the table. In this view, "scan table" is just a shorthand for "scan table using covering index which is the table itself". If you need this count real fast for some reason, then create a separate table, with one column and one row, that would store the count. Then create INSERT and DELETE triggers that would maintain the count. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
On Thu, Jun 6, 2013 at 11:26 AM, Gabriel Corneanuwrote: > > 2. Is there NO WAY to quickly get the row count WITHOUT full scan if I > only have the auto primary key?? > The b-tree structures in the SQLite file format do not store the row count, as that slows down writes (since the row count would have to be updated with each insert or delete). If you want fast access to a row count, store it in a separate table and keep it up-to-date using triggers. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
Gabriel Corneanu wrote: > I was surprised to find that simple query "select count(*) from table" took > too much time, and found that it does NOT use the primary key index?? In SQLite, indexes are stored as B-trees, ordered by the indexed columns. Tables are _also_ stored as B-trees, ordered by the rowid. This means that if the primary key is the rowid, the table _is_ the index corresponding to the primary key. (There is no separate index structure in this case.) > If I use CREATE TABLE t(a unique), then it uses the auto-generated cover > index. Because any index has exactly the same number of entries as its table, but is likely to occupy fewer pages. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
Again sorry for count(a), I wrote too fast. I understand of course about null values. Otherwise by rowid I mean the autogenerated primary key. In my actual case, I have a field as alias. CREATE TABLE t(id integer primary key, a); explain query plan select count(*) from t -> scan table create index ia on t(id); explain query plan select count(*) from t -> SCAN TABLE t USING COVERING INDEX... 1. It means, the primary key is not as good as a cover index?? 2. Is there NO WAY to quickly get the row count WITHOUT full scan if I only have the auto primary key?? Thanks, Gabriel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
OK I understand, then it remains the question why it does not use the primary key?? Thanks, Gabriel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
On Thu, Jun 06, 2013 at 04:52:12PM +0200, Gabriel Corneanu scratched on the wall: > I was surprised to find that simple query "select count(*) from table" took > too much time, and found that it does NOT use the primary key index?? > e.g. > CREATE TABLE t(a); > explain query plan select count(*) from t > > I get : SCAN TABLE t (~100 rows) > > If I use CREATE TABLE t(a unique), then it uses the auto-generated cover > index. > Even if I write > select count(rowid) from t > it still uses scan table... > However I would expect that it should also use the primary key for > counting, or not?? What PK? Rowid is not a PK unless you define it as such. The table itself is stored in rowid order, so the "index" for rowid is the table itself-- there is no "other" index for rowid. > In my opinion, count(*) is the same as count(rowid) (I see that even > count() is accepted); I could say it's even the same as count(x) (any other > field). That is not true. The SQLite docs are quite clear: http://www.sqlite.org/lang_aggfunc.html#count count(X) count(*) The count(X) function returns a count of the number of times that X is not NULL in a group. The count(*) function (with no arguments) returns the total number of rows in the group. If you provide an actual column name, count() only counts non-NULL rows. The two versions of the function are equivalent if "X" prohibits NULL entries (such as the rowid column), but not in the general case. This is not SQLite specific... this is standard SQL. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select count(*) does not use primary key??
On Thu, Jun 6, 2013 at 10:52 AM, Gabriel Corneanuwrote: > Strange is, count(*) uses the cover index for a but "select count(a)" does > NOT use the same cover index... > count(a) has to check for NULL values of a, which are not counted. count(*) does not. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] select count(*) does not use primary key??
I was surprised to find that simple query "select count(*) from table" took too much time, and found that it does NOT use the primary key index?? e.g. CREATE TABLE t(a); explain query plan select count(*) from t I get : SCAN TABLE t (~100 rows) If I use CREATE TABLE t(a unique), then it uses the auto-generated cover index. Even if I write select count(rowid) from t it still uses scan table... However I would expect that it should also use the primary key for counting, or not?? In my opinion, count(*) is the same as count(rowid) (I see that even count() is accepted); I could say it's even the same as count(x) (any other field). Strange is, count(*) uses the cover index for a but "select count(a)" does NOT use the same cover index... Am I making any mistake here?? Thanks, Gabriel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users