Re: [sqlite] without rowid and secondary indexes

2018-04-10 Thread Olivier Mascia
> Le 10 avr. 2018 à 18:25, Clemens Ladisch  a écrit :
> 
> Olivier Mascia wrote:
>> Would you mean:
>> 
>>  select NAME,count(NAME) from SHEETS where SHEET>? group by NAME having NAME 
>> is not NULL;
>> 
>> is more logically appropriate?
> 
> Probably.  The first query can give quite random results:
> 
>  delete from SHEETS;
>  insert into SHEETS(NODE, SHEET, NAME) values(1, 1, 'a');
>  insert into SHEETS(NODE, SHEET, NAME) values(1, 3, 'a');
>  select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is 
> not NULL;
>  a|2
>  select NAME,count(NAME) from SHEETS where SHEET>2 group by NAME having NAME 
> is not NULL;
>  a|1
>  delete from SHEETS;
>  insert into SHEETS(NODE, SHEET, NAME) values(1, 3, 'a');
>  insert into SHEETS(NODE, SHEET, NAME) values(2, 1, 'a');
>  select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is 
> not NULL;
>  select NAME,count(NAME) from SHEETS where SHEET>2 group by NAME having NAME 
> is not NULL;
>  a|1
> 
>> Or that I might even go for:
>> 
>>  select NAME,count(NAME) from SHEETS where SHEET>? and NAME is not NULL 
>> group by NAME;
> 
> This always has the same result as the second query.  (But "NAME is not NULL" 
> in the
> WHERE clause applies to the table column, while in the HAVING clause, it 
> applies to
> the result of the grouping.  So the last query makes it more likely that 
> SQLite can
> prove that it can use the index.)

Thanks a lot: this proved wonderfully educative, especially the oddities 
arising from:

> select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is 
> not NULL;

-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] without rowid and secondary indexes

2018-04-10 Thread Clemens Ladisch
Olivier Mascia wrote:
>> Le 10 avr. 2018 à 13:20, Clemens Ladisch  a écrit :
>>> select NAME,count(NAME) from SHEETS group by NAME having SHEET>? and NAME 
>>> is not NULL;
>>
>> The SHEET value is from some random row in the group, and the NAME value is
>> the same for all rows in the group.  Are you sure you do not want to use 
>> WHERE
>> instead of HAVING?
>
> Would you mean:
>
>   select NAME,count(NAME) from SHEETS where SHEET>? group by NAME having NAME 
> is not NULL;
>
> is more logically appropriate?

Probably.  The first query can give quite random results:

  delete from SHEETS;
  insert into SHEETS(NODE, SHEET, NAME) values(1, 1, 'a');
  insert into SHEETS(NODE, SHEET, NAME) values(1, 3, 'a');
  select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is 
not NULL;
  a|2
  select NAME,count(NAME) from SHEETS where SHEET>2 group by NAME having NAME 
is not NULL;
  a|1
  delete from SHEETS;
  insert into SHEETS(NODE, SHEET, NAME) values(1, 3, 'a');
  insert into SHEETS(NODE, SHEET, NAME) values(2, 1, 'a');
  select NAME,count(NAME) from SHEETS group by NAME having SHEET>2 and NAME is 
not NULL;
  select NAME,count(NAME) from SHEETS where SHEET>2 group by NAME having NAME 
is not NULL;
  a|1

> Or that I might even go for:
>
>   select NAME,count(NAME) from SHEETS where SHEET>? and NAME is not NULL 
> group by NAME;

This always has the same result as the second query.  (But "NAME is not NULL" 
in the
WHERE clause applies to the table column, while in the HAVING clause, it 
applies to
the result of the grouping.  So the last query makes it more likely that SQLite 
can
prove that it can use the index.)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] without rowid and secondary indexes

2018-04-10 Thread Olivier Mascia
> Le 10 avr. 2018 à 13:20, Clemens Ladisch  a écrit :
> 
>> select NAME,count(NAME) from SHEETS group by NAME having SHEET>? and NAME is 
>> not NULL;
> 
> The SHEET value is from some random row in the group, and the NAME value is
> the same for all rows in the group.  Are you sure you do not want to use WHERE
> instead of HAVING?

Thanks for the hint, Clemens.
Would you mean:

  select NAME,count(NAME) from SHEETS where SHEET>? group by NAME having NAME 
is not NULL;

is more logically appropriate? I'm inclined to think yes and that was what you 
meant.

Or that I might even go for:

  select NAME,count(NAME) from SHEETS where SHEET>? and NAME is not NULL group 
by NAME;

On a test table with statistically representative values (only ~95'000 rows), I 
get the same plan in all three cases:

  --EQP-- 0,0,0,SEARCH TABLE SHEETS USING COVERING INDEX SHEETS_NAME (NAME>?)


-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] without rowid and secondary indexes

2018-04-10 Thread Clemens Ladisch
Olivier Mascia wrote:
> [...]
> So I should probably even better go with:
>
>   create index ... where NAME is not NULL;
>
> as my only queries involving NAME have a 'where NAME is not NULL' restriction.

To be sure, check with EXPLAIN QUERY PLAN whether the index is actually used.

> select NAME,count(NAME) from SHEETS group by NAME having SHEET>? and NAME is 
> not NULL;

The SHEET value is from some random row in the group, and the NAME value is
the same for all rows in the group.  Are you sure you do not want to use WHERE
instead of HAVING?


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] without rowid and secondary indexes

2018-04-10 Thread Olivier Mascia
Thanks!

> In your example below, your PRIMARY KEY consists of two integer
> instead of just a single rowid integer.  So it will take up slightly
> more space.  Not a lot, but some.
> 
> Will that make a difference in your application?  I do not know. 

Then, I will indeed run some more tests, but I guess I'm OK with adding the 
index while keeping the table without rowid because few rows will actually have 
a NAME value, most (like in more than 95% of them) will be NULL. What's more, 
most queries are PK centric.

So I should probably even better go with:

  create index if not exists SHEETS_NAME on SHEETS(NAME) where NAME is not NULL;

as my only queries involving NAME have a 'where NAME is not NULL' restriction. 
The purpose of the index being only to optimize such occasional dictionary 
building queries:

select NAME,count(NAME) from SHEETS group by NAME having SHEET>? and NAME is 
not NULL;

-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


> Le 10 avr. 2018 à 12:36, Richard Hipp  a écrit :
> 
> Every index is composed of the columns being indexed followed by the
> primary key of the table.  In a regular ROWID table, the "primary key"
> is the rowid - an integer.  In a WITHOUT ROWID table, the primary key
> is whatever you declare the PRIMARY KEY to be.
> 
> So whether or not there is more overhead in an index on a WITHOUT
> ROWID table versus a normal rowid table depends on your choice of the
> PRIMARY KEY.
> 
> In your example below, your PRIMARY KEY consists of two integer
> instead of just a single rowid integer.  So it will take up slightly
> more space.  Not a lot, but some.
> 
> Will that make a difference in your application?  I do not know.  I
> suggest you try it and see.  This is one of the great beauties of SQL
> (any SQL not just SQLite).  You can make a simple schema change like
> this and try it out, without having to modify your application -
> indeed without having to even recompile your application.  Simply run
> the same build of your application using two different versions of the
> database - one with an ordinary rowid table and the other with a
> WITHOUT ROWID table - and see which one gives you better performance -
> for whatever definition of "performance" is important to you.
> 
> On 4/10/18, Olivier Mascia  wrote:
>> Dear all,
>> 
>> With https://www.sqlite.org/withoutrowid.html in mind, assuming something
>> simple as:
>> 
>>  create table if not exists SHEETS(
>>  NODE integer not null,
>>  SHEET integer not null,
>>  NAME text,   -- often NULL, sometimes short text value
>> (keyword)
>>  ..., -- 2 or 3 other small properties
>>  primary key(NODE,SHEET)
>>  ) without rowid;
>>  -- makes sense because small row and natural primary key is not integer
>> 
>> Now what if an index on NAME becomes valuable?
>> Will such a secondary index have a higher storage cost or performance cost
>> than with an ordinary table?
>> 
>>  create index if not exists SHEETS_NAME on SHEETS(NAME);
>> 
>> --
>> Best Regards, Meilleures salutations, Met vriendelijke groeten,
>> Olivier Mascia


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] without rowid and secondary indexes

2018-04-10 Thread Richard Hipp
Every index is composed of the columns being indexed followed by the
primary key of the table.  In a regular ROWID table, the "primary key"
is the rowid - an integer.  In a WITHOUT ROWID table, the primary key
is whatever you declare the PRIMARY KEY to be.

So whether or not there is more overhead in an index on a WITHOUT
ROWID table versus a normal rowid table depends on your choice of the
PRIMARY KEY.

In your example below, your PRIMARY KEY consists of two integer
instead of just a single rowid integer.  So it will take up slightly
more space.  Not a lot, but some.

Will that make a difference in your application?  I do not know.  I
suggest you try it and see.  This is one of the great beauties of SQL
(any SQL not just SQLite).  You can make a simple schema change like
this and try it out, without having to modify your application -
indeed without having to even recompile your application.  Simply run
the same build of your application using two different versions of the
database - one with an ordinary rowid table and the other with a
WITHOUT ROWID table - and see which one gives you better performance -
for whatever definition of "performance" is important to you.

On 4/10/18, Olivier Mascia  wrote:
> Dear all,
>
> With https://www.sqlite.org/withoutrowid.html in mind, assuming something
> simple as:
>
>   create table if not exists SHEETS(
>   NODE integer not null,
>   SHEET integer not null,
>   NAME text,   -- often NULL, sometimes short text value
> (keyword)
>   ..., -- 2 or 3 other small properties
>   primary key(NODE,SHEET)
>   ) without rowid;
>   -- makes sense because small row and natural primary key is not integer
>
> Now what if an index on NAME becomes valuable?
> Will such a secondary index have a higher storage cost or performance cost
> than with an ordinary table?
>
>   create index if not exists SHEETS_NAME on SHEETS(NAME);
>
> --
> Best Regards, Meilleures salutations, Met vriendelijke groeten,
> Olivier Mascia
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users