Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-03 Thread Hick Gunter
The correlated subquery is fine if your SQL must be portable, but will return n 
rows if the a value is duplicated (n times) within a group. The bare field 
select only works in SQLite and is probably faster, with the caveat that the 
row the bare field values are taken from is "random". Since you seem to 
guarantee unique a within each b/c group, both will return exactly one row for 
every b/c group (with the c and d values "randomly selected" from a set of 1 
rows, in case of the bare field select).

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Hamish Allan
Gesendet: Montag, 03. April 2017 16:24
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

Thanks, Hick. I now understand that it's undefined which value is selected from 
the bare column.

To (hopefully) clarify my purpose: I'm adding a row (with incrementing
"a") each time "d" is changed for a given combination of "b" and "c".
But the way I want to query is: for each "c" with a given "b", find me the most 
recent "d" (in my example, the "b" I gave was 1).

R Smith's approach does the trick nicely!

Thanks,
Hamish






On 3 April 2017 at 10:48, Hick Gunter <h...@scigames.at> wrote:
> I am not sure I correctly understand what you want.
>
> This is the value of d associated with a randomly chosen record from
> the group of records having the highest value of a tha also fulfills
> b == 1;
>
> SELECT MAX(a),d FROM x WHERE b=1;
>
> If you want the value of d within each group of records sharing a
> value of c, try
>
> SELECT MAX(a),c,d FROM x WHERE b=1 GROUP BY c ORDER BY 1 DESC, 2 ASC;
>
> See also http://sqlite.org/lang_select.html
>
> "Side note: Bare columns in an aggregate queries. The usual case is that all 
> column names in an aggregate query are either arguments to aggregate 
> functions or else appear in the GROUP BY clause. A result column which 
> contains a column name that is not within an aggregate function and that does 
> not appear in the GROUP BY clause (if one exists) is called a "bare" column. 
> Example:
>
> SELECT a, b, sum(c) FROM tab1 GROUP BY a;
>
>
> In the query above, the "a" column is part of the GROUP BY clause and so each 
> row of the output contains one of the distinct values for "a". The "c" column 
> is contained within the sum() aggregate function and so that output column is 
> the sum of all "c" values in rows that have the same value for "a". But what 
> is the result of the bare column "b"? The answer is that the "b" result will 
> be the value for "b" in one of the input rows that form the aggregate. The 
> problem is that you usually do not know which input row is used to compute 
> "b", and so in many cases the value for "b" is undefined.
>
> Special processing occurs occurs when the aggregate function is either min() 
> or max(). Example:
>
> SELECT a, b, max(c) FROM tab1 GROUP BY a;
>
> When the min() or max() aggregate functions are used in an aggregate query, 
> all bare columns in the result set take values from the input row which also 
> contains the minimum or maximum. So in the query above, the value of the "b" 
> column in the output will be the value of the "b" column in the input row 
> that has the largest "c" value. There is still an ambiguity if two or more of 
> the input rows have the same minimum or maximum value or if the query 
> contains more than one min() and/or max() aggregate function. Only the 
> built-in min() and max() functions work this way."
>
> -Ursprüngliche Nachricht-
> Von: sqlite-users
> [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von
> Hamish Allan
> Gesendet: Montag, 03. April 2017 10:51
> An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
> Betreff: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>
> Ah. My purpose is to determine "d for the most recent c with b=1", with "most 
> recent" being "largest a".
>
> My query had been working for this, but I'd only been adding monotonically 
> increasing values for "a" in my tests. I just tried doing otherwise and found 
> that I had been relying on an implementation detail.
>
> Is there any way I can perform this collation at query time, or do I need to 
> do it programmatically later on?
>
> Thanks,
> Hamish
>
>
>
> ___
>  Gunter Hick
> Software Engineer
> Scientif

Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-03 Thread Hamish Allan
Thanks, Keith. I think I need the sub-select because I don't just want
the (d) for the max(a) for the grouping of (c), but for the
combination of the grouping (c) and the given value (b).


On 3 April 2017 at 14:32, Keith Medcalf <kmedc...@dessus.com> wrote:
>   select max(a), c, d
> from x
>where b == 1
> group by c
> order by 1 desc;
>
> index on (b, c, a)
>
> So, you are grouping by c where b=1, and within each grouping returning the 
> grouping (c), the maximum value of (a) for that grouping, and (d) from one of 
> perhaps multiple rows that contain the max(a) within that group.
>
> Or perhaps:
>
> select max(a), c, d
>   from x
>  where b == 1;
>
> Which finds the max(a) where b==1, and returns that value and the value of c 
> and d from one of the rows containing that maximum (1 row returned).
>
> index on (b, a)
>
> The former returns all groups of c with the top one being the one row 
> returned by the latter.
>
>> -Original Message-
>> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
>> On Behalf Of Hamish Allan
>> Sent: Monday, 3 April, 2017 02:51
>> To: SQLite mailing list
>> Subject: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>>
>> Ah. My purpose is to determine "d for the most recent c with b=1",
>> with "most recent" being "largest a".
>>
>> My query had been working for this, but I'd only been adding
>> monotonically increasing values for "a" in my tests. I just tried
>> doing otherwise and found that I had been relying on an implementation
>> detail.
>>
>> Is there any way I can perform this collation at query time, or do I
>> need to do it programmatically later on?
>>
>> Thanks,
>> Hamish
>>
>>
>>
>>
>>
>> On 3 April 2017 at 00:41, Keith Medcalf <kmedc...@dessus.com> wrote:
>> >
>> > What is the purpose of the ORDER BY?
>> >
>> > The value of a used for the order by is from some random row in the
>> grouping of c.  Are there relationships between a, b, c, d that you have
>> not documented nor told us about?
>> >
>> > In any case, your most efficient index is on (b, c).
>> >
>> > The order by is useless.
>> >
>> >> -Original Message-
>> >> From: sqlite-users [mailto:sqlite-users-
>> boun...@mailinglists.sqlite.org]
>> >> On Behalf Of Hamish Allan
>> >> Sent: Sunday, 2 April, 2017 17:28
>> >> To: sqlite-users@mailinglists.sqlite.org
>> >> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>> >>
>> >> Given a table:
>> >>
>> >> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>> >>
>> >> the query:
>> >>
>> >> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>> >>
>> >> shows the following plan, without indexes:
>> >>
>> >> 0|0|0|SCAN TABLE x
>> >> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>> >>
>> >> I can create an index to cover the WHERE clause:
>> >>
>> >> CREATE INDEX b_index ON x (b);
>> >>
>> >> which gives the plan:
>> >>
>> >> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
>> >> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>> >>
>> >> or I can create an index to cover the GROUP BY clause:
>> >>
>> >> DROP INDEX b_index;
>> >> CREATE INDEX c_index ON x (c);
>> >>
>> >> which gives the plan:
>> >>
>> >> 0|0|0|SCAN TABLE x USING INDEX c_index
>> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>> >>
>> >> but I don't seem to be able to create a combined index to cover both
>> >> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
>> >>
>> >> Am I missing something, or is this just not possible?
>> >>
>> >> If it's not possible, which is the more efficient of those indexes  --
>> >> or is there a third way, using an index for the ORDER BY, which would
>> >> be more efficient still?
>> >>
>> >> Thanks,
>> >> Hamish
>> >> ___
>> >> sqlite-users mailing list
>> >> sqlite-users@mailinglists.sqlite.org
>> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> >
>> >
>> > ___
>> > sqlite-users mailing list
>> > sqlite-users@mailinglists.sqlite.org
>> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-03 Thread Hamish Allan
Thanks, Hick. I now understand that it's undefined which value is
selected from the bare column.

To (hopefully) clarify my purpose: I'm adding a row (with incrementing
"a") each time "d" is changed for a given combination of "b" and "c".
But the way I want to query is: for each "c" with a given "b", find me
the most recent "d" (in my example, the "b" I gave was 1).

R Smith's approach does the trick nicely!

Thanks,
Hamish






On 3 April 2017 at 10:48, Hick Gunter <h...@scigames.at> wrote:
> I am not sure I correctly understand what you want.
>
> This is the value of d associated with a randomly chosen record from the 
> group of records having the highest value of a tha also fulfills  b == 1;
>
> SELECT MAX(a),d FROM x WHERE b=1;
>
> If you want the value of d within each group of records sharing a value of c, 
> try
>
> SELECT MAX(a),c,d FROM x WHERE b=1 GROUP BY c ORDER BY 1 DESC, 2 ASC;
>
> See also http://sqlite.org/lang_select.html
>
> "Side note: Bare columns in an aggregate queries. The usual case is that all 
> column names in an aggregate query are either arguments to aggregate 
> functions or else appear in the GROUP BY clause. A result column which 
> contains a column name that is not within an aggregate function and that does 
> not appear in the GROUP BY clause (if one exists) is called a "bare" column. 
> Example:
>
> SELECT a, b, sum(c) FROM tab1 GROUP BY a;
>
>
> In the query above, the "a" column is part of the GROUP BY clause and so each 
> row of the output contains one of the distinct values for "a". The "c" column 
> is contained within the sum() aggregate function and so that output column is 
> the sum of all "c" values in rows that have the same value for "a". But what 
> is the result of the bare column "b"? The answer is that the "b" result will 
> be the value for "b" in one of the input rows that form the aggregate. The 
> problem is that you usually do not know which input row is used to compute 
> "b", and so in many cases the value for "b" is undefined.
>
> Special processing occurs occurs when the aggregate function is either min() 
> or max(). Example:
>
> SELECT a, b, max(c) FROM tab1 GROUP BY a;
>
> When the min() or max() aggregate functions are used in an aggregate query, 
> all bare columns in the result set take values from the input row which also 
> contains the minimum or maximum. So in the query above, the value of the "b" 
> column in the output will be the value of the "b" column in the input row 
> that has the largest "c" value. There is still an ambiguity if two or more of 
> the input rows have the same minimum or maximum value or if the query 
> contains more than one min() and/or max() aggregate function. Only the 
> built-in min() and max() functions work this way."
>
> -Ursprüngliche Nachricht-
> Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
> Auftrag von Hamish Allan
> Gesendet: Montag, 03. April 2017 10:51
> An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
> Betreff: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>
> Ah. My purpose is to determine "d for the most recent c with b=1", with "most 
> recent" being "largest a".
>
> My query had been working for this, but I'd only been adding monotonically 
> increasing values for "a" in my tests. I just tried doing otherwise and found 
> that I had been relying on an implementation detail.
>
> Is there any way I can perform this collation at query time, or do I need to 
> do it programmatically later on?
>
> Thanks,
> Hamish
>
>
>
> ___
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: h...@scigames.at
>
> This communication (including any attachments) is intended for the use of the 
> intended recipient(s) only and may contain information that is confidential, 
> privileged or legally protected. Any unauthorized use or dissemination of 
> this communication is strictly prohibited. If you have received this 
> communication in error, please immediately notify the sender by return e-mail 
> message and delete all copies of the original communication. Thank you for 
> your cooperation.
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-03 Thread Hamish Allan
This second one also has the advantage that I understand what's being
expressed! 

This works perfectly, thank you! (and thank you to all others that replied).

I'm not sure I understand the indexing, though. The query plan without
indexes is:

0|0|0|SCAN TABLE x AS x1
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE x AS x2

If I subsequently CREATE INDEX b_c_index ON x (b, c):

0|0|0|SEARCH TABLE x AS x1 USING INDEX b_c_index (b=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE x AS x2 USING INDEX b_c_index (b=? AND c=?)

So far so good. But if I additionally CREATE INDEX b_a_index ON x (b, a):

0|0|0|SEARCH TABLE x AS x1 USING INDEX b_a_index (b=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE x AS x2 USING INDEX b_a_index (b=?)

I've not dropped the b_c_index. Why does the subquery no longer use it?

Thanks,
Hamish



On 3 April 2017 at 10:20, R Smith <rsm...@rsweb.co.za> wrote:
> Another option, if a is simple and a suitable index exists, is to simply get
> the max in a correlated subquery, so this should work faster even:
>
> SELECT c,d FROM x AS x1 WHERE b=1 AND a=(SELECT MAX(x2.a) FROM x AS x2 WHERE
> x2.b=x1.b AND x2.c=x1.c)
>
> This needs no grouping because the sub-query ensures it, unless a can have
> duplicate values for any one c value.
>
>
>
>
> On 2017/04/03 11:09 AM, R Smith wrote:
>>
>>
>>
>> On 2017/04/03 10:51 AM, Hamish Allan wrote:
>>>
>>> Ah. My purpose is to determine "d for the most recent c with b=1",
>>> with "most recent" being "largest a".
>>>
>>> My query had been working for this, but I'd only been adding
>>> monotonically increasing values for "a" in my tests. I just tried
>>> doing otherwise and found that I had been relying on an implementation
>>> detail.
>>>
>>> Is there any way I can perform this collation at query time, or do I
>>> need to do it programmatically later on?
>>
>>
>> WITH CLT(d,c,a) AS (
>>   SELECT d,c,a FROM x WHERE b = 1 GROUP BY c,a
>> ), CMX(c,mxa) AS (
>>   SELECT c, MAX(a) FROM CLT GROUP BY c
>> )
>> SELECT d
>>   FROM CLT
>>   JOIN CMX ON CMX.c = CLT.c AND CMX.mxa = CLT.a
>> ;
>>
>> Zero Order-by's
>>
>>
>>
>>>
>>> Thanks,
>>> Hamish
>>>
>>>
>>>
>>>
>>>
>>> On 3 April 2017 at 00:41, Keith Medcalf <kmedc...@dessus.com> wrote:
>>>>
>>>> What is the purpose of the ORDER BY?
>>>>
>>>> The value of a used for the order by is from some random row in the
>>>> grouping of c.  Are there relationships between a, b, c, d that you have 
>>>> not
>>>> documented nor told us about?
>>>>
>>>> In any case, your most efficient index is on (b, c).
>>>>
>>>> The order by is useless.
>>>>
>>>>> -Original Message-
>>>>> From: sqlite-users
>>>>> [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
>>>>> On Behalf Of Hamish Allan
>>>>> Sent: Sunday, 2 April, 2017 17:28
>>>>> To: sqlite-users@mailinglists.sqlite.org
>>>>> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>>>>>
>>>>> Given a table:
>>>>>
>>>>> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>>>>>
>>>>> the query:
>>>>>
>>>>> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>>>>>
>>>>> shows the following plan, without indexes:
>>>>>
>>>>> 0|0|0|SCAN TABLE x
>>>>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>>>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>>>
>>>>> I can create an index to cover the WHERE clause:
>>>>>
>>>>> CREATE INDEX b_index ON x (b);
>>>>>
>>>>> which gives the plan:
>>>>>
>>>>> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
>>>>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>>>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>>>
>>>>> or I can create an index to cover the GROUP BY clause:
>>>>>
>>>>> DROP INDEX b_index;
>>>>> CREATE INDEX c_index ON x (c);
>>>>>
>>>>> which gives the plan:
>>>>>
>>>>> 0|0|0|SCAN TABLE x USING INDEX c_index
>>>>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>>>>
>>>>> 

Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-03 Thread Keith Medcalf
  select max(a), c, d
from x
   where b == 1
group by c
order by 1 desc;

index on (b, c, a)

So, you are grouping by c where b=1, and within each grouping returning the 
grouping (c), the maximum value of (a) for that grouping, and (d) from one of 
perhaps multiple rows that contain the max(a) within that group.

Or perhaps:

select max(a), c, d
  from x
 where b == 1;

Which finds the max(a) where b==1, and returns that value and the value of c 
and d from one of the rows containing that maximum (1 row returned).

index on (b, a)

The former returns all groups of c with the top one being the one row returned 
by the latter.

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Hamish Allan
> Sent: Monday, 3 April, 2017 02:51
> To: SQLite mailing list
> Subject: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
> 
> Ah. My purpose is to determine "d for the most recent c with b=1",
> with "most recent" being "largest a".
> 
> My query had been working for this, but I'd only been adding
> monotonically increasing values for "a" in my tests. I just tried
> doing otherwise and found that I had been relying on an implementation
> detail.
> 
> Is there any way I can perform this collation at query time, or do I
> need to do it programmatically later on?
> 
> Thanks,
> Hamish
> 
> 
> 
> 
> 
> On 3 April 2017 at 00:41, Keith Medcalf <kmedc...@dessus.com> wrote:
> >
> > What is the purpose of the ORDER BY?
> >
> > The value of a used for the order by is from some random row in the
> grouping of c.  Are there relationships between a, b, c, d that you have
> not documented nor told us about?
> >
> > In any case, your most efficient index is on (b, c).
> >
> > The order by is useless.
> >
> >> -Original Message-
> >> From: sqlite-users [mailto:sqlite-users-
> boun...@mailinglists.sqlite.org]
> >> On Behalf Of Hamish Allan
> >> Sent: Sunday, 2 April, 2017 17:28
> >> To: sqlite-users@mailinglists.sqlite.org
> >> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
> >>
> >> Given a table:
> >>
> >> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
> >>
> >> the query:
> >>
> >> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
> >>
> >> shows the following plan, without indexes:
> >>
> >> 0|0|0|SCAN TABLE x
> >> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> >>
> >> I can create an index to cover the WHERE clause:
> >>
> >> CREATE INDEX b_index ON x (b);
> >>
> >> which gives the plan:
> >>
> >> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
> >> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> >>
> >> or I can create an index to cover the GROUP BY clause:
> >>
> >> DROP INDEX b_index;
> >> CREATE INDEX c_index ON x (c);
> >>
> >> which gives the plan:
> >>
> >> 0|0|0|SCAN TABLE x USING INDEX c_index
> >> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> >>
> >> but I don't seem to be able to create a combined index to cover both
> >> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
> >>
> >> Am I missing something, or is this just not possible?
> >>
> >> If it's not possible, which is the more efficient of those indexes  --
> >> or is there a third way, using an index for the ORDER BY, which would
> >> be more efficient still?
> >>
> >> Thanks,
> >> Hamish
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> >
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-03 Thread Hick Gunter
I am not sure I correctly understand what you want.

This is the value of d associated with a randomly chosen record from the group 
of records having the highest value of a tha also fulfills  b == 1;

SELECT MAX(a),d FROM x WHERE b=1;

If you want the value of d within each group of records sharing a value of c, 
try

SELECT MAX(a),c,d FROM x WHERE b=1 GROUP BY c ORDER BY 1 DESC, 2 ASC;

See also http://sqlite.org/lang_select.html

"Side note: Bare columns in an aggregate queries. The usual case is that all 
column names in an aggregate query are either arguments to aggregate functions 
or else appear in the GROUP BY clause. A result column which contains a column 
name that is not within an aggregate function and that does not appear in the 
GROUP BY clause (if one exists) is called a "bare" column. Example:

SELECT a, b, sum(c) FROM tab1 GROUP BY a;


In the query above, the "a" column is part of the GROUP BY clause and so each 
row of the output contains one of the distinct values for "a". The "c" column 
is contained within the sum() aggregate function and so that output column is 
the sum of all "c" values in rows that have the same value for "a". But what is 
the result of the bare column "b"? The answer is that the "b" result will be 
the value for "b" in one of the input rows that form the aggregate. The problem 
is that you usually do not know which input row is used to compute "b", and so 
in many cases the value for "b" is undefined.

Special processing occurs occurs when the aggregate function is either min() or 
max(). Example:

SELECT a, b, max(c) FROM tab1 GROUP BY a;

When the min() or max() aggregate functions are used in an aggregate query, all 
bare columns in the result set take values from the input row which also 
contains the minimum or maximum. So in the query above, the value of the "b" 
column in the output will be the value of the "b" column in the input row that 
has the largest "c" value. There is still an ambiguity if two or more of the 
input rows have the same minimum or maximum value or if the query contains more 
than one min() and/or max() aggregate function. Only the built-in min() and 
max() functions work this way."

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Hamish Allan
Gesendet: Montag, 03. April 2017 10:51
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

Ah. My purpose is to determine "d for the most recent c with b=1", with "most 
recent" being "largest a".

My query had been working for this, but I'd only been adding monotonically 
increasing values for "a" in my tests. I just tried doing otherwise and found 
that I had been relying on an implementation detail.

Is there any way I can perform this collation at query time, or do I need to do 
it programmatically later on?

Thanks,
Hamish



___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-03 Thread R Smith
Another option, if a is simple and a suitable index exists, is to simply 
get the max in a correlated subquery, so this should work faster even:


SELECT c,d FROM x AS x1 WHERE b=1 AND a=(SELECT MAX(x2.a) FROM x AS x2 
WHERE x2.b=x1.b AND x2.c=x1.c)


This needs no grouping because the sub-query ensures it, unless a can 
have duplicate values for any one c value.




On 2017/04/03 11:09 AM, R Smith wrote:



On 2017/04/03 10:51 AM, Hamish Allan wrote:

Ah. My purpose is to determine "d for the most recent c with b=1",
with "most recent" being "largest a".

My query had been working for this, but I'd only been adding
monotonically increasing values for "a" in my tests. I just tried
doing otherwise and found that I had been relying on an implementation
detail.

Is there any way I can perform this collation at query time, or do I
need to do it programmatically later on?


WITH CLT(d,c,a) AS (
  SELECT d,c,a FROM x WHERE b = 1 GROUP BY c,a
), CMX(c,mxa) AS (
  SELECT c, MAX(a) FROM CLT GROUP BY c
)
SELECT d
  FROM CLT
  JOIN CMX ON CMX.c = CLT.c AND CMX.mxa = CLT.a
;

Zero Order-by's





Thanks,
Hamish





On 3 April 2017 at 00:41, Keith Medcalf <kmedc...@dessus.com> wrote:

What is the purpose of the ORDER BY?

The value of a used for the order by is from some random row in the 
grouping of c.  Are there relationships between a, b, c, d that you 
have not documented nor told us about?


In any case, your most efficient index is on (b, c).

The order by is useless.


-Original Message-
From: sqlite-users 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org]

On Behalf Of Hamish Allan
Sent: Sunday, 2 April, 2017 17:28
To: sqlite-users@mailinglists.sqlite.org
Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

Given a table:

CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);

the query:

SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;

shows the following plan, without indexes:

0|0|0|SCAN TABLE x
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

I can create an index to cover the WHERE clause:

CREATE INDEX b_index ON x (b);

which gives the plan:

0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

or I can create an index to cover the GROUP BY clause:

DROP INDEX b_index;
CREATE INDEX c_index ON x (c);

which gives the plan:

0|0|0|SCAN TABLE x USING INDEX c_index
0|0|0|USE TEMP B-TREE FOR ORDER BY

but I don't seem to be able to create a combined index to cover both
the WHERE and GROUP BY clauses (let alone the ORDER BY clause).

Am I missing something, or is this just not possible?

If it's not possible, which is the more efficient of those indexes  --
or is there a third way, using an index for the ORDER BY, which would
be more efficient still?

Thanks,
Hamish
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-03 Thread R Smith



On 2017/04/03 10:51 AM, Hamish Allan wrote:

Ah. My purpose is to determine "d for the most recent c with b=1",
with "most recent" being "largest a".

My query had been working for this, but I'd only been adding
monotonically increasing values for "a" in my tests. I just tried
doing otherwise and found that I had been relying on an implementation
detail.

Is there any way I can perform this collation at query time, or do I
need to do it programmatically later on?


WITH CLT(d,c,a) AS (
  SELECT d,c,a FROM x WHERE b = 1 GROUP BY c,a
), CMX(c,mxa) AS (
  SELECT c, MAX(a) FROM CLT GROUP BY c
)
SELECT d
  FROM CLT
  JOIN CMX ON CMX.c = CLT.c AND CMX.mxa = CLT.a
;

Zero Order-by's





Thanks,
Hamish





On 3 April 2017 at 00:41, Keith Medcalf <kmedc...@dessus.com> wrote:

What is the purpose of the ORDER BY?

The value of a used for the order by is from some random row in the grouping of 
c.  Are there relationships between a, b, c, d that you have not documented nor 
told us about?

In any case, your most efficient index is on (b, c).

The order by is useless.


-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
On Behalf Of Hamish Allan
Sent: Sunday, 2 April, 2017 17:28
To: sqlite-users@mailinglists.sqlite.org
Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

Given a table:

CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);

the query:

SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;

shows the following plan, without indexes:

0|0|0|SCAN TABLE x
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

I can create an index to cover the WHERE clause:

CREATE INDEX b_index ON x (b);

which gives the plan:

0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

or I can create an index to cover the GROUP BY clause:

DROP INDEX b_index;
CREATE INDEX c_index ON x (c);

which gives the plan:

0|0|0|SCAN TABLE x USING INDEX c_index
0|0|0|USE TEMP B-TREE FOR ORDER BY

but I don't seem to be able to create a combined index to cover both
the WHERE and GROUP BY clauses (let alone the ORDER BY clause).

Am I missing something, or is this just not possible?

If it's not possible, which is the more efficient of those indexes  --
or is there a third way, using an index for the ORDER BY, which would
be more efficient still?

Thanks,
Hamish
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-03 Thread Hamish Allan
Ah. My purpose is to determine "d for the most recent c with b=1",
with "most recent" being "largest a".

My query had been working for this, but I'd only been adding
monotonically increasing values for "a" in my tests. I just tried
doing otherwise and found that I had been relying on an implementation
detail.

Is there any way I can perform this collation at query time, or do I
need to do it programmatically later on?

Thanks,
Hamish





On 3 April 2017 at 00:41, Keith Medcalf <kmedc...@dessus.com> wrote:
>
> What is the purpose of the ORDER BY?
>
> The value of a used for the order by is from some random row in the grouping 
> of c.  Are there relationships between a, b, c, d that you have not 
> documented nor told us about?
>
> In any case, your most efficient index is on (b, c).
>
> The order by is useless.
>
>> -Original Message-
>> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
>> On Behalf Of Hamish Allan
>> Sent: Sunday, 2 April, 2017 17:28
>> To: sqlite-users@mailinglists.sqlite.org
>> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
>>
>> Given a table:
>>
>> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
>>
>> the query:
>>
>> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
>>
>> shows the following plan, without indexes:
>>
>> 0|0|0|SCAN TABLE x
>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>
>> I can create an index to cover the WHERE clause:
>>
>> CREATE INDEX b_index ON x (b);
>>
>> which gives the plan:
>>
>> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
>> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>
>> or I can create an index to cover the GROUP BY clause:
>>
>> DROP INDEX b_index;
>> CREATE INDEX c_index ON x (c);
>>
>> which gives the plan:
>>
>> 0|0|0|SCAN TABLE x USING INDEX c_index
>> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>>
>> but I don't seem to be able to create a combined index to cover both
>> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
>>
>> Am I missing something, or is this just not possible?
>>
>> If it's not possible, which is the more efficient of those indexes  --
>> or is there a third way, using an index for the ORDER BY, which would
>> be more efficient still?
>>
>> Thanks,
>> Hamish
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-02 Thread Simon Slavin

On 3 Apr 2017, at 12:27am, Hamish Allan  wrote:

> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;

Your problem comes down to this:

If you are GROUPing BY c, why do you want ORDER BY a ?

If you remove the "ORDER BY a" clause then the ideal index would be on (b, c).  
But to deal with the ORDER BY we need to know whether "a" is a subgroup of "c".

My guess is that (b, c, a) would be the best index.  But it depends on whether 
a is a subgroup of b or c.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-02 Thread Keith Medcalf

What is the purpose of the ORDER BY?  

The value of a used for the order by is from some random row in the grouping of 
c.  Are there relationships between a, b, c, d that you have not documented nor 
told us about?

In any case, your most efficient index is on (b, c).  

The order by is useless.

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Hamish Allan
> Sent: Sunday, 2 April, 2017 17:28
> To: sqlite-users@mailinglists.sqlite.org
> Subject: [sqlite] Indexing WHERE with GROUP BY and ORDER BY
> 
> Given a table:
> 
> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
> 
> the query:
> 
> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
> 
> shows the following plan, without indexes:
> 
> 0|0|0|SCAN TABLE x
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> 
> I can create an index to cover the WHERE clause:
> 
> CREATE INDEX b_index ON x (b);
> 
> which gives the plan:
> 
> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> 
> or I can create an index to cover the GROUP BY clause:
> 
> DROP INDEX b_index;
> CREATE INDEX c_index ON x (c);
> 
> which gives the plan:
> 
> 0|0|0|SCAN TABLE x USING INDEX c_index
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> 
> but I don't seem to be able to create a combined index to cover both
> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
> 
> Am I missing something, or is this just not possible?
> 
> If it's not possible, which is the more efficient of those indexes  --
> or is there a third way, using an index for the ORDER BY, which would
> be more efficient still?
> 
> Thanks,
> Hamish
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-02 Thread Jeffrey Mattox
I had nearly the same question a month ago (Subject: Index usefulness for GROUP 
BY).  In my case, the best index was on the WHERE clause because it eliminated 
the scan and returned only the few important rows for the other clauses.  
However, the best result will depend on how many rows are eliminated compared 
to the number of rows remaining for the GROUP BY to deal with.

Jeff


> On Apr 2, 2017, at 6:27 PM, Hamish Allan  wrote:
> 
> Given a table:
> 
> CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);
> 
> the query:
> 
> SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;
> 
> shows the following plan, without indexes:
> 
> 0|0|0|SCAN TABLE x
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> 
> I can create an index to cover the WHERE clause:
> 
> CREATE INDEX b_index ON x (b);
> 
> which gives the plan:
> 
> 0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> 
> or I can create an index to cover the GROUP BY clause:
> 
> DROP INDEX b_index;
> CREATE INDEX c_index ON x (c);
> 
> which gives the plan:
> 
> 0|0|0|SCAN TABLE x USING INDEX c_index
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> 
> but I don't seem to be able to create a combined index to cover both
> the WHERE and GROUP BY clauses (let alone the ORDER BY clause).
> 
> Am I missing something, or is this just not possible?
> 
> If it's not possible, which is the more efficient of those indexes  --
> or is there a third way, using an index for the ORDER BY, which would
> be more efficient still?
> 
> Thanks,
> Hamish
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Indexing WHERE with GROUP BY and ORDER BY

2017-04-02 Thread Hamish Allan
Given a table:

CREATE TABLE x (a INT, b INT, c TEXT, d TEXT);

the query:

SELECT d FROM x WHERE b = 1 GROUP BY c ORDER BY a;

shows the following plan, without indexes:

0|0|0|SCAN TABLE x
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

I can create an index to cover the WHERE clause:

CREATE INDEX b_index ON x (b);

which gives the plan:

0|0|0|SEARCH TABLE x USING INDEX b_index (b=?)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

or I can create an index to cover the GROUP BY clause:

DROP INDEX b_index;
CREATE INDEX c_index ON x (c);

which gives the plan:

0|0|0|SCAN TABLE x USING INDEX c_index
0|0|0|USE TEMP B-TREE FOR ORDER BY

but I don't seem to be able to create a combined index to cover both
the WHERE and GROUP BY clauses (let alone the ORDER BY clause).

Am I missing something, or is this just not possible?

If it's not possible, which is the more efficient of those indexes  --
or is there a third way, using an index for the ORDER BY, which would
be more efficient still?

Thanks,
Hamish
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users