Re: [sqlite] BUG: Aggregate functions in subqueries

2014-09-14 Thread Richard Hipp
On Sun, Sep 14, 2014 at 11:47 AM, Keith Medcalf  wrote:

>
> Yes, the max(y) operation is flattened out of the query probably because
> it is not referenced anywhere and the optimizer does not see that it is
> performing any useful function.  There is still only one row returned,
> however, because the same accumulator is used to collect the result, it
> just operates on all rows instead of only those which set the max() return
> value.
>

An accurate and succinct summary of the issue.  The question now: what to
do about it.  I'll save that question for a workday...

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-14 Thread James K. Lowden
On Sat, 13 Sep 2014 15:43:00 -0400
Richard Hipp  wrote:

> There were often restrictions on the permitted values for block
> sizes.  And you couldn't ask the operating system to tell you whether
> a file was text or binary or sequential or random-access or what its
> block-size was;  you just had to know.  

That brings to mind Faulkner's observation: 

"The past is never dead. It's not even past."  

VSAM has the properties you describe, and is the successor to ISAM,
which was the state of art when Unix was being invented.  VSAM is still
alive and kicking, see VSAM Demystified:

http://www.redbooks.ibm.com/abstracts/sg246105.html?Open

Sample paragraph: 

"VSAM is used to organize records into four types of data sets:
Key-sequenced (KSDS), entry-sequenced (ESDS), linear (LDS), and
relative record (RRDS and VRRDS). The primary difference between the
types of VSAM data sets is the way that their records are stored and
accessed."

Whatever complaints one might have with Posix, ten minutes with that
document will reveal the blessings of mmap and fsync!  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread jose isaias cabrera


"Richard Hipp" wrote...
On Sun, Sep 14, 2014 at 4:31 PM, Simon Slavin  
wrote:




On 14 Sep 2014, at 8:58pm, jose isaias cabrera 
wrote:

> Yeah, that is what I am using now.  I was trying to get the speed that
supposedly is in the IN clause. :-)  Thanks.

BETWEEN is fast[er] than INf



Confirmed.  Just to verify this, I ran the following script which times 
the

same query using both IN and BETWEEN:

.open /home/drh/www/repos/tcl.fossil
CREATE TEMP TABLE t1(id INTEGER PRIMARY KEY, date TEXT);
INSERT INTO t1 SELECT objid, date(mtime) FROM event;
SELECT count(*) FROM t1;
.timer on
SELECT count(*) FROM t1
WHERE date BETWEEN '2014-01-01' AND '2014-01-05';
WITH RECURSIVE
 range(date) AS (SELECT '2014-01-01'
 UNION ALL
 SELECT date(date,'+1 day') FROM range
  WHERE date<'2014-01-05')
SELECT count(*) FROM t1
WHERE date in range;

The t1 table above has 61856 dates in it, only 45 of which are in the
selected range.  Using BETWEEN is about twice the speed of using IN.

That makes sense.  The IN operator (usually) builds a transient table with
all of the values of the right-hand side, then does a membership test for
row of the input table.  The building of the transient table is a one-time
setup action so we can ignore the performance impact of that.  But then it
has to do a binary search of the transient table for each of the 61856 
rows

of the input table.  A binary search of the 5-entry "range" table involves
about 3 comparisons, on average, whereas the BETWEEN operator involves
about 1.5 comparisons per row, on average (assuming the first comparison 
is

false about half the time).

Note that the to queries are not exactly equivalent though.  With the IN 
in

the WHERE clause, a date of '2014-01-03-extra-text' would be excluded
whereas it would be included using the BETWEEN operator.  So one way to
look at it is that the IN operator is being more selective and hence
requires additional comparisons.


Thanks, Dr. Hipp.  This is perfect.  If I may, this should probably be part 
of the documentation, as I am always looking for speed and this confirms the 
result.


josé 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread Richard Hipp
On Sun, Sep 14, 2014 at 4:31 PM, Simon Slavin  wrote:

>
> On 14 Sep 2014, at 8:58pm, jose isaias cabrera 
> wrote:
>
> > Yeah, that is what I am using now.  I was trying to get the speed that
> supposedly is in the IN clause. :-)  Thanks.
>
> BETWEEN is fast[er] than INf
>

Confirmed.  Just to verify this, I ran the following script which times the
same query using both IN and BETWEEN:

.open /home/drh/www/repos/tcl.fossil
CREATE TEMP TABLE t1(id INTEGER PRIMARY KEY, date TEXT);
INSERT INTO t1 SELECT objid, date(mtime) FROM event;
SELECT count(*) FROM t1;
.timer on
SELECT count(*) FROM t1
 WHERE date BETWEEN '2014-01-01' AND '2014-01-05';
WITH RECURSIVE
  range(date) AS (SELECT '2014-01-01'
  UNION ALL
  SELECT date(date,'+1 day') FROM range
   WHERE date<'2014-01-05')
SELECT count(*) FROM t1
 WHERE date in range;

The t1 table above has 61856 dates in it, only 45 of which are in the
selected range.  Using BETWEEN is about twice the speed of using IN.

That makes sense.  The IN operator (usually) builds a transient table with
all of the values of the right-hand side, then does a membership test for
row of the input table.  The building of the transient table is a one-time
setup action so we can ignore the performance impact of that.  But then it
has to do a binary search of the transient table for each of the 61856 rows
of the input table.  A binary search of the 5-entry "range" table involves
about 3 comparisons, on average, whereas the BETWEEN operator involves
about 1.5 comparisons per row, on average (assuming the first comparison is
false about half the time).

Note that the to queries are not exactly equivalent though.  With the IN in
the WHERE clause, a date of '2014-01-03-extra-text' would be excluded
whereas it would be included using the BETWEEN operator.  So one way to
look at it is that the IN operator is being more selective and hence
requires additional comparisons.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread Simon Slavin

On 14 Sep 2014, at 8:58pm, jose isaias cabrera  wrote:

> "Darren Duncan" wrote...
>> 
>> 
>>  BETWEEN '2014-01-01' AND '2014-01-05'
> 
> Yeah, that is what I am using now.  I was trying to get the speed that 
> supposedly is in the IN clause. :-)  Thanks.

BETWEEN is fast than IN for any sensible number of IN arguments.  BETWEEN gets 
directly translated to like so

A BETWEEN B AND C   ==> A >= B AND A <= C

IN has to check whether A matches any one of a number of arguments.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread jose isaias cabrera


"Martin Engelschalk" wrote...


Hi Jose,

you are probably looking for the between-Operator: Open 
http://www.sqlite.org/lang_expr.html and search for "The BETWEEN operator"


inn you case, date BETWEEN '2014-01-01' AND '2014-01-05'

Martin


Thanks, Martin.  Yes, that is what I am using now... :-)

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread jose isaias cabrera


"Simon Slavin" wrote...


On 14 Sep 2014, at 11:59am, Martin Engelschalk  
wrote:


you are probably looking for the between-Operator: Open 
http://www.sqlite.org/lang_expr.html and search for "The BETWEEN 
operator"


inn you case, date BETWEEN '2014-01-01' AND '2014-01-05'


By the way, kudos for storing your dates in a format which is subject to 
ASCII sorting like that.  Assuming that '2014-01-05' means the fifth of 
January and not the first of May.


Thanks.  Yes, I have always done that, even in filenames with dates in them. 
But, am pretty sure that I read this in the sqlite site somewhere about 
storing dates.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread jose isaias cabrera


"Darren Duncan" wrote...

On 2014-09-13, 10:07 PM, jose isaias cabrera wrote:

I know that the IN clause contains a list of something. I.e.

IN ('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05')

So the question is, is there a shorter way for one to say something like,

IN ('2014-01-01', ..., '2014-01-05')

where the content of the IN would have the first item and the last item 
of the

list, but that's it?  Thanks.


You're talking about a range/interval.

In SQL it is spelled like this:

  BETWEEN '2014-01-01' AND '2014-01-05'


Yeah, that is what I am using now.  I was trying to get the speed that 
supposedly is in the IN clause. :-)  Thanks.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BUG: Aggregate functions in subqueries

2014-09-14 Thread Keith Medcalf

Yes, the max(y) operation is flattened out of the query probably because it is 
not referenced anywhere and the optimizer does not see that it is performing 
any useful function.  There is still only one row returned, however, because 
the same accumulator is used to collect the result, it just operates on all 
rows instead of only those which set the max() return value.

>-Original Message-
>From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] On Behalf Of Lea Verou
>Sent: Saturday, 13 September, 2014 22:19
>To: sqlite-users@sqlite.org
>Subject: [sqlite] BUG: Aggregate functions in subqueries
>
>Per the 3.7.11 changelog [1], queries of the form SELECT max(x), y FROM
>table return the value of y from the same row that contains the maximum x
>value. However, this:
>
>select y from (SELECT max(x), y FROM table);
>
>would not return the same y rows. This would work as expected:
>
>select m, y from (SELECT max(x) as m, y FROM table);
>
>[1]: http://www.sqlite.org/changes.html#version_3_7_11
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread Clemens Ladisch
jose isaias cabrera wrote:
> I know that the IN clause contains a list of something. I.e.
>
> IN ('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05')
>
> So the question is, is there a shorter way for one to say something like,
>
> IN ('2014-01-01', ..., '2014-01-05')
>
> where the content of the IN would have the first item and the last item of 
> the list, but that's it?

In this case, BETWEEN works just fine.

In the general case, you can create a series of values with a common table 
expression:

  WITH RECURSIVE dates(d) AS (
SELECT '2014-01-01'
UNION ALL
SELECT date(d, '+1 day')
FROM dates
WHERE d < '2014-01-05'
  )
  SELECT * FROM MyTable WHERE MyColumn IN dates;


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BUG: Aggregate functions in subqueries

2014-09-14 Thread Zsbán Ambrus
On 9/14/14, Lea Verou  wrote:
> Per the 3.7.11 changelog [1], queries of the form SELECT max(x), y FROM
> table return the value of y from the same row that contains the maximum x
> value. However, this:

Hello!  I don't think this is a bug.  The documentation for the SELECT
statement at "http://sqlite.org/lang_select.html; says:

> If the SELECT statement is an aggregate query without a GROUP BY clause, then 
> [...] Each non-aggregate expression in the result-set is evaluated once for 
> an arbitrarily selected row of the dataset. The same arbitrarily selected row 
> is used for each non-aggregate expression.

This applies to your query, so y will return the value y from any row.
If it happens to return the value where x is maximal, that is an
accident, and might depend on what indexes you have or how sqlite
chooses to optimize your query.  You have no reason to trust sqlite to
choose any particular column.

If you would definitely like to get the value of y from the row where
x is maximal, I recommend a query like this:

  SELECT x, y FROM sometable ORDER BY x DESC LIMIT 1;

-- Ambrus
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread Simon Slavin

On 14 Sep 2014, at 11:59am, Martin Engelschalk  
wrote:

> you are probably looking for the between-Operator: Open 
> http://www.sqlite.org/lang_expr.html and search for "The BETWEEN operator"
> 
> inn you case, date BETWEEN '2014-01-01' AND '2014-01-05'

By the way, kudos for storing your dates in a format which is subject to ASCII 
sorting like that.  Assuming that '2014-01-05' means the fifth of January and 
not the first of May.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BUG: Aggregate functions in subqueries

2014-09-14 Thread Kees Nuyt
On Sun, 14 Sep 2014 00:18:34 -0400, Lea Verou 
wrote:

> Per the 3.7.11 changelog [1], queries of the form 
> SELECT max(x), y FROM table return the value of y
> from the same row that contains the maximum x value.
> However, this:
> select y from (SELECT max(x), y FROM table);
> would not return the same y rows. This would work as expected:
> select m, y from (SELECT max(x) as m, y FROM table);
>
> [1]: http://www.sqlite.org/changes.html#version_3_7_11

Confirmed.
However, SELECT max(x), y FROM t1 is not proper SQL, as a
non-aggregate expression (in this case column y) is used that
doesn't appear in a GROUP BY clause. SQLite allows it, and in
the simplest of statements it returns a value of y from one of
the rows that matches x=max(x), as promised.
Apparently, in SELECT y FROM (SELECT max(x), y FROM t1); it is
optimized out somehow.

Workaround:
SELECT y FROM t1 WHERE x=(SELECT max(x) FROM t1);
which may return multiple rows, so you'd have to use 
LIMIT 1 or max(y) or min(y), whichever is most appropriate for
your use case.

I agree that the result 
is quite unexpected.

Test script and results on 
3.8.7 2014-09-06 17:06:13
ad7063aa1a0db32cdbe71815545b2edca57d3bcc and 
3.8.7 2014-09-12 20:30:59
b332a84d5154f70f3197537df4af243eaebbb011:

CREATE TABLE t1 (x INTEGER, y INTEGER PRIMARY KEY);
INSERT INTO t1 (x,y) VALUES (1,1);
INSERT INTO t1 (x,y) VALUES (2,2);
INSERT INTO t1 (x,y) VALUES (3,3);
INSERT INTO t1 (x,y) VALUES (3,4);
INSERT INTO t1 (x,y) VALUES (2,5);
SELECT * FROM t1;
1   1
2   2
3   3
3   4
2   5

SELECT max(x), y FROM t1;
3   3

SELECT y FROM (SELECT max(x), y FROM t1);
5

SELECT m, y FROM (SELECT max(x) as m, y FROM t1);
3   3

SELECT y FROM t1 WHERE x=(SELECT max(x) FROM t1);
3
4

-- 
Groet,

Kees Nuyt

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread Martin Engelschalk

Hi Jose,

you are probably looking for the between-Operator: Open 
http://www.sqlite.org/lang_expr.html and search for "The BETWEEN operator"


inn you case, date BETWEEN '2014-01-01' AND '2014-01-05'

Martin


Am 14.09.2014 07:07, schrieb jose isaias cabrera:


Greetings!

I know that the IN clause contains a list of something. I.e.

IN ('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05')

So the question is, is there a shorter way for one to say something like,

IN ('2014-01-01', ..., '2014-01-05')

where the content of the IN would have the first item and the last 
item of the list, but that's it?  Thanks.


josé
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


--

*Codeswift GmbH *
Kräutlerweg 20a
A-5020 Salzburg
Tel: +49 (0) 8662 / 494330
Mob: +49 (0) 171 / 4487687
Fax: +49 (0) 3212 / 1001404
engelsch...@codeswift.com
www.codeswift.com / www.swiftcash.at

Codeswift Professional IT Services GmbH
Firmenbuch-Nr. FN 202820s
UID-Nr. ATU 50576309

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] BUG: Aggregate functions in subqueries

2014-09-14 Thread Lea Verou
Per the 3.7.11 changelog [1], queries of the form SELECT max(x), y FROM table 
return the value of y from the same row that contains the maximum x value. 
However, this:

select y from (SELECT max(x), y FROM table);

would not return the same y rows. This would work as expected:

select m, y from (SELECT max(x) as m, y FROM table);

[1]: http://www.sqlite.org/changes.html#version_3_7_11
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The IN (... ) clause

2014-09-14 Thread Darren Duncan

On 2014-09-13, 10:07 PM, jose isaias cabrera wrote:

I know that the IN clause contains a list of something. I.e.

IN ('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05')

So the question is, is there a shorter way for one to say something like,

IN ('2014-01-01', ..., '2014-01-05')

where the content of the IN would have the first item and the last item of the
list, but that's it?  Thanks.


You're talking about a range/interval.

In SQL it is spelled like this:

  BETWEEN '2014-01-01' AND '2014-01-05'

-- Darren Duncan


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users