Re: [sqlite] BUG: Aggregate functions in subqueries
On Sun, Sep 14, 2014 at 11:47 AM, Keith Medcalfwrote: > > Yes, the max(y) operation is flattened out of the query probably because > it is not referenced anywhere and the optimizer does not see that it is > performing any useful function. There is still only one row returned, > however, because the same accumulator is used to collect the result, it > just operates on all rows instead of only those which set the max() return > value. > An accurate and succinct summary of the issue. The question now: what to do about it. I'll save that question for a workday... -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] presentation about ordering and atomicity of filesystems
On Sat, 13 Sep 2014 15:43:00 -0400 Richard Hippwrote: > There were often restrictions on the permitted values for block > sizes. And you couldn't ask the operating system to tell you whether > a file was text or binary or sequential or random-access or what its > block-size was; you just had to know. That brings to mind Faulkner's observation: "The past is never dead. It's not even past." VSAM has the properties you describe, and is the successor to ISAM, which was the state of art when Unix was being invented. VSAM is still alive and kicking, see VSAM Demystified: http://www.redbooks.ibm.com/abstracts/sg246105.html?Open Sample paragraph: "VSAM is used to organize records into four types of data sets: Key-sequenced (KSDS), entry-sequenced (ESDS), linear (LDS), and relative record (RRDS and VRRDS). The primary difference between the types of VSAM data sets is the way that their records are stored and accessed." Whatever complaints one might have with Posix, ten minutes with that document will reveal the blessings of mmap and fsync! --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
"Richard Hipp" wrote... On Sun, Sep 14, 2014 at 4:31 PM, Simon Slavinwrote: On 14 Sep 2014, at 8:58pm, jose isaias cabrera wrote: > Yeah, that is what I am using now. I was trying to get the speed that supposedly is in the IN clause. :-) Thanks. BETWEEN is fast[er] than INf Confirmed. Just to verify this, I ran the following script which times the same query using both IN and BETWEEN: .open /home/drh/www/repos/tcl.fossil CREATE TEMP TABLE t1(id INTEGER PRIMARY KEY, date TEXT); INSERT INTO t1 SELECT objid, date(mtime) FROM event; SELECT count(*) FROM t1; .timer on SELECT count(*) FROM t1 WHERE date BETWEEN '2014-01-01' AND '2014-01-05'; WITH RECURSIVE range(date) AS (SELECT '2014-01-01' UNION ALL SELECT date(date,'+1 day') FROM range WHERE date<'2014-01-05') SELECT count(*) FROM t1 WHERE date in range; The t1 table above has 61856 dates in it, only 45 of which are in the selected range. Using BETWEEN is about twice the speed of using IN. That makes sense. The IN operator (usually) builds a transient table with all of the values of the right-hand side, then does a membership test for row of the input table. The building of the transient table is a one-time setup action so we can ignore the performance impact of that. But then it has to do a binary search of the transient table for each of the 61856 rows of the input table. A binary search of the 5-entry "range" table involves about 3 comparisons, on average, whereas the BETWEEN operator involves about 1.5 comparisons per row, on average (assuming the first comparison is false about half the time). Note that the to queries are not exactly equivalent though. With the IN in the WHERE clause, a date of '2014-01-03-extra-text' would be excluded whereas it would be included using the BETWEEN operator. So one way to look at it is that the IN operator is being more selective and hence requires additional comparisons. Thanks, Dr. Hipp. This is perfect. If I may, this should probably be part of the documentation, as I am always looking for speed and this confirms the result. josé ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
On Sun, Sep 14, 2014 at 4:31 PM, Simon Slavinwrote: > > On 14 Sep 2014, at 8:58pm, jose isaias cabrera > wrote: > > > Yeah, that is what I am using now. I was trying to get the speed that > supposedly is in the IN clause. :-) Thanks. > > BETWEEN is fast[er] than INf > Confirmed. Just to verify this, I ran the following script which times the same query using both IN and BETWEEN: .open /home/drh/www/repos/tcl.fossil CREATE TEMP TABLE t1(id INTEGER PRIMARY KEY, date TEXT); INSERT INTO t1 SELECT objid, date(mtime) FROM event; SELECT count(*) FROM t1; .timer on SELECT count(*) FROM t1 WHERE date BETWEEN '2014-01-01' AND '2014-01-05'; WITH RECURSIVE range(date) AS (SELECT '2014-01-01' UNION ALL SELECT date(date,'+1 day') FROM range WHERE date<'2014-01-05') SELECT count(*) FROM t1 WHERE date in range; The t1 table above has 61856 dates in it, only 45 of which are in the selected range. Using BETWEEN is about twice the speed of using IN. That makes sense. The IN operator (usually) builds a transient table with all of the values of the right-hand side, then does a membership test for row of the input table. The building of the transient table is a one-time setup action so we can ignore the performance impact of that. But then it has to do a binary search of the transient table for each of the 61856 rows of the input table. A binary search of the 5-entry "range" table involves about 3 comparisons, on average, whereas the BETWEEN operator involves about 1.5 comparisons per row, on average (assuming the first comparison is false about half the time). Note that the to queries are not exactly equivalent though. With the IN in the WHERE clause, a date of '2014-01-03-extra-text' would be excluded whereas it would be included using the BETWEEN operator. So one way to look at it is that the IN operator is being more selective and hence requires additional comparisons. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
On 14 Sep 2014, at 8:58pm, jose isaias cabrerawrote: > "Darren Duncan" wrote... >> >> >> BETWEEN '2014-01-01' AND '2014-01-05' > > Yeah, that is what I am using now. I was trying to get the speed that > supposedly is in the IN clause. :-) Thanks. BETWEEN is fast than IN for any sensible number of IN arguments. BETWEEN gets directly translated to like so A BETWEEN B AND C ==> A >= B AND A <= C IN has to check whether A matches any one of a number of arguments. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
"Martin Engelschalk" wrote... Hi Jose, you are probably looking for the between-Operator: Open http://www.sqlite.org/lang_expr.html and search for "The BETWEEN operator" inn you case, date BETWEEN '2014-01-01' AND '2014-01-05' Martin Thanks, Martin. Yes, that is what I am using now... :-) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
"Simon Slavin" wrote... On 14 Sep 2014, at 11:59am, Martin Engelschalkwrote: you are probably looking for the between-Operator: Open http://www.sqlite.org/lang_expr.html and search for "The BETWEEN operator" inn you case, date BETWEEN '2014-01-01' AND '2014-01-05' By the way, kudos for storing your dates in a format which is subject to ASCII sorting like that. Assuming that '2014-01-05' means the fifth of January and not the first of May. Thanks. Yes, I have always done that, even in filenames with dates in them. But, am pretty sure that I read this in the sqlite site somewhere about storing dates. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
"Darren Duncan" wrote... On 2014-09-13, 10:07 PM, jose isaias cabrera wrote: I know that the IN clause contains a list of something. I.e. IN ('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05') So the question is, is there a shorter way for one to say something like, IN ('2014-01-01', ..., '2014-01-05') where the content of the IN would have the first item and the last item of the list, but that's it? Thanks. You're talking about a range/interval. In SQL it is spelled like this: BETWEEN '2014-01-01' AND '2014-01-05' Yeah, that is what I am using now. I was trying to get the speed that supposedly is in the IN clause. :-) Thanks. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] BUG: Aggregate functions in subqueries
Yes, the max(y) operation is flattened out of the query probably because it is not referenced anywhere and the optimizer does not see that it is performing any useful function. There is still only one row returned, however, because the same accumulator is used to collect the result, it just operates on all rows instead of only those which set the max() return value. >-Original Message- >From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- >boun...@sqlite.org] On Behalf Of Lea Verou >Sent: Saturday, 13 September, 2014 22:19 >To: sqlite-users@sqlite.org >Subject: [sqlite] BUG: Aggregate functions in subqueries > >Per the 3.7.11 changelog [1], queries of the form SELECT max(x), y FROM >table return the value of y from the same row that contains the maximum x >value. However, this: > >select y from (SELECT max(x), y FROM table); > >would not return the same y rows. This would work as expected: > >select m, y from (SELECT max(x) as m, y FROM table); > >[1]: http://www.sqlite.org/changes.html#version_3_7_11 >___ >sqlite-users mailing list >sqlite-users@sqlite.org >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
jose isaias cabrera wrote: > I know that the IN clause contains a list of something. I.e. > > IN ('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05') > > So the question is, is there a shorter way for one to say something like, > > IN ('2014-01-01', ..., '2014-01-05') > > where the content of the IN would have the first item and the last item of > the list, but that's it? In this case, BETWEEN works just fine. In the general case, you can create a series of values with a common table expression: WITH RECURSIVE dates(d) AS ( SELECT '2014-01-01' UNION ALL SELECT date(d, '+1 day') FROM dates WHERE d < '2014-01-05' ) SELECT * FROM MyTable WHERE MyColumn IN dates; Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] BUG: Aggregate functions in subqueries
On 9/14/14, Lea Verouwrote: > Per the 3.7.11 changelog [1], queries of the form SELECT max(x), y FROM > table return the value of y from the same row that contains the maximum x > value. However, this: Hello! I don't think this is a bug. The documentation for the SELECT statement at "http://sqlite.org/lang_select.html; says: > If the SELECT statement is an aggregate query without a GROUP BY clause, then > [...] Each non-aggregate expression in the result-set is evaluated once for > an arbitrarily selected row of the dataset. The same arbitrarily selected row > is used for each non-aggregate expression. This applies to your query, so y will return the value y from any row. If it happens to return the value where x is maximal, that is an accident, and might depend on what indexes you have or how sqlite chooses to optimize your query. You have no reason to trust sqlite to choose any particular column. If you would definitely like to get the value of y from the row where x is maximal, I recommend a query like this: SELECT x, y FROM sometable ORDER BY x DESC LIMIT 1; -- Ambrus ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
On 14 Sep 2014, at 11:59am, Martin Engelschalkwrote: > you are probably looking for the between-Operator: Open > http://www.sqlite.org/lang_expr.html and search for "The BETWEEN operator" > > inn you case, date BETWEEN '2014-01-01' AND '2014-01-05' By the way, kudos for storing your dates in a format which is subject to ASCII sorting like that. Assuming that '2014-01-05' means the fifth of January and not the first of May. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] BUG: Aggregate functions in subqueries
On Sun, 14 Sep 2014 00:18:34 -0400, Lea Verouwrote: > Per the 3.7.11 changelog [1], queries of the form > SELECT max(x), y FROM table return the value of y > from the same row that contains the maximum x value. > However, this: > select y from (SELECT max(x), y FROM table); > would not return the same y rows. This would work as expected: > select m, y from (SELECT max(x) as m, y FROM table); > > [1]: http://www.sqlite.org/changes.html#version_3_7_11 Confirmed. However, SELECT max(x), y FROM t1 is not proper SQL, as a non-aggregate expression (in this case column y) is used that doesn't appear in a GROUP BY clause. SQLite allows it, and in the simplest of statements it returns a value of y from one of the rows that matches x=max(x), as promised. Apparently, in SELECT y FROM (SELECT max(x), y FROM t1); it is optimized out somehow. Workaround: SELECT y FROM t1 WHERE x=(SELECT max(x) FROM t1); which may return multiple rows, so you'd have to use LIMIT 1 or max(y) or min(y), whichever is most appropriate for your use case. I agree that the result is quite unexpected. Test script and results on 3.8.7 2014-09-06 17:06:13 ad7063aa1a0db32cdbe71815545b2edca57d3bcc and 3.8.7 2014-09-12 20:30:59 b332a84d5154f70f3197537df4af243eaebbb011: CREATE TABLE t1 (x INTEGER, y INTEGER PRIMARY KEY); INSERT INTO t1 (x,y) VALUES (1,1); INSERT INTO t1 (x,y) VALUES (2,2); INSERT INTO t1 (x,y) VALUES (3,3); INSERT INTO t1 (x,y) VALUES (3,4); INSERT INTO t1 (x,y) VALUES (2,5); SELECT * FROM t1; 1 1 2 2 3 3 3 4 2 5 SELECT max(x), y FROM t1; 3 3 SELECT y FROM (SELECT max(x), y FROM t1); 5 SELECT m, y FROM (SELECT max(x) as m, y FROM t1); 3 3 SELECT y FROM t1 WHERE x=(SELECT max(x) FROM t1); 3 4 -- Groet, Kees Nuyt ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
Hi Jose, you are probably looking for the between-Operator: Open http://www.sqlite.org/lang_expr.html and search for "The BETWEEN operator" inn you case, date BETWEEN '2014-01-01' AND '2014-01-05' Martin Am 14.09.2014 07:07, schrieb jose isaias cabrera: Greetings! I know that the IN clause contains a list of something. I.e. IN ('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05') So the question is, is there a shorter way for one to say something like, IN ('2014-01-01', ..., '2014-01-05') where the content of the IN would have the first item and the last item of the list, but that's it? Thanks. josé ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- *Codeswift GmbH * Kräutlerweg 20a A-5020 Salzburg Tel: +49 (0) 8662 / 494330 Mob: +49 (0) 171 / 4487687 Fax: +49 (0) 3212 / 1001404 engelsch...@codeswift.com www.codeswift.com / www.swiftcash.at Codeswift Professional IT Services GmbH Firmenbuch-Nr. FN 202820s UID-Nr. ATU 50576309 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] BUG: Aggregate functions in subqueries
Per the 3.7.11 changelog [1], queries of the form SELECT max(x), y FROM table return the value of y from the same row that contains the maximum x value. However, this: select y from (SELECT max(x), y FROM table); would not return the same y rows. This would work as expected: select m, y from (SELECT max(x) as m, y FROM table); [1]: http://www.sqlite.org/changes.html#version_3_7_11 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] The IN (... ) clause
On 2014-09-13, 10:07 PM, jose isaias cabrera wrote: I know that the IN clause contains a list of something. I.e. IN ('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05') So the question is, is there a shorter way for one to say something like, IN ('2014-01-01', ..., '2014-01-05') where the content of the IN would have the first item and the last item of the list, but that's it? Thanks. You're talking about a range/interval. In SQL it is spelled like this: BETWEEN '2014-01-01' AND '2014-01-05' -- Darren Duncan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users