On Wed, 26 Aug 2015 13:39:09 +0100 Simon Slavin <slavins at bigfraud.org> wrote:
> > On 26 Aug 2015, at 11:27am, tonyp at acm.org wrote: > > > Plus, it apparently recognizes if the random() expression in the > > ORDER BY is the same as the SELECT one and again sort correctly > > (without re-evaluating) and without needing an alias. > > Ah, but I would call /that/ a bug ! I think I can convince you Postgres is right and SQLite is not. :-) I'll give you 4 reasons: 1. Theory 2. Authority 3. Atomicity 4. Consistency Theory. Like the relation of relational algebra, a SQL table has no order. (Not only tables, but any table-like thing: view, CTE, subquery.) If you take a table and permute its order, it's still the same table. ORDER BY permutes the order in a particular way. It does not change the table. Relationally, the input and output of ORDER BY must be identical. Authority. As a matter of fact, CJ Date says the output of ORDER BY is *not* a table, but a cursor. That makes sense, and also explains why ORDER BY can, according to the SQL standard, appear only once in a query, and only at the end. You could think of ORDER BY not as a sort operator but as a row-returning operator that magically returns the "next" row per a specification. The SQL-92 standard distinqushes between the SELECT statement with all its many clauses, and ORDER BY: """ Format <direct select statement: multiple rows> ::= <query expression> [ <order by clause> ] """ SQLite would do well to adopt this notion in its query processor. It would simplify the system, and would have avoided this error. Atomicity. The rule in SQL is that all functions act *as if* they were invoked only once. We had this confusion some time back with "date(now)". There's only one "now" in a query, and there's only one random() in a row-column. The column-specification of SELECT is applied *once* to each row. The column-specification of ORDER BY (and GROUP BY) is *not* applied; it merely *designates* the column by using the same specification text. Consider this example: sqlite> select 2+2 order by '2+2'; 2+2 ---------- 4 '2+2' here is only the column name. It's not an instruction to create data. Consistency. Any two semantically equivalent queries that produce different outputs from the same input constitute prima facia evidence of a bug. However you restate a query, if it is logically equivalent, it should produce the same results. The fact that you can rejigger queries with RANDOM() and get different answers tells you immediately that something is wrong. It's sometimes useful to think about how a DBMS works, and imagine traversing trees, sorting, etc. It helps in understanding how to express a query, and in how it will be processed. But you have to throw all that away when thinking about query semantics. The user has *one* chance only to state the query; he has no control over how its executed. As he is constrained, so is the system: it cannot let implementation details affect query semantics. That's the way it is and should be, because it best serves the interests of both parties. --jkl