[sqlite] order by not working in combination with random()

James K. Lowden Wed, 26 Aug 2015 21:03:21 -0400

On Wed, 26 Aug 2015 13:39:09 +0100
Simon Slavin <slavins at bigfraud.org> wrote:

> 
> On 26 Aug 2015, at 11:27am, tonyp at acm.org wrote:
> 
> > Plus, it apparently recognizes if the random() expression in the
> > ORDER BY is the same as the SELECT one and again sort correctly
> > (without re-evaluating) and without needing an alias.
> 
> Ah, but I would call /that/ a bug !

I think I can convince you Postgres is right and SQLite is not.  :-)
I'll give you 4 reasons:

1.  Theory
2.  Authority
3.  Atomicity
4.  Consistency

Theory.  Like the relation of relational algebra, a SQL table has no
order. (Not only tables, but any table-like thing: view, CTE,
subquery.)  If you take a table and permute its order, it's still the
same table. ORDER BY permutes the order in a particular way.  It does
not change the table.  Relationally, the input and output of ORDER BY
must be identical.  

Authority.  As a matter of fact, CJ Date says the output of ORDER BY is
*not* a table, but a cursor.  That makes sense, and also explains why
ORDER BY can, according to the SQL standard, appear only once in a
query, and only at the end.  You could think of ORDER BY not as a sort
operator but as a row-returning operator that magically returns the
"next" row per a specification.  

The SQL-92 standard distinqushes between the SELECT statement with
all its many clauses, and ORDER BY:

        """
         Format

         <direct select statement: multiple rows> ::=
              <query expression> [ <order by clause> ]
        """

SQLite would do well to adopt this notion in its query processor.  It
would simplify the system, and would have avoided this error.  

Atomicity.  The rule in SQL is that all functions act *as if* they were
invoked only once.  We had this confusion some time back with 
"date(now)".  There's only one "now" in a query, and there's only one
random() in a row-column.  

The column-specification of SELECT is applied *once* to each row.  The
column-specification of ORDER BY (and GROUP BY) is *not* applied; it
merely *designates* the column by using the same specification text.
Consider this example:

        sqlite> select 2+2 order by '2+2';
        2+2       
        ----------
        4         

'2+2' here is only the column name.  It's not an instruction to create
data.  

Consistency.  Any two semantically equivalent queries that produce
different outputs from the same input constitute prima facia evidence
of a bug.  However you restate a query, if it is logically equivalent,
it should produce the same results.  The fact that you can rejigger
queries with RANDOM() and get different answers tells you immediately
that something is wrong.  

It's sometimes useful to think about how a DBMS works, and imagine
traversing trees, sorting, etc.  It helps in understanding how to
express a query, and in how it will be processed.  But you have to
throw all that away when thinking about query semantics.  The user has
*one* chance only to state the query; he has no control over how its
executed.  As he is constrained, so is the system: it cannot let
implementation details affect query semantics.  That's the way it is
and should be, because it best serves the interests of both parties.  

--jkl

[sqlite] order by not working in combination with random()

Reply via email to