On Thu, Aug 27, 2015 at 10:06 AM, Jean-Christophe Deschamps < jcd at antichoc.net> wrote:
> At 16:00 27/08/2015, you wrote: > >> <quote> >> An *ORDER BY* clause in SQL <https://en.wikipedia.org/wiki/SQL> specifies >> that a SQL SELECT <https://en.wikipedia.org/wiki/Select_(SQL)> statement >> returns a result set <https://en.wikipedia.org/wiki/Result_set> with the >> rows being sorted by the values of one or more columns. The sort criteria >> do not have to be included in the result set. The sort criteria can be >> expressions, including ? but not limited to ? column names, user-defined >> functions <https://en.wikipedia.org/wiki/User_defined_function>, >> arithmetic >> operations, or CASE expressions. The expressions are evaluated and the >> results are used for the sorting, i.e. the values stored in the column or >> the results of the function call. >> <quote> >> > > Pretty clear until now. > > So, applying this to our subject, "rr" is _NOT_ the name a column in the >> table. >> > > Yes: it's a temporary schema name. > I can see both sides of this debate, whether or not random() should be evaluated twice in this context: select random() from blah order by random() So let me pose a question. What should the following query generate? select random(), random() from blah order by random() Should all three random values be the same? Certainly I can see a case for that, but it doesn't seem useful to me. What would be even worse is if it behaved like other functions which maintain their value over an entire select statement as DRH indicated recently. Do we really want a single random number used three times per row for thousands of rows? I doubt it. Perhaps you think it should create two random values for the columns but reuse one of them for the order by. But which? Sure, if the columns were aliased with names one might expect it to pick the right one, but it seems that is not a requirement of at least one interpretation of the standard. Ultimately it is a question of "what should happen with a function that has side effects"? SQLite has a flag for user defined functions named SQLITE_DETERMINISTIC. This flag allows the author of the function to inform SQLite that the function will always return a single result for a given set of input. I can only assume that the "native" functions of SQLite use something very similar to this interface (where "using that exact interface" is as similar as possible). Clearly random() cannot be deterministic as it takes no user supplied input (even indirectly from a separate seed function) yet is expected to (probably) return different numbers from each invocation that cannot be predicted in advance. It seems that if someone needs a list of random values that can subsequently be sorted (or selected or whatever) they need to resort to temp tables or subqueries. One step to generate the random values, another to process the now fixed set of random numbers. Alternatively, we need a third class of functions. NON_DETERMINISTIC, DETERMINISTIC, and PSEUDO_DETERMINISTIC. I don't care for that solution. Realizing this is not a democracy, my vote if cast would be to keep the existing functionality. It is not difficult to work around the non-deterministic nature of the random function via temp tables or subqueries. I dare say it is an extremely tiny subset of queries that will ever have to worry about this. Update the documentation in some way to make it clear random behaves this way (and perhaps that column aliases in an order by clause re-evaluate their definitions from the select). But my vote doesn't mean much, so whatever. Maybe the best answer to this issue is "to change the functionality at this point may break existing queries that depend on the current behavior" and thus it should not be changed. -- Scott Robison