On Thu, Aug 27, 2015 at 10:06 AM, Jean-Christophe Deschamps <
jcd at antichoc.net> wrote:

> At 16:00 27/08/2015, you wrote:
>
>> <quote>
>> An *ORDER BY* clause in SQL <https://en.wikipedia.org/wiki/SQL> specifies
>> that a SQL SELECT <https://en.wikipedia.org/wiki/Select_(SQL)> statement
>> returns a result set <https://en.wikipedia.org/wiki/Result_set> with the
>> rows being sorted by the values of one or more columns. The sort criteria
>> do not have to be included in the result set. The sort criteria can be
>> expressions, including ? but not limited to ? column names, user-defined
>> functions <https://en.wikipedia.org/wiki/User_defined_function>,
>> arithmetic
>> operations, or CASE expressions. The expressions are evaluated and the
>> results are used for the sorting, i.e. the values stored in the column or
>> the results of the function call.
>> <quote>
>>
>
> Pretty clear until now.
>
> So, applying this to our subject, "rr" is _NOT_ the name a column in the
>> table.
>>
>
> Yes: it's a temporary schema name.
>

I can see both sides of this debate, whether or not random() should be
evaluated twice in this context:

select random() from blah order by random()

So let me pose a question. What should the following query generate?

select random(), random() from blah order by random()

Should all three random values be the same? Certainly I can see a case for
that, but it doesn't seem useful to me. What would be even worse is if it
behaved like other functions which maintain their value over an entire
select statement as DRH indicated recently. Do we really want a single
random number used three times per row for thousands of rows? I doubt it.

Perhaps you think it should create two random values for the columns but
reuse one of them for the order by. But which? Sure, if the columns were
aliased with names one might expect it to pick the right one, but it seems
that is not a requirement of at least one interpretation of the standard.

Ultimately it is a question of "what should happen with a function that has
side effects"? SQLite has a flag for user defined functions named
SQLITE_DETERMINISTIC. This flag allows the author of the function to inform
SQLite that the function will always return a single result for a given set
of input. I can only assume that the "native" functions of SQLite use
something very similar to this interface (where "using that exact
interface" is as similar as possible).

Clearly random() cannot be deterministic as it takes no user supplied input
(even indirectly from a separate seed function) yet is expected to
(probably) return different numbers from each invocation that cannot be
predicted in advance.

It seems that if someone needs a list of random values that can
subsequently be sorted (or selected or whatever) they need to resort to
temp tables or subqueries. One step to generate the random values, another
to process the now fixed set of random numbers.

Alternatively, we need a third class of functions. NON_DETERMINISTIC,
DETERMINISTIC, and PSEUDO_DETERMINISTIC. I don't care for that solution.

Realizing this is not a democracy, my vote if cast would be to keep the
existing functionality. It is not difficult to work around the
non-deterministic nature of the random function via temp tables or
subqueries. I dare say it is an extremely tiny subset of queries that will
ever have to worry about this. Update the documentation in some way to make
it clear random behaves this way (and perhaps that column aliases in an
order by clause re-evaluate their definitions from the select). But my vote
doesn't mean much, so whatever.

Maybe the best answer to this issue is "to change the functionality at this
point may break existing queries that depend on the current behavior" and
thus it should not be changed.

-- 
Scott Robison

Reply via email to