[sqlite] Reporting documentation issues/unclarity?

2020-01-21 Thread Merijn Verstraaten
Should issues with the documentation (i.e., missing/unclear things) be reported 
to this mailing list too?

Specifically, something that was unclear to me while implementing my own 
aggregate function is what happens if sqlite3_result_error() is called and 
another result functions gets called afterwards. So, suppose we have:

void stepfun(sqlite3_context *ctxt, int nArgs, sqlite3_value **args)
{
... random code here...
sqlite3_result_error(ctxt, "Something went bad!", -1);
...more random code...
sqlite3_result_int(ctxt, 42);
return;
}

Would the overall function still report an error or would the call to 
sqlite3_result_int overwrite the earlier error and have it return successfully 
again?

Cheers,
Merijn


signature.asc
Description: Message signed with OpenPGP
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimising query with aggregate in subselect.

2019-11-20 Thread Merijn Verstraaten


> On 20 Nov 2019, at 20:37, Andy Bennett  wrote:
> 
> Hi,
> 
>> Did you try retrieving the data "directly" or do you need the subselect in 
>> order to maintain compatibility with other SQL dialects that are no longer 
>> able to retrieve data from the row on which the max was found?
> 
> Thanks Keith!
> 
> I understood that selecting other columns during an aggregate lead to 
> ill-specific or undefined values in those columns. Does SQLite make more 
> guarantees than the SQL standard here? Do you have a pointer to the docs as I 
> tried and failed to find it in there.

There's a small sidenote (that I'm too lazy too find right now) in the select 
docs that mentions that, in case of using min or max as aggregate, the 
non-aggregate columns will come from the row that held the min/max value.

Kind regards,
Merijn


signature.asc
Description: Message signed with OpenPGP
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Custom window functions vs builtin

2019-11-12 Thread Merijn Verstraaten


> On 12 Nov 2019, at 14:58, Dan Kennedy  wrote:
> 
> No, they don't need to handle that. The rows will always be added/removed in 
> ORDER BY order for all window functions (and will be removed in the same 
> order in which they were added if the ORDER BY order is ambiguous).

Ah, excellent! This dramatically simplifies my life :)

Could I suggest adding a note to this extent to 
https://www.sqlite.org/windowfunctions.html#udfwinfunc for the next paranoid 
person?

Kind regards,
Merijn
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Custom window functions vs builtin

2019-11-12 Thread Merijn Verstraaten
I already asked this question without an answer, but as it hidden somewhere 
nested deeply in another thread I think it may have simply gone under the 
radar. Apologies for the duplication if you already saw it!

How is the behaviour of (custom) window functions defined? Specifically, in the 
presence of an ORDER BY on the window.

The functionality of row_number/rank/dense_rank seems to require that xStep and 
xInverse are called on rows in the order specified by ORDER BY. And, indeed, 
the implementation of row_number() in the sqlite source seems to rely on being 
called in the same order as ORDER BY, but at the same time the documentation 
states:

"the built-in window functions, however, require special-case handling in the 
query planner and hence new window functions that exhibit the exceptional 
properties found in the built-in window functions cannot be added by the 
application."

So does this indeed mean that these builtin ones are handled specially and 
other windows functions have to accept/deal with having their window arguments 
added/removed in an arbitrary order?

Kind regards,
Merijn
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-08 Thread Merijn Verstraaten
On 7 Nov 2019, at 20:47, Chris Peachment  wrote:
>  1. generate a list of pseudo-random numbers, using a pre-defined
> seed value, over the range 1 .. count(*) of records in table,
> 
>  2. use that list as record id values to select the desired subset
> of the data in the table.
> 
> This would be done in two separate operations, possibly with a
> storage of the generated numbers in a separate table which could
> be used in the query of the main data.

Yeah, this and some of the other ideas were some things I considered as 
fallback ideas if things weren't possible within the query, although it does 
complicate things a bit.

I actually just had another idea: How is the behaviour of window functions 
defined? i.e. if there is an ORDER BY clause are rows added/removed from the 
window in the order of the order by clause, or is the behaviour of 
row_number/rank/dense_rank special cased and only those functions guarantee the 
order?

Because if window functions do follow there own order by, you could easily 
recover a deterministic evaluation order via the window specification.

If not, I guess I'll have to give up and do it the "hard way".

- Merijn
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Merijn Verstraaten

> On 7 Nov 2019, at 19:16, David Raymond  wrote:
> 
> Along those lines SQLite includes the reverse_unordered_selects pragma
> https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects
> which will flip the order it sends rows in queries that don't explicitly 
> specify an ordering. It's there to assist you in finding spots in your code 
> where you might be relying on implicit ordering when you really shouldn't be.

Like the rest of this threads, this is just pointing out why the things in my 
initial email don't work, but I already knew that. Which is why I asked for 
help to see if there is a way to do what I want that *does* work. I don't care 
particularly about the details of "can I control the order the condition is 
evaluated", it's just that all reasonable ways to sample large streams that I 
know would require a deterministic order.

If someone has a different/better idea on how to return just a random sample 
from a query in a repeatable way, I'm all ears.

So far the only suggestion was "use some non-deterministic random sampling 
method and store the result", but since my samples are large and I have lots of 
them, this would balloon my storage by >100x and I don't have the available 
storage to make that work.

- Merijn


signature.asc
Description: Message signed with OpenPGP
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Merijn Verstraaten
I'm trying sample a (deterministically) random subset of a SELECT query, the 
most common solution on the internet to get random samples seems to be "SELECT 
* FROM (...) ORDER BY RANDOM() LIMIT n;" (this already has some question marks, 
since it relies on seeding RANDOM and knowing the RANDOM function is always 
evaluated in the same order every query run), but looking at the query plans 
this materialises the entire result set in memory for the sort (no surprise, I 
can't think of anyway that could work otherwise) which is rather undesirable if 
the sample size becomes large (i.e. several million rows).

Now, I already know different ways to implement a predicate function that can 
deterministically keep elements from a stream, however that relies on having a 
deterministic order for the stream. Which brings us to SQLite. I can easily 
write something like:

SELECT *
FROM (...)
WHERE my_predicate_fun()
ORDER BY column1, column2,...

And this *seems* to evaluate the where clause for each row in the order 
determined by ORDER BY, but this doesn't seem at all guaranteed by the SQL 
spec. So, is this behaviour documented/guaranteed somewhere? If not, is there 
some way to guarantee my where clause is evaluated for each row in a 
deterministic order?

In the simple case like above I could always just evaluate the query without 
the ORDER BY, step through the entire query, and evaluate the predicate in the 
application, but if I want to use this random selection as a subquery, then 
that doesn't work.

And while I'm asking questions: What if I want to do the above, but selecting 
groups of rows? So, sort of like:

SELECT *
FROM (...)
GROUP BY groupColumn
HAVING my_predicate_fun();

But where I want to return all rows in the group, rather than an aggregate.

Thanks in advance,
Merijn
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users