Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread raypoker79
Stop,,I'm not subscribed 


Sent via the Samsung Galaxy S7, an AT 4G LTE smartphone
 Original message From: Warren Young <war...@etr-usa.com> Date: 
9/12/17  6:19 PM  (GMT-05:00) To: SQLite mailing list 
<sqlite-users@mailinglists.sqlite.org> Subject: Re: [sqlite] Common 
subexpression optimization of deterministic
  functions 
On Sep 12, 2017, at 2:36 PM, Jens Alfke <j...@mooseyard.com> wrote:
> 
> On Sep 12, 2017, at 12:58 PM, Warren Young <war...@etr-usa.com> wrote:
>> 
>> Could it be *extended* to mean what you want?  Of course, but that means 
>> you’re asking for a feature, not reporting a bug.
> 
> I never claimed to be reporting a bug!

I read your reference to an older version as implying that you thought this was 
a regression in functionality.  That is, that you thought this change first 
appeared in 3.19.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Warren Young
On Sep 12, 2017, at 2:36 PM, Jens Alfke  wrote:
> 
> On Sep 12, 2017, at 12:58 PM, Warren Young  wrote:
>> 
>> Could it be *extended* to mean what you want?  Of course, but that means 
>> you’re asking for a feature, not reporting a bug.
> 
> I never claimed to be reporting a bug!

I read your reference to an older version as implying that you thought this was 
a regression in functionality.  That is, that you thought this change first 
appeared in 3.19.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Darko Volaric
Yeah that is a tricky bit, especially since the query optimizer might evaluate 
join expressions in an arbitrary order. A possible approach to this is to work 
out how to always get a particular expression evaluated first (that may well 
just be the left-most expression in the WHERE clause) then create a trivial 
function that takes any number of parameters and always returns true and pass 
all the Set functions as parameters in left to right dependency order. This 
should work becuase I believe function parameters are evaluated left to right 
before the function is called.


> On Sep 12, 2017, at 11:27 PM, Jens Alfke  wrote:
> 
> 
>> On Sep 12, 2017, at 1:41 PM, Darko Volaric > > wrote:
>> 
>> You can implement this by using user defined functions to implement row 
>> "local variables" or "registers". They're single assignment storage that 
>> keeps intermediate results, namely the common subexpressions.
> 
> Thanks! That's a very interesting technique.
> 
>> Note that you would need to order these so they are evaluated in dependency 
>> order, i.e. ensure each name is set before it is got.
> 
> What's a good way to do that? Since SQL is a non-imperative language there 
> isn't much notion of order of operations. I can imagine using a 
> short-circuiting AND operator, but can I be guaranteed that this will always 
> work?
> 
> —Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread raypoker79
This is a scam please take me off this email the card has been reported stolen 


Sent via the Samsung Galaxy S7, an AT 4G LTE smartphone
 Original message From: Darko Volaric <li...@darko.org> Date: 
9/12/17  4:41 PM  (GMT-05:00) To: SQLite mailing list 
<sqlite-users@mailinglists.sqlite.org>, Jens Alfke <j...@mooseyard.com> 
Subject: Re: [sqlite] Common subexpression optimization of deterministic
functions 
You can implement this by using user defined functions to implement row "local 
variables" or "registers". They're single assignment storage that keeps 
intermediate results, namely the common subexpressions.

You'd define two functions, something like Get(rowid, name) and Set(rowid, 
name, value). You call Set with the subexpressions as the last parameter. It 
doesn't return any value and just stores the value. The Get function returns a 
previously set value with the given name and is used in the expressions where 
that subexpression would otherwise appear. Note that you would need to order 
these so they are evaluated in dependency order, i.e. ensure each name is set 
before it is got. The rowid parameter is used to detect when the row changes 
and the local variables are all cleared in readiness for the next row.



> On Sep 12, 2017, at 9:22 PM, Jens Alfke <j...@mooseyard.com> wrote:
> 
> 
> 
>> On Sep 12, 2017, at 12:02 PM, Darren Duncan <dar...@darrenduncan.net> wrote:
>> 
>> Practically speaking any optimization to reduce actual calls to the 
>> deterministic function would have to be at compile time to rewrite the query 
>> to explicitly keep the result of the function and use it several times,
> 
> Exactly.
> 
>> which is someone users can also do by writing the query differently.
> 
> Great — any advice on how to do it? I'm totally willing to do this :) but I'm 
> not sure how. As I said, a WITH clause looks promising, but I don't know if 
> that is purely syntactic sugar, like a macro. (And changing my query 
> generator to factor common calls into WITH clauses would be a nontrivial 
> amount of work, so I would like to get some assurance that it might help, 
> before I try it.)
> 
> The CSE optimization has long been standard in traditional compilers, even 
> though the programmer could get the same result by changing their code. (The 
> same is true of many other optimizations.) The benefit is that it lets the 
> developer write simpler, clearer code with less effort. 
> 
> I realize SQLite doesn't have the kind of industrial-strength query 
> optimizers that other SQL databases have, but (from an outside perspective) 
> this seems like a fairly straightforward optimization. SQLite is already 
> doing some similar tricks to recognize matching sub-expressions when it 
> applies an expression-based index to a query, for example.
> 
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Jens Alfke

> On Sep 12, 2017, at 1:41 PM, Darko Volaric  wrote:
> 
> You can implement this by using user defined functions to implement row 
> "local variables" or "registers". They're single assignment storage that 
> keeps intermediate results, namely the common subexpressions.

Thanks! That's a very interesting technique.

> Note that you would need to order these so they are evaluated in dependency 
> order, i.e. ensure each name is set before it is got.

What's a good way to do that? Since SQL is a non-imperative language there 
isn't much notion of order of operations. I can imagine using a 
short-circuiting AND operator, but can I be guaranteed that this will always 
work?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Darko Volaric
You can implement this by using user defined functions to implement row "local 
variables" or "registers". They're single assignment storage that keeps 
intermediate results, namely the common subexpressions.

You'd define two functions, something like Get(rowid, name) and Set(rowid, 
name, value). You call Set with the subexpressions as the last parameter. It 
doesn't return any value and just stores the value. The Get function returns a 
previously set value with the given name and is used in the expressions where 
that subexpression would otherwise appear. Note that you would need to order 
these so they are evaluated in dependency order, i.e. ensure each name is set 
before it is got. The rowid parameter is used to detect when the row changes 
and the local variables are all cleared in readiness for the next row.



> On Sep 12, 2017, at 9:22 PM, Jens Alfke  wrote:
> 
> 
> 
>> On Sep 12, 2017, at 12:02 PM, Darren Duncan  wrote:
>> 
>> Practically speaking any optimization to reduce actual calls to the 
>> deterministic function would have to be at compile time to rewrite the query 
>> to explicitly keep the result of the function and use it several times,
> 
> Exactly.
> 
>> which is someone users can also do by writing the query differently.
> 
> Great — any advice on how to do it? I'm totally willing to do this :) but I'm 
> not sure how. As I said, a WITH clause looks promising, but I don't know if 
> that is purely syntactic sugar, like a macro. (And changing my query 
> generator to factor common calls into WITH clauses would be a nontrivial 
> amount of work, so I would like to get some assurance that it might help, 
> before I try it.)
> 
> The CSE optimization has long been standard in traditional compilers, even 
> though the programmer could get the same result by changing their code. (The 
> same is true of many other optimizations.) The benefit is that it lets the 
> developer write simpler, clearer code with less effort. 
> 
> I realize SQLite doesn't have the kind of industrial-strength query 
> optimizers that other SQL databases have, but (from an outside perspective) 
> this seems like a fairly straightforward optimization. SQLite is already 
> doing some similar tricks to recognize matching sub-expressions when it 
> applies an expression-based index to a query, for example.
> 
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Jens Alfke


> On Sep 12, 2017, at 12:58 PM, Warren Young  wrote:
> 
> Could it be *extended* to mean what you want?  Of course, but that means 
> you’re asking for a feature, not reporting a bug.

I never claimed to be reporting a bug! The subject line refers to this as an 
"optimization", and in the initial message I said "it would be a noticeable 
speedup". 

I pointed out the SQLITE_DETERMINISTIC flag because it already gives the query 
optimizer the information it would need to detect that a function call is pure 
and can safely be factored out.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Jens Alfke


> On Sep 12, 2017, at 12:55 PM, Richard Hipp  wrote:
> 
> But we deliberately omit common subexpression elimination (CSE).  This
> is because our research shows that out of the millions of queries that
> SQLite compiles every second, only a very tiny fraction would actually
> benefit from CSE

OMG, I can see the BuzzFeed headline: "SQLite database engine is secretly 
sending your queries back to their server for analytics!" ;-)

Seriously, I see your point. But JSON support may have created a new class of 
queries that do benefit, because every reference to a JSON property is through 
a function call. So while a traditional query can use the same column name 
multiple times without incurring overhead, the JSON equivalent repeats a 
json_extract call multiple times, which does have overhead.

Traditional:
SELECT * FROM stuff WHERE length(name) = 5 OR name = 'five';
JSON:
SELECT * FROM stuff WHERE length(json_extract(jsn,'$.name')) = 5 OR 
json_extract(jsn,'$.name') = 'five';

If CSE optimization is too expensive to do by default, maybe it could be 
enabled by a flag when preparing the statement?

Unfortunately using a WITH clause doesn't seem to help; as I suspected, it's 
basically used as a macro that gets expanded. Here's one I tried:
WITH docs AS (SELECT rowid as id, json_extract(jsn,'$.name') AS name 
FROM stuff) 
SELECT id FROM docs WHERE length(name) <= 5 OR name = 'five';
The EXPLAIN command shows that this query still makes two identical calls to 
json_extract per row.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Warren Young
On Sep 12, 2017, at 12:41 PM, Jens Alfke  wrote:
> 
>> On Sep 12, 2017, at 11:09 AM, Warren Young  wrote:
>> 
>> From my reading of the docs, I don’t see that that is the purpose of 
>> SQLITE_DETERMINISTIC:
>> 
>>   https://www.sqlite.org/deterministic.html 
>> 
> 
> Actually it is. "A deterministic function always gives the same answer when 
> it has the same inputs." That is the definition of a mathematical (also 
> called "pure") function. Such a function call can of course be factored out 
> as a common subexpression.

All true, but irrelevant is the *purpose* of this constant is not what you want 
it to be.

As I read the docs, the only purpose of this constant is so that SQLite can 
give an error if you use a non-deterministic function anywhere that would cause 
DB corruption.  That is all.

Could it be *extended* to mean what you want?  Of course, but that means you’re 
asking for a feature, not reporting a bug.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Richard Hipp
On 9/12/17, Jens Alfke  wrote:
>
> I realize SQLite doesn't have the kind of industrial-strength query
> optimizers that other SQL databases have, but (from an outside perspective)
> this seems like a fairly straightforward optimization. SQLite is already
> doing some similar tricks to recognize matching sub-expressions when it
> applies an expression-based index to a query, for example.
>

The query planner in SQLite will hold its own against most others.

But we deliberately omit common subexpression elimination (CSE).  This
is because our research shows that out of the millions of queries that
SQLite compiles every second, only a very tiny fraction would actually
benefit from CSE, but checking for CSE is expensive in both memory and
CPU cycles, and all queries would have to pay the extra checking
overhead whether they benefit or not.

A traditional compiler like GCC is free to use as much memory, time,
and code space to implement esoteric optimizations as it wants,
because compilation happens separately from the application and the
build product will be reused many times and so the cost of compilation
is amortized over many executions.  But a query planner in an RDBMS
(which is really just a compiler that translates the SQL programming
language into some low-level representation - byte code in the case of
SQLite) is more constrained because the compilation happens at
application run-time and the number of uses of the build product is
approximately 1.  And so when writing a query planner, one must be
careful in the use of memory, time, and code space devoted to
optimizations.  This is particularly so for SQLite which is an
embedded RDBMS.

Our belief is that CSE would be not worth the extra memory, CPU, and
code space required to implement it since CSE simply does not come up
that often in SQL statements.  If, in the future, we find that people
begin coding more complex SQL statements which will more often benefit
from CSE, then we might revisit this decision.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Jens Alfke


> On Sep 12, 2017, at 12:02 PM, Darren Duncan  wrote:
> 
> Practically speaking any optimization to reduce actual calls to the 
> deterministic function would have to be at compile time to rewrite the query 
> to explicitly keep the result of the function and use it several times,

Exactly.

> which is someone users can also do by writing the query differently.

Great — any advice on how to do it? I'm totally willing to do this :) but I'm 
not sure how. As I said, a WITH clause looks promising, but I don't know if 
that is purely syntactic sugar, like a macro. (And changing my query generator 
to factor common calls into WITH clauses would be a nontrivial amount of work, 
so I would like to get some assurance that it might help, before I try it.)

The CSE optimization has long been standard in traditional compilers, even 
though the programmer could get the same result by changing their code. (The 
same is true of many other optimizations.) The benefit is that it lets the 
developer write simpler, clearer code with less effort. 

I realize SQLite doesn't have the kind of industrial-strength query optimizers 
that other SQL databases have, but (from an outside perspective) this seems 
like a fairly straightforward optimization. SQLite is already doing some 
similar tricks to recognize matching sub-expressions when it applies an 
expression-based index to a query, for example.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Darren Duncan

On 2017-09-12 11:41 AM, Jens Alfke wrote:

On Sep 12, 2017, at 11:09 AM, Warren Young  wrote:

From my reading of the docs, I don’t see that that is the purpose of 
SQLITE_DETERMINISTIC:

   https://www.sqlite.org/deterministic.html 



Actually it is. "A deterministic function always gives the same answer when it has the same 
inputs." That is the definition of a mathematical (also called "pure") function. 
Such a function call can of course be factored out as a common subexpression.


The purpose is simply so that the SQLite internals know whether it is safe to 
use the user-defined function in certain query types.


The reason SQLITE_DETERMINISTIC was added is to allow such functions to be used 
in indexes and then matched with the same function in a query. That allows 
indexing things like JSON properties (via the deterministic json_value 
function.)


Practically speaking any optimization to reduce actual calls to the 
deterministic function would have to be at compile time to rewrite the query to 
explicitly keep the result of the function and use it several times, which is 
someone users can also do by writing the query differently.


The fact is, any runtime-level smarts to prevent multiple calls to the function 
would have to involve creating and maintaining an index of inputs to outputs so 
that the DBMS knows whether the function was already called with particular 
inputs or not, so that would be an added complexity.


-- Darren Duncan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Jens Alfke


> On Sep 12, 2017, at 11:09 AM, Warren Young  wrote:
> 
> From my reading of the docs, I don’t see that that is the purpose of 
> SQLITE_DETERMINISTIC:
> 
>https://www.sqlite.org/deterministic.html 
> 

Actually it is. "A deterministic function always gives the same answer when it 
has the same inputs." That is the definition of a mathematical (also called 
"pure") function. Such a function call can of course be factored out as a 
common subexpression.

> The purpose is simply so that the SQLite internals know whether it is safe to 
> use the user-defined function in certain query types.

The reason SQLITE_DETERMINISTIC was added is to allow such functions to be used 
in indexes and then matched with the same function in a query. That allows 
indexing things like JSON properties (via the deterministic json_value 
function.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Warren Young
On Sep 12, 2017, at 11:25 AM, Jens Alfke  wrote:
> 
> SQLite 3.19 doesn’t seem to coalesce identical calls to a deterministic 
> function. For example, in this query, where `fl_value` is a function I’ve 
> registered as SQLITE_DETERMINISTIC:

From my reading of the docs, I don’t see that that is the purpose of 
SQLITE_DETERMINISTIC:

https://www.sqlite.org/deterministic.html

The purpose is simply so that the SQLite internals know whether it is safe to 
use the user-defined function in certain query types.

Still, it *would* be a neat optimization to be able to mark a function as a 
“function” in the mathematical sense and have SQLite treat it as referentially 
transparent in contexts like this.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Jens Alfke


> On Sep 12, 2017, at 10:52 AM, Richard Hipp  wrote:
> 
>   ... WHERE fl_value(body,'contact.address.state') IN ('CA','WA');

This was just a simple example query. In general it's possible to have 
arbitrary queries that use arbitrary properties. For example it might be 
comparing a property to a specific value as well as checking the string's 
length.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Richard Hipp
On 9/12/17, Jens Alfke  wrote:
> SQLite 3.19 doesn’t seem to coalesce identical calls to a deterministic
> function. For example, in this query, where `fl_value` is a function I’ve
> registered as SQLITE_DETERMINISTIC:
>
> SELECT key FROM kv_default
> WHERE fl_value(body, 'contact.address.state') = 'CA'
>OR fl_value(body, 'contact.address.state') = 'WA'
>
> fl_value gets called twice per row in the table, with the same inputs both
> times of course. As fl_value is not a cheap function — it’s similar to
> json_value — it would be a noticeable speedup if it were evaluated only once
> per row.

   ... WHERE fl_value(body,'contact.address.state') IN ('CA','WA');

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Common subexpression optimization of deterministic functions

2017-09-12 Thread Jens Alfke
SQLite 3.19 doesn’t seem to coalesce identical calls to a deterministic 
function. For example, in this query, where `fl_value` is a function I’ve 
registered as SQLITE_DETERMINISTIC:

SELECT key FROM kv_default 
WHERE fl_value(body, 'contact.address.state') = 'CA'
   OR fl_value(body, 'contact.address.state') = 'WA'

fl_value gets called twice per row in the table, with the same inputs both 
times of course. As fl_value is not a cheap function — it’s similar to 
json_value — it would be a noticeable speedup if it were evaluated only once 
per row.

Is there a way I can restructure these (automatically generated) queries to do 
the refactoring explicitly? Sort of like assigning to a temporary variable in 
an imperative language? It looks like a WITH clause lets me do this 
syntactically, but I'm not sure if it'll make a difference at runtime.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users