Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-15 Thread Keith Medcalf
On Thursday, 14 September, 2017 19:05, Jens Alfke , wrote:

>> On Sep 14, 2017, at 1:23 PM, Keith Medcalf 
>wrote:

>> You merely need to ONCE it either for each input row or for each
>result row.  So for example:

>> select slow(a.x), slow(a.x)*slow(b.y), slow(b.y) from a, b where
>a.this == b.that

>> when computing the result set you merely compute ONCE slow(a.x) and
>ONCE slow(b,y)

>Interesting … I assume ONCE is something internal to the SQLite query
>engine? I don't see any reference to it in the SQL syntax or the
>other docs.

>Is there any way to achieve this effect without modifying SQLite?

Yes, it is an internal thing in the VDBE code.  It is currently used so that 
certain things can be done only ONCE per execution of the prepared statement 
(that is, to generate certain constants that once created do not change 
throughout the entire execution (from the first step until reset), but will be 
regenerated if the statement is reset then run again)).  It basically an if 
type construct for example:

once r5 {
  r5 = (do something)
}

So the first time it is run, r5 is unintialized something is done to compute a 
value stored in r5.  On the next step the value of r5 already exists so the 
computation steps are skipped.

Of course, in this case the calculation is "per step" and you are just ignoring 
duplicate calculations, so it is similar but not really the same thing.

This type of optimization would at least prevent multiple executions of the 
function with the same arguments *in the same step* but would not prevent 
execution with the same arguments in different steps.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-15 Thread Clemens Ladisch
Nico Williams wrote:
> I would much prefer to be able to specify which CTEs must be materialized,
> and which may be left as internal views.  That would give the user a great
> deal of control.  WITH x AS () MATERIALIZED ... .

"Materialized" is the wrong word; you want to prevent only subquery
flattening, but not implementing the subquery as a coroutine.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-15 Thread nomad
On Fri Sep 15, 2017 at 09:55:40AM +0200, Dominique Devienne wrote:
> On Thu, Sep 14, 2017 at 11:43 PM, Nico Williams 
> wrote:
> 
> > [...] I would much prefer to be able to specify which CTEs must be
> > materialized,
> > and which may be left as internal views.  That would give the user a great
> > deal of control.  WITH x AS () MATERIALIZED ... .  Can we get that?
> >
> 
> +1 --DD

I would also like to see such a feature. I don't know if the syntax
above is standard SQL. If not, I would suggest an alternative position
for the extra keyword:

WITH MATERIALIZED
x
AS
( SELECT ... )
SELECT
a
FROM
x

That fits in better with the existing RECURSIVE keyword position, looks
better to my eyes, and would make my life as the author of an SQLite
wrapper easier :-)

-- 
Mark Lawrence
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-15 Thread Dominique Devienne
On Thu, Sep 14, 2017 at 11:43 PM, Nico Williams 
wrote:

> On Thu, Sep 14, 2017 at 1:10 PM Simon Slavin  wrote:
> > Can you not do it with WITH ?  I don’t really understand how WITH works
> > but it would seem to evaluate its terms just once for each iteration.
>


> [...] I would much prefer to be able to specify which CTEs must be
> materialized,
> and which may be left as internal views.  That would give the user a great
> deal of control.  WITH x AS () MATERIALIZED ... .  Can we get that?
>

+1 --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Jens Alfke


> On Sep 14, 2017, at 1:23 PM, Keith Medcalf  wrote:
> 
> You merely need to ONCE it either for each input row or for each result row.  
> So for example:
> 
> select slow(a.x), slow(a.x)*slow(b.y), slow(b.y) from a, b where a.this == 
> b.that
> 
> when computing the result set you merely compute ONCE slow(a.x) and ONCE 
> slow(b,y)

Interesting … I assume ONCE is something internal to the SQLite query engine? I 
don't see any reference to it in the SQL syntax or the other docs.
Is there any way to achieve this effect without modifying SQLite?

—Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Simon Slavin
On 14 Sep 2017, at 10:43pm, Nico Williams  wrote:

> On Thu, Sep 14, 2017 at 1:10 PM Simon Slavin  > wrote:
> 
>> Can you not do it with WITH ?  I don’t really understand how WITH works
>> but it would seem to evaluate its terms just once for each iteration.
> 
> In PostgreSQL CTEs fiction as an optimizer barrier: the engine always
> evaluates each CTE in order and stores their results in temporary tables,
> then it runs the main statement with those tables as sources.
> 
> But in SQLite3 CTEs are not materialized first -- they are like VIEWs, and
> so they do not function as optimizer barriers, therefore they cannot be
> used for CSE.

That, and the other answers to my question, clarifies how SQLite works very 
well.  I come from a background where SQL is used only for storage and fast 
searching.  Anything that involves computing is done in software.  So I’m not 
familiar with the functions of SQLite which do heavy processing, like 
reformatting numbers for printing and CTEs.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Nico Williams
On Thu, Sep 14, 2017 at 1:10 PM Simon Slavin  wrote:

>
>
> On 14 Sep 2017, at 5:55pm, R Smith  wrote:
>
> > Richard, wouldn't it be possible to supply a wrapping function (perhaps
> a hint function, like the likelihood() function), that takes another
> function as a parameter and then ensuring that THAT gets calculated only
> once?
>
> Can you not do it with WITH ?  I don’t really understand how WITH works
> but it would seem to evaluate its terms just once for each iteration.


In PostgreSQL CTEs fiction as an optimizer barrier: the engine always
evaluates each CTE in order and stores their results in temporary tables,
then it runs the main statement with those tables as sources.

But in SQLite3 CTEs are not materialized first -- they are like VIEWs, and
so they do not function as optimizer barriers, therefore they cannot be
used for CSE.

I would much prefer to be able to specify which CTEs must be materialized,
and which may be left as internal views.  That would give the user a great
deal of control.  WITH x AS () MATERIALIZED ... .  Can we get that?

>
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Keith Medcalf

No, you do not have to keep track of all the possibly seen results to only 
compute them once in order to achieve significant benefits.  

You merely need to ONCE it either for each input row or for each result row.  
So for example:

select slow(a.x), slow(a.x)*slow(b.y), slow(b.y) from a, b where a.this == 
b.that

when computing the result set you merely compute ONCE slow(a.x) and ONCE 
slow(b,y) and any subsequent call to the same phrase simply uses the result 
that was stored in the result register.  You could, of course, build a "table" 
of results of the function, but I do not think there is any implementation that 
does that for you.  If you want to do that, then you would do it yourself and 
join that table back to the original query.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin
>Sent: Thursday, 14 September, 2017 08:57
>To: SQLite mailing list
>Subject: Re: [sqlite] [EXTERNAL] Common subexpression optimization of
>deterministic functions
>
>
>
>On 14 Sep 2017, at 3:08pm, Darko Volaric <li...@darko.org> wrote:
>
>> I think people are missing the point, probably becuase it's not a
>great example. Consider the following statement:
>>
>> SELECT funca(slow(10)), funkb(slow(10))
>>
>> and lets say slow(10) takes an hour to compute, and funka and funkb
>take almost no time to execute. With common subexpression
>optimization the statement would take one hour, instead of two, to
>compute becuase the value of slow(10) would only be calculated once.
>
>Thing of how hard that is to implement, though.  You have to keep a
>cache of each deterministic function and the parameters it was given
>and the result.  You have to search through the cache every time you
>evaluate a deterministic function.  You have to allocate and free the
>memory this cache takes up.  You have to figure out a strategy for
>how long you keep results in the cache.  And the best strategy
>depends on something the SQLite developers can’t tell about your
>programming, so whatever they choose some people will complain.
>
>Simon.
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Simon Slavin


On 14 Sep 2017, at 8:02pm, Clemens Ladisch  wrote:

> Jens Alfke wrote:
>> can someone please tell me how to hoist/factor out the subexpression 
>> manually then?
> 
> Move the subexpression into a subquery, then prevent subquery flattening
> (http://www.sqlite.org/optoverview.html#flattening) by violating one of
> the listed constraints.  (These rules might change in the future ...)
> 
> In this case, let's use rule 7: the subquery does not have a FROM clause:
> 
>   SELECT x, x FROM (SELECT slow(10) AS x);

Nice.  Assuming it works, which it looks like it should.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Clemens Ladisch
Jens Alfke wrote:
> can someone please tell me how to hoist/factor out the subexpression manually 
> then?

Move the subexpression into a subquery, then prevent subquery flattening
(http://www.sqlite.org/optoverview.html#flattening) by violating one of
the listed constraints.  (These rules might change in the future ...)

In this case, let's use rule 7: the subquery does not have a FROM clause:

   SELECT x, x FROM (SELECT slow(10) AS x);

Or use rule 13, and put "LIMIT 9223372036854775807" on both queries.

> I've tried using a "WITH" clause, but it doesn't help

A recursive CTE is never flattened:

  WITH RECURSIVE cte(x) AS (
SELECT slow(10)

UNION ALL

-- dummy recursion that must be constructed so
-- that it never actually happens:
SELECT x FROM cte WHERE x IS NULL
  )
  SELECT x, x FROM cte;

(However, if "FROM cte" is used more than once in the query, it will be
executed more than once.)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Jens Alfke


> On Sep 14, 2017, at 10:31 AM, R Smith  wrote:
> 
> I think that's actually one of the things one of the posters tried and found 
> not to work. The problem is the WITH syntax in theory is just-like 
> constructing a temporary table and using the values from there, but in 
> reality (AFAIK) it's just a mathematical construct that re-routes the query, 
> so anything that is a parameter to a function (or a function itself) in the 
> WITH clause gets re-evaluated every time the main query refers to the WITH 
> table.

Yes, I tried using WITH and it behaves as you describe — like a type of 
high-level macro that gets expanded every place it's used. So it's useless for 
CSE.

> Perhaps if you can force the WITH into building an in-memory table with the 
> results from, say a recursive function, it would indeed be a possibility

That would almost certainly have more overhead (in my case) than the redundant 
function calls. 

What I'd like would be something like a "let" statement in a functional 
language, which would bind a name to a value that's computed once no matter how 
many times it's referenced in the SELECT. For example:
LET foo = computeFoo(table.col) IN
SELECT derivedvalue(foo)
FROM table WHERE sometest(foo) AND anothertest(foo)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread R Smith


On 2017/09/14 7:10 PM, Simon Slavin wrote:


On 14 Sep 2017, at 5:55pm, R Smith  wrote:


Richard, wouldn't it be possible to supply a wrapping function (perhaps a hint 
function, like the likelihood() function), that takes another function as a 
parameter and then ensuring that THAT gets calculated only once?

Can you not do it with WITH ?  I don’t really understand how WITH works but it 
would seem to evaluate its terms just once for each iteration.


I think that's actually one of the things one of the posters tried and 
found not to work. The problem is the WITH syntax in theory is just-like 
constructing a temporary table and using the values from there, but in 
reality (AFAIK) it's just a mathematical construct that re-routes the 
query, so anything that is a parameter to a function (or a function 
itself) in the WITH clause gets re-evaluated every time the main query 
refers to the WITH table.


Perhaps if you can force the WITH into building an in-memory table with 
the results from, say a recursive function, it would indeed be a 
possibility, but I'm pretty sure that isn't a possibility. This is 
something easily done by the bigger engines since they rarely need to 
concern themselves with memory limitations, but then, for that same 
reason, they can (and do) just implement CSE without much fuss.


I was trying to posit a light-weight approach that puts a bit of onus on 
the programmer, but still answers the need - but no solution to this 
will be "quick and easy".



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Darko Volaric
If you're looking for a simple/easy/clean way of doing it, there isn't one. You 
have to modify the library to do it properly. But I still find it an 
interesting design challenge.

Maybe instead of going the eager route you can go lazy and just cache 
subexpressions which might be called again. This is messy though, a cacheable 
expression would have to be a user-defined function call that takes parameters 
that are easy to compute (since the params are always evaluated), plus an id 
param to identify the sub expression. The first time the function is called you 
calculate and store the value under the id and next time it's called you return 
the stored value that was previously computed. This has the advantage of being 
automatic and you could hand tune it, storing only the calls you know you will 
reuse. Having to use user defined function calls is pretty ugly although you 
could automatically transform the statement and generate the C source for the 
calls.

Another approach I've been thinking about implementing, which is a bit 
speculative to say the least, is to compile the SQLite byte codes to something 
like WebAssembly byte codes, then running it through the BinaryEn which 
optimizes it, then into native code. That would implement not only CSE but a 
host of other optimizations and might be useful where the statement and query 
plan will not change much, or it's worth spending compute time optimizing it 
thus. Typically SQLite will be IO bound, but for some applications (like mine) 
it might make sense.



> On Sep 14, 2017, at 6:37 PM, Jens Alfke  wrote:
> 
> 
> 
>> On Sep 14, 2017, at 8:38 AM, Warren Young  wrote:
>> 
>> All the examples I’ve seen attempting to support the value of this feature 
>> are simple enough that even a naive text compression algorithm could find 
>> the similarities and “hoist” the copies so the value is computed only once.  
>> That means the *human* can also see the CSE and hoist it manually.  
> 
> Fine; **can someone please tell me how to hoist/factor out the subexpression 
> manually then**? My SQL queries are generated procedurally and I can easily 
> change my code to do this refactoring, if I know the trick.
> 
> I've tried using a "WITH" clause, but it doesn't help; it results in the same 
> number of calls to the native function. (See previous post in this thread for 
> an actual example.)
> 
> I need something that doesn't modify the database, so generating a new table 
> with the function results is right out. Even a temporary table wouldn't help 
> because it would probably be more overhead than it's worth (the functions I 
> want to factor out aren't _that_ expensive.)
> 
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Simon Slavin


On 14 Sep 2017, at 5:55pm, R Smith  wrote:

> Richard, wouldn't it be possible to supply a wrapping function (perhaps a 
> hint function, like the likelihood() function), that takes another function 
> as a parameter and then ensuring that THAT gets calculated only once?

Can you not do it with WITH ?  I don’t really understand how WITH works but it 
would seem to evaluate its terms just once for each iteration.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread R Smith
Richard, wouldn't it be possible to supply a wrapping function (perhaps 
a hint function, like the likelihood() function), that takes another 
function as a parameter and then ensuring that THAT gets calculated only 
once?


SELECT calc_once(slow(10))
  FROM xxx

Note that if the same function is used twice or more in a single query, 
they don't need to know about each other, but perhaps a further win can 
be achieved with adding an ID to the call, so that:


SELECT x * calc_once(slow(10), 1), y * calc_once(slow(10), 1), z * 
calc_once(slow(20), 2)

  FROM xxx

will only see two slow calculations performed, and it means no expensive 
cache-management code by SQLite itself, it's completely up to the query 
programmer managing those IDs.


I thought of doing this a while ago via a UDF, but then realized there 
is no way for me to prevent SQLite from re-calculating a parameter to my 
own UDF.


So essentially I tried to do the above and it would be a rather easy 
exercise for me in code, but I can't prevent SQLite from recalculating 
the "slow(10)" in calc_once(slow(10), 1) before evaluating the 
encapsulating function whose job it is to report back the cached 
version. But perhaps there is a way for you to do it.


I also thought of a way to send my function a string, so that it would be:

SELECT x * calc_once( ' slow(10) ' ), y * calc_once( ' slow(10) ' )   
where my function would then initiate it's own little SQL to do "SELECT 
slow(10);" and use the return value from that as a cache whenever it 
gets asked for the same string, but that prevents the evaluation from 
containing something from the same query and I wasn't sure how well it 
would play with ATOMICness of transactions and the like.


I feel there might be a middle-ground in there somewhere where 
"something" can be done that doesn't require all the heft.


Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Jens Alfke


> On Sep 14, 2017, at 8:38 AM, Warren Young  wrote:
> 
> All the examples I’ve seen attempting to support the value of this feature 
> are simple enough that even a naive text compression algorithm could find the 
> similarities and “hoist” the copies so the value is computed only once.  That 
> means the *human* can also see the CSE and hoist it manually.  

Fine; **can someone please tell me how to hoist/factor out the subexpression 
manually then**? My SQL queries are generated procedurally and I can easily 
change my code to do this refactoring, if I know the trick.

I've tried using a "WITH" clause, but it doesn't help; it results in the same 
number of calls to the native function. (See previous post in this thread for 
an actual example.)

I need something that doesn't modify the database, so generating a new table 
with the function results is right out. Even a temporary table wouldn't help 
because it would probably be more overhead than it's worth (the functions I 
want to factor out aren't _that_ expensive.)

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Dominique Devienne
On Thu, Sep 14, 2017 at 5:38 PM, Warren Young  wrote:

> On Sep 14, 2017, at 8:49 AM, Dominique Devienne 
> wrote:
> >
> > On Thu, Sep 14, 2017 at 4:13 PM, Richard Hipp  wrote:
> >
> >> the amount of extra time spent inside of sqlite3_prepare() in order to
> >> deal with them is not worth the effort.
> >
> > But why not let the client code decide though? And have it off by
> default as now?
>
> Opportunity costs.  Time spent writing, testing, maintaining, and
> documenting this feature is time *not*

spent on features that will either make HWACI’s clients happy or expand
> that set of clients.
>
> If you take money out of it, it’s still time *not* spent writing code that
> makes drh and his collaborators happy.


I've never said otherwise. But my point is that it's not the rational Dr
Hipp used to not supporting this optimization.


>
> > but basing the decision to completely eschew this optimisation on the
> basis that a
> > non-exhaustive sampling of SQL statements shows it's rarely needed seems
> a bit
> > "weird" for lack of a better term on my part.
>
> Are you seriously saying that drh cannot possibly make this decision,
> lacking sufficient information?
>

Where did you read that? DRH is Mr SQLite, of course he has all discretion
to take it wherever he wants.
But Richard cannot possibly see all queries thrown at SQLite, that's just
not possible.


> Obviously he doesn’t see 100% of queries made through SQLite, but he
> probably sees more of them than any other person on the planet.  No person
> is in a better position to make a decision like this.
>

Again, I never said otherwise.


> But hey, if you feel you have a better feel for the truth of the matter,
> working code persuades best.


Even if I had it, DRH wouldn't want it. SQLite is not open to outside
contributions,
and for good reasons Richard re-iterated recently on this list.


>
> > Also, when you write that the query time gains are often offset by
> > additional prepare time
> > overhead (and memory as well, as you mentioned) kinda assumes a bad
> > practice of
> > not using (and caching) prepared statements, i.e. the prepare time
> overhead
> > (1x) could
> > well be worth it for an often-used (prepared/cached) query (Nx, with N
> > large).
>
> Up-thread, drh suggested that in the vast majority of cases he’s examined,
> each prepared statement is used once, then thrown away.

I don’t doubt that he’s seen a representative sample of the way
> applications are commonly written.
>

That's just shoddy DB programming IMHO. Basing any decision on the argument
that *prepared* statements
are used only once is just plain wrong. Or at the very least catering to
the lowest denominator of SQLite users.
By that token, taking this argument to its extreme, we wouldn't need
prepared statements at all...


>
> Doubtless some of those prepared statements could be cached longer, but
> that misses the point: those applications will not benefit from this
> optimization unless their SQLite driver code is rewritten.
>
> Isn’t the whole point of this argument to get more speed for free, without
> rewriting our applications?
>

SQLite driver code? Then you're thinking "scripting code". I don't need any
driver rewritten to use prepared statement, thank you very much.


> This implicit analogy with CSE optimizations in statically-compiled
> programming languages is inapt for that very reason: highly-optimizing
> compilers go to an awful lot of effort to ensure that you don’t have to
> write your code in the most optimal fashion to get the benefit of its
> optimizations.  If you make SQLite’s query parser equivalently smart, you
> may well eat up all the savings in the optimizer.
>
> All the examples I’ve seen attempting to support the value of this feature
> are simple enough that even a naive text compression algorithm could find
> the similarities and “hoist” the copies so the value is computed only
> once.  That means the *human* can also see the CSE and hoist it manually.
>
> The real trick SQLite would need to pull off to make this valuable is
> where mathematical analysis is required to recognize common
> subexpressions.  That’s where your favorite statically-compiled programming
> language gets most of its power, not in handling the trivial cases the
> human sees at first glance.  And that’s why your compiler takes so long to
> run even on a multigigahertz multicore box.
>
> How many milliseconds do the bulk of your SQLite statements run in?  Not
> the really heavy ones, I mean the 2-3 sigma span surrounding the mean.
> Let’s say it’s 1-10 ms.  Let’s also say we can only afford 1% overhead
> before the extra optimizer time spent on fast queries overwhelms the time
> saved on the heavy queries.  That means you’ve only got a few microseconds
> to do all this optimization, else you’ll blow the savings running the
> optimizer.
>
> I’d guess my C++ compiler on -O3 takes something like 100x as long to
> 

Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Richard Hipp
On 9/14/17, Darko Volaric  wrote:
> is
> there a way of ensuring that a particular expression (or just function call)
> will be guaranteed to be evaluated before any other in a particular
> statement?

No, not in general.

There are special cases were the order of evaluation is defined.  For
example, arguments to coalesce() and subexpressions in a CASE
statement are evaluated from left to right.  But in general, SQLite is
free to evaluate expressions in any order that it wants.  We reserve
the right to change the order of evaluation in future releases of
SQLite, as part of our on-going optimization efforts.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Warren Young
On Sep 14, 2017, at 9:36 AM, Darko Volaric  wrote:
> 
> I don't support automatic CSE, I think it should be done manually.

So…you want a SQL profiler, then?  Like EXPLAIN?

I suppose there might be a market for a tool that takes a query string and 
spends a ridiculous amount of time analyzing it and rewrites it to run 
optimally on SQLite.  That sounds like something you’d expect to pay top dollar 
for, though, since you’re basically replacing the brain of a SQLite expert.

> is there a way of ensuring that a particular expression (or just function 
> call) will be guaranteed to be evaluated before any other in a particular 
> statement?

Pseudocode:

INSERT INTO CacheTable.foo VALUES (SELECT FROM…) ;
SELECT FROM ... WHERE CacheTable.foo = ...

Other than that, I doubt it.  SQL is a declarative language, which means it 
purposely does not define sequence points, in C parlance.  Without that 
property, SQL’s query planner would not have much of the freedom it does, which 
would be a very bad thing from the point of this thread.

Much of the reason a language like C takes so long to compile is that all those 
sequence point rules in the ISO C standard tie the compiler’s hands, requiring 
very clever analysis to figure out what optimizations are even legal.

Optimization efforts have gotten to the point that there are now C compilers 
that generate code that is correct according to the standard but which behaves 
in ways the average C programmer does not fully understand through extremely 
abstruse language lawyering.

For example:

https://stackoverflow.com/questions/18195715/
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

In the second link, scroll down to how the ISO C rule explained in the first 
link impacts the LLVM optimizer.  Were you aware of all these ramifications?

SQLite, having even more freedom in some ways than ISO C, probably does plenty 
that would surprise you if you knew about it.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Warren Young
On Sep 14, 2017, at 8:49 AM, Dominique Devienne  wrote:
> 
> On Thu, Sep 14, 2017 at 4:13 PM, Richard Hipp  wrote:
> 
>> the amount of extra time spent inside of sqlite3_prepare() in order to
>> deal with them is not worth the effort.
> 
> But why not let the client code decide though? And have it off by default
> as now?

Opportunity costs.  Time spent writing, testing, maintaining, and documenting 
this feature is time *not* spent on features that will either make HWACI’s 
clients happy or expand that set of clients.

If you take money out of it, it’s still time *not* spent writing code that 
makes drh and his collaborators happy.

> but basing the decision to completely eschew this optimisation on the basis
> that a
> non-exhaustive sampling of SQL statements shows it's rarely needed seems a
> bit
> "weird" for lack of a better term on my part.

Are you seriously saying that drh cannot possibly make this decision, lacking 
sufficient information?

Obviously he doesn’t see 100% of queries made through SQLite, but he probably 
sees more of them than any other person on the planet.  No person is in a 
better position to make a decision like this.

But hey, if you feel you have a better feel for the truth of the matter, 
working code persuades best.

> Also, when you write that the query time gains are often offset by
> additional prepare time
> overhead (and memory as well, as you mentioned) kinda assumes a bad
> practice of
> not using (and caching) prepared statements, i.e. the prepare time overhead
> (1x) could
> well be worth it for an often-used (prepared/cached) query (Nx, with N
> large).

Up-thread, drh suggested that in the vast majority of cases he’s examined, each 
prepared statement is used once, then thrown away.  I don’t doubt that he’s 
seen a representative sample of the way applications are commonly written.

Doubtless some of those prepared statements could be cached longer, but that 
misses the point: those applications will not benefit from this optimization 
unless their SQLite driver code is rewritten.  

Isn’t the whole point of this argument to get more speed for free, without 
rewriting our applications?

This implicit analogy with CSE optimizations in statically-compiled programming 
languages is inapt for that very reason: highly-optimizing compilers go to an 
awful lot of effort to ensure that you don’t have to write your code in the 
most optimal fashion to get the benefit of its optimizations.  If you make 
SQLite’s query parser equivalently smart, you may well eat up all the savings 
in the optimizer.

All the examples I’ve seen attempting to support the value of this feature are 
simple enough that even a naive text compression algorithm could find the 
similarities and “hoist” the copies so the value is computed only once.  That 
means the *human* can also see the CSE and hoist it manually.  

The real trick SQLite would need to pull off to make this valuable is where 
mathematical analysis is required to recognize common subexpressions.  That’s 
where your favorite statically-compiled programming language gets most of its 
power, not in handling the trivial cases the human sees at first glance.  And 
that’s why your compiler takes so long to run even on a multigigahertz 
multicore box.

How many milliseconds do the bulk of your SQLite statements run in?  Not the 
really heavy ones, I mean the 2-3 sigma span surrounding the mean.  Let’s say 
it’s 1-10 ms.  Let’s also say we can only afford 1% overhead before the extra 
optimizer time spent on fast queries overwhelms the time saved on the heavy 
queries.  That means you’ve only got a few microseconds to do all this 
optimization, else you’ll blow the savings running the optimizer.

I’d guess my C++ compiler on -O3 takes something like 100x as long to process 
the same number of bytes of input text as SQLite.  It doesn’t help to make 
SQLite queries prepare() 100x slower on the off chance that it can cut the 
execution time of a few particularly heavy queries in half.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Darko Volaric
OK, in trying to clear up one misunderstanding I've created another, so let me 
be clear: I don't support automatic CSE, I think it should be done manually. 
But to support that Richard, can you answer this question: is there a way of 
ensuring that a particular expression (or just function call) will be 
guaranteed to be evaluated before any other in a particular statement?



> On Sep 14, 2017, at 4:13 PM, Richard Hipp  wrote:
> 
> On 9/14/17, Darko Volaric  wrote:
>> I think people are missing the point, probably becuase it's not a great
>> example. Consider the following statement:
>> 
>> SELECT funca(slow(10)), funkb(slow(10))
>> 
>> and lets say slow(10) takes an hour to compute, and funka and funkb take
>> almost no time to execute. With common subexpression optimization the
>> statement would take one hour, instead of two, to compute becuase the value
>> of slow(10) would only be calculated once.
> 
> I fully understand the benefits of CSE.  My point is that constructs
> such as the above are very rarely used in SQLite - so much so that the
> amount of extra time spent inside of sqlite3_prepare() in order to
> deal with them is not worth the effort.
> 
> CSE in the example you cite above is relatively easy.  A harder example is 
> this:
> 
>  SELECT coalesce(x, slow(10)), coalesce(y, slow(10)) FROM tab;
> 
> -- 
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Simon Slavin


On 14 Sep 2017, at 3:08pm, Darko Volaric  wrote:

> I think people are missing the point, probably becuase it's not a great 
> example. Consider the following statement:
> 
> SELECT funca(slow(10)), funkb(slow(10))
> 
> and lets say slow(10) takes an hour to compute, and funka and funkb take 
> almost no time to execute. With common subexpression optimization the 
> statement would take one hour, instead of two, to compute becuase the value 
> of slow(10) would only be calculated once.

Thing of how hard that is to implement, though.  You have to keep a cache of 
each deterministic function and the parameters it was given and the result.  
You have to search through the cache every time you evaluate a deterministic 
function.  You have to allocate and free the memory this cache takes up.  You 
have to figure out a strategy for how long you keep results in the cache.  And 
the best strategy depends on something the SQLite developers can’t tell about 
your programming, so whatever they choose some people will complain.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Dominique Devienne
On Thu, Sep 14, 2017 at 4:13 PM, Richard Hipp  wrote:

> On 9/14/17, Darko Volaric  wrote:
> > I think people are missing the point, probably becuase it's not a great
> > example. Consider the following statement:
> >
> > SELECT funca(slow(10)), funkb(slow(10))
> >
> > and lets say slow(10) takes an hour to compute, and funka and funkb take
> > almost no time to execute. With common subexpression optimization the
> > statement would take one hour, instead of two, to compute because the
> value
> > of slow(10) would only be calculated once.
>
> I fully understand the benefits of CSE.  My point is that constructs
> such as the above are very rarely used in SQLite - so much so that the
> amount of extra time spent inside of sqlite3_prepare() in order to
> deal with them is not worth the effort.
>

But why not let the client code decide though? And have it off by default
as now?

I fully appreciate writing/testing/maintaining that optimisation might be a
lot of work,
but basing the decision to completely eschew this optimisation on the basis
that a
non-exhaustive sampling of SQL statements shows it's rarely needed seems a
bit
"weird" for lack of a better term on my part.

Also, when you write that the query time gains are often offset by
additional prepare time
overhead (and memory as well, as you mentioned) kinda assumes a bad
practice of
not using (and caching) prepared statements, i.e. the prepare time overhead
(1x) could
well be worth it for an often-used (prepared/cached) query (Nx, with N
large).

Not trying to get into the usual "SQLite is lite" arguments here, which
given how the NGQP
is quite advanced already and not so lite (and holds its own as you wrote
against others),
I'm just surprised by the rational you're using Richard. Respectfully so.
--DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Richard Hipp
On 9/14/17, Darko Volaric  wrote:
> I think people are missing the point, probably becuase it's not a great
> example. Consider the following statement:
>
> SELECT funca(slow(10)), funkb(slow(10))
>
> and lets say slow(10) takes an hour to compute, and funka and funkb take
> almost no time to execute. With common subexpression optimization the
> statement would take one hour, instead of two, to compute becuase the value
> of slow(10) would only be calculated once.

I fully understand the benefits of CSE.  My point is that constructs
such as the above are very rarely used in SQLite - so much so that the
amount of extra time spent inside of sqlite3_prepare() in order to
deal with them is not worth the effort.

CSE in the example you cite above is relatively easy.  A harder example is this:

  SELECT coalesce(x, slow(10)), coalesce(y, slow(10)) FROM tab;

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Darko Volaric
I think people are missing the point, probably becuase it's not a great 
example. Consider the following statement:

SELECT funca(slow(10)), funkb(slow(10))

and lets say slow(10) takes an hour to compute, and funka and funkb take almost 
no time to execute. With common subexpression optimization the statement would 
take one hour, instead of two, to compute becuase the value of slow(10) would 
only be calculated once.


> On Sep 14, 2017, at 8:31 AM, Wout Mertens  wrote:
> 
> Isn't that what cross join is for? Do a select on a virtual table to
> calculate the value and then use that value in the real where clause?
> 
> On Wed, Sep 13, 2017, 9:10 AM Hick Gunter  wrote:
> 
>> Try fl_value(...) IN ()
>> 
>> -Ursprüngliche Nachricht-
>> Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
>> Im Auftrag von Jens Alfke
>> Gesendet: Dienstag, 12. September 2017 19:26
>> An: SQLite mailing list 
>> Betreff: [EXTERNAL] [sqlite] Common subexpression optimization of
>> deterministic functions
>> 
>> SQLite 3.19 doesn’t seem to coalesce identical calls to a deterministic
>> function. For example, in this query, where `fl_value` is a function I’ve
>> registered as SQLITE_DETERMINISTIC:
>> 
>> SELECT key FROM kv_default
>> WHERE fl_value(body, 'contact.address.state') = 'CA'
>>   OR fl_value(body, 'contact.address.state') = 'WA'
>> 
>> fl_value gets called twice per row in the table, with the same inputs both
>> times of course. As fl_value is not a cheap function — it’s similar to
>> json_value — it would be a noticeable speedup if it were evaluated only
>> once per row.
>> 
>> Is there a way I can restructure these (automatically generated) queries
>> to do the refactoring explicitly? Sort of like assigning to a temporary
>> variable in an imperative language? It looks like a WITH clause lets me do
>> this syntactically, but I'm not sure if it'll make a difference at runtime.
>> 
>> —Jens
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
>> 
>> ___
>> Gunter Hick
>> Software Engineer
>> Scientific Games International GmbH
>> FN 157284 a, HG Wien
>> Klitschgasse 2-4, A-1130 Vienna, Austria
>> Tel: +43 1 80100 0
>> E-Mail: h...@scigames.at
>> 
>> This communication (including any attachments) is intended for the use of
>> the intended recipient(s) only and may contain information that is
>> confidential, privileged or legally protected. Any unauthorized use or
>> dissemination of this communication is strictly prohibited. If you have
>> received this communication in error, please immediately notify the sender
>> by return e-mail message and delete all copies of the original
>> communication. Thank you for your cooperation.
>> 
>> 
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Wout Mertens
Isn't that what cross join is for? Do a select on a virtual table to
calculate the value and then use that value in the real where clause?

On Wed, Sep 13, 2017, 9:10 AM Hick Gunter  wrote:

> Try fl_value(...) IN ()
>
> -Ursprüngliche Nachricht-
> Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> Im Auftrag von Jens Alfke
> Gesendet: Dienstag, 12. September 2017 19:26
> An: SQLite mailing list 
> Betreff: [EXTERNAL] [sqlite] Common subexpression optimization of
> deterministic functions
>
> SQLite 3.19 doesn’t seem to coalesce identical calls to a deterministic
> function. For example, in this query, where `fl_value` is a function I’ve
> registered as SQLITE_DETERMINISTIC:
>
> SELECT key FROM kv_default
> WHERE fl_value(body, 'contact.address.state') = 'CA'
>OR fl_value(body, 'contact.address.state') = 'WA'
>
> fl_value gets called twice per row in the table, with the same inputs both
> times of course. As fl_value is not a cheap function — it’s similar to
> json_value — it would be a noticeable speedup if it were evaluated only
> once per row.
>
> Is there a way I can restructure these (automatically generated) queries
> to do the refactoring explicitly? Sort of like assigning to a temporary
> variable in an imperative language? It looks like a WITH clause lets me do
> this syntactically, but I'm not sure if it'll make a difference at runtime.
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: h...@scigames.at
>
> This communication (including any attachments) is intended for the use of
> the intended recipient(s) only and may contain information that is
> confidential, privileged or legally protected. Any unauthorized use or
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, please immediately notify the sender
> by return e-mail message and delete all copies of the original
> communication. Thank you for your cooperation.
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-13 Thread Hick Gunter
Try fl_value(...) IN ()

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Jens Alfke
Gesendet: Dienstag, 12. September 2017 19:26
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] Common subexpression optimization of deterministic 
functions

SQLite 3.19 doesn’t seem to coalesce identical calls to a deterministic 
function. For example, in this query, where `fl_value` is a function I’ve 
registered as SQLITE_DETERMINISTIC:

SELECT key FROM kv_default
WHERE fl_value(body, 'contact.address.state') = 'CA'
   OR fl_value(body, 'contact.address.state') = 'WA'

fl_value gets called twice per row in the table, with the same inputs both 
times of course. As fl_value is not a cheap function — it’s similar to 
json_value — it would be a noticeable speedup if it were evaluated only once 
per row.

Is there a way I can restructure these (automatically generated) queries to do 
the refactoring explicitly? Sort of like assigning to a temporary variable in 
an imperative language? It looks like a WITH clause lets me do this 
syntactically, but I'm not sure if it'll make a difference at runtime.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users