Re: [sqlite] optimizing out function calls

2005-11-16 Thread Robin Breathe
Nathan Kurz wrote:
> I'm using a computationally expensive user defined function called
> 'match()'.  In case it makes a difference, match() is written in C,
> and for testing, I'm loading it as a shared library into the sqlite3
> shell application.  I want to return the value of match(), and also
> order by it.  So my query looks something like this:

Nathan,

I'm intrigued to know how you loaded the additional shared library with
your sqlite3(1) shell... is it simply a case of linking another library
to it post installation, and if so how do you register the user
functions that library contains?

Regards,
Robin
-- 
Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK
[EMAIL PROTECTED]   Tel: +44 1865 483685  Fax: +44 1865 483073



signature.asc
Description: OpenPGP digital signature


Re: [sqlite] optimizing out function calls

2005-11-14 Thread Arjen Markus
Jay Sprenkle wrote:
> 
> On 11/14/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > >  Original Message 
> > > Subject: Re: [sqlite] optimizing out function calls
> > > From: Jay Sprenkle <[EMAIL PROTECTED]>
> > > Date: Mon, November 14, 2005 4:34 pm
> > > To: sqlite-users@sqlite.org
> > >
> > > > - the random function in C has no arguments, it will produce
> > > >   a different result on every call (within the limits of the
> > > >   random number generator that is used). Of course from a
> > > >   mathematical point of view this is a monstrosity ;).
> 
> A monstrosity? Isn't that a bit emotionally over the top?
> 

I did add ";)"

> A random number generator should generate random numbers and
> they aren't supposed to be the same. It does generate the same
> sequence of numbers each time given the same seed so it's testable.
> 

Yes, but the implementation (not the definition) as a function a la C
causes it to deviate from the mathematical concept of a function.

> >
> > But 'random' is not a mathematical function. In software we often use
> > functions to get values from the outside world ( getch(),time() etc,
> > etc). Random is a function which fetches the next number from a random
> > number generator. The qualities of that random number depend on the
> > underlying generator. When we pass an argument to random it is just to
> > control the behaviour of the random number generator.
> >
> > We would not expect getch() and time() to return the same result each
> > time they are called (except on friday afternoons).
> 
> I think that's why DRH doesn't cache results for user defined functions.

Again, the implementation of getch and time as _C functions_ is a pure
implementation choice. Other languages do it in another way. The whole
point I was trying to make is that the apparent implementation form of
some 
functionality may be misleading as to its proper semantics.

Regards,

Arjen



Re: [sqlite] optimizing out function calls

2005-11-14 Thread Jay Sprenkle
On 11/14/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >  Original Message 
> > Subject: Re: [sqlite] optimizing out function calls
> > From: Jay Sprenkle <[EMAIL PROTECTED]>
> > Date: Mon, November 14, 2005 4:34 pm
> > To: sqlite-users@sqlite.org
> >
> > > - the random function in C has no arguments, it will produce
> > >   a different result on every call (within the limits of the
> > >   random number generator that is used). Of course from a
> > >   mathematical point of view this is a monstrosity ;).

A monstrosity? Isn't that a bit emotionally over the top?

A random number generator should generate random numbers and
they aren't supposed to be the same. It does generate the same
sequence of numbers each time given the same seed so it's testable.


>
> But 'random' is not a mathematical function. In software we often use
> functions to get values from the outside world ( getch(),time() etc,
> etc). Random is a function which fetches the next number from a random
> number generator. The qualities of that random number depend on the
> underlying generator. When we pass an argument to random it is just to
> control the behaviour of the random number generator.
>
> We would not expect getch() and time() to return the same result each
> time they are called (except on friday afternoons).

I think that's why DRH doesn't cache results for user defined functions.


RE: [sqlite] optimizing out function calls

2005-11-14 Thread roger
>  Original Message 
> Subject: Re: [sqlite] optimizing out function calls
> From: Jay Sprenkle <[EMAIL PROTECTED]>
> Date: Mon, November 14, 2005 4:34 pm
> To: sqlite-users@sqlite.org
> 
> > - the random function in C has no arguments, it will produce
> >   a different result on every call (within the limits of the
> >   random number generator that is used). Of course from a
> >   mathematical point of view this is a monstrosity ;).

But 'random' is not a mathematical function. In software we often use
functions to get values from the outside world ( getch(),time() etc,
etc). Random is a function which fetches the next number from a random
number generator. The qualities of that random number depend on the
underlying generator. When we pass an argument to random it is just to
control the behaviour of the random number generator.

We would not expect getch() and time() to return the same result each
time they are called (except on friday afternoons).



Re: [sqlite] optimizing out function calls

2005-11-14 Thread Jay Sprenkle
> - the random function in C has no arguments, it will produce
>   a different result on every call (within the limits of the
>   random number generator that is used). Of course from a
>   mathematical point of view this is a monstrosity ;).
>   Functions should return the same values given the same
>   arguments.

  rand() should generate the same the same sequence of results
  given the same initial seed value.

  The srand() function sets the seed


Re: [sqlite] optimizing out function calls

2005-11-14 Thread Jay Sprenkle
On 11/13/05, Joe Wilson <[EMAIL PROTECTED]> wrote:

> MS Access appears to assume all functions called with the same
> arguments are constant and returns the same result for every row:

>
> SQLite is apparently the other extreme - it assumes that each call a
> function can potentially yield a different result each time.

If it's a user defined function I would think you'd have to assume
the user might not have a function that only depends on the arguments
for it's calculations.


RE: [sqlite] optimizing out function calls

2005-11-14 Thread Brandon, Nicholas


Have a look at archive here
http://thread.gmane.org/gmane.comp.db.sqlite.general/13781

At the time I was using the random number generator function and was
confused about its usage.

It may help some of you.

Regards
Nick



This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender.
You should not copy it or use it for any purpose nor disclose or
distribute its contents to any other person.



Re: [sqlite] optimizing out function calls

2005-11-14 Thread Arjen Markus
Joe Wilson wrote:
> 
> What do other databases return for the types of SQL queries below?
> 
>   SELECT random(1) AS func FROM test ORDER BY func;
>   SELECT random() AS func FROM test WHERE func > 10;
> 
> MS Access appears to assume all functions called with the same
> arguments are constant and returns the same result for every row:
> 
>   select rnd() from test;
> 
>   0.5795186162
>   0.5795186162
>   0.5795186162
>   0.5795186162
>   0.5795186162
> 
> SQLite is apparently the other extreme - it assumes that each call a
> function can potentially yield a different result each time.
> 

If I may be so blunt and refer to Fortran and C for a comparison:

- the random function in C has no arguments, it will produce 
  a different result on every call (within the limits of the 
  random number generator that is used). Of course from a   
  mathematical point of view this is a monstrosity ;).
  Functions should return the same values given the same
  arguments.

- functions without arguments in Fortran are presumed 
  to return the same value. The compiler may decide to
  cache that value and _always_ return the same value.
  This is the reason in Fortran (90, ...) random
  numbers are generated via a subroutine, instead of
  a function.

Apart from this distinction, you can also encounter
the situation where one platform (compiler, OS, hardware)
always initialises the random number generator with
the same seed and another will use a seed derived from,
say, the system clock.

I know too little of SQL to say that the standard
defines the random function in this way or that, 
but these are the variations I have seen and my
guess is the standard does not prescribe anything
about it.

Regards,

Arjen



Re: [sqlite] optimizing out function calls

2005-11-13 Thread Joe Wilson
What do other databases return for the types of SQL queries below?

  SELECT random(1) AS func FROM test ORDER BY func;
  SELECT random() AS func FROM test WHERE func > 10;

MS Access appears to assume all functions called with the same 
arguments are constant and returns the same result for every row:

  select rnd() from test;

  0.5795186162
  0.5795186162
  0.5795186162
  0.5795186162
  0.5795186162

SQLite is apparently the other extreme - it assumes that each call a
function can potentially yield a different result each time.




__ 
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com


Re: [sqlite] optimizing out function calls

2005-11-13 Thread Eric Bohlman

[EMAIL PROTECTED] wrote:

Nathan Kurz <[EMAIL PROTECTED]> wrote:

SELECT uid, match("complex", "function", vector) AS match FROM vectors 
 ORDER BY match DESC LIMIT 20;



SELECT uid, mx FROM
  (SELECT uid, match(...) AS mx FROM vectors LIMIT -1)
ORDER BY mx DESC LIMIT 20;

The LIMIT -1 on the subquery is to fake out the optimizer and prevent
it from folding the subquery back into the main query, resulting in the
same statement you started with.  A "LIMIT -1" is effectively a no-op.
It does no limiting.  But subqueries that contain a limit will not be
folded into outer queries that also contain a limit.


That strikes me as a hack, and also a matter of allowing implementation 
to leak into interface.  The current implementation seems to assume that 
all functions are idempotent; once you allow user-defined functions, 
that's no longer a safe assumption and it can lead to incorrect, as well 
as inefficient, behavior if a user-defined function has side effects.


Re: [sqlite] optimizing out function calls

2005-11-13 Thread Jay Sprenkle
On 11/13/05, Nathan Kurz <[EMAIL PROTECTED]> wrote:
>
> On Sun, Nov 13, 2005 at 07:30:58AM -0500, [EMAIL PROTECTED] wrote:
> Or even better, is there any way to write a user defined function that
> could do the ordering and limiting internally to reduce the data set
> early? I suppose I could do it as a aggregate function that returns a
> text string and then reparse that into a second query using IN, but it
> would wonderful if there was a way to 'explode' the return value of a
> function into multiple rows. Something like:



If it calls the same user defined function twice sequentially perhaps
you can cache the function result in ram, if you get the same parameters
then return the same result the last call produced.


Re: [sqlite] optimizing out function calls

2005-11-13 Thread Nathan Kurz
On Sun, Nov 13, 2005 at 07:30:58AM -0500, [EMAIL PROTECTED] wrote:
> Nathan Kurz <[EMAIL PROTECTED]> wrote:
> > 
> > SELECT uid, match("complex", "function", vector) AS match FROM vectors 
> >   ORDER BY match DESC LIMIT 20;
> 
> SELECT uid, mx FROM
>   (SELECT uid, match(...) AS mx FROM vectors LIMIT -1)
> ORDER BY mx DESC LIMIT 20;
> 
> The LIMIT -1 on the subquery is to fake out the optimizer and prevent
> it from folding the subquery back into the main query, resulting in the
> same statement you started with.  A "LIMIT -1" is effectively a no-op.
> It does no limiting.  But subqueries that contain a limit will not be
> folded into outer queries that also contain a limit.

Thanks!  I would not have thought to try that on my own.  It does
indeed prevent the double function call, but unfortunately makes the
rest of the query more expensive by creating an extra temp table.
>From what I can tell about the VDBE execution in both cases:

SELECT function() AS func FROM table ORDER BY func DESC LIMIT 10;
   -> cycle through every row calling function() twice per row
   -> put results directly into an sorted temp table

SELECT func FROM (SELECT function() AS func FROM table LIMIT -1)
  ORDER BY func DESC LIMIT 10;
   -> only calls function() once per row
   -> puts results into temp table 1 (unsorted)
   -> inserts all rows from temp table 1 into sorted temp table 2

Is there any way to combine the best of both of these worlds: only
calling function() once but only creating one temp table?  Or is it
the case (as it seems) that SQLite only copies the expression for the
function() when parsing an AS and never goes back to determine that
the expression has already been solved for that row?

Or even better, is there any way to write a user defined function that
could do the ordering and limiting internally to reduce the data set
early?  I suppose I could do it as a aggregate function that returns a
text string and then reparse that into a second query using IN, but it
would wonderful if there was a way to 'explode' the return value of a
function into multiple rows.  Something like:

SELECT function_returning_multiple_rows(id, vector, limit) FROM vectors;

> Hint:  The output of EXPLAIN is much easier to read if you do ".explain"
> first to set up the output formatting.

Thanks for the hint!  It does indeed.

--nate

ps.  A couple Pathological cases that superficially look like bugs
 because of this double execution of the function call.  Are
 these bugs, features, or just the way things currently are?

sqlite> SELECT random(1) AS func FROM test ORDER BY func DESC LIMIT 5;
func

-141
7787
-823
1453
-654

sqlite> SELECT random() AS func FROM test WHERE func > 10 LIMIT 5;
func

-853
5973
-232
-217
9849

pps.  Despite any apparent complaints, I'm really enjoying SQLite.



Re: [sqlite] optimizing out function calls

2005-11-13 Thread drh
Nathan Kurz <[EMAIL PROTECTED]> wrote:
> 
> SELECT uid, match("complex", "function", vector) AS match FROM vectors 
>   ORDER BY match DESC LIMIT 20;

SELECT uid, mx FROM
  (SELECT uid, match(...) AS mx FROM vectors LIMIT -1)
ORDER BY mx DESC LIMIT 20;

The LIMIT -1 on the subquery is to fake out the optimizer and prevent
it from folding the subquery back into the main query, resulting in the
same statement you started with.  A "LIMIT -1" is effectively a no-op.
It does no limiting.  But subqueries that contain a limit will not be
folded into outer queries that also contain a limit.

> 
> And in case it bolsters my case, here's the EXPLAIN output I see:
> 
> sqlite> EXPLAIN SELECT uid, match("complex", "function", vector) AS match
>...> FROM vectors ORDER BY match DESC LIMIT 20;

Hint:  The output of EXPLAIN is much easier to read if you do ".explain"
first to set up the output formatting.

--
D. Richard Hipp <[EMAIL PROTECTED]>



Re: [sqlite] optimizing out function calls

2005-11-12 Thread Darren Duncan

According to my understanding of standard SQL, you should be able to say:

 SELECT arbitrary_expression() AS bar FROM foo ORDER BY bar;

... and the expression is only evaluated once per row, not twice.

Your actual example seems confusing, since you appear to alias your 
'vectors' table to 'match' in the from clause, which is also the name 
of your function, and the name of what you sort by.  Perhaps having 
different names for each thing that is actually different will make 
your question easier to answer.


For example:

 SELECT uid, match_func("complex", "function", vector) AS match_res
 FROM vectors AS match_tbl
 ORDER BY match_res DESC
 LIMIT 20;

-- Darren Duncan

At 10:01 PM -0700 11/12/05, Nathan Kurz wrote:

Hello --

I'm trying to figure out how to optimize a query a bit, and think I've
hit a case that could easily be optimized by sqlite but isn't.  I'm
wondering if it would be an easy optimization to add, or whether there
is some way I can 'hint' the optization into being.

I'm using a computationally expensive user defined function called
'match()'.  In case it makes a difference, match() is written in C,
and for testing, I'm loading it as a shared library into the sqlite3
shell application.  I want to return the value of match(), and also
order by it.  So my query looks something like this:

SELECT uid, match("complex", "function", vector) FROM vectors AS match
 ORDER BY match DESC LIMIT 20;

I had expected that match() would only be called once per row, but it
turns out to be called twice: once for the select, and once for the
ordering.  I've confirmed this both by putting in a counter, and by
using 'EXPLAIN'.  Is there any way to tell SQLite to reuse the value
of the first call rather than calling the function again?

I'm a comfortable C programmer, but only superficially familiar with
the SQLite code so far.  If I'm not missing something obvious, hints
on where to look at writing a patch for this would be appreciated.

Thanks!

Nathan Kurz
[EMAIL PROTECTED]




Re: [sqlite] optimizing out function calls

2005-11-12 Thread Nathan Kurz
On Sat, Nov 12, 2005 at 10:01:29PM -0700, Nathan Kurz wrote:
> SELECT uid, match("complex", "function", vector) FROM vectors AS match
>  ORDER BY match DESC LIMIT 20;

Please pardon the silly typo.  I do have the AS in the right spot.

SELECT uid, match("complex", "function", vector) AS match FROM vectors 
  ORDER BY match DESC LIMIT 20;

And in case it bolsters my case, here's the EXPLAIN output I see:

sqlite> EXPLAIN SELECT uid, match("complex", "function", vector) AS match  
   ...> FROM vectors ORDER BY match DESC LIMIT 20;
0|OpenVirtual|1|3|keyinfo(1,-BINARY)
1|Integer|20|0|
2|MustBeInt|0|0|
3|Negative|0|0|
4|MemStore|0|1|
5|Goto|0|37|
6|Integer|0|0|
7|OpenRead|0|14984|
8|SetNumColumns|0|2|
9|Rewind|0|25|
10|Column|0|0|
11|String8|0|0|complex
12|String8|0|0|function
13|Column|0|1|
14|Function|3|3|match(3)
15|MakeRecord|2|0|
16|String8|0|0|complex
17|String8|0|0|function
18|Column|0|1|
19|Function|3|3|match(3)
20|Sequence|1|0|
21|Pull|2|0|
22|MakeRecord|3|0|
23|IdxInsert|1|0|
24|Next|0|10|
25|Close|0|0|
26|Sort|1|36|
27|MemIncr|0|36|
28|Column|1|2|
29|Integer|2|0|
30|Pull|1|0|
31|Column|-1|0|
32|Column|-2|1|
33|Callback|2|0|
34|Pop|2|0|
35|Next|1|27|
36|Halt|0|0|
37|Transaction|0|0|
38|VerifyCookie|0|116|
39|Goto|0|6|
40|Noop|0|0|

--nate


[sqlite] optimizing out function calls

2005-11-12 Thread Nathan Kurz
Hello --

I'm trying to figure out how to optimize a query a bit, and think I've
hit a case that could easily be optimized by sqlite but isn't.  I'm
wondering if it would be an easy optimization to add, or whether there
is some way I can 'hint' the optization into being.

I'm using a computationally expensive user defined function called
'match()'.  In case it makes a difference, match() is written in C,
and for testing, I'm loading it as a shared library into the sqlite3
shell application.  I want to return the value of match(), and also
order by it.  So my query looks something like this:

SELECT uid, match("complex", "function", vector) FROM vectors AS match
 ORDER BY match DESC LIMIT 20;

I had expected that match() would only be called once per row, but it
turns out to be called twice: once for the select, and once for the
ordering.  I've confirmed this both by putting in a counter, and by
using 'EXPLAIN'.  Is there any way to tell SQLite to reuse the value
of the first call rather than calling the function again?

I'm a comfortable C programmer, but only superficially familiar with
the SQLite code so far.  If I'm not missing something obvious, hints
on where to look at writing a patch for this would be appreciated.

Thanks!

Nathan Kurz
[EMAIL PROTECTED]