Re: [sqlite] optimizing out function calls
Nathan Kurz wrote: > I'm using a computationally expensive user defined function called > 'match()'. In case it makes a difference, match() is written in C, > and for testing, I'm loading it as a shared library into the sqlite3 > shell application. I want to return the value of match(), and also > order by it. So my query looks something like this: Nathan, I'm intrigued to know how you loaded the additional shared library with your sqlite3(1) shell... is it simply a case of linking another library to it post installation, and if so how do you register the user functions that library contains? Regards, Robin -- Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK [EMAIL PROTECTED] Tel: +44 1865 483685 Fax: +44 1865 483073 signature.asc Description: OpenPGP digital signature
Re: [sqlite] optimizing out function calls
Jay Sprenkle wrote: > > On 11/14/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > Original Message > > > Subject: Re: [sqlite] optimizing out function calls > > > From: Jay Sprenkle <[EMAIL PROTECTED]> > > > Date: Mon, November 14, 2005 4:34 pm > > > To: sqlite-users@sqlite.org > > > > > > > - the random function in C has no arguments, it will produce > > > > a different result on every call (within the limits of the > > > > random number generator that is used). Of course from a > > > > mathematical point of view this is a monstrosity ;). > > A monstrosity? Isn't that a bit emotionally over the top? > I did add ";)" > A random number generator should generate random numbers and > they aren't supposed to be the same. It does generate the same > sequence of numbers each time given the same seed so it's testable. > Yes, but the implementation (not the definition) as a function a la C causes it to deviate from the mathematical concept of a function. > > > > But 'random' is not a mathematical function. In software we often use > > functions to get values from the outside world ( getch(),time() etc, > > etc). Random is a function which fetches the next number from a random > > number generator. The qualities of that random number depend on the > > underlying generator. When we pass an argument to random it is just to > > control the behaviour of the random number generator. > > > > We would not expect getch() and time() to return the same result each > > time they are called (except on friday afternoons). > > I think that's why DRH doesn't cache results for user defined functions. Again, the implementation of getch and time as _C functions_ is a pure implementation choice. Other languages do it in another way. The whole point I was trying to make is that the apparent implementation form of some functionality may be misleading as to its proper semantics. Regards, Arjen
Re: [sqlite] optimizing out function calls
On 11/14/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > Original Message > > Subject: Re: [sqlite] optimizing out function calls > > From: Jay Sprenkle <[EMAIL PROTECTED]> > > Date: Mon, November 14, 2005 4:34 pm > > To: sqlite-users@sqlite.org > > > > > - the random function in C has no arguments, it will produce > > > a different result on every call (within the limits of the > > > random number generator that is used). Of course from a > > > mathematical point of view this is a monstrosity ;). A monstrosity? Isn't that a bit emotionally over the top? A random number generator should generate random numbers and they aren't supposed to be the same. It does generate the same sequence of numbers each time given the same seed so it's testable. > > But 'random' is not a mathematical function. In software we often use > functions to get values from the outside world ( getch(),time() etc, > etc). Random is a function which fetches the next number from a random > number generator. The qualities of that random number depend on the > underlying generator. When we pass an argument to random it is just to > control the behaviour of the random number generator. > > We would not expect getch() and time() to return the same result each > time they are called (except on friday afternoons). I think that's why DRH doesn't cache results for user defined functions.
RE: [sqlite] optimizing out function calls
> Original Message > Subject: Re: [sqlite] optimizing out function calls > From: Jay Sprenkle <[EMAIL PROTECTED]> > Date: Mon, November 14, 2005 4:34 pm > To: sqlite-users@sqlite.org > > > - the random function in C has no arguments, it will produce > > a different result on every call (within the limits of the > > random number generator that is used). Of course from a > > mathematical point of view this is a monstrosity ;). But 'random' is not a mathematical function. In software we often use functions to get values from the outside world ( getch(),time() etc, etc). Random is a function which fetches the next number from a random number generator. The qualities of that random number depend on the underlying generator. When we pass an argument to random it is just to control the behaviour of the random number generator. We would not expect getch() and time() to return the same result each time they are called (except on friday afternoons).
Re: [sqlite] optimizing out function calls
> - the random function in C has no arguments, it will produce > a different result on every call (within the limits of the > random number generator that is used). Of course from a > mathematical point of view this is a monstrosity ;). > Functions should return the same values given the same > arguments. rand() should generate the same the same sequence of results given the same initial seed value. The srand() function sets the seed
Re: [sqlite] optimizing out function calls
On 11/13/05, Joe Wilson <[EMAIL PROTECTED]> wrote: > MS Access appears to assume all functions called with the same > arguments are constant and returns the same result for every row: > > SQLite is apparently the other extreme - it assumes that each call a > function can potentially yield a different result each time. If it's a user defined function I would think you'd have to assume the user might not have a function that only depends on the arguments for it's calculations.
RE: [sqlite] optimizing out function calls
Have a look at archive here http://thread.gmane.org/gmane.comp.db.sqlite.general/13781 At the time I was using the random number generator function and was confused about its usage. It may help some of you. Regards Nick This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person.
Re: [sqlite] optimizing out function calls
Joe Wilson wrote: > > What do other databases return for the types of SQL queries below? > > SELECT random(1) AS func FROM test ORDER BY func; > SELECT random() AS func FROM test WHERE func > 10; > > MS Access appears to assume all functions called with the same > arguments are constant and returns the same result for every row: > > select rnd() from test; > > 0.5795186162 > 0.5795186162 > 0.5795186162 > 0.5795186162 > 0.5795186162 > > SQLite is apparently the other extreme - it assumes that each call a > function can potentially yield a different result each time. > If I may be so blunt and refer to Fortran and C for a comparison: - the random function in C has no arguments, it will produce a different result on every call (within the limits of the random number generator that is used). Of course from a mathematical point of view this is a monstrosity ;). Functions should return the same values given the same arguments. - functions without arguments in Fortran are presumed to return the same value. The compiler may decide to cache that value and _always_ return the same value. This is the reason in Fortran (90, ...) random numbers are generated via a subroutine, instead of a function. Apart from this distinction, you can also encounter the situation where one platform (compiler, OS, hardware) always initialises the random number generator with the same seed and another will use a seed derived from, say, the system clock. I know too little of SQL to say that the standard defines the random function in this way or that, but these are the variations I have seen and my guess is the standard does not prescribe anything about it. Regards, Arjen
Re: [sqlite] optimizing out function calls
What do other databases return for the types of SQL queries below? SELECT random(1) AS func FROM test ORDER BY func; SELECT random() AS func FROM test WHERE func > 10; MS Access appears to assume all functions called with the same arguments are constant and returns the same result for every row: select rnd() from test; 0.5795186162 0.5795186162 0.5795186162 0.5795186162 0.5795186162 SQLite is apparently the other extreme - it assumes that each call a function can potentially yield a different result each time. __ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com
Re: [sqlite] optimizing out function calls
[EMAIL PROTECTED] wrote: Nathan Kurz <[EMAIL PROTECTED]> wrote: SELECT uid, match("complex", "function", vector) AS match FROM vectors ORDER BY match DESC LIMIT 20; SELECT uid, mx FROM (SELECT uid, match(...) AS mx FROM vectors LIMIT -1) ORDER BY mx DESC LIMIT 20; The LIMIT -1 on the subquery is to fake out the optimizer and prevent it from folding the subquery back into the main query, resulting in the same statement you started with. A "LIMIT -1" is effectively a no-op. It does no limiting. But subqueries that contain a limit will not be folded into outer queries that also contain a limit. That strikes me as a hack, and also a matter of allowing implementation to leak into interface. The current implementation seems to assume that all functions are idempotent; once you allow user-defined functions, that's no longer a safe assumption and it can lead to incorrect, as well as inefficient, behavior if a user-defined function has side effects.
Re: [sqlite] optimizing out function calls
On 11/13/05, Nathan Kurz <[EMAIL PROTECTED]> wrote: > > On Sun, Nov 13, 2005 at 07:30:58AM -0500, [EMAIL PROTECTED] wrote: > Or even better, is there any way to write a user defined function that > could do the ordering and limiting internally to reduce the data set > early? I suppose I could do it as a aggregate function that returns a > text string and then reparse that into a second query using IN, but it > would wonderful if there was a way to 'explode' the return value of a > function into multiple rows. Something like: If it calls the same user defined function twice sequentially perhaps you can cache the function result in ram, if you get the same parameters then return the same result the last call produced.
Re: [sqlite] optimizing out function calls
On Sun, Nov 13, 2005 at 07:30:58AM -0500, [EMAIL PROTECTED] wrote: > Nathan Kurz <[EMAIL PROTECTED]> wrote: > > > > SELECT uid, match("complex", "function", vector) AS match FROM vectors > > ORDER BY match DESC LIMIT 20; > > SELECT uid, mx FROM > (SELECT uid, match(...) AS mx FROM vectors LIMIT -1) > ORDER BY mx DESC LIMIT 20; > > The LIMIT -1 on the subquery is to fake out the optimizer and prevent > it from folding the subquery back into the main query, resulting in the > same statement you started with. A "LIMIT -1" is effectively a no-op. > It does no limiting. But subqueries that contain a limit will not be > folded into outer queries that also contain a limit. Thanks! I would not have thought to try that on my own. It does indeed prevent the double function call, but unfortunately makes the rest of the query more expensive by creating an extra temp table. >From what I can tell about the VDBE execution in both cases: SELECT function() AS func FROM table ORDER BY func DESC LIMIT 10; -> cycle through every row calling function() twice per row -> put results directly into an sorted temp table SELECT func FROM (SELECT function() AS func FROM table LIMIT -1) ORDER BY func DESC LIMIT 10; -> only calls function() once per row -> puts results into temp table 1 (unsorted) -> inserts all rows from temp table 1 into sorted temp table 2 Is there any way to combine the best of both of these worlds: only calling function() once but only creating one temp table? Or is it the case (as it seems) that SQLite only copies the expression for the function() when parsing an AS and never goes back to determine that the expression has already been solved for that row? Or even better, is there any way to write a user defined function that could do the ordering and limiting internally to reduce the data set early? I suppose I could do it as a aggregate function that returns a text string and then reparse that into a second query using IN, but it would wonderful if there was a way to 'explode' the return value of a function into multiple rows. Something like: SELECT function_returning_multiple_rows(id, vector, limit) FROM vectors; > Hint: The output of EXPLAIN is much easier to read if you do ".explain" > first to set up the output formatting. Thanks for the hint! It does indeed. --nate ps. A couple Pathological cases that superficially look like bugs because of this double execution of the function call. Are these bugs, features, or just the way things currently are? sqlite> SELECT random(1) AS func FROM test ORDER BY func DESC LIMIT 5; func -141 7787 -823 1453 -654 sqlite> SELECT random() AS func FROM test WHERE func > 10 LIMIT 5; func -853 5973 -232 -217 9849 pps. Despite any apparent complaints, I'm really enjoying SQLite.
Re: [sqlite] optimizing out function calls
Nathan Kurz <[EMAIL PROTECTED]> wrote: > > SELECT uid, match("complex", "function", vector) AS match FROM vectors > ORDER BY match DESC LIMIT 20; SELECT uid, mx FROM (SELECT uid, match(...) AS mx FROM vectors LIMIT -1) ORDER BY mx DESC LIMIT 20; The LIMIT -1 on the subquery is to fake out the optimizer and prevent it from folding the subquery back into the main query, resulting in the same statement you started with. A "LIMIT -1" is effectively a no-op. It does no limiting. But subqueries that contain a limit will not be folded into outer queries that also contain a limit. > > And in case it bolsters my case, here's the EXPLAIN output I see: > > sqlite> EXPLAIN SELECT uid, match("complex", "function", vector) AS match >...> FROM vectors ORDER BY match DESC LIMIT 20; Hint: The output of EXPLAIN is much easier to read if you do ".explain" first to set up the output formatting. -- D. Richard Hipp <[EMAIL PROTECTED]>
Re: [sqlite] optimizing out function calls
According to my understanding of standard SQL, you should be able to say: SELECT arbitrary_expression() AS bar FROM foo ORDER BY bar; ... and the expression is only evaluated once per row, not twice. Your actual example seems confusing, since you appear to alias your 'vectors' table to 'match' in the from clause, which is also the name of your function, and the name of what you sort by. Perhaps having different names for each thing that is actually different will make your question easier to answer. For example: SELECT uid, match_func("complex", "function", vector) AS match_res FROM vectors AS match_tbl ORDER BY match_res DESC LIMIT 20; -- Darren Duncan At 10:01 PM -0700 11/12/05, Nathan Kurz wrote: Hello -- I'm trying to figure out how to optimize a query a bit, and think I've hit a case that could easily be optimized by sqlite but isn't. I'm wondering if it would be an easy optimization to add, or whether there is some way I can 'hint' the optization into being. I'm using a computationally expensive user defined function called 'match()'. In case it makes a difference, match() is written in C, and for testing, I'm loading it as a shared library into the sqlite3 shell application. I want to return the value of match(), and also order by it. So my query looks something like this: SELECT uid, match("complex", "function", vector) FROM vectors AS match ORDER BY match DESC LIMIT 20; I had expected that match() would only be called once per row, but it turns out to be called twice: once for the select, and once for the ordering. I've confirmed this both by putting in a counter, and by using 'EXPLAIN'. Is there any way to tell SQLite to reuse the value of the first call rather than calling the function again? I'm a comfortable C programmer, but only superficially familiar with the SQLite code so far. If I'm not missing something obvious, hints on where to look at writing a patch for this would be appreciated. Thanks! Nathan Kurz [EMAIL PROTECTED]
Re: [sqlite] optimizing out function calls
On Sat, Nov 12, 2005 at 10:01:29PM -0700, Nathan Kurz wrote: > SELECT uid, match("complex", "function", vector) FROM vectors AS match > ORDER BY match DESC LIMIT 20; Please pardon the silly typo. I do have the AS in the right spot. SELECT uid, match("complex", "function", vector) AS match FROM vectors ORDER BY match DESC LIMIT 20; And in case it bolsters my case, here's the EXPLAIN output I see: sqlite> EXPLAIN SELECT uid, match("complex", "function", vector) AS match ...> FROM vectors ORDER BY match DESC LIMIT 20; 0|OpenVirtual|1|3|keyinfo(1,-BINARY) 1|Integer|20|0| 2|MustBeInt|0|0| 3|Negative|0|0| 4|MemStore|0|1| 5|Goto|0|37| 6|Integer|0|0| 7|OpenRead|0|14984| 8|SetNumColumns|0|2| 9|Rewind|0|25| 10|Column|0|0| 11|String8|0|0|complex 12|String8|0|0|function 13|Column|0|1| 14|Function|3|3|match(3) 15|MakeRecord|2|0| 16|String8|0|0|complex 17|String8|0|0|function 18|Column|0|1| 19|Function|3|3|match(3) 20|Sequence|1|0| 21|Pull|2|0| 22|MakeRecord|3|0| 23|IdxInsert|1|0| 24|Next|0|10| 25|Close|0|0| 26|Sort|1|36| 27|MemIncr|0|36| 28|Column|1|2| 29|Integer|2|0| 30|Pull|1|0| 31|Column|-1|0| 32|Column|-2|1| 33|Callback|2|0| 34|Pop|2|0| 35|Next|1|27| 36|Halt|0|0| 37|Transaction|0|0| 38|VerifyCookie|0|116| 39|Goto|0|6| 40|Noop|0|0| --nate
[sqlite] optimizing out function calls
Hello -- I'm trying to figure out how to optimize a query a bit, and think I've hit a case that could easily be optimized by sqlite but isn't. I'm wondering if it would be an easy optimization to add, or whether there is some way I can 'hint' the optization into being. I'm using a computationally expensive user defined function called 'match()'. In case it makes a difference, match() is written in C, and for testing, I'm loading it as a shared library into the sqlite3 shell application. I want to return the value of match(), and also order by it. So my query looks something like this: SELECT uid, match("complex", "function", vector) FROM vectors AS match ORDER BY match DESC LIMIT 20; I had expected that match() would only be called once per row, but it turns out to be called twice: once for the select, and once for the ordering. I've confirmed this both by putting in a counter, and by using 'EXPLAIN'. Is there any way to tell SQLite to reuse the value of the first call rather than calling the function again? I'm a comfortable C programmer, but only superficially familiar with the SQLite code so far. If I'm not missing something obvious, hints on where to look at writing a patch for this would be appreciated. Thanks! Nathan Kurz [EMAIL PROTECTED]