Re: [HACKERS] Function result cacheing - any comments?

2002-08-19 Thread Tom Lane

Philip Warner <[EMAIL PROTECTED]> writes:
> At 15:41 19/08/2002 +0800, Christopher Kings-Lynne wrote:
>> So it seems Philip already has what he wants?

> I really hope so, but my understanding is that this information is used 
> during optimization, not execution; I want it to be used in execution.

Philip is correct that there is no cacheing of the sort he wants.

The existing volatility classifications are somewhat relevant in the
sense that an IMMUTABLE or STABLE function would be legal to cache the
way he wants ... but the system doesn't do so, it only uses these
classifications to drive before-the-query constant folding and decisions
about whether indexscans are safe.

I would resist any attempt to install cacheing by default for immutable
or stable functions.  Philip was proposing that function authors would
have to explicitly ask for cacheing (ie, add another function property)
and that seems a necessary component of an acceptable solution IMHO.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Function result cacheing - any comments?

2002-08-19 Thread Tom Lane

Philip Warner <[EMAIL PROTECTED]> writes:
> My theory is that if such a piece of code gets a performance gain, then the 
> code is probably worth including, assuming that the function manager does 
> not need to be butchered to achieve the desired goal. Does that sound 
> reasonable?

Some real results would certainly bolster your case.

> So the obvious question is - in the opinion of people who know the code, 
> can a function-result-cache be implemented with a lifetime of a single 
> statement, without butchering the function manager?

I'd suggest trying to make it a function call handler.  Look at the way
Peter did "SECURITY DEFINER" functions for inspiration.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Function result cacheing - any comments?

2002-08-19 Thread Philip Warner

At 15:41 19/08/2002 +0800, Christopher Kings-Lynne wrote:
>So it seems Philip already has what he wants?

I really hope so, but my understanding is that this information is used 
during optimization, not execution; I want it to be used in execution.





Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
  |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Function result cacheing - any comments?

2002-08-19 Thread Christopher Kings-Lynne

> >What Philip seems to be asking for is a mechanism where by if a function
> >is marked as being mathematically deterministic (given a
> particular set of
> >parameters the same result is always returned -- eg: cos(), sin(),
> >etc) then the result is cached and next time the function is called with
> >the same argument(s) the result is retrieved from the cache
> instead of the
> >function being run again.
>
> Exactly. But obviously not limited to simple mathematical functions.

>From 7.3 docs:

http://candle.pha.pa.us/main/writings/pgsql/sgml/sql-createfunction.html

CREATE [ OR REPLACE ] FUNCTION name ( [ argtype [, ...] ] )
RETURNS rettype
  { LANGUAGE langname
| IMMUTABLE | STABLE | VOLATILE
| CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT
| [EXTERNAL] SECURITY INVOKER | [EXTERNAL] SECURITY DEFINER
| AS 'definition'
| AS 'obj_file', 'link_symbol'
  } ...
[ WITH ( attribute [, ...] ) ]

And:

IMMUTABLE
STABLE
VOLATILE
These attributes inform the system whether it is safe to replace multiple
evaluations of the function with a single evaluation, for run-time
optimization. At most one choice should be specified. If none of these
appear, VOLATILE is the default assumption.

IMMUTABLE indicates that the function always returns the same result when
given the same argument values; that is, it does not do database lookups or
otherwise use information not directly present in its parameter list. If
this option is given, any call of the function with all-constant arguments
can be immediately replaced with the function value.

STABLE indicates that within a single table scan the function will
consistently return the same result for the same argument values, but that
its result could change across SQL statements. This is the appropriate
selection for functions whose results depend on database lookups, parameter
variables (such as the current time zone), etc. Also note that the
CURRENT_TIMESTAMP family of functions qualify as stable, since their values
do not change within a transaction.

VOLATILE indicates that the function value can change even within a single
table scan, so no optimizations can be made. Relatively few database
functions are volatile in this sense; some examples are random(), currval(),
timeofday(). Note that any function that has side-effects must be classified
volatile, even if its result is quite predictable, to prevent calls from
being optimized away; an example is setval().

So it seems Philip already has what he wants?

Chris


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Function result cacheing - any comments?

2002-08-19 Thread Philip Warner

At 17:03 19/08/2002 +1000, Gavin Sherry wrote:

>What Philip seems to be asking for is a mechanism where by if a function
>is marked as being mathematically deterministic (given a particular set of
>parameters the same result is always returned -- eg: cos(), sin(),
>etc) then the result is cached and next time the function is called with
>the same argument(s) the result is retrieved from the cache instead of the
>function being run again.

Exactly. But obviously not limited to simple mathematical functions.





Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
  |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Function result cacheing - any comments?

2002-08-18 Thread Christopher Kings-Lynne

> What Philip seems to be asking for is a mechanism where by if a function
> is marked as being mathematically deterministic (given a particular set of
> parameters the same result is always returned -- eg: cos(), sin(),
> etc) then the result is cached and next time the function is called with
> the same argument(s) the result is retrieved from the cache instead of the
> function being run again.

I was under the impression that the sin, cos, tan and like functions are
marked non-volatile in the system catalogs and so are evaluated once per
transaction only.

Chris


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Function result cacheing - any comments?

2002-08-18 Thread Gavin Sherry

On Sun, 18 Aug 2002, Joe Conway wrote:

> Philip Warner wrote:
> > So the obvious question is - in the opinion of people who know the code, 
> > can a function-result-cache be implemented with a lifetime of a single 
> > statement, without butchering the function manager?
> > 
> 
> I don't know if I fully understand what you're proposing, but if I 

Hi Joe,

What Philip seems to be asking for is a mechanism where by if a function
is marked as being mathematically deterministic (given a particular set of
parameters the same result is always returned -- eg: cos(), sin(),
etc) then the result is cached and next time the function is called with
the same argument(s) the result is retrieved from the cache instead of the
function being run again.

If I have got this correct, there is merit in this request. It is a
feature of other databases (such as oracle) and SQL99 provides for a
differentiation between deterministic and 'possibly non-deterministic'
routines. It does not discuss, to my knowledge, how this information
should be used.

I do not know if it is worth while implementing a deterministic function
result cache in Postgres -- I haven't looked at the complexity. Needless
to say, it would not be trivial :-).

Gavin



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Function result cacheing - any comments?

2002-08-18 Thread Philip Warner

At 22:29 18/08/2002 -0700, Joe Conway wrote:
>create function get_manager_names() returns setof manager_names as
> 'select d.id, p.name from departments d, people p
>  where p.id = d.manager_id' language sql;
>
>select p.name, m.name as boss from people p, get_manager_names() m where 
>p.department_id = m.dept_id;
>...
>Is this anything close what you had in mind?

Nice thought, and it probably works for the example I gave, but in the case 
where the secondary table is large potentially large, I think it falls down.

To give an example, in my case I have a function 'has_access_to_object' 
which does access checking up a tree of ownership & inheritance. While the 
first level access check is always unique, subsequent ones will be executed 
more than once in a typical tree.

As a result, what I would like to implement is a new attribute for 
functions (eg. 'invariant') which tells the function manager that in the 
context of one command, if the function is called with the same args, the 
result will be the same. The idea is for the function manager(?) to 
maintain a cache of, say, up to 100K of cached function results for 
functions marked 'invariant'. A hash will be used to check if a function 
result is in the cache, and the least recently used results will be purged 
when necessary.

While my example is quite specific, the benefits would apply to any simple 
lookup function, as well as any external function that is expensive to execute.







Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
  |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Function result cacheing - any comments?

2002-08-18 Thread Joe Conway

Philip Warner wrote:
> So the obvious question is - in the opinion of people who know the code, 
> can a function-result-cache be implemented with a lifetime of a single 
> statement, without butchering the function manager?
> 

I don't know if I fully understand what you're proposing, but if I 
understand it correctly, I think the table function feature in current 
sources does just what you want already. If you can write your function 
as a table function, the results are put in a tuplestore for the 
duration of the statement, and rescanned when needed.

Your example ends up looking like this:

create table departments(id integer, name text, manager_id integer);
insert into departments values(1, 'manufacturing', 1);
insert into departments values(2, 'accounting', 2);

create table people(id integer, department_id, name text);
insert into people values(1, 1, 'mfg boss');
insert into people values(2, 2, 'acct boss');
insert into people values(3, 1, 'mfg emp');
insert into people values(4, 2, 'acct emp');

create type manager_names as (dept_id int, name text);

create function get_manager_names() returns setof manager_names as
 'select d.id, p.name from departments d, people p
  where p.id = d.manager_id' language sql;

select p.name, m.name as boss from people p, get_manager_names() m where 
p.department_id = m.dept_id;
name|   boss
---+---
  mfg boss  | mfg boss
  mfg emp   | mfg boss
  acct boss | acct boss
  acct emp  | acct boss
(4 rows)

Is this anything close what you had in mind?

Joe


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Function result cacheing - any comments?

2002-08-18 Thread Philip Warner


OK - I assume from everybody else's silence that they either (a) agree with 
the idea, or (b) think Tom hit the idea on the head, so they feel they 
don't need to respond.

So what I would like to do is implement a simple version of this to attempt 
to justify my claims of performance gains. The sort of trivial places where 
I think gains *may* be had are:

create table departments(id integer, name text, manager_id integer);
create table people(id integer, department_id, name text);

create function get_manager_name(integer) returns text as
 'select name from departments d, people p
  where d.id = $1 and p.id = d.manager_id';

select name,get_manager_name(department_id) from people;

This is obviously a case where a LOJ or column-select would do the trick, 
*but* it does represent a class of problems that people frequently write 
procedures to perform a single (sometimes complex) action. Using a function 
also encapsulates some knowledge of the data structures, resulting in more 
maintainable code.

eg. even the above simple example becomes a lot less readable and maintainable:

select name,
 (select m.name from departments d, people m
 where d.id = p.department_id and m.id = d.manager_id) as manager_name
  from people p;

if a function is not used.

My theory is that if such a piece of code gets a performance gain, then the 
code is probably worth including, assuming that the function manager does 
not need to be butchered to achieve the desired goal. Does that sound 
reasonable?

So the obvious question is - in the opinion of people who know the code, 
can a function-result-cache be implemented with a lifetime of a single 
statement, without butchering the function manager?




Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
  |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Function result cacheing

2002-08-16 Thread Philip Warner

At 00:18 17/08/2002 -0400, Tom Lane wrote:
>Philip Warner <[EMAIL PROTECTED]> writes:
> > Obviously this is not a 7.3 item, but would people support such
> > functionality going into a future version?
>
>Actually, I wouldn't.

This forces application-based caches, which in turn need indexed local 
temporary tables, and ideally the ability to either check if they exist, or 
a CREATE...IF NOT EXISTS. And I'd guess the indexes would not be used, 
whereas the 'checksum on args' model comes close to hash-index performance.


>I can think of very few situations where
>such caching is useful,

Aside, of course, from any external functions that for whatever reason are 
expensive to execute, and which will be passwed the same args more than 
once in a single SELECT. As well as any functions that do complex lookups 
on reference data in the database; in short anything that only reads data 
and which does more than a simple lookup, and which gets the same args more 
than once.


>  and I don't believe that the mechanism required
>would pay for itself.

In what sense? The mechanism is close to cost-free if the flag is not set 
on the function, and would presumably only be set by the definer if there 
was likely to be a benefit. Coming from a database that supports such 
functions, I *know* they can help a great deal.


>In the cases where a cache does make sense,
>it's sufficiently application-specific that a generic "cache on a key
>consisting of the function arguments" isn't the right thing anyway;

Not for the the uses I have.


>you'll find you want some internal logic to decide what to cache and
>what key to use to retrieve it.

No, I don't. I am very happy with function parameters being used.


>   Furthermore, a generic cache will have
>no clue whatever about cache-invalidating events, thus further
>restricting its usefulness.

This is true, but mainly an argument for cacheing at the statement level; 
TX level cacheing seems like a bad idea. It's a matter for application 
design to ensure that when a developer marks a function as invariant, then 
they mean it. If it really becomes a problem, then *maybe* we need an 
application-level cache invalidation, but it seems very unlikely to be 
a  problem.

>   (Your suggestion of "flush at transaction
>end" is too short-term for most applications, too long-term for some,
>and just right for hardly any.)

I actually suggested two options, and would personally prefer 
flush-at-statement-end.


>Build the cache internally to your function if you need it.

Not too keen on building cacheing code into 3 different functions just on 
the one database;  and doing the same on another which also would benefit.






Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
  |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Function result cacheing

2002-08-16 Thread Tom Lane

Philip Warner <[EMAIL PROTECTED]> writes:
> Obviously this is not a 7.3 item, but would people support such 
> functionality going into a future version?

Actually, I wouldn't.  I can think of very few situations where
such caching is useful, and I don't believe that the mechanism required
would pay for itself.  In the cases where a cache does make sense,
it's sufficiently application-specific that a generic "cache on a key
consisting of the function arguments" isn't the right thing anyway;
you'll find you want some internal logic to decide what to cache and
what key to use to retrieve it.  Furthermore, a generic cache will have
no clue whatever about cache-invalidating events, thus further
restricting its usefulness.  (Your suggestion of "flush at transaction
end" is too short-term for most applications, too long-term for some,
and just right for hardly any.)

Build the cache internally to your function if you need it.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



[HACKERS] Function result cacheing

2002-08-16 Thread Philip Warner


This has been discussed before in the context of misunderstanding the 
meaning of 'iscachable', but I now have a use for cached function results, 
and have seen at least one other posting with a similar need.

The reason I need it is that I have a few functions that do recursive 
inheritance lookups going up a converging inheritance tree. Typically this 
function will be called on several hundred objects in a single select 
statement. Because of inheritance, it ends up with several thousand 
function calls, each of which is non-trivial.

A solution that would be useful for me would be:

If a function is marked 'invariant' (or something similar), then

- cache the most recently used 20 calls (config item) iff the args were 
less than 1K in total storage (ie. don't cache large text blocks),

- calculate a very simple checksum on the args

- when a function is to be evaluated, calc the checksum and if a match is 
found, compare the args, and if they all match, return the result.

I would anticipate deleting the cache when the current command exits, 
and/or certainly when a TX ends.

Obviously this is not a 7.3 item, but would people support such 
functionality going into a future version?






Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
  |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster