Re: [HACKERS] modifying the tbale function

2007-03-20 Thread Andrew Dunstan

Florian G. Pflug wrote:

Are we really sure that this isn't a solution in search of a problem?

I think this really depends on how you define problem. Some people
might think that select * from myfunc(...) limit 1 should stop and
return a result after myfunc(...) has returned one row. Others will
say well, just use a different software design that doesn't depend
on this optimization.




I think that's a very thin use case to justify all the scaffolding you 
propose, so put me in with the others I guess.


If we can provide a per-call mechanism for C functions, that should be 
adequate, I think. If you need heavy optimization then you need C 
functions anyway.


cheers

andrew


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] modifying the tbale function

2007-03-20 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-03-20 kell 09:35, kirjutas Andrew Dunstan:
 Florian G. Pflug wrote:
  Are we really sure that this isn't a solution in search of a problem?
  I think this really depends on how you define problem. Some people
  might think that select * from myfunc(...) limit 1 should stop and
  return a result after myfunc(...) has returned one row. Others will
  say well, just use a different software design that doesn't depend
  on this optimization.

The same argument could be used for justifying leaving out any
optimisations.

 I think that's a very thin use case to justify all the scaffolding you 
 propose, so put me in with the others I guess.
 
 If we can provide a per-call mechanism for C functions, that should be 
 adequate, I think. 

Sure, having it at least at C level would make the job of pl
implementors wanting to do per-call returns a little easier.

 If you need heavy optimization then you need C 
 functions anyway.

Check out pl/proxy ( http://pgfoundry.org/projects/plproxy ), which is a
dblink replacement and database partitioning tool in one. A short
summary is here
https://developer.skype.com/SkypeGarage/DbProjects/PlProxy (may be a bit
outdated).

I'm sure there would be use-cases (like moving huge tables) where
per-call (or per-kilocall :) ) mechanisms would come handy.

One example would be doing a huge sort split between N partitions so
that each partition sorts its part of the data and then merge the
results in-order from all partitions in-line, without doing re-sorting
on the master.

With a function returning the full recordset the sorting can't be
distributed very effectively, as the whole query result needs to be
saved on the master node before returning.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] modifying the tbale function

2007-03-20 Thread Florian G. Pflug

Tom Lane wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:

Here is a paper about portable userspace threads that I just googled.
http://www.gnu.org/software/pth/rse-pmt.ps


I'm not impressed --- the list of unsolved problems is a bit too long.
One that seems a showstopper for our purposes is lack of ability to
deal reliably with stack overflow on alternate stacks.  If we're going
to be doing anything as loosely defined as running a third-party
language interpreter in these stacks, that just won't do.

Another little problem is it's LGPL.


I don't think using that GnuPT library the paper mentions is a viable
approach for postgres. I just posted that link because it shows
that this is not impossible to do.

What would actually be interesting is to find out of the
ucontext/getcontext/makecontext/swapcontext/setcontext
functions mentioned in the paper are available on the
plattform where postgres is used in production for larger
dataset. I'd assume that people who need this PL optimization
will run a quite update-to-date version of their particular OS,
so maybe the portability problems of those functions wouldn't
be a problem for postgres - the PL optimization would just be
disabled at configure time if they are not available.

The main question is: IF the stability problems like stack overflow
can be addressed, would this be in principle considered to be feature
that people would like to have? Or is it considered not worth the effort?

greetings, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Heikki Linnakangas

Tom Lane wrote:

Neil Conway [EMAIL PROTECTED] writes:
Returning control to the backend for every row returned would likely be 
excessive, but you could return once every k rows and get most of the 
benefits of both approaches (k might be on the order of 1000).


However, this still leaves us with no idea how to persuade perl, tcl,
python, et al to cooperate.


It seem like a useful optimization for C-functions, though. I was caught 
by surprise a while ago when I realized that the way I've been using to 
create simple test data quickly:


INSERT INTO foo SELECT key FROM generate_series(1, large number) key

materializes the generate_series result set first.

I'd like to have that changed, even if we leave the behavior as it is 
for PLs.


Another affected use case is using dblink to copy large tables.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Andrew Dunstan

Heikki Linnakangas wrote:

Tom Lane wrote:

Neil Conway [EMAIL PROTECTED] writes:
Returning control to the backend for every row returned would likely 
be excessive, but you could return once every k rows and get most of 
the benefits of both approaches (k might be on the order of 1000).


However, this still leaves us with no idea how to persuade perl, tcl,
python, et al to cooperate.


It seem like a useful optimization for C-functions, though. I was 
caught by surprise a while ago when I realized that the way I've been 
using to create simple test data quickly:






Actually, I think we could teach the PLs to do it - just not 
transparently, so we'd need to mark which functions used the new 
protocol. Such functions would get a state object as an implied first 
argument, so in plperl it might work like this (for a 
generate_series-like function):


   my $state = shift;
   my $low = shift;
   my $high = shift;
   if ($state-{callstatus} eq 'firstcall')
   {
  $state-{counter} = $low;
   }
   elseif ($state-{callstatus} eq 'cleanup')
   {
  # do cleanup here
  $state-{return_status} = 'cleaned';
  return;
   }
   my $next  = $state-{counter}++;
   $state-{return_status}  = $next  $high ? 'result' : 'last_result';
   return $next;


To support this I think we'd need to do something like:

 create function mygs(int, int)
   returns setof int
   language plperl
   with srfstate
   as $$ ... $$;



cheers

andrew




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Richard Huxton

Andrew Dunstan wrote:


Actually, I think we could teach the PLs to do it - just not 
transparently, so we'd need to mark which functions used the new 
protocol. Such functions would get a state object as an implied first 
argument, so in plperl it might work like this (for a 
generate_series-like function):



To support this I think we'd need to do something like:

 create function mygs(int, int)
   returns setof int
   language plperl
   with srfstate
   as $$ ... $$;


Is this not what we do with aggregate functions at present?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Andrew Dunstan

Richard Huxton wrote:

Andrew Dunstan wrote:


Actually, I think we could teach the PLs to do it - just not 
transparently, so we'd need to mark which functions used the new 
protocol. Such functions would get a state object as an implied first 
argument, so in plperl it might work like this (for a 
generate_series-like function):



To support this I think we'd need to do something like:

 create function mygs(int, int)
   returns setof int
   language plperl
   with srfstate
   as $$ ... $$;


Is this not what we do with aggregate functions at present?



Yes, more or less. That's what made me think of it.

OTOH, before we rush out and do it someone needs to show that it's a net 
win. I agree with Tom that making tuplestore faster would probably be a 
much better investment of time.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Richard Huxton wrote:
 Is this not what we do with aggregate functions at present?

 Yes, more or less. That's what made me think of it.

 OTOH, before we rush out and do it someone needs to show that it's a net 
 win.

Yeah, because this isn't doing anything to address the problem of
entry/exit overhead from calling a PL function many times.  I kinda
dislike shoving the problem onto the heads of PL programmers anyway...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Gregory Stark

Andrew Dunstan [EMAIL PROTECTED] writes:

 Yes, more or less. That's what made me think of it.

 OTOH, before we rush out and do it someone needs to show that it's a net win. 
 I
 agree with Tom that making tuplestore faster would probably be a much better
 investment of time.

I don't think the problem with the tuplestore is a matter of speed. It's a
matter of scalability and flexibility. It limits the types of applications
that can use SRFs and the amount of data they can manipulate before it becomes
impractical.

Consider applications like dblink that have SRFs that read data from a slow
network sources. Or that generate more data than the server can actually store
at any one time. Or that overflow work_mem but are used in queries that could
return quickly based on the first few records.

Unfortunately, I don't think there's a simple fix that'll work for all PLs
using the current interface. Even languages with iterators themselves (python,
I think) probably don't expect to be called externally while an iterator is in
progress.

It seems to me the way to fix it is to abandon the iterator style interface
and an interface that allows you to implement a SRF by providing a function
that returns just the next record. It would have to save enough state for
the next iteration explicitly in a data structure rather than being able to
depend on the entire program state being restored.

You could argue you could already do this using a non-SRF but there are two
problems: 1) there's no convenient way to stash the state anywhere and 2) it
wouldn't be convenient to use in SQL in FROM clauses the way SRFs are.

IIRC there is already a hack in at least one of the PLs to stash data in a
place where you can access it in repeated invocations. It doesn't work
correctly if you call your function from two different places from a query. It
would take executor support for such state data structures to fix that
problem.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Florian G. Pflug

Gregory Stark wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:


Yes, more or less. That's what made me think of it.

OTOH, before we rush out and do it someone needs to show that it's a net win. I
agree with Tom that making tuplestore faster would probably be a much better
investment of time.


I don't think the problem with the tuplestore is a matter of speed. It's a
matter of scalability and flexibility. It limits the types of applications
that can use SRFs and the amount of data they can manipulate before it becomes
impractical.

Consider applications like dblink that have SRFs that read data from a slow
network sources. Or that generate more data than the server can actually store
at any one time. Or that overflow work_mem but are used in queries that could
return quickly based on the first few records.

Unfortunately, I don't think there's a simple fix that'll work for all PLs
using the current interface. Even languages with iterators themselves (python,
I think) probably don't expect to be called externally while an iterator is in
progress.


Just a thought - I believe that there are portable user-space thread 
implementations that contain little or no machine-specific code. What

if postgres used one of those to switch from the PL into the executor
and back after, say, 1000 rows were returned by the SFR?

What would be needed is basically some enhanced version of setjmp/longjmp
that actually saves the stack, and not just resets the stackpointer.

Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache, but still allow to switch in and out of the PL interpreter.

greetings, Florian Pflug

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Andrew Dunstan

Florian G. Pflug wrote:
Just a thought - I believe that there are portable user-space thread 
implementations that contain little or no machine-specific code. What

if postgres used one of those to switch from the PL into the executor
and back after, say, 1000 rows were returned by the SFR?

What would be needed is basically some enhanced version of setjmp/longjmp
that actually saves the stack, and not just resets the stackpointer.

Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache, but still allow to switch in and out of the PL interpreter.




This just sounds horribly fragile.

Are we really sure that this isn't a solution in search of a problem?

cheers

andrew




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Islam Hegazy

So, I understood from all those opinions that much of the work is to be done
in the interface language interpreter not postgresql code itself. Am I right
or I missed something?

Regards
Islam Hegazy


- Original Message - 
From: Andrew Dunstan [EMAIL PROTECTED]

To: Florian G. Pflug [EMAIL PROTECTED]
Cc: Gregory Stark [EMAIL PROTECTED]; Richard Huxton
dev@archonet.com; Heikki Linnakangas [EMAIL PROTECTED]; Tom
Lane [EMAIL PROTECTED]; Neil Conway [EMAIL PROTECTED]; Martijn van
Oosterhout kleptog@svana.org; Islam Hegazy [EMAIL PROTECTED];
pgsql-hackers@postgresql.org
Sent: Monday, March 19, 2007 12:18 PM
Subject: Re: [HACKERS] modifying the tbale function



Florian G. Pflug wrote:

Just a thought - I believe that there are portable user-space thread
implementations that contain little or no machine-specific code. What
if postgres used one of those to switch from the PL into the executor
and back after, say, 1000 rows were returned by the SFR?

What would be needed is basically some enhanced version of setjmp/longjmp
that actually saves the stack, and not just resets the stackpointer.

Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache, but still allow to switch in and out of the PL interpreter.




This just sounds horribly fragile.

Are we really sure that this isn't a solution in search of a problem?

cheers

andrew







---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Joe Conway

Andrew Dunstan wrote:


Are we really sure that this isn't a solution in search of a problem?


The need for value-per-call is real (examples mentioned down-thread) and 
was anticipated from day one of the SRF implementation (in fact the 
first patch I wrote was value-per-call, not materialize). But when we 
realized that value-per-call was not going to work very well for any PL 
*except* C-functions, we switched to SFRM_Materialize as the only 
supported mode, with SFRM_ValuePerCall left as a to-be-coded-later 
option (see SetFunctionReturnMode in execnodes.h).


Personally I think it is worth having SFRM_ValuePerCall even if only C 
functions can make use of it.


Joe


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Andrew Dunstan

Joe Conway wrote:

Andrew Dunstan wrote:


Are we really sure that this isn't a solution in search of a problem?


The need for value-per-call is real (examples mentioned down-thread) 
and was anticipated from day one of the SRF implementation (in fact 
the first patch I wrote was value-per-call, not materialize). But when 
we realized that value-per-call was not going to work very well for 
any PL *except* C-functions, we switched to SFRM_Materialize as the 
only supported mode, with SFRM_ValuePerCall left as a 
to-be-coded-later option (see SetFunctionReturnMode in execnodes.h).


Personally I think it is worth having SFRM_ValuePerCall even if only C 
functions can make use of it.





Yeah, makes plenty of sense for C funcs. I don't think there's an 
argument about that. But for that we don't need any threading 
infrastructure.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Joe Conway

Andrew Dunstan wrote:

Joe Conway wrote:

Andrew Dunstan wrote:

Are we really sure that this isn't a solution in search of a problem?
The need for value-per-call is real (examples mentioned down-thread) 
and was anticipated from day one of the SRF implementation (in fact 
the first patch I wrote was value-per-call, not materialize). But when 
we realized that value-per-call was not going to work very well for 
any PL *except* C-functions, we switched to SFRM_Materialize as the 
only supported mode, with SFRM_ValuePerCall left as a 
to-be-coded-later option (see SetFunctionReturnMode in execnodes.h).


Personally I think it is worth having SFRM_ValuePerCall even if only C 
functions can make use of it.


Yeah, makes plenty of sense for C funcs. I don't think there's an 
argument about that. But for that we don't need any threading 
infrastructure.


Oh sure -- sorry I wasn't clear. I wasn't trying to support the idea of 
threading so much as the idea that value-per-call itself has merit for a 
number of use cases.


Joe


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Gregory Stark
Florian G. Pflug [EMAIL PROTECTED] writes:

 Since context switching would occur only at two well-defined places
 (Some return_next_row function that PLs call when a SFR returns a row,
 and in the executor if no more previously returned rows from that SFR
 are available), this wouldn't introduce the usual multithreading
 headache...

Yes it would. Consider what happens if the PL function calls into SPI to
execute a query

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Florian G. Pflug

Andrew Dunstan wrote:

Florian G. Pflug wrote:
Just a thought - I believe that there are portable user-space thread 
implementations that contain little or no machine-specific code. What

if postgres used one of those to switch from the PL into the executor
and back after, say, 1000 rows were returned by the SFR?

What would be needed is basically some enhanced version of setjmp/longjmp
that actually saves the stack, and not just resets the stackpointer.

Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache, but still allow to switch in and out of the PL interpreter.



This just sounds horribly fragile.


Why would it be? It's about the same as running postgresql in one thread,
and some PL in another. This should only cause trouble if both use some
non-reentrant libc-functions. But even that wouldn't matter because of
the well-defined context switching points.

Here is a paper about portable userspace threads that I just googled.
http://www.gnu.org/software/pth/rse-pmt.ps


Are we really sure that this isn't a solution in search of a problem?

I think this really depends on how you define problem. Some people
might think that select * from myfunc(...) limit 1 should stop and
return a result after myfunc(...) has returned one row. Others will
say well, just use a different software design that doesn't depend
on this optimization.

greetings, Florian Pflug



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Florian G. Pflug

Gregory Stark wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:


Since context switching would occur only at two well-defined places
(Some return_next_row function that PLs call when a SFR returns a row,
and in the executor if no more previously returned rows from that SFR
are available), this wouldn't introduce the usual multithreading
headache...


Yes it would. Consider what happens if the PL function calls into SPI to
execute a query


I don't that would cause trouble. Postgres and the PL would still not run 
concurrently - in fact, from a control-flow point of view the PL would run at 
_exactly_ the same place as it is now. Precisely if someone tries to pull rows 
out of some executor node. The only difference is that now it's execution would

would be stopped and restarted multiple times by some sort of stack-switching.

There might be trouble if a second function has to be executed with the same PL
as an already running (but currently stopped) function. This would only work
for PL that is thread-safe in some way.

greetings, Florian Pflug


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 Here is a paper about portable userspace threads that I just googled.
 http://www.gnu.org/software/pth/rse-pmt.ps

I'm not impressed --- the list of unsolved problems is a bit too long.
One that seems a showstopper for our purposes is lack of ability to
deal reliably with stack overflow on alternate stacks.  If we're going
to be doing anything as loosely defined as running a third-party
language interpreter in these stacks, that just won't do.

Another little problem is it's LGPL.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] modifying the tbale function

2007-03-19 Thread Tom Lane
Florian G. Pflug [EMAIL PROTECTED] writes:
 There might be trouble if a second function has to be executed with
 the same PL as an already running (but currently stopped)
 function. This would only work for PL that is thread-safe in some way.

Seems a bit iffy.

It strikes me that at least for plpgsql, it might be possible to support
value-per-call mode without any thread support.  What you'd need to do
is get rid of the current arrangement whereby the control structure of
the plpgsql code is modeled on-the-fly by the call stack of the C code,
and instead have block nesting, loops, etc represented by explicit data
structures that're manipulated by C code with a flat call stack.  If the
function wants to do a RETURN NEXT, you just return, leaving its current
state all nicely tucked in a data structure.  This would be a little
tedious but is in principle a straightforward change.  I'm not sure if
there'd be any meaningful performance impact.

The tricky part is what about exception handling?  If the function does
RETURN NEXT inside a BEGIN/EXCEPTION block, what do you do ... what does
that even mean?  There be equally nasty dragons lurking behind that
question for a threaded implementation, of course.  It might be that we
could get away with decreeing that RETURN NEXT inside EXCEPTION isn't
legal.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Martijn van Oosterhout
On Sun, Mar 18, 2007 at 01:54:55PM -0600, Islam Hegazy wrote:
 I am trying to modify the execution of the table function to work in
 iterator fashion instead of materializing the output. I have been
 digging the Postgresql code source for about a month now and I can
 figure out where the execution of the table function works. I will be
 very grateful if anyone tell where to begin as my project due is
 after 10 days only.

I've been thinking recently about why it's so difficult. It occurs to
me that the problem is because the language interpreters don't lend
themselves to being an iterator.

What you want is that when you call a perl tablefunction that as soon
as the perl function returns a row to return that to the caller. That
means the perl interpreter has to be able to save all its state,
return to the caller and when next called resume where it left off.
I don't know if it can do that, but it would have to be implemented for
each language (or use threads).

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Andrew Dunstan



Martijn van Oosterhout wrote:

What you want is that when you call a perl tablefunction that as soon
as the perl function returns a row to return that to the caller. That
means the perl interpreter has to be able to save all its state,
return to the caller and when next called resume where it left off.
I don't know if it can do that, but it would have to be implemented for
each language (or use threads).


  


We haven't even worked out how to do that cleanly for plpgsql, which we 
control, let alone for any third party interpreter.


I'm not convinced it would be a huge gain anyway. Switching madly in and 
out of the perl interpreter at least is a known performance problem, 
IIRC - Perl XML objects have, or used to have, problems with that (and 
they still don't perform terribly well).


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Neil Conway

Andrew Dunstan wrote:
I'm not convinced it would be a huge gain anyway. Switching madly in 
and out of the perl interpreter at least is a known performance 
problem, IIRC


Returning control to the backend for every row returned would likely be 
excessive, but you could return once every k rows and get most of the 
benefits of both approaches (k might be on the order of 1000). The 
problem with the current approach is that it makes returning large 
result sets from PL functions very expensive, since they need to be 
spooled to disk.


As for using threads, that's pretty much a non-starter: we can't safely 
allow calls into the backend from multiple concurrent threads, and I 
doubt that will chance any time soon.


-Neil


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Islam Hegazy
Returning k rows would be a reasonable solution but which functions need to 
be modified to achieve this.



- Original Message - 
From: Neil Conway [EMAIL PROTECTED]

To: Andrew Dunstan [EMAIL PROTECTED]
Cc: Martijn van Oosterhout kleptog@svana.org; Islam Hegazy 
[EMAIL PROTECTED]; pgsql-hackers@postgresql.org

Sent: Sunday, March 18, 2007 4:57 PM
Subject: Re: [HACKERS] modifying the tbale function



Andrew Dunstan wrote:
I'm not convinced it would be a huge gain anyway. Switching madly in and 
out of the perl interpreter at least is a known performance problem, IIRC


Returning control to the backend for every row returned would likely be 
excessive, but you could return once every k rows and get most of the 
benefits of both approaches (k might be on the order of 1000). The problem 
with the current approach is that it makes returning large result sets 
from PL functions very expensive, since they need to be spooled to disk.


As for using threads, that's pretty much a non-starter: we can't safely 
allow calls into the backend from multiple concurrent threads, and I doubt 
that will chance any time soon.


-Neil


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Returning control to the backend for every row returned would likely be 
 excessive, but you could return once every k rows and get most of the 
 benefits of both approaches (k might be on the order of 1000).

However, this still leaves us with no idea how to persuade perl, tcl,
python, et al to cooperate.

I think you are underestimating the cost of suspending/resuming any of
those interpreters, and overestimating the cost of a tuplestore, which
on a per-tuple basis is really pretty cheap.  It's quite likely that the
proposed project would produce piddling or negative gains, after
expending a huge amount of work.  (A tenth of the effort on optimizing
tuplestore some more would probably be a better investment.)

A cross-check on this theory could be made without a lot of effort: hack
SQL functions to use a tuplestore (fed via the tuplestore destreceiver,
so as not to exit the executor) instead of return-after-every-tuple.
Compare performance.  I kinda suspect you'll find it a loss even there.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate