Re: [HACKERS] 7.3 schedule

2002-04-17 Thread Bruce Momjian


I have added these emails to TODO.detail/prepare.

---

Karel Zak wrote:
 On Fri, Apr 12, 2002 at 12:41:34AM -0400, Neil Conway wrote:
  On Fri, 12 Apr 2002 12:58:01 +0900
  Hiroshi Inoue [EMAIL PROTECTED] wrote:
   
   Just a confirmation.
   Someone is working on PREPARE/EXECUTE ?
   What about Karel's work ?
 
  Right question :-)
  
  I am. My work is based on Karel's stuff -- at the moment I'm still
  basically working on getting Karel's patch to play nicely with
  current sources; once that's done I'll be addressing whatever
  issues are stopping the code from getting into CVS.
 
  My patch (qcache) for PostgreSQL 7.0 is available at 
  ftp://ftp2.zf.jcu.cz/users/zakkr/pg/.
  
  I very look forward to Neil's work on this. 
 
  Notes:
 
 * It's experimental patch, but usable. All features below mentioned 
   works.
 
 * PREPARE/EXECUTE is not only SQL statements, I think good idea is
   create something common and robus for query-plan caching,
   beacuse there is for example SPI too. The RI triggers are based 
   on SPI_saveplan(). 
  
 * My patch knows EXECUTE INTO feature:
 
  PREPARE foo AS SELECT * FROM pg_class WHERE relname ~~ $1 USING text;
 
  EXECUTE foo USING 'pg%';-- standard select
 
  EXECUTE foo INTO TEMP newtab USING 'pg%';  -- select into
  
  
 * The patch allows store query-planns to shared memory and is
   possible EXECUTE it at more backends (over same DB) and planns
   are persistent across connetions. For this feature I create special 
   memory context subsystem (like current aset.c, but it works with 
   IPC shared memory).
   
   This is maybe too complex solution and (maybe) sufficient is cache 
   query in one backend only. I know unbelief about this shared
   memory solution (Tom?). 
   
 
 Karel
   
   
  My experimental patch README (excuse my English):
 
  Implementation
  ~~
 
The qCache allows save queryTree and queryPlan. There is available are 
two space for data caching. 
   
LOCAL  - data are cached in backend non-shared memory and data aren't
 available in other backends.  
   
SHARE  - data are cached in backend shared memory and data are 
 visible in all backends.
   
Because size of share memory pool is limited and it is set during
postmaster start up, the qCache must remove all old planns if pool is 
full. You can mark each entry as REMOVEABLE or NOTREMOVEABLE. 
   
A removeable entry is removed if pool is full.
   
A not-removeable entry must be removed via qCache_Remove() or 
the other routines. The qCache not remove this entry itself.
   
All records in qCache are cached (in the hash table) under some key.
The qCache knows two alternate of key --- KEY_STRING and KEY_BINARY. 
   
The qCache API not allows access to shared memory, all cached planns that 
API returns are copy to CurrentMemoryContext. All (qCache_ ) routines lock 
shmem itself (exception is qCache_RemoveOldest_ShareRemoveAble()).
 
 - for locking is used spin lock.
 
Memory management
~
The qCache use for qCache's shared pool its memory context independent on
standard aset/mcxt, but use compatible API --- it allows to use standard
palloc() (it is very needful for basic plan-tree operations, an example 
for copyObject()). The qCache memory management is very simular to current
aset.c code. It is chunk-ed blocks too, but the block is smaller - 1024b.
 
The number of blocks is available set in postmaster 'argv' via option
'-Z'.
 
For plan storing is used separate MemoryContext for each plan, it 
is good idea (Hiroshi's ?), bucause create new context is simple and 
inexpensive and allows easy destroy (free) cached plan. This method is 
used in my SPI overhaul instead TopMemoryContext feeding.
 
Postmaster
~~
The query cache memory is init during potmaster startup. The size of
query cache pool is set via '-Z number-of-blocks' switch --- default 
is 100 blocks where 1 block = 1024b, it is sufficient for 20-30 cached
planns. One query needs somewhere 3-10 blocks, for example query like
 
 PREPARE sel AS SELECT * FROM pg_class;
 
needs 10Kb, because table pg_class has very much columns. 
  
Note: for development I add SQL function: SELECT qcache_state();,
  this routine show usage of qCache.
 
  SPI
  ~~~
 I a little overwrite SPI save plan method and remove TopMemoryContext
 feeding.
 
 Standard SPI:
 
 SPI_saveplan() - save each plan to separate standard memory context.
 
 SPI_freeplan() - free plan.
 
 By key SPI:
 
 It is SPI interface for query cache and allows save planns to SHARED
 or LOCAL cache 'by' arbitrary key (string or binary). Routines:
 

Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)

2002-04-16 Thread Peter Eisentraut

Tom Lane writes:

 The regression tests contain no very-long literals.  The results I was
 referring to concerned cases with string (BLOB) literals in the
 hundreds-of-K range; it seems that the per-character loop in the flex
 lexer starts to look like a bottleneck when you have tokens that much
 larger than the rest of the query.

 Solutions seem to be either (a) make that loop quicker, or (b) find a
 way to avoid passing BLOBs through the lexer.  I was merely suggesting
 that (a) should be investigated before we invest the work implied
 by (b).

I've done the following test:  Ten statements of the form

SELECT 1 FROM tab1 WHERE val = '...';

where ... are literals of length 5 - 10 MB (some random base-64 encoded
MP3 files).  tab1 was empty.  The test ran 3:40 min wall-clock time.

Top ten calls:

  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 36.95  9.87 9.87 74882482 0.00 0.00  pq_getbyte
 22.80 15.96 6.09   11   553.64  1450.93  pq_getstring
 13.55 19.58 3.62   11   329.09   329.10  scanstr
 12.09 22.81 3.23  11029.3686.00  base_yylex
  4.27 23.95 1.14   3433.5333.53  yy_get_previous_state
  3.86 24.98 1.03   2246.8246.83  textin
  3.67 25.96 0.98   3428.8228.82  myinput
  1.83 26.45 0.49   4510.8932.67  yy_get_next_buffer
  0.11 26.48 0.03 3027 0.01 0.01  AllocSetAlloc
  0.11 26.51 0.03  129 0.23 0.23  fmgr_isbuiltin

The string literals didn't contain any backslashes, so scanstr is
operating in the best-case scenario here.  But for arbitary binary data we
need some escape mechanism, so I don't see much room for improvement
there.

It seems the real bottleneck is the excessive abstraction in the
communications layer.  I haven't looked closely at all, but it would seem
better if pq_getstring would not use pq_getbyte and instead read the
buffer directly.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)

2002-04-16 Thread Bruce Momjian

Peter Eisentraut wrote:
 The string literals didn't contain any backslashes, so scanstr is
 operating in the best-case scenario here.  But for arbitary binary data we
 need some escape mechanism, so I don't see much room for improvement
 there.
 
 It seems the real bottleneck is the excessive abstraction in the
 communications layer.  I haven't looked closely at all, but it would seem
 better if pq_getstring would not use pq_getbyte and instead read the
 buffer directly.

I am inclined to agree with your analysis.  We added abstraction to
libpq because the old code was quite poorly structured.  Now that it is
well structured, removing some of the abstraction seems valuable.

Any chance pq_getbyte could be made into a macro?  I would be glad to
send you a macro version for testing.  I would have to push the while
loop into pg_recvbuf() and change the while in pg_getbyte to an if, or
as a macro, ? :.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)

2002-04-16 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 Top ten calls:

   %   cumulative   self  self total
  time   seconds   secondscalls  ms/call  ms/call  name
  36.95  9.87 9.87 74882482 0.00 0.00  pq_getbyte
  22.80 15.96 6.09   11   553.64  1450.93  pq_getstring
  13.55 19.58 3.62   11   329.09   329.10  scanstr
  12.09 22.81 3.23  11029.3686.00  base_yylex
   4.27 23.95 1.14   3433.5333.53  yy_get_previous_state
   3.86 24.98 1.03   2246.8246.83  textin
   3.67 25.96 0.98   3428.8228.82  myinput
   1.83 26.45 0.49   4510.8932.67  yy_get_next_buffer
   0.11 26.48 0.03 3027 0.01 0.01  AllocSetAlloc
   0.11 26.51 0.03  129 0.23 0.23  fmgr_isbuiltin

Interesting.  This should be taken with a grain of salt however: gprof's
call-counting overhead is large enough to skew the results on many
machines (ie, routines that are called many times tend to show more than
their fair share of runtime).  If your profiler does not show the
counter subroutine (mcount or some similar name) separately, you
should be very suspicious of where the overhead time is hidden.

For comparison you might want to check out some similar numbers I
obtained awhile back:
http://archives.postgresql.org/pgsql-hackers/2001-12/msg00076.php
(thanks to Barry Lind for reminding me about that ;-)).  That test
showed base_yylex/addlit/scanstr as costing about twice as much as
pg_getstring/pq_getbyte.  Probably the truth is somewhere in between
your measurements and mine.

In any case it does seem that some micro-optimization in the vicinity of
the scanner's per-character costs, ie, pq_getbyte, addlit, etc would be
worth the trouble.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] 7.3 schedule

2002-04-14 Thread Karel Zak

On Fri, Apr 12, 2002 at 12:51:26PM -0400, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Certainly a shared cache would be good for apps that connect to issue a
  single query frequently.  In such cases, there would be no local cache
  to use.
 
 We have enough other problems with the single-query-per-connection
 scenario that I see no reason to believe that a shared plan cache will
 help materially.  The correct answer for those folks will *always* be
 to find a way to reuse the connection.

 My query cache was write for 7.0. If some next release will use
 pre-forked backend and after a client disconnection the backend will 
 still alives and waits for new client the shared cache is (maybe:-) not
 needful. The current backend fork model is killer of all possible 
 caching.

 We have more caches. I hope persistent backend help will help to all 
 and I'm sure that speed will grow up with persistent backend and 
 persistent caches without shared memory usage. There I can agree with
 Tom :-)

Karel

-- 
 Karel Zak  [EMAIL PROTECTED]
 http://home.zf.jcu.cz/~zakkr/
 
 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] 7.3 schedule

2002-04-14 Thread Brian Bruns

On 13 Apr 2002, Hannu Krosing wrote:

 On Fri, 2002-04-12 at 03:04, Brian Bruns wrote:
  On 11 Apr 2002, Hannu Krosing wrote:
  
   IIRC someone started work on modularising the network-related parts with
   a goal of supporting DRDA (DB2 protocol) and others in future.
  
  That was me, although I've been bogged down lately, and haven't been able 
  to get back to it.
 
 Has any of your modularisation work got into CVS yet ?

No, Bruce didn't like the way I did certain things, and had some qualms 
about the value of supporting multiple wire protocols IIRC.  Plus the 
patch was not really ready for primetime yet.  

I'm hoping to get back to it soon and sync it with the latest CVS, and 
clean up the odds and ends.

  DRDA, btw, is not just a DB2 protocol but an opengroup 
  spec that hopefully will someday be *the* standard on the wire database 
  protocol.  DRDA handles prepare/execute and is completely binary in 
  representation, among other advantages.
 
 What about extensibility - is there some predefined way of adding new
 types ?

Not really, there is some ongoing standards activity adding some new 
features.  The list of supported types is pretty impressive, anything in 
particular you are looking for?

 Also, does it handle NOTIFY ?

I don't know the answer to this.  The spec is pretty huge, so it may, but 
I haven't seen it.

Even if it is supported as a secondary protocol, I believe there is alot 
of value in having a single database protocol standard. (why else would I 
be doing it!).  I'm also looking into what it will take to do the same for 
MySQL and Firebird.  Hopefully they will be receptive to the idea as well.

 
 Hannu

Cheers,

Brian


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.3 schedule

2002-04-14 Thread Curt Sampson

On Thu, 11 Apr 2002, Barry Lind wrote:

 I'm not sure that JDBC would use this feature directly.  When a
 PreparableStatement is created in JDBC there is nothing that indicates
 how many times this statement is going to be used.  Many (most IMHO)
 will be used only once.

Well, the particular PreparedStatement instance may be used only
once, yes. But it's quite likely that other, identical PreparedStatement
objects would be used time and time again, so it's still good if
you don't need to do much work on the second and subsequent
preparations of that statement.

 If it only is used once, it will actually perform worse than
 without the feature (since you need to issue two sql statements to the
 backend to accomplish what you were doing in one before).

I'm not sure that it would be much worse unless you need to wait
for an acknowledgement from the back-end for the first statement.
If you had a back-end command along the lines of prepare this
statement and execute it with these parameters, it would have
pretty much the same performance as giving the statement directly
with the parameters already substituted in, right?

 Thus if someone wanted to use this functionality from jdbc they would
 need to do it manually, i.e. issue the prepare and execute statements
 manually instead of the jdbc driver doing it automatically for them.

I'd say that this is awfully frequent, anyway. I use PreparedStatements
for pretty much any non-constant input, because it's just not safe
or portable to try to escape parameters yourself.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] 7.3 schedule

2002-04-14 Thread Barry Lind



Curt Sampson wrote:
 On Thu, 11 Apr 2002, Barry Lind wrote:
 
 
I'm not sure that JDBC would use this feature directly.  When a
PreparableStatement is created in JDBC there is nothing that indicates
how many times this statement is going to be used.  Many (most IMHO)
will be used only once.
 
 
 Well, the particular PreparedStatement instance may be used only
 once, yes. But it's quite likely that other, identical PreparedStatement
 objects would be used time and time again, so it's still good if
 you don't need to do much work on the second and subsequent
 preparations of that statement.
 
But since the syntax for prepare is:  PREPARE name AS statement  you 
can't easily reuse sql prepared by other PreparedStatement objects since 
you don't know if the sql you are about to execute has or has not yet 
been prepared or what name was used in that prepare.  Thus you will 
always need to do a new prepare.  (This only is true if the driver is 
trying to automatically use PREPARE/EXECUTE, which was the senario I was 
talking about).

 
If it only is used once, it will actually perform worse than
without the feature (since you need to issue two sql statements to the
backend to accomplish what you were doing in one before).
 
 
 I'm not sure that it would be much worse unless you need to wait
 for an acknowledgement from the back-end for the first statement.
 If you had a back-end command along the lines of prepare this
 statement and execute it with these parameters, it would have
 pretty much the same performance as giving the statement directly
 with the parameters already substituted in, right?
 
I didn't say it would be much worse, but it won't be faster than not 
using PREPARE.


 
Thus if someone wanted to use this functionality from jdbc they would
need to do it manually, i.e. issue the prepare and execute statements
manually instead of the jdbc driver doing it automatically for them.
 
 
 I'd say that this is awfully frequent, anyway. I use PreparedStatements
 for pretty much any non-constant input, because it's just not safe
 or portable to try to escape parameters yourself.
 
I agree this is useful, and you can write user code to take advantage of 
the functionality.  I am just pointing out that I don't think the driver 
can behind the scenes use this capability automatically.

--Barry


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.3 schedule

2002-04-14 Thread Curt Sampson

On Sun, 14 Apr 2002, Barry Lind wrote:

 But since the syntax for prepare is:  PREPARE name AS statement  you
 can't easily reuse sql prepared by other PreparedStatement objects since
 you don't know if the sql you are about to execute has or has not yet
 been prepared or what name was used in that prepare.  Thus you will
 always need to do a new prepare.  (This only is true if the driver is
 trying to automatically use PREPARE/EXECUTE, which was the senario I was
 talking about).

Well, there are some ugly tricks you could build into the driver
to allow it to effectively use a PREPAREd statement with multiple,
identical PreparedStatement objects (basically, via the driver
caching various things and identifying PreparedStatements created
with the same SQL), but it's messy enough and has some problems
hard enough to resolve that I can't actually see this being practical.

I was actually just wanting to point out that this is where automatic
caching on the server shines.

 If it only is used once, it will actually perform worse

 I didn't say it would be much worse, but it won't be faster than not
 using PREPARE.

Well, if it's not faster, that's fine. If it's worse, that's not
so fine, because as you point out there's really no way for the
driver to know whether a PreparedStatement is being used just for
speed (multiple queries with one instance) or security (on query,
but with parameters).

 I am just pointing out that I don't think the driver
 can behind the scenes use this capability automatically.

Well, if there's little or no performance impact, I would say that
the driver should always use this capability with PreparedStatement
objects. If there is a performance impact, perhaps a property could
turn it on and off?

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)

2002-04-13 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 My profiles show that the work spent in the scanner is really minuscule
 compared to everything else.

Under ordinary circumstances I think that's true ...

 (The profile data is from a run of all the regression test files in order
 in one session.)

The regression tests contain no very-long literals.  The results I was
referring to concerned cases with string (BLOB) literals in the
hundreds-of-K range; it seems that the per-character loop in the flex
lexer starts to look like a bottleneck when you have tokens that much
larger than the rest of the query.

Solutions seem to be either (a) make that loop quicker, or (b) find a
way to avoid passing BLOBs through the lexer.  I was merely suggesting
that (a) should be investigated before we invest the work implied
by (b).

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.3 schedule

2002-04-13 Thread Hannu Krosing

On Fri, 2002-04-12 at 03:04, Brian Bruns wrote:
 On 11 Apr 2002, Hannu Krosing wrote:
 
  IIRC someone started work on modularising the network-related parts with
  a goal of supporting DRDA (DB2 protocol) and others in future.
 
 That was me, although I've been bogged down lately, and haven't been able 
 to get back to it.

Has any of your modularisation work got into CVS yet ?

 DRDA, btw, is not just a DB2 protocol but an opengroup 
 spec that hopefully will someday be *the* standard on the wire database 
 protocol.  DRDA handles prepare/execute and is completely binary in 
 representation, among other advantages.

What about extensibility - is there some predefined way of adding new
types ?

Also, does it handle NOTIFY ?


Hannu



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] 7.3 schedule

2002-04-13 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 thought out way of predicting/limiting their size.  (2) How the heck do
 you get rid of obsoleted cached plans, if the things stick around in
 shared memory even after you start a new backend?  (3) A shared cache
 requires locking; contention among multiple backends to access that
 shared resource could negate whatever performance benefit you might hope
 to realize from it.

 I don't understand all these locking problems?

Searching the cache and inserting/deleting entries in the cache probably
have to be mutually exclusive; concurrent insertions probably won't work
either (at least not without a remarkably intelligent data structure).
Unless the cache hit rate is remarkably high, there are going to be lots
of insertions --- and, at steady state, an equal rate of deletions ---
leading to lots of contention.

This could possibly be avoided if the cache is not used for all query
plans but only for explicitly PREPAREd plans, so that only explicit
EXECUTEs would need to search it.  But that approach also makes a
sizable dent in the usefulness of the cache to begin with.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Tom Lane

Karel Zak [EMAIL PROTECTED] writes:
 * The patch allows store query-planns to shared memory and is
   possible EXECUTE it at more backends (over same DB) and planns
   are persistent across connetions. For this feature I create special 
   memory context subsystem (like current aset.c, but it works with 
   IPC shared memory).
   This is maybe too complex solution and (maybe) sufficient is cache 
   query in one backend only. I know unbelief about this shared
   memory solution (Tom?). 

Yes, that is the part that was my sticking point last time around.
(1) Because shared memory cannot be extended on-the-fly, I think it is
a very bad idea to put data structures in there without some well
thought out way of predicting/limiting their size.  (2) How the heck do
you get rid of obsoleted cached plans, if the things stick around in
shared memory even after you start a new backend?  (3) A shared cache
requires locking; contention among multiple backends to access that
shared resource could negate whatever performance benefit you might hope
to realize from it.

A per-backend cache kept in local memory avoids all of these problems,
and I have seen no numbers to make me think that a shared plan cache
would achieve significantly more performance benefit than a local one.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Neil Conway

On Fri, 12 Apr 2002 12:21:04 -0400 (EDT)
Bruce Momjian [EMAIL PROTECTED] wrote:
 Tom Lane wrote:
  A per-backend cache kept in local memory avoids all of these problems,
  and I have seen no numbers to make me think that a shared plan cache
  would achieve significantly more performance benefit than a local one.
 
 Certainly a shared cache would be good for apps that connect to issue a
 single query frequently.  In such cases, there would be no local cache
 to use.

One problem with this kind of scenario is: what to do if the plan no
longer exists for some reason? (e.g. the code that was supposed to be
PREPARE-ing your statements failed to execute properly, or the cached
plan has been evicted from shared memory, or the database was restarted,
etc.) -- EXECUTE in and of itself won't have enough information to do
anything useful. We could perhaps provide a means for an application
to test for the existence of a cached plan (in which case the
application developer will need to add logic to their application
to re-prepare the query if necessary, which could get complicated).

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED]
PGP Key ID: DB3C29FC

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Karel Zak

On Fri, Apr 12, 2002 at 12:41:34AM -0400, Neil Conway wrote:
 On Fri, 12 Apr 2002 12:58:01 +0900
 Hiroshi Inoue [EMAIL PROTECTED] wrote:
  
  Just a confirmation.
  Someone is working on PREPARE/EXECUTE ?
  What about Karel's work ?

 Right question :-)
 
 I am. My work is based on Karel's stuff -- at the moment I'm still
 basically working on getting Karel's patch to play nicely with
 current sources; once that's done I'll be addressing whatever
 issues are stopping the code from getting into CVS.

 My patch (qcache) for PostgreSQL 7.0 is available at 
 ftp://ftp2.zf.jcu.cz/users/zakkr/pg/.
 
 I very look forward to Neil's work on this. 

 Notes:

* It's experimental patch, but usable. All features below mentioned 
  works.

* PREPARE/EXECUTE is not only SQL statements, I think good idea is
  create something common and robus for query-plan caching,
  beacuse there is for example SPI too. The RI triggers are based 
  on SPI_saveplan(). 
 
* My patch knows EXECUTE INTO feature:

 PREPARE foo AS SELECT * FROM pg_class WHERE relname ~~ $1 USING text;

 EXECUTE foo USING 'pg%';-- standard select

 EXECUTE foo INTO TEMP newtab USING 'pg%';  -- select into
 
 
* The patch allows store query-planns to shared memory and is
  possible EXECUTE it at more backends (over same DB) and planns
  are persistent across connetions. For this feature I create special 
  memory context subsystem (like current aset.c, but it works with 
  IPC shared memory).
  
  This is maybe too complex solution and (maybe) sufficient is cache 
  query in one backend only. I know unbelief about this shared
  memory solution (Tom?). 
  

Karel
  
  
 My experimental patch README (excuse my English):

 Implementation
 ~~

   The qCache allows save queryTree and queryPlan. There is available are 
   two space for data caching. 
  
   LOCAL  - data are cached in backend non-shared memory and data aren't
available in other backends.  
  
   SHARE  - data are cached in backend shared memory and data are 
visible in all backends.
  
   Because size of share memory pool is limited and it is set during
   postmaster start up, the qCache must remove all old planns if pool is 
   full. You can mark each entry as REMOVEABLE or NOTREMOVEABLE. 
  
   A removeable entry is removed if pool is full.
  
   A not-removeable entry must be removed via qCache_Remove() or 
   the other routines. The qCache not remove this entry itself.
  
   All records in qCache are cached (in the hash table) under some key.
   The qCache knows two alternate of key --- KEY_STRING and KEY_BINARY. 
  
   The qCache API not allows access to shared memory, all cached planns that 
   API returns are copy to CurrentMemoryContext. All (qCache_ ) routines lock 
   shmem itself (exception is qCache_RemoveOldest_ShareRemoveAble()).

- for locking is used spin lock.

   Memory management
   ~
   The qCache use for qCache's shared pool its memory context independent on
   standard aset/mcxt, but use compatible API --- it allows to use standard
   palloc() (it is very needful for basic plan-tree operations, an example 
   for copyObject()). The qCache memory management is very simular to current
   aset.c code. It is chunk-ed blocks too, but the block is smaller - 1024b.

   The number of blocks is available set in postmaster 'argv' via option
   '-Z'.

   For plan storing is used separate MemoryContext for each plan, it 
   is good idea (Hiroshi's ?), bucause create new context is simple and 
   inexpensive and allows easy destroy (free) cached plan. This method is 
   used in my SPI overhaul instead TopMemoryContext feeding.

   Postmaster
   ~~
   The query cache memory is init during potmaster startup. The size of
   query cache pool is set via '-Z number-of-blocks' switch --- default 
   is 100 blocks where 1 block = 1024b, it is sufficient for 20-30 cached
   planns. One query needs somewhere 3-10 blocks, for example query like

PREPARE sel AS SELECT * FROM pg_class;

   needs 10Kb, because table pg_class has very much columns. 
 
   Note: for development I add SQL function: SELECT qcache_state();,
 this routine show usage of qCache.

 SPI
 ~~~
I a little overwrite SPI save plan method and remove TopMemoryContext
feeding.

Standard SPI:

SPI_saveplan() - save each plan to separate standard memory context.

SPI_freeplan() - free plan.

By key SPI:

It is SPI interface for query cache and allows save planns to SHARED
or LOCAL cache 'by' arbitrary key (string or binary). Routines:

SPI_saveplan_bykey()- save plan to query cache

SPI_freeplan_bykey()- remove plan from query cache

SPI_fetchplan_bykey()   - fetch plan saved in query cache

SPI_execp_bykey()   - execute (via SPI) plan saved in query
 

Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Hannu Krosing

On Thu, 2002-04-11 at 22:48, Tom Lane wrote:
 Barry Lind [EMAIL PROTECTED] writes:
  ...
  Since we 
  don't currently provide any information to the user on the relative cost 
  of the parse, plan and execute phases, the end user is going to be 
  guessing IMHO.
 
 You can in fact get that information fairly easily; set 
 show_parser_stats, show_planner_stats, and show_executor_stats to 1
 and then look in the postmaster log for the results.

One thing that seems to be missing is backend ids for query stats - if I
set 

log_timestamp = true
log_pid = true

then I get pid for query but _not_ for stats

If I have many long-running queries then it is impossible to know which
stats are for which query ;(


Hannu



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Barry Lind


Neil Conway wrote:
 I would suggest using it any time you're executing the same query
 plan a large number of times. In my experience, this is very common.
 There are already hooks for this in many client interfaces: e.g.
 PrepareableStatement in JDBC and $dbh-prepare() in Perl DBI.

I'm not sure that JDBC would use this feature directly.  When a 
PreparableStatement is created in JDBC there is nothing that indicates 
how many times this statement is going to be used.  Many (most IMHO) 
will be used only once.  As I stated previously, this feature is only 
useful if you are going to end up using the PreparedStatement multiple 
times.  If it only is used once, it will actually perform worse than 
without the feature (since you need to issue two sql statements to the 
backend to accomplish what you were doing in one before).

Thus if someone wanted to use this functionality from jdbc they would 
need to do it manually, i.e. issue the prepare and execute statements 
manually instead of the jdbc driver doing it automatically for them.

thanks,
--Barry

PS.  I actually do believe that the proposed functionality is good and 
should be added (even though it may sound from the tone of my emails in 
this thread that that isn't the case :-)  I just want to make sure that 
everyone understands that this doesn't solve the whole problem.  And 
that more work needs to be done either in 7.3 or some future release. 
My fear is that everyone will view this work as being good enough such 
that the rest of the issues won't be addressed anytime soon.  I only 
wish I was able to work on some of this myself, but I don't have the 
skills to hack on the backend too much.  (However if someone really 
wanted a new feature in the jdbc driver in exchange, I'd be more than 
happy to help)


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Bruce Momjian

Tom Lane wrote:
 Karel Zak [EMAIL PROTECTED] writes:
  * The patch allows store query-planns to shared memory and is
possible EXECUTE it at more backends (over same DB) and planns
are persistent across connetions. For this feature I create special 
memory context subsystem (like current aset.c, but it works with 
IPC shared memory).
This is maybe too complex solution and (maybe) sufficient is cache 
query in one backend only. I know unbelief about this shared
memory solution (Tom?). 
 
 Yes, that is the part that was my sticking point last time around.
 (1) Because shared memory cannot be extended on-the-fly, I think it is
 a very bad idea to put data structures in there without some well
 thought out way of predicting/limiting their size.  (2) How the heck do
 you get rid of obsoleted cached plans, if the things stick around in
 shared memory even after you start a new backend?  (3) A shared cache
 requires locking; contention among multiple backends to access that
 shared resource could negate whatever performance benefit you might hope
 to realize from it.
 
 A per-backend cache kept in local memory avoids all of these problems,
 and I have seen no numbers to make me think that a shared plan cache
 would achieve significantly more performance benefit than a local one.

Certainly a shared cache would be good for apps that connect to issue a
single query frequently.  In such cases, there would be no local cache
to use.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Barry Lind



Tom Lane wrote:
 Yes, that is the part that was my sticking point last time around.
 (1) Because shared memory cannot be extended on-the-fly, I think it is
 a very bad idea to put data structures in there without some well
 thought out way of predicting/limiting their size.  (2) How the heck do
 you get rid of obsoleted cached plans, if the things stick around in
 shared memory even after you start a new backend?  (3) A shared cache
 requires locking; contention among multiple backends to access that
 shared resource could negate whatever performance benefit you might hope
 to realize from it.
 
 A per-backend cache kept in local memory avoids all of these problems,
 and I have seen no numbers to make me think that a shared plan cache
 would achieve significantly more performance benefit than a local one.
 

Oracle's implementation is a shared cache for all plans.  This was 
introduced in Oracle 6 or 7 (I don't remember which anymore).  The net 
effect was that in general there was a significant performance 
improvement with the shared cache.  However poorly written apps can now 
bring the Oracle database to its knees because of the locking issues 
associated with the shared cache.  For example if the most frequently 
run sql statements are coded poorly (i.e. they don't use bind variables, 
eg.  'select bar from foo where foobar = $1' vs. 'select bar from foo 
where foobar =  || somevalue'  (where somevalue is likely to be 
different on every call)) the shared cache doesn't help and its overhead 
becomes significant.

thanks,
--Barry



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Bruce Momjian

Barry Lind wrote:
 Oracle's implementation is a shared cache for all plans.  This was 
 introduced in Oracle 6 or 7 (I don't remember which anymore).  The net 
 effect was that in general there was a significant performance 
 improvement with the shared cache.  However poorly written apps can now 
 bring the Oracle database to its knees because of the locking issues 
 associated with the shared cache.  For example if the most frequently 
 run sql statements are coded poorly (i.e. they don't use bind variables, 
 eg.  'select bar from foo where foobar = $1' vs. 'select bar from foo 
 where foobar =  || somevalue'  (where somevalue is likely to be 
 different on every call)) the shared cache doesn't help and its overhead 
 becomes significant.

This is very interesting.  We have always been concerned that shared
cache invalidation could cause more of a performance problem that the
shared cache gives benefit, and it sounds like you are saying exactly
that.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Certainly a shared cache would be good for apps that connect to issue a
 single query frequently.  In such cases, there would be no local cache
 to use.

We have enough other problems with the single-query-per-connection
scenario that I see no reason to believe that a shared plan cache will
help materially.  The correct answer for those folks will *always* be
to find a way to reuse the connection.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Bruce Momjian

Neil Conway wrote:
 On Fri, 12 Apr 2002 12:21:04 -0400 (EDT)
 Bruce Momjian [EMAIL PROTECTED] wrote:
  Tom Lane wrote:
   A per-backend cache kept in local memory avoids all of these problems,
   and I have seen no numbers to make me think that a shared plan cache
   would achieve significantly more performance benefit than a local one.
  
  Certainly a shared cache would be good for apps that connect to issue a
  single query frequently.  In such cases, there would be no local cache
  to use.
 
 One problem with this kind of scenario is: what to do if the plan no
 longer exists for some reason? (e.g. the code that was supposed to be
 PREPARE-ing your statements failed to execute properly, or the cached
 plan has been evicted from shared memory, or the database was restarted,
 etc.) -- EXECUTE in and of itself won't have enough information to do
 anything useful. We could perhaps provide a means for an application
 to test for the existence of a cached plan (in which case the
 application developer will need to add logic to their application
 to re-prepare the query if necessary, which could get complicated).

Oh, are you thinking that one backend would do the PREPARE and another
one the EXECUTE?  I can't see that working at all.  I thought there
would some way to quickly test if the submitted query was in the cache,
but maybe that is too much of a performance penalty to be worth it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Oh, are you thinking that one backend would do the PREPARE and another
 one the EXECUTE?  I can't see that working at all.

Uh, why exactly were you advocating a shared cache then?  Wouldn't that
be exactly the *point* of a shared cache?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.3 schedule

2002-04-12 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Oh, are you thinking that one backend would do the PREPARE and another
  one the EXECUTE?  I can't see that working at all.
 
 Uh, why exactly were you advocating a shared cache then?  Wouldn't that
 be exactly the *point* of a shared cache?

I thought it would somehow compare the SQL query string to the cached
plans and if it matched, it would use that plan rather than make a new
one.  Any DDL statement would flush the cache.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Scanner performance (was Re: [HACKERS] 7.3 schedule)

2002-04-12 Thread Peter Eisentraut

Tom Lane writes:

 We do have some numbers suggesting that the per-character loop in the
 lexer is slow enough to be a problem with very long literals.  That is
 the overhead that might be avoided with a special protocol.

Which loop is that?  Doesn't the scanner use buffered input anyway?

 However, it should be noted that (AFAIK) no one has spent any effort at
 all on trying to make the lexer go faster.  There is quite a bit of
 material in the flex documentation about performance considerations ---
 someone should take a look at it and see if we can get any wins by being
 smarter, without having to introduce protocol changes.

My profiles show that the work spent in the scanner is really minuscule
compared to everything else.

The data appears to support a suspicion that I've had many moons ago that
the binary search for the key words takes quite a bit of time:

0.220.06   66748/66748   yylex [125]
[129]0.40.220.06   66748 base_yylex [129]
0.010.029191/9191yy_get_next_buffer [495]
0.020.00   32808/34053   ScanKeywordLookup [579]
0.000.01   16130/77100   MemoryContextStrdup [370]
0.000.004000/4000scanstr [1057]
0.000.004637/4637yy_get_previous_state [2158]
0.000.004554/4554base_yyrestart [2162]
0.000.004554/4554yywrap [2163]
0.000.00   1/1   base_yy_create_buffer [2852]
0.000.00   1/13695   base_yy_load_buffer_state [2107]

I while ago I've experimented with hash functions for the key word lookup
and got a speedup of factor 2.5, but again, this is really minor in the
overall scheme of things.

(The profile data is from a run of all the regression test files in order
in one session.)

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Ashley Cambrell

Christopher Kings-Lynne wrote:

Is anyone feeling we have the 7.3 release nearing?


No way!

I certainly am not. 
I can imagine us going for several more months like this, perhaps
through August.


Easily.  I think that the critical path is Tom's schema support.

We'll need a good beta period this time, because of:

* Schemas
* Prepare/Execute maybe

What are the chances that the BE/FE will be altered to take advantage of 
prepare / execute? Or is it something that will never happen?


* Domains

Chris

Ashley Cambrell


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Marc G. Fournier

On Thu, 11 Apr 2002, Bruce Momjian wrote:

 Is anyone feeling we have the 7.3 release nearing?  I certainly am not.
 I can imagine us going for several more months like this, perhaps
 through August.

seeing as how we just released v7.2, I don't see a v7.3 even going beta
until end of Summer ... I personally consider July/August to be relatively
dead months since too much turnover of ppl going on holidays with their
kids ... right now, I'm kinda seeing Sept 1st/Labour Day Weekend timeframe
from going Beta ...


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Tom Lane

Barry Lind [EMAIL PROTECTED] writes:
 ...
 Since we 
 don't currently provide any information to the user on the relative cost 
 of the parse, plan and execute phases, the end user is going to be 
 guessing IMHO.

You can in fact get that information fairly easily; set 
show_parser_stats, show_planner_stats, and show_executor_stats to 1
and then look in the postmaster log for the results.  (Although to be
fair, this does not provide any accounting for the CPU time expended
simply to *receive* the query string, which might be non negligible
for huge queries.)

It would be interesting to see some stats for the large-BLOB scenarios
being debated here.  You could get more support for the position that
something should be done if you had numbers to back it up.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Barry Lind


Tom Lane wrote:

  It would be interesting to see some stats for the large-BLOB scenarios
  being debated here.  You could get more support for the position that
  something should be done if you had numbers to back it up.

Below are some stats you did a few months ago when I was asking a 
related question.  Your summary was: Bottom line: feeding huge strings 
through the lexer is slow.

--Barry

Tom Lane wrote:

  Barry Lind [EMAIL PROTECTED] writes:
 
 In looking at some performance issues (I was trying to look at the 
 overhead of toast) I found that large insert statements were very slow.
  ...
...
I got around to reproducing this today,
and what I find is that the majority of the backend time is going into
simple scanning of the input statement:

Each sample counts as 0.01 seconds.
   %   cumulative   self  self totaltime 
seconds   secondscalls  ms/call  ms/call  name 31.24 11.90 
   11.90 _mcount
  19.51 19.33 7.4310097 0.74 1.06  base_yylex
   7.48 22.18 2.85 21953666 0.00 0.00  appendStringInfoChar
   5.88 24.42 2.24  776 2.89 2.89  pglz_compress
   4.36 26.08 1.66 21954441 0.00 0.00  pq_getbyte
   3.57 27.44 1.36  7852141 0.00 0.00  addlit
   3.26 28.68 1.24 1552 0.80 0.81  scanstr
   2.84 29.76 1.08  779 1.39 7.18  pq_getstring
   2.31 30.64 0.8810171 0.09 0.09  _doprnt
   2.26 31.50 0.86  776 1.11 1.11  byteain
   2.07 32.29 0.79 msquadloop
   1.60 32.90 0.61  7931430 0.00 0.00  memcpy
   1.18 33.35 0.45 chunks
   1.08 33.76 0.4146160 0.01 0.01  strlen
   1.08 34.17 0.41 encore
   1.05 34.57 0.40 8541 0.05 0.05  XLogInsert
   0.89 34.91 0.34 appendStringInfo

60% of the call graph time is accounted for by these two areas:

index % timeself  childrencalled name
 7.433.32   10097/10097   yylex [14]
[13]41.07.433.32   10097 base_yylex [13]
 1.360.61 7852141/7852141 addlit [28]
 1.240.011552/1552scanstr [30]
 0.020.033108/3108ScanKeywordLookup [99]
 0.000.022335/2335yy_get_next_buffer [144]
 0.020.00 776/781 strtol [155]
 0.000.01 777/3920MemoryContextStrdup [108]
 0.000.00   1/1   base_yy_create_buffer 
[560]
 0.000.004675/17091   isupper [617]
 0.000.001556/1556yy_get_previous_state 
[671]
 0.000.00 779/779 yywrap [706]
 0.000.00   1/2337 
base_yy_load_buffer_state [654]
---
 1.084.51 779/779 pq_getstr [17]
[18]21.41.084.51 779 pq_getstring [18]
 2.850.00 21953662/21953666 appendStringInfoChar 
[20]
 1.660.00 21954441/21954441 pq_getbyte [29]
---

While we could probably do a little bit to speed up pg_getstring and its
children, it's not clear that we can do anything about yylex, which is
flex output code not handmade code, and is probably well-tuned already.

Bottom line: feeding huge strings through the lexer is slow.

 regards, tom lane




 It would be interesting to see some stats for the large-BLOB scenarios
 being debated here.  You could get more support for the position that
 something should be done if you had numbers to back it up.
 
   regards, tom lane
 



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Hiroshi Inoue
Tom Lane wrote:
 
 I'm not sure I believe Hannu's numbers, but in any case they're fairly
 irrelevant to the argument about whether a special protocol is useful.
 He wasn't testing textually-long queries, but rather the planning
 overhead, which is more or less independent of the length of any literal
 constants involved (especially if they're not part of the WHERE clause).
 Saving query plans via PREPARE seems quite sufficient, and appropriate,
 to tackle the planner-overhead issue.

Just a confirmation.
Someone is working on PREPARE/EXECUTE ?
What about Karel's work ?

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Neil Conway

On Fri, 12 Apr 2002 12:58:01 +0900
Hiroshi Inoue [EMAIL PROTECTED] wrote:
 Tom Lane wrote:
  
  I'm not sure I believe Hannu's numbers, but in any case they're fairly
  irrelevant to the argument about whether a special protocol is useful.
  He wasn't testing textually-long queries, but rather the planning
  overhead, which is more or less independent of the length of any literal
  constants involved (especially if they're not part of the WHERE clause).
  Saving query plans via PREPARE seems quite sufficient, and appropriate,
  to tackle the planner-overhead issue.
 
 Just a confirmation.
 Someone is working on PREPARE/EXECUTE ?
 What about Karel's work ?

I am. My work is based on Karel's stuff -- at the moment I'm still
basically working on getting Karel's patch to play nicely with
current sources; once that's done I'll be addressing whatever
issues are stopping the code from getting into CVS.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED]
PGP Key ID: DB3C29FC

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Tom Lane

Neil Conway [EMAIL PROTECTED] writes:
 On the other hand, there are already a few reasons to make some
 changes to the FE/BE protocol (NOTIFY messages, transaction state,
 and now possibly PREPARE/EXECUTE -- anything else?).

Passing EXECUTE parameters without having them go through the parser
could possibly be done without a protocol change: use the 'fast path'
function-call code to pass binary parameters to a function that is
otherwise equivalent to EXECUTE.

On the other hand, the 'fast path' protocol itself is pretty horribly
misdesigned, and I'm not sure I want to encourage more use of it until
we can get it cleaned up (see the comments in backend/tcop/fastpath.c).
Aside from lack of robustness, I'm not sure it can work at all for
functions that don't have prespecified types and numbers of parameters.

The FE/BE COPY protocol is also horrible.  So yeah, there are a bunch of
things we *could* fix if we were ready to take on a protocol change.

My own thought is this might be better held for 7.4, though.  We are
already going to be causing application programmers a lot of pain with
the schema changes and ensuing system-catalog revisions.  That might
be enough on their plates for this cycle.

In any case, for the moment I think it's fine to be working on
PREPARE/EXECUTE support at the SQL level.  We can worry about adding
a parser bypass for EXECUTE parameters later.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread bpalmer

 We'll need a good beta period this time, because of:

I know it's a sore subject,  but how about ALTER TABLE DROP COLUMN this
time around?  I've been hearing about it for years now.  :)

- brandon


 c: 646-456-5455h: 201-798-4983
 b. palmer,  [EMAIL PROTECTED]   pgp:crimelabs.net/bpalmer.pgp5


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Peter Eisentraut

Nicolas Bazin writes:

 For the next release and package it would be good to differentiate the
 release candidate to the proper release.

They do have different names.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Neil Conway

On Thu, 11 Apr 2002 16:25:24 +1000
Ashley Cambrell [EMAIL PROTECTED] wrote:
 What are the chances that the BE/FE will be altered to take advantage of 
 prepare / execute? Or is it something that will never happen?

Is there a need for this? The current patch I'm working on just
does everything using SQL statements, which I don't think is
too bad (the typical client programmer won't actually need to
see them, their interface should wrap the PREPARE/EXECUTE stuff
for them).

On the other hand, there are already a few reasons to make some
changes to the FE/BE protocol (NOTIFY messages, transaction state,
and now possibly PREPARE/EXECUTE -- anything else?). IMHO, each of
these isn't worth changing the protocol by itself, but perhaps if
we can get all 3 in one swell foop it might be a good idea...

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED]
PGP Key ID: DB3C29FC

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Hannu Krosing

On Thu, 2002-04-11 at 18:14, Tom Lane wrote:
 Neil Conway [EMAIL PROTECTED] writes:
  On the other hand, there are already a few reasons to make some
  changes to the FE/BE protocol (NOTIFY messages, transaction state,
  and now possibly PREPARE/EXECUTE -- anything else?).
 
 Passing EXECUTE parameters without having them go through the parser
 could possibly be done without a protocol change: use the 'fast path'
 function-call code to pass binary parameters to a function that is
 otherwise equivalent to EXECUTE.
 
 On the other hand, the 'fast path' protocol itself is pretty horribly
 misdesigned, and I'm not sure I want to encourage more use of it until
 we can get it cleaned up (see the comments in backend/tcop/fastpath.c).
 Aside from lack of robustness, I'm not sure it can work at all for
 functions that don't have prespecified types and numbers of parameters.
 
 The FE/BE COPY protocol is also horrible.  So yeah, there are a bunch of
 things we *could* fix if we were ready to take on a protocol change.

Also _universal_ binary on-wire representation for types would be a good
thing. There already are slots in pg_type for functions to do that. By
doing so we could also avoid parsing text representations of field data.

 My own thought is this might be better held for 7.4, though.  We are
 already going to be causing application programmers a lot of pain with
 the schema changes and ensuing system-catalog revisions.  That might
 be enough on their plates for this cycle.
 
 In any case, for the moment I think it's fine to be working on
 PREPARE/EXECUTE support at the SQL level.  We can worry about adding
 a parser bypass for EXECUTE parameters later.

IIRC someone started work on modularising the network-related parts with
a goal of supporting DRDA (DB2 protocol) and others in future.

-
Hannu



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Barry Lind



Neil Conway wrote:
 On Thu, 11 Apr 2002 16:25:24 +1000
 Ashley Cambrell [EMAIL PROTECTED] wrote:
 
What are the chances that the BE/FE will be altered to take advantage of 
prepare / execute? Or is it something that will never happen?
 
 
 Is there a need for this? The current patch I'm working on just
 does everything using SQL statements, which I don't think is
 too bad (the typical client programmer won't actually need to
 see them, their interface should wrap the PREPARE/EXECUTE stuff
 for them).
 

Yes there is a need.

If you break up the query into roughly three stages of execution:
parse, plan, and execute, each of these can be the performance 
bottleneck.  The parse can be the performance bottleneck when passing 
large values as data to the parser (eg. inserting one row containing a 
100K value will result in a 100K+ sized statement that needs to be 
parsed, parsing will take a long time, but the planning and execution 
should be relatively short).  The planning stage can be a bottleneck for 
complex queries.  And of course the execution stage can be a bottleneck 
for all sorts of reasons (eg. bad plans, missing indexes, bad 
statistics, poorly written sql, etc.).

So if you look at the three stages (parse, plan, execute) we have a lot 
of tools, tips, and techniques for making the execute faster.  We have 
some tools (at least on the server side via SPI, and plpgsql) to help 
minimize the planning costs by reusing plans.  But there doesn't exist 
much to help with the parsing cost of large values (actually the 
fastpath API does help in this regard, but everytime I mention it Tom 
responds that the fastpath API should be avoided).

So when I look at the proposal for the prepare/execute stuff:
PREPARE plan AS query;
EXECUTE plan USING parameters;
DEALLOCATE plan;

Executing a sql statement today is the following:
insert into table values (stuff);
which does one parse, one plan, one execute

under the new functionality:
prepare plan as insert into table values (stuff);
execute plan using stuff;
which does two parses, one plan, one execute

which obviously isn't a win unless you end up reusing the plan many 
times.  So lets look at the case of reusing the plan multiple times:
prepare plan as insert into table values (stuff);
execute plan using stuff;
execute plan using stuff;
...
which does n+1 parses, one plan, n executes

so this is a win if the cost of the planing stage is significant 
compared to the costs of the parse and execute stages.  If the cost of 
the plan is not significant there is little if any benefit in doing this.

I realize that there are situations where this functionality will be a 
big win.  But I question how the typical user of postgres will know when 
they should use this functionality and when they shouldn't.  Since we 
don't currently provide any information to the user on the relative cost 
of the parse, plan and execute phases, the end user is going to be 
guessing IMHO.

What I think would be a clear win would be if we could get the above 
senario of multiple inserts down to one parse, one plan, n executes, and 
n binds (where binding is simply the operation of plugging values into 
the statement without having to pipe the values through the parser). 
This would be a win in most if not all circumstances where the same 
statement is executed many times.

I think it would also be nice if the new explain anaylze showed times 
for the parsing and planning stages in addition to the execution stage 
which it currently shows so there is more information for the end user 
on what approach they should take.

thanks,
--Barry

 On the other hand, there are already a few reasons to make some
 changes to the FE/BE protocol (NOTIFY messages, transaction state,
 and now possibly PREPARE/EXECUTE -- anything else?). IMHO, each of
 these isn't worth changing the protocol by itself, but perhaps if
 we can get all 3 in one swell foop it might be a good idea...
 
 Cheers,
 
 Neil
 



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Ashley Cambrell

Neil Conway wrote:

On Thu, 11 Apr 2002 16:25:24 +1000
Ashley Cambrell [EMAIL PROTECTED] wrote:

What are the chances that the BE/FE will be altered to take advantage of 
prepare / execute? Or is it something that will never happen?


Is there a need for this? The current patch I'm working on just
does everything using SQL statements, which I don't think is
too bad (the typical client programmer won't actually need to
see them, their interface should wrap the PREPARE/EXECUTE stuff
for them).

I remember an email Hannu sent (I originally thought Tome sent it but I
found the email*) that said postgresql spends a lot of time parsing sql
(compared to oracle), so if the BE/FE and libpq were extended to support
pg_prepare / pg_bind, then it might make repetitive queries quicker.

if we could save half of parse/optimise time by saving query plans, then
the backend performance would go up from 1097 to 10/(91.1-16.2)=1335
updates/sec.
 
Hannu's email doesn't seem to be in google groups, but it's titled
Oracle vs PostgreSQL in real life (2002-03-01). I can attach it if
people can't find it.



On the other hand, there are already a few reasons to make some
changes to the FE/BE protocol (NOTIFY messages, transaction state,
and now possibly PREPARE/EXECUTE -- anything else?). IMHO, each of
these isn't worth changing the protocol by itself, but perhaps if
we can get all 3 in one swell foop it might be a good idea...

Passing on a possible 1/3 speed improvement doesn't sound like a bad
thing.. :-)  

Hannu: You mentioned that you already had an experimental patch that did
it?  Was that the same sort of thing as Neil's patch (SPI), or did it
include a libpq patch as well?


Cheers,

Neil

Ashley Cambrell


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Neil Conway

On Thu, 11 Apr 2002 11:38:33 -0700
Barry Lind [EMAIL PROTECTED] wrote:
 Neil Conway wrote:
  On Thu, 11 Apr 2002 16:25:24 +1000
  Ashley Cambrell [EMAIL PROTECTED] wrote:
  
 What are the chances that the BE/FE will be altered to take advantage of 
 prepare / execute? Or is it something that will never happen?
  
  Is there a need for this? The current patch I'm working on just
  does everything using SQL statements, which I don't think is
  too bad (the typical client programmer won't actually need to
  see them, their interface should wrap the PREPARE/EXECUTE stuff
  for them).
 
 Yes there is a need.

Right -- I would agree that such functionality would be nice to have.
What I meant was is there a need for this in order to implement
PREPARE/EXECUTE? IMHO, no -- the two features are largely
orthogonal.

 If you break up the query into roughly three stages of execution:
 parse, plan, and execute, each of these can be the performance 
 bottleneck.  The parse can be the performance bottleneck when passing 
 large values as data to the parser (eg. inserting one row containing a 
 100K value will result in a 100K+ sized statement that needs to be 
 parsed, parsing will take a long time, but the planning and execution 
 should be relatively short).

If you're inserting 100KB of data, I'd expect the time to insert
that into tables, update relevent indexes, etc. to be larger than
the time to parse the query (i.e. execution  parsing). But I
may well be wrong, I haven't done any benchmarks.
 
 Executing a sql statement today is the following:
 insert into table values (stuff);
 which does one parse, one plan, one execute

You're assuming that the cost of the parse step for the EXECUTE
statement is the same as parse for the original query, which
will often not be the case (parsing the EXECUTE statement will
be cheaper).

 so this is a win if the cost of the planing stage is significant 
 compared to the costs of the parse and execute stages.  If the cost of 
 the plan is not significant there is little if any benefit in doing this.
 
 I realize that there are situations where this functionality will be a 
 big win.  But I question how the typical user of postgres will know when 
 they should use this functionality and when they shouldn't.

I would suggest using it any time you're executing the same query
plan a large number of times. In my experience, this is very common.
There are already hooks for this in many client interfaces: e.g.
PrepareableStatement in JDBC and $dbh-prepare() in Perl DBI.

 What I think would be a clear win would be if we could get the above 
 senario of multiple inserts down to one parse, one plan, n executes, and 
 n binds

This behavior would be better, but I think the current solution is
still a clear win, and good enough for now. I'd prefer that we
worry about implementing PREPARE/EXECUTE for now, and deal with
query binding/BLOB parser-shortcuts later -- perhaps with an FE/BE
protocol in 7.4 as Tom suggested.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED]
PGP Key ID: DB3C29FC

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Brian Bruns

On 11 Apr 2002, Hannu Krosing wrote:

 IIRC someone started work on modularising the network-related parts with
 a goal of supporting DRDA (DB2 protocol) and others in future.

That was me, although I've been bogged down lately, and haven't been able 
to get back to it.  DRDA, btw, is not just a DB2 protocol but an opengroup 
spec that hopefully will someday be *the* standard on the wire database 
protocol.  DRDA handles prepare/execute and is completely binary in 
representation, among other advantages.

Brian


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] 7.3 schedule

2002-04-11 Thread Tom Lane

Ashley Cambrell [EMAIL PROTECTED] writes:
 I remember an email Hannu sent (I originally thought Tome sent it but I
 found the email*) that said postgresql spends a lot of time parsing sql
 (compared to oracle), so if the BE/FE and libpq were extended to support
 pg_prepare / pg_bind, then it might make repetitive queries quicker.

I'm not sure I believe Hannu's numbers, but in any case they're fairly
irrelevant to the argument about whether a special protocol is useful.
He wasn't testing textually-long queries, but rather the planning
overhead, which is more or less independent of the length of any literal
constants involved (especially if they're not part of the WHERE clause).
Saving query plans via PREPARE seems quite sufficient, and appropriate,
to tackle the planner-overhead issue.

We do have some numbers suggesting that the per-character loop in the
lexer is slow enough to be a problem with very long literals.  That is
the overhead that might be avoided with a special protocol.

However, it should be noted that (AFAIK) no one has spent any effort at
all on trying to make the lexer go faster.  There is quite a bit of
material in the flex documentation about performance considerations ---
someone should take a look at it and see if we can get any wins by being
smarter, without having to introduce protocol changes.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.3 schedule

2002-04-10 Thread Christopher Kings-Lynne

 Is anyone feeling we have the 7.3 release nearing?

No way!

 I certainly am not. 
 I can imagine us going for several more months like this, perhaps
 through August.

Easily.  I think that the critical path is Tom's schema support.

We'll need a good beta period this time, because of:

* Schemas
* Prepare/Execute maybe
* Domains

Chris


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.3 schedule

2002-04-10 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
  Is anyone feeling we have the 7.3 release nearing?
 
 No way!

Good.

  I certainly am not. 
  I can imagine us going for several more months like this, perhaps
  through August.
 
 Easily.  I think that the critical path is Tom's schema support.
 
 We'll need a good beta period this time, because of:
 
 * Schemas
 * Prepare/Execute maybe
 * Domains

I guess I am hoping for even more killer features for this release.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster