Re: [HACKERS] 7.3 schedule
I have added these emails to TODO.detail/prepare. --- Karel Zak wrote: On Fri, Apr 12, 2002 at 12:41:34AM -0400, Neil Conway wrote: On Fri, 12 Apr 2002 12:58:01 +0900 Hiroshi Inoue [EMAIL PROTECTED] wrote: Just a confirmation. Someone is working on PREPARE/EXECUTE ? What about Karel's work ? Right question :-) I am. My work is based on Karel's stuff -- at the moment I'm still basically working on getting Karel's patch to play nicely with current sources; once that's done I'll be addressing whatever issues are stopping the code from getting into CVS. My patch (qcache) for PostgreSQL 7.0 is available at ftp://ftp2.zf.jcu.cz/users/zakkr/pg/. I very look forward to Neil's work on this. Notes: * It's experimental patch, but usable. All features below mentioned works. * PREPARE/EXECUTE is not only SQL statements, I think good idea is create something common and robus for query-plan caching, beacuse there is for example SPI too. The RI triggers are based on SPI_saveplan(). * My patch knows EXECUTE INTO feature: PREPARE foo AS SELECT * FROM pg_class WHERE relname ~~ $1 USING text; EXECUTE foo USING 'pg%';-- standard select EXECUTE foo INTO TEMP newtab USING 'pg%'; -- select into * The patch allows store query-planns to shared memory and is possible EXECUTE it at more backends (over same DB) and planns are persistent across connetions. For this feature I create special memory context subsystem (like current aset.c, but it works with IPC shared memory). This is maybe too complex solution and (maybe) sufficient is cache query in one backend only. I know unbelief about this shared memory solution (Tom?). Karel My experimental patch README (excuse my English): Implementation ~~ The qCache allows save queryTree and queryPlan. There is available are two space for data caching. LOCAL - data are cached in backend non-shared memory and data aren't available in other backends. SHARE - data are cached in backend shared memory and data are visible in all backends. Because size of share memory pool is limited and it is set during postmaster start up, the qCache must remove all old planns if pool is full. You can mark each entry as REMOVEABLE or NOTREMOVEABLE. A removeable entry is removed if pool is full. A not-removeable entry must be removed via qCache_Remove() or the other routines. The qCache not remove this entry itself. All records in qCache are cached (in the hash table) under some key. The qCache knows two alternate of key --- KEY_STRING and KEY_BINARY. The qCache API not allows access to shared memory, all cached planns that API returns are copy to CurrentMemoryContext. All (qCache_ ) routines lock shmem itself (exception is qCache_RemoveOldest_ShareRemoveAble()). - for locking is used spin lock. Memory management ~ The qCache use for qCache's shared pool its memory context independent on standard aset/mcxt, but use compatible API --- it allows to use standard palloc() (it is very needful for basic plan-tree operations, an example for copyObject()). The qCache memory management is very simular to current aset.c code. It is chunk-ed blocks too, but the block is smaller - 1024b. The number of blocks is available set in postmaster 'argv' via option '-Z'. For plan storing is used separate MemoryContext for each plan, it is good idea (Hiroshi's ?), bucause create new context is simple and inexpensive and allows easy destroy (free) cached plan. This method is used in my SPI overhaul instead TopMemoryContext feeding. Postmaster ~~ The query cache memory is init during potmaster startup. The size of query cache pool is set via '-Z number-of-blocks' switch --- default is 100 blocks where 1 block = 1024b, it is sufficient for 20-30 cached planns. One query needs somewhere 3-10 blocks, for example query like PREPARE sel AS SELECT * FROM pg_class; needs 10Kb, because table pg_class has very much columns. Note: for development I add SQL function: SELECT qcache_state();, this routine show usage of qCache. SPI ~~~ I a little overwrite SPI save plan method and remove TopMemoryContext feeding. Standard SPI: SPI_saveplan() - save each plan to separate standard memory context. SPI_freeplan() - free plan. By key SPI: It is SPI interface for query cache and allows save planns to SHARED or LOCAL cache 'by' arbitrary key (string or binary). Routines:
Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)
Tom Lane writes: The regression tests contain no very-long literals. The results I was referring to concerned cases with string (BLOB) literals in the hundreds-of-K range; it seems that the per-character loop in the flex lexer starts to look like a bottleneck when you have tokens that much larger than the rest of the query. Solutions seem to be either (a) make that loop quicker, or (b) find a way to avoid passing BLOBs through the lexer. I was merely suggesting that (a) should be investigated before we invest the work implied by (b). I've done the following test: Ten statements of the form SELECT 1 FROM tab1 WHERE val = '...'; where ... are literals of length 5 - 10 MB (some random base-64 encoded MP3 files). tab1 was empty. The test ran 3:40 min wall-clock time. Top ten calls: % cumulative self self total time seconds secondscalls ms/call ms/call name 36.95 9.87 9.87 74882482 0.00 0.00 pq_getbyte 22.80 15.96 6.09 11 553.64 1450.93 pq_getstring 13.55 19.58 3.62 11 329.09 329.10 scanstr 12.09 22.81 3.23 11029.3686.00 base_yylex 4.27 23.95 1.14 3433.5333.53 yy_get_previous_state 3.86 24.98 1.03 2246.8246.83 textin 3.67 25.96 0.98 3428.8228.82 myinput 1.83 26.45 0.49 4510.8932.67 yy_get_next_buffer 0.11 26.48 0.03 3027 0.01 0.01 AllocSetAlloc 0.11 26.51 0.03 129 0.23 0.23 fmgr_isbuiltin The string literals didn't contain any backslashes, so scanstr is operating in the best-case scenario here. But for arbitary binary data we need some escape mechanism, so I don't see much room for improvement there. It seems the real bottleneck is the excessive abstraction in the communications layer. I haven't looked closely at all, but it would seem better if pq_getstring would not use pq_getbyte and instead read the buffer directly. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)
Peter Eisentraut wrote: The string literals didn't contain any backslashes, so scanstr is operating in the best-case scenario here. But for arbitary binary data we need some escape mechanism, so I don't see much room for improvement there. It seems the real bottleneck is the excessive abstraction in the communications layer. I haven't looked closely at all, but it would seem better if pq_getstring would not use pq_getbyte and instead read the buffer directly. I am inclined to agree with your analysis. We added abstraction to libpq because the old code was quite poorly structured. Now that it is well structured, removing some of the abstraction seems valuable. Any chance pq_getbyte could be made into a macro? I would be glad to send you a macro version for testing. I would have to push the while loop into pg_recvbuf() and change the while in pg_getbyte to an if, or as a macro, ? :. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)
Peter Eisentraut [EMAIL PROTECTED] writes: Top ten calls: % cumulative self self total time seconds secondscalls ms/call ms/call name 36.95 9.87 9.87 74882482 0.00 0.00 pq_getbyte 22.80 15.96 6.09 11 553.64 1450.93 pq_getstring 13.55 19.58 3.62 11 329.09 329.10 scanstr 12.09 22.81 3.23 11029.3686.00 base_yylex 4.27 23.95 1.14 3433.5333.53 yy_get_previous_state 3.86 24.98 1.03 2246.8246.83 textin 3.67 25.96 0.98 3428.8228.82 myinput 1.83 26.45 0.49 4510.8932.67 yy_get_next_buffer 0.11 26.48 0.03 3027 0.01 0.01 AllocSetAlloc 0.11 26.51 0.03 129 0.23 0.23 fmgr_isbuiltin Interesting. This should be taken with a grain of salt however: gprof's call-counting overhead is large enough to skew the results on many machines (ie, routines that are called many times tend to show more than their fair share of runtime). If your profiler does not show the counter subroutine (mcount or some similar name) separately, you should be very suspicious of where the overhead time is hidden. For comparison you might want to check out some similar numbers I obtained awhile back: http://archives.postgresql.org/pgsql-hackers/2001-12/msg00076.php (thanks to Barry Lind for reminding me about that ;-)). That test showed base_yylex/addlit/scanstr as costing about twice as much as pg_getstring/pq_getbyte. Probably the truth is somewhere in between your measurements and mine. In any case it does seem that some micro-optimization in the vicinity of the scanner's per-character costs, ie, pq_getbyte, addlit, etc would be worth the trouble. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] 7.3 schedule
On Fri, Apr 12, 2002 at 12:51:26PM -0400, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Certainly a shared cache would be good for apps that connect to issue a single query frequently. In such cases, there would be no local cache to use. We have enough other problems with the single-query-per-connection scenario that I see no reason to believe that a shared plan cache will help materially. The correct answer for those folks will *always* be to find a way to reuse the connection. My query cache was write for 7.0. If some next release will use pre-forked backend and after a client disconnection the backend will still alives and waits for new client the shared cache is (maybe:-) not needful. The current backend fork model is killer of all possible caching. We have more caches. I hope persistent backend help will help to all and I'm sure that speed will grow up with persistent backend and persistent caches without shared memory usage. There I can agree with Tom :-) Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] 7.3 schedule
On 13 Apr 2002, Hannu Krosing wrote: On Fri, 2002-04-12 at 03:04, Brian Bruns wrote: On 11 Apr 2002, Hannu Krosing wrote: IIRC someone started work on modularising the network-related parts with a goal of supporting DRDA (DB2 protocol) and others in future. That was me, although I've been bogged down lately, and haven't been able to get back to it. Has any of your modularisation work got into CVS yet ? No, Bruce didn't like the way I did certain things, and had some qualms about the value of supporting multiple wire protocols IIRC. Plus the patch was not really ready for primetime yet. I'm hoping to get back to it soon and sync it with the latest CVS, and clean up the odds and ends. DRDA, btw, is not just a DB2 protocol but an opengroup spec that hopefully will someday be *the* standard on the wire database protocol. DRDA handles prepare/execute and is completely binary in representation, among other advantages. What about extensibility - is there some predefined way of adding new types ? Not really, there is some ongoing standards activity adding some new features. The list of supported types is pretty impressive, anything in particular you are looking for? Also, does it handle NOTIFY ? I don't know the answer to this. The spec is pretty huge, so it may, but I haven't seen it. Even if it is supported as a secondary protocol, I believe there is alot of value in having a single database protocol standard. (why else would I be doing it!). I'm also looking into what it will take to do the same for MySQL and Firebird. Hopefully they will be receptive to the idea as well. Hannu Cheers, Brian ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.3 schedule
On Thu, 11 Apr 2002, Barry Lind wrote: I'm not sure that JDBC would use this feature directly. When a PreparableStatement is created in JDBC there is nothing that indicates how many times this statement is going to be used. Many (most IMHO) will be used only once. Well, the particular PreparedStatement instance may be used only once, yes. But it's quite likely that other, identical PreparedStatement objects would be used time and time again, so it's still good if you don't need to do much work on the second and subsequent preparations of that statement. If it only is used once, it will actually perform worse than without the feature (since you need to issue two sql statements to the backend to accomplish what you were doing in one before). I'm not sure that it would be much worse unless you need to wait for an acknowledgement from the back-end for the first statement. If you had a back-end command along the lines of prepare this statement and execute it with these parameters, it would have pretty much the same performance as giving the statement directly with the parameters already substituted in, right? Thus if someone wanted to use this functionality from jdbc they would need to do it manually, i.e. issue the prepare and execute statements manually instead of the jdbc driver doing it automatically for them. I'd say that this is awfully frequent, anyway. I use PreparedStatements for pretty much any non-constant input, because it's just not safe or portable to try to escape parameters yourself. cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 7.3 schedule
Curt Sampson wrote: On Thu, 11 Apr 2002, Barry Lind wrote: I'm not sure that JDBC would use this feature directly. When a PreparableStatement is created in JDBC there is nothing that indicates how many times this statement is going to be used. Many (most IMHO) will be used only once. Well, the particular PreparedStatement instance may be used only once, yes. But it's quite likely that other, identical PreparedStatement objects would be used time and time again, so it's still good if you don't need to do much work on the second and subsequent preparations of that statement. But since the syntax for prepare is: PREPARE name AS statement you can't easily reuse sql prepared by other PreparedStatement objects since you don't know if the sql you are about to execute has or has not yet been prepared or what name was used in that prepare. Thus you will always need to do a new prepare. (This only is true if the driver is trying to automatically use PREPARE/EXECUTE, which was the senario I was talking about). If it only is used once, it will actually perform worse than without the feature (since you need to issue two sql statements to the backend to accomplish what you were doing in one before). I'm not sure that it would be much worse unless you need to wait for an acknowledgement from the back-end for the first statement. If you had a back-end command along the lines of prepare this statement and execute it with these parameters, it would have pretty much the same performance as giving the statement directly with the parameters already substituted in, right? I didn't say it would be much worse, but it won't be faster than not using PREPARE. Thus if someone wanted to use this functionality from jdbc they would need to do it manually, i.e. issue the prepare and execute statements manually instead of the jdbc driver doing it automatically for them. I'd say that this is awfully frequent, anyway. I use PreparedStatements for pretty much any non-constant input, because it's just not safe or portable to try to escape parameters yourself. I agree this is useful, and you can write user code to take advantage of the functionality. I am just pointing out that I don't think the driver can behind the scenes use this capability automatically. --Barry ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.3 schedule
On Sun, 14 Apr 2002, Barry Lind wrote: But since the syntax for prepare is: PREPARE name AS statement you can't easily reuse sql prepared by other PreparedStatement objects since you don't know if the sql you are about to execute has or has not yet been prepared or what name was used in that prepare. Thus you will always need to do a new prepare. (This only is true if the driver is trying to automatically use PREPARE/EXECUTE, which was the senario I was talking about). Well, there are some ugly tricks you could build into the driver to allow it to effectively use a PREPAREd statement with multiple, identical PreparedStatement objects (basically, via the driver caching various things and identifying PreparedStatements created with the same SQL), but it's messy enough and has some problems hard enough to resolve that I can't actually see this being practical. I was actually just wanting to point out that this is where automatic caching on the server shines. If it only is used once, it will actually perform worse I didn't say it would be much worse, but it won't be faster than not using PREPARE. Well, if it's not faster, that's fine. If it's worse, that's not so fine, because as you point out there's really no way for the driver to know whether a PreparedStatement is being used just for speed (multiple queries with one instance) or security (on query, but with parameters). I am just pointing out that I don't think the driver can behind the scenes use this capability automatically. Well, if there's little or no performance impact, I would say that the driver should always use this capability with PreparedStatement objects. If there is a performance impact, perhaps a property could turn it on and off? cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: Scanner performance (was Re: [HACKERS] 7.3 schedule)
Peter Eisentraut [EMAIL PROTECTED] writes: My profiles show that the work spent in the scanner is really minuscule compared to everything else. Under ordinary circumstances I think that's true ... (The profile data is from a run of all the regression test files in order in one session.) The regression tests contain no very-long literals. The results I was referring to concerned cases with string (BLOB) literals in the hundreds-of-K range; it seems that the per-character loop in the flex lexer starts to look like a bottleneck when you have tokens that much larger than the rest of the query. Solutions seem to be either (a) make that loop quicker, or (b) find a way to avoid passing BLOBs through the lexer. I was merely suggesting that (a) should be investigated before we invest the work implied by (b). regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.3 schedule
On Fri, 2002-04-12 at 03:04, Brian Bruns wrote: On 11 Apr 2002, Hannu Krosing wrote: IIRC someone started work on modularising the network-related parts with a goal of supporting DRDA (DB2 protocol) and others in future. That was me, although I've been bogged down lately, and haven't been able to get back to it. Has any of your modularisation work got into CVS yet ? DRDA, btw, is not just a DB2 protocol but an opengroup spec that hopefully will someday be *the* standard on the wire database protocol. DRDA handles prepare/execute and is completely binary in representation, among other advantages. What about extensibility - is there some predefined way of adding new types ? Also, does it handle NOTIFY ? Hannu ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] 7.3 schedule
Christopher Kings-Lynne [EMAIL PROTECTED] writes: thought out way of predicting/limiting their size. (2) How the heck do you get rid of obsoleted cached plans, if the things stick around in shared memory even after you start a new backend? (3) A shared cache requires locking; contention among multiple backends to access that shared resource could negate whatever performance benefit you might hope to realize from it. I don't understand all these locking problems? Searching the cache and inserting/deleting entries in the cache probably have to be mutually exclusive; concurrent insertions probably won't work either (at least not without a remarkably intelligent data structure). Unless the cache hit rate is remarkably high, there are going to be lots of insertions --- and, at steady state, an equal rate of deletions --- leading to lots of contention. This could possibly be avoided if the cache is not used for all query plans but only for explicitly PREPAREd plans, so that only explicit EXECUTEs would need to search it. But that approach also makes a sizable dent in the usefulness of the cache to begin with. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.3 schedule
Karel Zak [EMAIL PROTECTED] writes: * The patch allows store query-planns to shared memory and is possible EXECUTE it at more backends (over same DB) and planns are persistent across connetions. For this feature I create special memory context subsystem (like current aset.c, but it works with IPC shared memory). This is maybe too complex solution and (maybe) sufficient is cache query in one backend only. I know unbelief about this shared memory solution (Tom?). Yes, that is the part that was my sticking point last time around. (1) Because shared memory cannot be extended on-the-fly, I think it is a very bad idea to put data structures in there without some well thought out way of predicting/limiting their size. (2) How the heck do you get rid of obsoleted cached plans, if the things stick around in shared memory even after you start a new backend? (3) A shared cache requires locking; contention among multiple backends to access that shared resource could negate whatever performance benefit you might hope to realize from it. A per-backend cache kept in local memory avoids all of these problems, and I have seen no numbers to make me think that a shared plan cache would achieve significantly more performance benefit than a local one. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.3 schedule
On Fri, 12 Apr 2002 12:21:04 -0400 (EDT) Bruce Momjian [EMAIL PROTECTED] wrote: Tom Lane wrote: A per-backend cache kept in local memory avoids all of these problems, and I have seen no numbers to make me think that a shared plan cache would achieve significantly more performance benefit than a local one. Certainly a shared cache would be good for apps that connect to issue a single query frequently. In such cases, there would be no local cache to use. One problem with this kind of scenario is: what to do if the plan no longer exists for some reason? (e.g. the code that was supposed to be PREPARE-ing your statements failed to execute properly, or the cached plan has been evicted from shared memory, or the database was restarted, etc.) -- EXECUTE in and of itself won't have enough information to do anything useful. We could perhaps provide a means for an application to test for the existence of a cached plan (in which case the application developer will need to add logic to their application to re-prepare the query if necessary, which could get complicated). Cheers, Neil -- Neil Conway [EMAIL PROTECTED] PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.3 schedule
On Fri, Apr 12, 2002 at 12:41:34AM -0400, Neil Conway wrote: On Fri, 12 Apr 2002 12:58:01 +0900 Hiroshi Inoue [EMAIL PROTECTED] wrote: Just a confirmation. Someone is working on PREPARE/EXECUTE ? What about Karel's work ? Right question :-) I am. My work is based on Karel's stuff -- at the moment I'm still basically working on getting Karel's patch to play nicely with current sources; once that's done I'll be addressing whatever issues are stopping the code from getting into CVS. My patch (qcache) for PostgreSQL 7.0 is available at ftp://ftp2.zf.jcu.cz/users/zakkr/pg/. I very look forward to Neil's work on this. Notes: * It's experimental patch, but usable. All features below mentioned works. * PREPARE/EXECUTE is not only SQL statements, I think good idea is create something common and robus for query-plan caching, beacuse there is for example SPI too. The RI triggers are based on SPI_saveplan(). * My patch knows EXECUTE INTO feature: PREPARE foo AS SELECT * FROM pg_class WHERE relname ~~ $1 USING text; EXECUTE foo USING 'pg%';-- standard select EXECUTE foo INTO TEMP newtab USING 'pg%'; -- select into * The patch allows store query-planns to shared memory and is possible EXECUTE it at more backends (over same DB) and planns are persistent across connetions. For this feature I create special memory context subsystem (like current aset.c, but it works with IPC shared memory). This is maybe too complex solution and (maybe) sufficient is cache query in one backend only. I know unbelief about this shared memory solution (Tom?). Karel My experimental patch README (excuse my English): Implementation ~~ The qCache allows save queryTree and queryPlan. There is available are two space for data caching. LOCAL - data are cached in backend non-shared memory and data aren't available in other backends. SHARE - data are cached in backend shared memory and data are visible in all backends. Because size of share memory pool is limited and it is set during postmaster start up, the qCache must remove all old planns if pool is full. You can mark each entry as REMOVEABLE or NOTREMOVEABLE. A removeable entry is removed if pool is full. A not-removeable entry must be removed via qCache_Remove() or the other routines. The qCache not remove this entry itself. All records in qCache are cached (in the hash table) under some key. The qCache knows two alternate of key --- KEY_STRING and KEY_BINARY. The qCache API not allows access to shared memory, all cached planns that API returns are copy to CurrentMemoryContext. All (qCache_ ) routines lock shmem itself (exception is qCache_RemoveOldest_ShareRemoveAble()). - for locking is used spin lock. Memory management ~ The qCache use for qCache's shared pool its memory context independent on standard aset/mcxt, but use compatible API --- it allows to use standard palloc() (it is very needful for basic plan-tree operations, an example for copyObject()). The qCache memory management is very simular to current aset.c code. It is chunk-ed blocks too, but the block is smaller - 1024b. The number of blocks is available set in postmaster 'argv' via option '-Z'. For plan storing is used separate MemoryContext for each plan, it is good idea (Hiroshi's ?), bucause create new context is simple and inexpensive and allows easy destroy (free) cached plan. This method is used in my SPI overhaul instead TopMemoryContext feeding. Postmaster ~~ The query cache memory is init during potmaster startup. The size of query cache pool is set via '-Z number-of-blocks' switch --- default is 100 blocks where 1 block = 1024b, it is sufficient for 20-30 cached planns. One query needs somewhere 3-10 blocks, for example query like PREPARE sel AS SELECT * FROM pg_class; needs 10Kb, because table pg_class has very much columns. Note: for development I add SQL function: SELECT qcache_state();, this routine show usage of qCache. SPI ~~~ I a little overwrite SPI save plan method and remove TopMemoryContext feeding. Standard SPI: SPI_saveplan() - save each plan to separate standard memory context. SPI_freeplan() - free plan. By key SPI: It is SPI interface for query cache and allows save planns to SHARED or LOCAL cache 'by' arbitrary key (string or binary). Routines: SPI_saveplan_bykey()- save plan to query cache SPI_freeplan_bykey()- remove plan from query cache SPI_fetchplan_bykey() - fetch plan saved in query cache SPI_execp_bykey() - execute (via SPI) plan saved in query
Re: [HACKERS] 7.3 schedule
On Thu, 2002-04-11 at 22:48, Tom Lane wrote: Barry Lind [EMAIL PROTECTED] writes: ... Since we don't currently provide any information to the user on the relative cost of the parse, plan and execute phases, the end user is going to be guessing IMHO. You can in fact get that information fairly easily; set show_parser_stats, show_planner_stats, and show_executor_stats to 1 and then look in the postmaster log for the results. One thing that seems to be missing is backend ids for query stats - if I set log_timestamp = true log_pid = true then I get pid for query but _not_ for stats If I have many long-running queries then it is impossible to know which stats are for which query ;( Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.3 schedule
Neil Conway wrote: I would suggest using it any time you're executing the same query plan a large number of times. In my experience, this is very common. There are already hooks for this in many client interfaces: e.g. PrepareableStatement in JDBC and $dbh-prepare() in Perl DBI. I'm not sure that JDBC would use this feature directly. When a PreparableStatement is created in JDBC there is nothing that indicates how many times this statement is going to be used. Many (most IMHO) will be used only once. As I stated previously, this feature is only useful if you are going to end up using the PreparedStatement multiple times. If it only is used once, it will actually perform worse than without the feature (since you need to issue two sql statements to the backend to accomplish what you were doing in one before). Thus if someone wanted to use this functionality from jdbc they would need to do it manually, i.e. issue the prepare and execute statements manually instead of the jdbc driver doing it automatically for them. thanks, --Barry PS. I actually do believe that the proposed functionality is good and should be added (even though it may sound from the tone of my emails in this thread that that isn't the case :-) I just want to make sure that everyone understands that this doesn't solve the whole problem. And that more work needs to be done either in 7.3 or some future release. My fear is that everyone will view this work as being good enough such that the rest of the issues won't be addressed anytime soon. I only wish I was able to work on some of this myself, but I don't have the skills to hack on the backend too much. (However if someone really wanted a new feature in the jdbc driver in exchange, I'd be more than happy to help) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.3 schedule
Tom Lane wrote: Karel Zak [EMAIL PROTECTED] writes: * The patch allows store query-planns to shared memory and is possible EXECUTE it at more backends (over same DB) and planns are persistent across connetions. For this feature I create special memory context subsystem (like current aset.c, but it works with IPC shared memory). This is maybe too complex solution and (maybe) sufficient is cache query in one backend only. I know unbelief about this shared memory solution (Tom?). Yes, that is the part that was my sticking point last time around. (1) Because shared memory cannot be extended on-the-fly, I think it is a very bad idea to put data structures in there without some well thought out way of predicting/limiting their size. (2) How the heck do you get rid of obsoleted cached plans, if the things stick around in shared memory even after you start a new backend? (3) A shared cache requires locking; contention among multiple backends to access that shared resource could negate whatever performance benefit you might hope to realize from it. A per-backend cache kept in local memory avoids all of these problems, and I have seen no numbers to make me think that a shared plan cache would achieve significantly more performance benefit than a local one. Certainly a shared cache would be good for apps that connect to issue a single query frequently. In such cases, there would be no local cache to use. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] 7.3 schedule
Tom Lane wrote: Yes, that is the part that was my sticking point last time around. (1) Because shared memory cannot be extended on-the-fly, I think it is a very bad idea to put data structures in there without some well thought out way of predicting/limiting their size. (2) How the heck do you get rid of obsoleted cached plans, if the things stick around in shared memory even after you start a new backend? (3) A shared cache requires locking; contention among multiple backends to access that shared resource could negate whatever performance benefit you might hope to realize from it. A per-backend cache kept in local memory avoids all of these problems, and I have seen no numbers to make me think that a shared plan cache would achieve significantly more performance benefit than a local one. Oracle's implementation is a shared cache for all plans. This was introduced in Oracle 6 or 7 (I don't remember which anymore). The net effect was that in general there was a significant performance improvement with the shared cache. However poorly written apps can now bring the Oracle database to its knees because of the locking issues associated with the shared cache. For example if the most frequently run sql statements are coded poorly (i.e. they don't use bind variables, eg. 'select bar from foo where foobar = $1' vs. 'select bar from foo where foobar = || somevalue' (where somevalue is likely to be different on every call)) the shared cache doesn't help and its overhead becomes significant. thanks, --Barry ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.3 schedule
Barry Lind wrote: Oracle's implementation is a shared cache for all plans. This was introduced in Oracle 6 or 7 (I don't remember which anymore). The net effect was that in general there was a significant performance improvement with the shared cache. However poorly written apps can now bring the Oracle database to its knees because of the locking issues associated with the shared cache. For example if the most frequently run sql statements are coded poorly (i.e. they don't use bind variables, eg. 'select bar from foo where foobar = $1' vs. 'select bar from foo where foobar = || somevalue' (where somevalue is likely to be different on every call)) the shared cache doesn't help and its overhead becomes significant. This is very interesting. We have always been concerned that shared cache invalidation could cause more of a performance problem that the shared cache gives benefit, and it sounds like you are saying exactly that. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.3 schedule
Bruce Momjian [EMAIL PROTECTED] writes: Certainly a shared cache would be good for apps that connect to issue a single query frequently. In such cases, there would be no local cache to use. We have enough other problems with the single-query-per-connection scenario that I see no reason to believe that a shared plan cache will help materially. The correct answer for those folks will *always* be to find a way to reuse the connection. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] 7.3 schedule
Neil Conway wrote: On Fri, 12 Apr 2002 12:21:04 -0400 (EDT) Bruce Momjian [EMAIL PROTECTED] wrote: Tom Lane wrote: A per-backend cache kept in local memory avoids all of these problems, and I have seen no numbers to make me think that a shared plan cache would achieve significantly more performance benefit than a local one. Certainly a shared cache would be good for apps that connect to issue a single query frequently. In such cases, there would be no local cache to use. One problem with this kind of scenario is: what to do if the plan no longer exists for some reason? (e.g. the code that was supposed to be PREPARE-ing your statements failed to execute properly, or the cached plan has been evicted from shared memory, or the database was restarted, etc.) -- EXECUTE in and of itself won't have enough information to do anything useful. We could perhaps provide a means for an application to test for the existence of a cached plan (in which case the application developer will need to add logic to their application to re-prepare the query if necessary, which could get complicated). Oh, are you thinking that one backend would do the PREPARE and another one the EXECUTE? I can't see that working at all. I thought there would some way to quickly test if the submitted query was in the cache, but maybe that is too much of a performance penalty to be worth it. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.3 schedule
Bruce Momjian [EMAIL PROTECTED] writes: Oh, are you thinking that one backend would do the PREPARE and another one the EXECUTE? I can't see that working at all. Uh, why exactly were you advocating a shared cache then? Wouldn't that be exactly the *point* of a shared cache? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.3 schedule
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Oh, are you thinking that one backend would do the PREPARE and another one the EXECUTE? I can't see that working at all. Uh, why exactly were you advocating a shared cache then? Wouldn't that be exactly the *point* of a shared cache? I thought it would somehow compare the SQL query string to the cached plans and if it matched, it would use that plan rather than make a new one. Any DDL statement would flush the cache. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Scanner performance (was Re: [HACKERS] 7.3 schedule)
Tom Lane writes: We do have some numbers suggesting that the per-character loop in the lexer is slow enough to be a problem with very long literals. That is the overhead that might be avoided with a special protocol. Which loop is that? Doesn't the scanner use buffered input anyway? However, it should be noted that (AFAIK) no one has spent any effort at all on trying to make the lexer go faster. There is quite a bit of material in the flex documentation about performance considerations --- someone should take a look at it and see if we can get any wins by being smarter, without having to introduce protocol changes. My profiles show that the work spent in the scanner is really minuscule compared to everything else. The data appears to support a suspicion that I've had many moons ago that the binary search for the key words takes quite a bit of time: 0.220.06 66748/66748 yylex [125] [129]0.40.220.06 66748 base_yylex [129] 0.010.029191/9191yy_get_next_buffer [495] 0.020.00 32808/34053 ScanKeywordLookup [579] 0.000.01 16130/77100 MemoryContextStrdup [370] 0.000.004000/4000scanstr [1057] 0.000.004637/4637yy_get_previous_state [2158] 0.000.004554/4554base_yyrestart [2162] 0.000.004554/4554yywrap [2163] 0.000.00 1/1 base_yy_create_buffer [2852] 0.000.00 1/13695 base_yy_load_buffer_state [2107] I while ago I've experimented with hash functions for the key word lookup and got a speedup of factor 2.5, but again, this is really minor in the overall scheme of things. (The profile data is from a run of all the regression test files in order in one session.) -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] 7.3 schedule
Christopher Kings-Lynne wrote: Is anyone feeling we have the 7.3 release nearing? No way! I certainly am not. I can imagine us going for several more months like this, perhaps through August. Easily. I think that the critical path is Tom's schema support. We'll need a good beta period this time, because of: * Schemas * Prepare/Execute maybe What are the chances that the BE/FE will be altered to take advantage of prepare / execute? Or is it something that will never happen? * Domains Chris Ashley Cambrell ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] 7.3 schedule
On Thu, 11 Apr 2002, Bruce Momjian wrote: Is anyone feeling we have the 7.3 release nearing? I certainly am not. I can imagine us going for several more months like this, perhaps through August. seeing as how we just released v7.2, I don't see a v7.3 even going beta until end of Summer ... I personally consider July/August to be relatively dead months since too much turnover of ppl going on holidays with their kids ... right now, I'm kinda seeing Sept 1st/Labour Day Weekend timeframe from going Beta ... ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.3 schedule
Barry Lind [EMAIL PROTECTED] writes: ... Since we don't currently provide any information to the user on the relative cost of the parse, plan and execute phases, the end user is going to be guessing IMHO. You can in fact get that information fairly easily; set show_parser_stats, show_planner_stats, and show_executor_stats to 1 and then look in the postmaster log for the results. (Although to be fair, this does not provide any accounting for the CPU time expended simply to *receive* the query string, which might be non negligible for huge queries.) It would be interesting to see some stats for the large-BLOB scenarios being debated here. You could get more support for the position that something should be done if you had numbers to back it up. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.3 schedule
Tom Lane wrote: It would be interesting to see some stats for the large-BLOB scenarios being debated here. You could get more support for the position that something should be done if you had numbers to back it up. Below are some stats you did a few months ago when I was asking a related question. Your summary was: Bottom line: feeding huge strings through the lexer is slow. --Barry Tom Lane wrote: Barry Lind [EMAIL PROTECTED] writes: In looking at some performance issues (I was trying to look at the overhead of toast) I found that large insert statements were very slow. ... ... I got around to reproducing this today, and what I find is that the majority of the backend time is going into simple scanning of the input statement: Each sample counts as 0.01 seconds. % cumulative self self totaltime seconds secondscalls ms/call ms/call name 31.24 11.90 11.90 _mcount 19.51 19.33 7.4310097 0.74 1.06 base_yylex 7.48 22.18 2.85 21953666 0.00 0.00 appendStringInfoChar 5.88 24.42 2.24 776 2.89 2.89 pglz_compress 4.36 26.08 1.66 21954441 0.00 0.00 pq_getbyte 3.57 27.44 1.36 7852141 0.00 0.00 addlit 3.26 28.68 1.24 1552 0.80 0.81 scanstr 2.84 29.76 1.08 779 1.39 7.18 pq_getstring 2.31 30.64 0.8810171 0.09 0.09 _doprnt 2.26 31.50 0.86 776 1.11 1.11 byteain 2.07 32.29 0.79 msquadloop 1.60 32.90 0.61 7931430 0.00 0.00 memcpy 1.18 33.35 0.45 chunks 1.08 33.76 0.4146160 0.01 0.01 strlen 1.08 34.17 0.41 encore 1.05 34.57 0.40 8541 0.05 0.05 XLogInsert 0.89 34.91 0.34 appendStringInfo 60% of the call graph time is accounted for by these two areas: index % timeself childrencalled name 7.433.32 10097/10097 yylex [14] [13]41.07.433.32 10097 base_yylex [13] 1.360.61 7852141/7852141 addlit [28] 1.240.011552/1552scanstr [30] 0.020.033108/3108ScanKeywordLookup [99] 0.000.022335/2335yy_get_next_buffer [144] 0.020.00 776/781 strtol [155] 0.000.01 777/3920MemoryContextStrdup [108] 0.000.00 1/1 base_yy_create_buffer [560] 0.000.004675/17091 isupper [617] 0.000.001556/1556yy_get_previous_state [671] 0.000.00 779/779 yywrap [706] 0.000.00 1/2337 base_yy_load_buffer_state [654] --- 1.084.51 779/779 pq_getstr [17] [18]21.41.084.51 779 pq_getstring [18] 2.850.00 21953662/21953666 appendStringInfoChar [20] 1.660.00 21954441/21954441 pq_getbyte [29] --- While we could probably do a little bit to speed up pg_getstring and its children, it's not clear that we can do anything about yylex, which is flex output code not handmade code, and is probably well-tuned already. Bottom line: feeding huge strings through the lexer is slow. regards, tom lane It would be interesting to see some stats for the large-BLOB scenarios being debated here. You could get more support for the position that something should be done if you had numbers to back it up. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] 7.3 schedule
Tom Lane wrote: I'm not sure I believe Hannu's numbers, but in any case they're fairly irrelevant to the argument about whether a special protocol is useful. He wasn't testing textually-long queries, but rather the planning overhead, which is more or less independent of the length of any literal constants involved (especially if they're not part of the WHERE clause). Saving query plans via PREPARE seems quite sufficient, and appropriate, to tackle the planner-overhead issue. Just a confirmation. Someone is working on PREPARE/EXECUTE ? What about Karel's work ? regards, Hiroshi Inoue ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] 7.3 schedule
On Fri, 12 Apr 2002 12:58:01 +0900 Hiroshi Inoue [EMAIL PROTECTED] wrote: Tom Lane wrote: I'm not sure I believe Hannu's numbers, but in any case they're fairly irrelevant to the argument about whether a special protocol is useful. He wasn't testing textually-long queries, but rather the planning overhead, which is more or less independent of the length of any literal constants involved (especially if they're not part of the WHERE clause). Saving query plans via PREPARE seems quite sufficient, and appropriate, to tackle the planner-overhead issue. Just a confirmation. Someone is working on PREPARE/EXECUTE ? What about Karel's work ? I am. My work is based on Karel's stuff -- at the moment I'm still basically working on getting Karel's patch to play nicely with current sources; once that's done I'll be addressing whatever issues are stopping the code from getting into CVS. Cheers, Neil -- Neil Conway [EMAIL PROTECTED] PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] 7.3 schedule
Neil Conway [EMAIL PROTECTED] writes: On the other hand, there are already a few reasons to make some changes to the FE/BE protocol (NOTIFY messages, transaction state, and now possibly PREPARE/EXECUTE -- anything else?). Passing EXECUTE parameters without having them go through the parser could possibly be done without a protocol change: use the 'fast path' function-call code to pass binary parameters to a function that is otherwise equivalent to EXECUTE. On the other hand, the 'fast path' protocol itself is pretty horribly misdesigned, and I'm not sure I want to encourage more use of it until we can get it cleaned up (see the comments in backend/tcop/fastpath.c). Aside from lack of robustness, I'm not sure it can work at all for functions that don't have prespecified types and numbers of parameters. The FE/BE COPY protocol is also horrible. So yeah, there are a bunch of things we *could* fix if we were ready to take on a protocol change. My own thought is this might be better held for 7.4, though. We are already going to be causing application programmers a lot of pain with the schema changes and ensuing system-catalog revisions. That might be enough on their plates for this cycle. In any case, for the moment I think it's fine to be working on PREPARE/EXECUTE support at the SQL level. We can worry about adding a parser bypass for EXECUTE parameters later. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 7.3 schedule
We'll need a good beta period this time, because of: I know it's a sore subject, but how about ALTER TABLE DROP COLUMN this time around? I've been hearing about it for years now. :) - brandon c: 646-456-5455h: 201-798-4983 b. palmer, [EMAIL PROTECTED] pgp:crimelabs.net/bpalmer.pgp5 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.3 schedule
Nicolas Bazin writes: For the next release and package it would be good to differentiate the release candidate to the proper release. They do have different names. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.3 schedule
On Thu, 11 Apr 2002 16:25:24 +1000 Ashley Cambrell [EMAIL PROTECTED] wrote: What are the chances that the BE/FE will be altered to take advantage of prepare / execute? Or is it something that will never happen? Is there a need for this? The current patch I'm working on just does everything using SQL statements, which I don't think is too bad (the typical client programmer won't actually need to see them, their interface should wrap the PREPARE/EXECUTE stuff for them). On the other hand, there are already a few reasons to make some changes to the FE/BE protocol (NOTIFY messages, transaction state, and now possibly PREPARE/EXECUTE -- anything else?). IMHO, each of these isn't worth changing the protocol by itself, but perhaps if we can get all 3 in one swell foop it might be a good idea... Cheers, Neil -- Neil Conway [EMAIL PROTECTED] PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 7.3 schedule
On Thu, 2002-04-11 at 18:14, Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: On the other hand, there are already a few reasons to make some changes to the FE/BE protocol (NOTIFY messages, transaction state, and now possibly PREPARE/EXECUTE -- anything else?). Passing EXECUTE parameters without having them go through the parser could possibly be done without a protocol change: use the 'fast path' function-call code to pass binary parameters to a function that is otherwise equivalent to EXECUTE. On the other hand, the 'fast path' protocol itself is pretty horribly misdesigned, and I'm not sure I want to encourage more use of it until we can get it cleaned up (see the comments in backend/tcop/fastpath.c). Aside from lack of robustness, I'm not sure it can work at all for functions that don't have prespecified types and numbers of parameters. The FE/BE COPY protocol is also horrible. So yeah, there are a bunch of things we *could* fix if we were ready to take on a protocol change. Also _universal_ binary on-wire representation for types would be a good thing. There already are slots in pg_type for functions to do that. By doing so we could also avoid parsing text representations of field data. My own thought is this might be better held for 7.4, though. We are already going to be causing application programmers a lot of pain with the schema changes and ensuing system-catalog revisions. That might be enough on their plates for this cycle. In any case, for the moment I think it's fine to be working on PREPARE/EXECUTE support at the SQL level. We can worry about adding a parser bypass for EXECUTE parameters later. IIRC someone started work on modularising the network-related parts with a goal of supporting DRDA (DB2 protocol) and others in future. - Hannu ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.3 schedule
Neil Conway wrote: On Thu, 11 Apr 2002 16:25:24 +1000 Ashley Cambrell [EMAIL PROTECTED] wrote: What are the chances that the BE/FE will be altered to take advantage of prepare / execute? Or is it something that will never happen? Is there a need for this? The current patch I'm working on just does everything using SQL statements, which I don't think is too bad (the typical client programmer won't actually need to see them, their interface should wrap the PREPARE/EXECUTE stuff for them). Yes there is a need. If you break up the query into roughly three stages of execution: parse, plan, and execute, each of these can be the performance bottleneck. The parse can be the performance bottleneck when passing large values as data to the parser (eg. inserting one row containing a 100K value will result in a 100K+ sized statement that needs to be parsed, parsing will take a long time, but the planning and execution should be relatively short). The planning stage can be a bottleneck for complex queries. And of course the execution stage can be a bottleneck for all sorts of reasons (eg. bad plans, missing indexes, bad statistics, poorly written sql, etc.). So if you look at the three stages (parse, plan, execute) we have a lot of tools, tips, and techniques for making the execute faster. We have some tools (at least on the server side via SPI, and plpgsql) to help minimize the planning costs by reusing plans. But there doesn't exist much to help with the parsing cost of large values (actually the fastpath API does help in this regard, but everytime I mention it Tom responds that the fastpath API should be avoided). So when I look at the proposal for the prepare/execute stuff: PREPARE plan AS query; EXECUTE plan USING parameters; DEALLOCATE plan; Executing a sql statement today is the following: insert into table values (stuff); which does one parse, one plan, one execute under the new functionality: prepare plan as insert into table values (stuff); execute plan using stuff; which does two parses, one plan, one execute which obviously isn't a win unless you end up reusing the plan many times. So lets look at the case of reusing the plan multiple times: prepare plan as insert into table values (stuff); execute plan using stuff; execute plan using stuff; ... which does n+1 parses, one plan, n executes so this is a win if the cost of the planing stage is significant compared to the costs of the parse and execute stages. If the cost of the plan is not significant there is little if any benefit in doing this. I realize that there are situations where this functionality will be a big win. But I question how the typical user of postgres will know when they should use this functionality and when they shouldn't. Since we don't currently provide any information to the user on the relative cost of the parse, plan and execute phases, the end user is going to be guessing IMHO. What I think would be a clear win would be if we could get the above senario of multiple inserts down to one parse, one plan, n executes, and n binds (where binding is simply the operation of plugging values into the statement without having to pipe the values through the parser). This would be a win in most if not all circumstances where the same statement is executed many times. I think it would also be nice if the new explain anaylze showed times for the parsing and planning stages in addition to the execution stage which it currently shows so there is more information for the end user on what approach they should take. thanks, --Barry On the other hand, there are already a few reasons to make some changes to the FE/BE protocol (NOTIFY messages, transaction state, and now possibly PREPARE/EXECUTE -- anything else?). IMHO, each of these isn't worth changing the protocol by itself, but perhaps if we can get all 3 in one swell foop it might be a good idea... Cheers, Neil ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.3 schedule
Neil Conway wrote: On Thu, 11 Apr 2002 16:25:24 +1000 Ashley Cambrell [EMAIL PROTECTED] wrote: What are the chances that the BE/FE will be altered to take advantage of prepare / execute? Or is it something that will never happen? Is there a need for this? The current patch I'm working on just does everything using SQL statements, which I don't think is too bad (the typical client programmer won't actually need to see them, their interface should wrap the PREPARE/EXECUTE stuff for them). I remember an email Hannu sent (I originally thought Tome sent it but I found the email*) that said postgresql spends a lot of time parsing sql (compared to oracle), so if the BE/FE and libpq were extended to support pg_prepare / pg_bind, then it might make repetitive queries quicker. if we could save half of parse/optimise time by saving query plans, then the backend performance would go up from 1097 to 10/(91.1-16.2)=1335 updates/sec. Hannu's email doesn't seem to be in google groups, but it's titled Oracle vs PostgreSQL in real life (2002-03-01). I can attach it if people can't find it. On the other hand, there are already a few reasons to make some changes to the FE/BE protocol (NOTIFY messages, transaction state, and now possibly PREPARE/EXECUTE -- anything else?). IMHO, each of these isn't worth changing the protocol by itself, but perhaps if we can get all 3 in one swell foop it might be a good idea... Passing on a possible 1/3 speed improvement doesn't sound like a bad thing.. :-) Hannu: You mentioned that you already had an experimental patch that did it? Was that the same sort of thing as Neil's patch (SPI), or did it include a libpq patch as well? Cheers, Neil Ashley Cambrell ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.3 schedule
On Thu, 11 Apr 2002 11:38:33 -0700 Barry Lind [EMAIL PROTECTED] wrote: Neil Conway wrote: On Thu, 11 Apr 2002 16:25:24 +1000 Ashley Cambrell [EMAIL PROTECTED] wrote: What are the chances that the BE/FE will be altered to take advantage of prepare / execute? Or is it something that will never happen? Is there a need for this? The current patch I'm working on just does everything using SQL statements, which I don't think is too bad (the typical client programmer won't actually need to see them, their interface should wrap the PREPARE/EXECUTE stuff for them). Yes there is a need. Right -- I would agree that such functionality would be nice to have. What I meant was is there a need for this in order to implement PREPARE/EXECUTE? IMHO, no -- the two features are largely orthogonal. If you break up the query into roughly three stages of execution: parse, plan, and execute, each of these can be the performance bottleneck. The parse can be the performance bottleneck when passing large values as data to the parser (eg. inserting one row containing a 100K value will result in a 100K+ sized statement that needs to be parsed, parsing will take a long time, but the planning and execution should be relatively short). If you're inserting 100KB of data, I'd expect the time to insert that into tables, update relevent indexes, etc. to be larger than the time to parse the query (i.e. execution parsing). But I may well be wrong, I haven't done any benchmarks. Executing a sql statement today is the following: insert into table values (stuff); which does one parse, one plan, one execute You're assuming that the cost of the parse step for the EXECUTE statement is the same as parse for the original query, which will often not be the case (parsing the EXECUTE statement will be cheaper). so this is a win if the cost of the planing stage is significant compared to the costs of the parse and execute stages. If the cost of the plan is not significant there is little if any benefit in doing this. I realize that there are situations where this functionality will be a big win. But I question how the typical user of postgres will know when they should use this functionality and when they shouldn't. I would suggest using it any time you're executing the same query plan a large number of times. In my experience, this is very common. There are already hooks for this in many client interfaces: e.g. PrepareableStatement in JDBC and $dbh-prepare() in Perl DBI. What I think would be a clear win would be if we could get the above senario of multiple inserts down to one parse, one plan, n executes, and n binds This behavior would be better, but I think the current solution is still a clear win, and good enough for now. I'd prefer that we worry about implementing PREPARE/EXECUTE for now, and deal with query binding/BLOB parser-shortcuts later -- perhaps with an FE/BE protocol in 7.4 as Tom suggested. Cheers, Neil -- Neil Conway [EMAIL PROTECTED] PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.3 schedule
On 11 Apr 2002, Hannu Krosing wrote: IIRC someone started work on modularising the network-related parts with a goal of supporting DRDA (DB2 protocol) and others in future. That was me, although I've been bogged down lately, and haven't been able to get back to it. DRDA, btw, is not just a DB2 protocol but an opengroup spec that hopefully will someday be *the* standard on the wire database protocol. DRDA handles prepare/execute and is completely binary in representation, among other advantages. Brian ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] 7.3 schedule
Ashley Cambrell [EMAIL PROTECTED] writes: I remember an email Hannu sent (I originally thought Tome sent it but I found the email*) that said postgresql spends a lot of time parsing sql (compared to oracle), so if the BE/FE and libpq were extended to support pg_prepare / pg_bind, then it might make repetitive queries quicker. I'm not sure I believe Hannu's numbers, but in any case they're fairly irrelevant to the argument about whether a special protocol is useful. He wasn't testing textually-long queries, but rather the planning overhead, which is more or less independent of the length of any literal constants involved (especially if they're not part of the WHERE clause). Saving query plans via PREPARE seems quite sufficient, and appropriate, to tackle the planner-overhead issue. We do have some numbers suggesting that the per-character loop in the lexer is slow enough to be a problem with very long literals. That is the overhead that might be avoided with a special protocol. However, it should be noted that (AFAIK) no one has spent any effort at all on trying to make the lexer go faster. There is quite a bit of material in the flex documentation about performance considerations --- someone should take a look at it and see if we can get any wins by being smarter, without having to introduce protocol changes. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.3 schedule
Is anyone feeling we have the 7.3 release nearing? No way! I certainly am not. I can imagine us going for several more months like this, perhaps through August. Easily. I think that the critical path is Tom's schema support. We'll need a good beta period this time, because of: * Schemas * Prepare/Execute maybe * Domains Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.3 schedule
Christopher Kings-Lynne wrote: Is anyone feeling we have the 7.3 release nearing? No way! Good. I certainly am not. I can imagine us going for several more months like this, perhaps through August. Easily. I think that the critical path is Tom's schema support. We'll need a good beta period this time, because of: * Schemas * Prepare/Execute maybe * Domains I guess I am hoping for even more killer features for this release. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster