Re: [HACKERS] Odd out of memory problem.
On Thu, Mar 29, 2012 at 7:38 PM, Peter Eisentraut wrote: > On tis, 2012-03-27 at 00:53 +0100, Greg Stark wrote: >> Hm. So my original plan was dependent on adding the state-merge >> function we've talked about in the past. Not all aggregate functions >> necessarily can support such a function but I think all or nearly all >> the builtin aggregates can. Certainly min,max, count, sum, avg, >> stddev, array_agg can which are most of what people do. That would be >> a function which can take two state variables and produce a new state >> variable. > > This information could also be useful to have in PL/Proxy (or similar > FDWs) to be able to integrate aggregate computation into the language. > Currently, you always have to do the state merging yourself. > I don't know exactly how PL/Proxy or pgpool accomplish the multi-phase aggregate, but in theory the proposal above is state-merge function, so it doesn't apply to general aggregate results that passed through the final function. Of course some functions that don't have final functions are ok to call state-merge function on the results. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] measuring lwlock-related latency spikes
On Fri, Mar 30, 2012 at 1:55 PM, Robert Haas wrote: > Actually, what is really bugging me is that I cannot find any way of > getting a profile that reflects the *time* spent waiting rather than > merely the *number* of waits. That seems like an obvious thing to > want, and I cannot find a way to get it. I ended up taking a stab at writing some userspace code for this - see attached. It measures the amount of time taken for each contended lwlock acquisition and prints a crude histogram of the results at postmaster-shutdown time. At least on the IBM POWER7 machine, running pgbench with 32 threads, where pg_test_timing shows that getting the time takes less than a microsecond 96%+ of the time, this seemed to have no real impact on the tps numbers - perhaps because the workload is I/O bound. Risking the possible ire of people who object to large attachments, I'm attaching the results this generated on a 30-minute, 32-thread pgbench run at scale factor 300. To minimize said ire, I've run the file through bzip2. What's interesting about this is that, while there is plenty of waiting for the usual suspects - ProcArrayLock (4), WALInsertLock (7), and CLogControlLock (11), the waits are all pretty short: 2012-03-31 02:33:25 UTC [50305] LOG: lock 0: 2:520, 3: 2012-03-31 02:33:25 UTC [50305] LOG: lock 3: 1:9, 2:36838 2012-03-31 02:33:25 UTC [50305] LOG: lock 4: 1:33, 2:216964 2012-03-31 02:33:25 UTC [50305] LOG: lock 7: 1:39, 2:406249 2012-03-31 02:33:25 UTC [50305] LOG: lock 8: 1:4, 2:34 2012-03-31 02:33:25 UTC [50305] LOG: lock 11: 1:99, 2:374559 2012-03-31 02:33:25 UTC [50305] LOG: lock 17: 2:24, 3:24 That's saying that there were over 400,000 contended acquisitions of WALInsertLock, but the longest one had fls(wait_time_in_us) = 2, or in other words it took less than 4us to get the lock. So what happens if we grep the log file for the biggest offenders? 2012-03-31 02:33:25 UTC [50305] LOG: lock 204610: 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 272958: 23:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 325412: 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 325784: 21:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 360016: 5:1, 21:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 444886: 23:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 499890: 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 533418: 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 610484: 6:1, 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 897798: 22:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 1027128: 7:1, 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 1074256: 5:1, 21:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 1132586: 5:1, 23:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 1312178: 16:1, 22:1 If I'm reading this right, fls(wait_time_in_us) = 20 means a 1 second delay, which I think means that a couple of those waits were >= *8 seconds*. gdb reveals that in the test configuration, all odd numbered locks between 169 and 2097319 are some buffer's io_in_progress_lock, and all even numbered locks between 170 and 2097320 are some buffer's content_lock, which means, if I'm not confused here, that every single lwlock-related stall > 1s happened while waiting for a buffer content lock. Moreover, each event affected a different buffer. I find this result so surprising that I have a hard time believing that I haven't screwed something up, so if anybody can check over the patch and this analysis and suggest what that thing might be, I would appreciate it. I would be a lot less surprised if the waits revolved around the IO-in-progress locks, since it's not that hard to imagine an I/O taking a really long time on a busy system. But I didn't think we were ever supposed to hold content locks for that long. The other thing that baffles me about these numbers is that they don't provide any fodder for explaining the periodic drops in throughput that happen when the system checkpoints. I had assumed they would show up as long lwlock waits, like somebody hanging on to WALInsertLock while everybody else in the system piles up behind them. That's not reflected in these numbers - the few very long waits show just ONE guy waiting a really long time for the lock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company lwhistogram.patch Description: Binary data log.wp.lwhistogram.32.300.1800.bz2 Description: BZip2 compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking context switches with perf record
On Fri, Mar 30, 2012 at 10:07 PM, Jeff Janes wrote: > Which I've now done, and it actually goes the other way. with -F50, > not only do almost all pgbench_accounts updates still lead to a clean, > but a good chunk of the updates to pgbench_tellers lead to a clean as > well. If cleans are going to happen so often, would it make sense to > write a combined record for clean+hot_update? I think that'd be hard to arrange. You might thing that the prune happens when we go to update and find that there isn't room on the page, but that's actually not how it works at all: # Ideally we would do defragmenting only when we are about to attempt # heap_update on a HOT-safe tuple. The difficulty with this approach # is that the update query has certainly got a pin on the old tuple, and # therefore our attempt to acquire a buffer cleanup lock will always fail. # (This corresponds to the idea that we don't want to move the old tuple # out from under where the query's HeapTuple pointer points. It might # be possible to finesse that, but it seems fragile.) So what we do instead is: # The currently planned heuristic is to prune and defrag when first accessing # a page that potentially has prunable tuples (as flagged by the pd_prune_xid # page hint field) and that either has free space less than MAX(fillfactor # target free space, BLCKSZ/10) *or* has recently had an UPDATE fail to # find enough free space to store an updated tuple version. (These rules # are subject to change.) So, in other words, the prune happens before we have any idea that we're going to attempt an update. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking context switches with perf record
On Fri, Mar 30, 2012 at 9:27 AM, Robert Haas wrote: > > The more interesting waits, in my view anyway, are the ones that come > from LWLockAcquire, which account for nearly all of the semaphore > sleeps. As you can see, XLogInsert accounts for over half of those, > and ProcArrayEndTransaction and SimpleLruReadPage_ReadOnly account for > most of the rest. Just out of curiosity, I expanded a couple of those > another level, and you can see what's going on there, too. I was > slightly surprised by the number of calls to log_heap_clean, since I > wouldn't have expected vacuuming to generate that much work; it turns > out that 99.9% of those are HOT pruning events happening during query > execution. I've generally seen a one to one correspondence between log_heap_clean and updates to pgbench_accounts (not by sampling waits, but by wal_debug=1). I figured this was because the pgbench_accounts blocks are packed to the gills by default, so the only way to make room for the next HOT update is the prune the dead tuple left over from the previous HOT update in that same block. It might be interesting to see if those go away if run with pgbench -i -F90 or some other value less than 100. Which I've now done, and it actually goes the other way. with -F50, not only do almost all pgbench_accounts updates still lead to a clean, but a good chunk of the updates to pgbench_tellers lead to a clean as well. If cleans are going to happen so often, would it make sense to write a combined record for clean+hot_update? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed
Tom Lane wrote: Ants Aasma writes: A user complained on pgsql-performance that SELECT col FROM table GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe to return tuples from hash-aggregate as they are found when no aggregate functions are in use. Attached is a first shot at that. As I commented in the other thread, the user would be a lot better off if he'd had an index on the column in question. I'm not sure it's worth complicating the hashagg logic when an indexscan + groupagg would address the case better. Would this patch help in the case where "table" is actually a set-returning function, and thus can't have an index? (I don't yet know enough about the tree to know when hashaggs get used). I'm wondering if this is a useful exception to the "restrictions can't get pushed down through GROUP BYs" rule. Jay -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
2012/3/29 Marko Kreen > I can't find a place where WAL replay updates values under XLogCtl. > If that really does not happen, that would explain why standbys can > see wrong epoch. > > No clue yet how master can get broken. > > Details about environment: Debian Linux db 2.6.32-5-amd64 #1 SMP Fri Sep 9 20:23:16 UTC 2011 x86_64 GNU/Linux # SELECT version(); PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
I wrote: >> Changes from previous version >> = >> >> 1) Don't use remote EXPLAIN for cost/rows estimation, so now planner >> estimates result rows and costs on the basis of local statistics such >> as pg_class and pg_statistic. To update local statistics, I added >> pgsql_fdw_analyze() SQL function which updates local statistics of a >> foreign table by retrieving remote statistics, such as pg_class and >> pg_statistic, via libpq. This would make the planning of pgsql_fdw >> simple and fast. This function can be easily modified to handle >> ANALYZE command invoked for a foreign table (Fujita-san is proposing >> this as common feature in another thread). > I see the advantage of being able to do all this locally, but > I think there are a lot of downsides too: I found another limitation of this approach: pgsql_fdw_analyze() has to run as a user who can update pg_statistic, and this user needs a user mapping to a remote user who can read pg_statistic. This is not necessary for normal operation and needs to be configured specifically for getting remote statistics. This is cumbersome, and people might be unhappy to have to create user mappings for highly privileged remote users. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
I wrote: >> How about getting # of rows estimate by executing EXPLAIN for >> fully-fledged remote query (IOW, contains pushed-down WHERE clause), and >> estimate selectivity of local filter on the basis of the statistics >> which are generated by FDW via do_analyze_rel() and FDW-specific >> sampling function? In this design, we would be able to use quite >> correct rows estimate because we can consider filtering stuffs done on >> each side separately, though it requires expensive remote EXPLAIN for >> each possible path. > > That sounds nice. ... but it still suffers from the problems of local statistics for remote tables I pointed out. I think that these shortcomings are not justified by the gain of one client-server round trip less during planning. I'd prefer if pgsql_fdw were not dependent on remote statistics stored in the local database. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Marko Kreen writes: > On Sat, Mar 31, 2012 at 1:13 AM, Tom Lane wrote: >> Not if the message is a constant string, which seems like the typical >> situation (think "out of memory"). If the row processor does need a >> buffer for a constructed string, it could make use of some space in its >> "void *param" area, for instance. > If it's specified as string that libpq does not own, then I'm fine with it. Check. Let's make it "const char **" in fact, just to be clear on that. >> (This is why getAnotherTuple is careful to clear the failed result >> before it tries to build a new one. But that trick isn't going to be >> available to an external row processor.) > Kyotaro's original idea was to assume out-of-memory if error > string was not set, thus the callback needed to set the string > only when it really had something to say. Hmm. We could still do that in conjunction with the idea of returning the string from the row processor, but I'm not sure if it's useful or just overly cute. [ thinks... ] A small advantage of assuming NULL means that is that we could postpone the libpq_gettext("out of memory") call until after clearing the overflowed PGresult, which would greatly improve the odds of getting a nicely translated result and not just ASCII. Might be worth it just for that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Sat, Mar 31, 2012 at 1:13 AM, Tom Lane wrote: > Marko Kreen writes: >> On Fri, Mar 30, 2012 at 05:18:42PM -0400, Tom Lane wrote: >>> I'm pretty dissatisfied with the error reporting situation for row >>> processors. You can't just decide not to solve it, which seems to be >>> the current state of affairs. What I'm inclined to do is to add a >>> "char **" parameter to the row processor, and say that when the >>> processor returns -1 it can store an error message string there. > >> But such API seems to require specifying allocator, which seems ugly. > > Not if the message is a constant string, which seems like the typical > situation (think "out of memory"). If the row processor does need a > buffer for a constructed string, it could make use of some space in its > "void *param" area, for instance. If it's specified as string that libpq does not own, then I'm fine with it. >> I think it would be better to just use Kyotaro's original idea >> of PQsetRowProcessorError() which nicer to use. > > I don't particularly care for that idea because it opens up all sorts of > potential issues when such a function is called at the wrong time. > Moreover, you have to remember that the typical situation here is that > we're going to be out of memory or otherwise in trouble, which means > you've got to be really circumspect about what you assume will work. > Row processors that think they can do a lot of fancy message > construction should be discouraged, and an API that requires > construction of a new PGresult in order to return an error is right out. > (This is why getAnotherTuple is careful to clear the failed result > before it tries to build a new one. But that trick isn't going to be > available to an external row processor.) Kyotaro's original idea was to assume out-of-memory if error string was not set, thus the callback needed to set the string only when it really had something to say. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking context switches with perf record
On Friday, March 30, 2012 06:27:36 PM Robert Haas wrote: > Probability=No, score=-3.2 required=5.0 tests=AWL,BAYES_00, > RCVD_IN_DNSWL_LOW autolearn=ham version=3.2.5 I've been playing around > with "perf record" on the IBM POWER7 machine, and it's pretty cool. One > of the things I don't like is that some of the tools don't produce very > nice text reports that you can cut and paste into an email - it's kind of > a text-based GUI where you can zoom around and look at things. Still, > it's nice. You can use --stdio to force a more plain text interface (you need perf annotate manually if you want line-by-line view) if you want so. Especially if you pipe it through cat or so. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Marko Kreen writes: > On Fri, Mar 30, 2012 at 05:18:42PM -0400, Tom Lane wrote: >> I'm pretty dissatisfied with the error reporting situation for row >> processors. You can't just decide not to solve it, which seems to be >> the current state of affairs. What I'm inclined to do is to add a >> "char **" parameter to the row processor, and say that when the >> processor returns -1 it can store an error message string there. > But such API seems to require specifying allocator, which seems ugly. Not if the message is a constant string, which seems like the typical situation (think "out of memory"). If the row processor does need a buffer for a constructed string, it could make use of some space in its "void *param" area, for instance. > I think it would be better to just use Kyotaro's original idea > of PQsetRowProcessorError() which nicer to use. I don't particularly care for that idea because it opens up all sorts of potential issues when such a function is called at the wrong time. Moreover, you have to remember that the typical situation here is that we're going to be out of memory or otherwise in trouble, which means you've got to be really circumspect about what you assume will work. Row processors that think they can do a lot of fancy message construction should be discouraged, and an API that requires construction of a new PGresult in order to return an error is right out. (This is why getAnotherTuple is careful to clear the failed result before it tries to build a new one. But that trick isn't going to be available to an external row processor.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Fri, Mar 30, 2012 at 05:18:42PM -0400, Tom Lane wrote: > Marko Kreen writes: > > On Wed, Mar 07, 2012 at 03:14:57PM +0900, Kyotaro HORIGUCHI wrote: > >>> My suggestion - check in getAnotherTuple whether resultStatus is > >>> already error and do nothing then. This allows internal pqAddRow > >>> to set regular "out of memory" error. Otherwise give generic > >>> "row processor error". > > >> Current implement seems already doing this in > >> parseInput3(). Could you give me further explanation? > > > The suggestion was about getAnotherTuple() - currently it sets > > always "error in row processor". With such check, the callback > > can set the error result itself. Currently only callbacks that > > live inside libpq can set errors, but if we happen to expose > > error-setting function in outside API, then the getAnotherTuple() > > would already be ready for it. > > I'm pretty dissatisfied with the error reporting situation for row > processors. You can't just decide not to solve it, which seems to be > the current state of affairs. What I'm inclined to do is to add a > "char **" parameter to the row processor, and say that when the > processor returns -1 it can store an error message string there. > If it does so, that's what we report. If it doesn't (which we'd detect > by presetting the value to NULL), then use a generic "error in row > processor" message. This is cheap and doesn't prevent the row processor > from using some application-specific error reporting method if it wants; > but it does allow the processor to make use of libpq's error mechanism > when that's preferable. Yeah. But such API seems to require specifying allocator, which seems ugly. I think it would be better to just use Kyotaro's original idea of PQsetRowProcessorError() which nicer to use. Few thoughts on the issue: -- As libpq already provides quite good coverage of PGresult manipulation APIs, then how about: void PQsetResultError(PGresult *res, const char *msg); that does: res->errMsg = pqResultStrdup(msg); res->resultStatus = PGRES_FATAL_ERROR; that would also cause minimal fuss in getAnotherTuple(). I would actually like even more: void PQsetConnectionError(PGconn *conn, const char *msg); that does full-blown libpq error logic. Thus it would be useful everywherewhere in libpq. But it seems bit too disruptive, so I would like a ACK from a somebody who knows libpq better. (well, from you...) --- Another thought - if we have API to set error from *outside* of row-processor callback, that would immediately solve the "how to skip incoming resultset without buffering it" problem. And it would be usable for PQgetRow()/PQrecvRow() too. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Marko Kreen writes: > On Wed, Mar 07, 2012 at 03:14:57PM +0900, Kyotaro HORIGUCHI wrote: >>> My suggestion - check in getAnotherTuple whether resultStatus is >>> already error and do nothing then. This allows internal pqAddRow >>> to set regular "out of memory" error. Otherwise give generic >>> "row processor error". >> Current implement seems already doing this in >> parseInput3(). Could you give me further explanation? > The suggestion was about getAnotherTuple() - currently it sets > always "error in row processor". With such check, the callback > can set the error result itself. Currently only callbacks that > live inside libpq can set errors, but if we happen to expose > error-setting function in outside API, then the getAnotherTuple() > would already be ready for it. I'm pretty dissatisfied with the error reporting situation for row processors. You can't just decide not to solve it, which seems to be the current state of affairs. What I'm inclined to do is to add a "char **" parameter to the row processor, and say that when the processor returns -1 it can store an error message string there. If it does so, that's what we report. If it doesn't (which we'd detect by presetting the value to NULL), then use a generic "error in row processor" message. This is cheap and doesn't prevent the row processor from using some application-specific error reporting method if it wants; but it does allow the processor to make use of libpq's error mechanism when that's preferable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade incorrectly equates pg_default and database tablespace
On Thu, 2012-03-22 at 14:55 +0200, Ants Aasma wrote: > Hi, > > while working on a support case I stumbled upon a bug in pg_upgrade. > Upgrade fails with "No such file or directory" when a database is > moved to a non-default tablespace and contains a table that is moved > to pg_default. The cause seems to be that the following test > incorrectly equates empty spclocation with database tablespace: > > tblspace = PQgetvalue(res, relnum, i_spclocation); > /* if no table tablespace, use the database tablespace */ > if (strlen(tblspace) == 0) > tblspace = dbinfo->db_tblspace; > > Patch to fix this is attached. I confirmed this bug upgrading 9.1 to master, and that this patch fixes it. Thank you for the report! Patch looks good to me as well, with one very minor nitpick: the added comment is missing an apostrophe. Bruce, can you take a look at this? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql: tab-completion fails SET var=
(in hopes that the current changes to tab-completion will help to get this fixed) tab-completion goes wrong on SET setting=... example: If you want to input "set search_path=myschema;" without spaces around '=', and you try tab-completion halfway the schemaname: set search_path=mysch ^tab then the initial part of the schema name ('mysch') gets removed and replaced with 'TO'. So that you now have: set search_path=TO You lose the part of the schema name that was already input. With spaces that doesn't happen, but after all tabcompletion is about avoiding keystrokes and errors. Tab-completion is great and this search_path-annoyance happens to me all the time; my fingers can't seem to learn this exception. (pgsql 9.2devel, problem also in latest 9.1.3) Thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking context switches with perf record
On Fri, Mar 30, 2012 at 1:30 PM, Greg Stark wrote: > On Fri, Mar 30, 2012 at 5:27 PM, Robert Haas wrote: >> If you expand that branch of the call tree, you find that all of them >> are coming eventually from secure_read; the server is waiting for a >> new query from the client. This is, obviously, overhead we can't >> eliminate from this test; waiting for the client is part of the job. > > Fwiw this isn't necessarily true. How does the absolute number of > these events compare with the number of pg_bench operations done? Since perf is a stochastic profiler, I don't think there's any way to measure that using this tool. > If > it's significantly more the server could be reading on sockets while > there are partial commands there and it might be more efficient to > wait until the whole command is ready before reading. It may be that > this indicates that pg_bench is written in an inefficient way and it > should pipeline more commands but of course optimizing pg_bench is > kind of missing the point. Well, to some degree. I think most clients are going to send commands one at a time and wait for the responses. OTOH, if for example libpq is sending the data inefficiently in some way that wouldbe worth fixing; lots of people use libpq. > Also incidentally context switches is one of the things getrusage > shows so I'm still hoping to have that per-plan-node though that's > orthogonal to what this tool gives you with the call graph. Yeah. Actually, what is really bugging me is that I cannot find any way of getting a profile that reflects the *time* spent waiting rather than merely the *number* of waits. That seems like an obvious thing to want, and I cannot find a way to get it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking context switches with perf record
Greg Stark writes: > On Fri, Mar 30, 2012 at 5:27 PM, Robert Haas wrote: >> If you expand that branch of the call tree, you find that all of them >> are coming eventually from secure_read; the server is waiting for a >> new query from the client. This is, obviously, overhead we can't >> eliminate from this test; waiting for the client is part of the job. > Fwiw this isn't necessarily true. How does the absolute number of > these events compare with the number of pg_bench operations done? If > it's significantly more the server could be reading on sockets while > there are partial commands there and it might be more efficient to > wait until the whole command is ready before reading. It may be that > this indicates that pg_bench is written in an inefficient way and it > should pipeline more commands but of course optimizing pg_bench is > kind of missing the point. Well, that would be on libpq's head if it were true, but I believe we're fairly careful to not flush the output buffer until we're sending a complete message. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Fri, Mar 30, 2012 at 10:21 AM, Daniel Farina wrote: > Any enhancement here that can't be used with libpq via, say, drop-in > .so seems unworkable to me, and that's why any solution that is > basically proxying to the database is basically a non-starter outside > the very earliest prototyping stages. The tuple scanning and protocol > semantics can and even should remain the same, especially at first. I should add: proxying could work well if libpq had all the right hooks. The server could remain ignorant. Regardless, upstream changes result. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking context switches with perf record
On Fri, Mar 30, 2012 at 5:27 PM, Robert Haas wrote: > If you expand that branch of the call tree, you find that all of them > are coming eventually from secure_read; the server is waiting for a > new query from the client. This is, obviously, overhead we can't > eliminate from this test; waiting for the client is part of the job. Fwiw this isn't necessarily true. How does the absolute number of these events compare with the number of pg_bench operations done? If it's significantly more the server could be reading on sockets while there are partial commands there and it might be more efficient to wait until the whole command is ready before reading. It may be that this indicates that pg_bench is written in an inefficient way and it should pipeline more commands but of course optimizing pg_bench is kind of missing the point. Also incidentally context switches is one of the things getrusage shows so I'm still hoping to have that per-plan-node though that's orthogonal to what this tool gives you with the call graph. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Fri, Mar 30, 2012 at 9:11 AM, Andrew Dunstan wrote: > > > On 03/30/2012 11:41 AM, Robert Haas wrote: >> >> On Fri, Mar 30, 2012 at 10:55 AM, Dobes Vandermeer >> wrote: >>> >>> Well, in our case HTTP is a clear win (but not replacement) and SPDY a >>> potential one (even as a replacement). Even if SPDY is not widely >>> adopted >>> it could still replace FEBE if there's a clear advantage to using it, I >>> don't know enough to make the call right now. >> >> I can see that there are some advantages to having an HTTP interface >> to the database, but I think throwing our existing protocol out the >> window or relegating it to the status of a second-class citizen would >> be foolish. > > > > Right, I can't imagine it happening. And I wouldn't really be keen to add an > alternative protocol either. > > I could imagine a client which presented a SPDY interface to the world and > translated it into standard calls, possibly via libpq. Any enhancement here that can't be used with libpq via, say, drop-in .so seems unworkable to me, and that's why any solution that is basically proxying to the database is basically a non-starter outside the very earliest prototyping stages. The tuple scanning and protocol semantics can and even should remain the same, especially at first. What I really want is for a mechanism to push FEBE messages down into a transport in a extendable way (not unlike SSL, except this time sensitive to FEBE message boundaries), taking unmodified libpq binaries (but with a hook, of course...) into consideration. > It's well to remember that we are not a green fields project here. Perhaps not, which is a good reason to not put cart before horse -- I've tried to be very careful to suggest that real useful caching/proxy/routing software needs to appear that can work, unmodified, with hypothetical FEBE-over-SPDY communications, and accomplish some interesting use cases, and also very careful to suggest this is a *transport*-level feature. It think would be fair to demand: "Okay, when Apache or NGINX or a credible embedded web server in some language can accomplish something useful for us if we absorb this complexity, let's put this on the table". To have to write continuously write poolers, cachers and redirectors (for any reason: HA, partitioning, or even just clean-looking domain names...) is a burden on community resources. The criteria for success is to expand the number of projects attacking these problems and decrease the cost of writing one, and I think demanding a straw-man implementation of, say, a query cacher or pooler and seeing how that implementation feels is not unreasonable along with accepting the burden of maintenance. Robert Haas Wrote: >> HTTP is a non-trivial protocol that tends to impose lots >> of escaping and de-escaping overhead which is unnecessary for people >> who just want to connect to the database and run queries. I can >> completely understand that someone might want the ability to do GET >> /db/table/pk and have that return an answer very, very quickly, by >> bypassing the usual parser and planner and just firing off an >> index-scan and returning the results as JSON or somesuch. But I think >> it would be a serious mistake to assume that GET /q?q=myquery is going >> to come out better than what we have now in the general case. I don't think this holds for SPDY, which is the only way I even entertain thoughts of doing this. I want FEBE, at least at first and probably even then for quite a long while, just with framing and headers (during start-up) that other software will understand. Control frames are like this: +--+ |C| Version(15bits) | Type(16bits) | +--+ | Flags (8) | Length (24 bits) | +--+ | Data | +--+ Data frames are like this: +--+ |C| Stream-ID (31bits) | +--+ | Flags (8) | Length (24 bits) | +--+ | Data | +--+ The (XML, not HTML, I read this with "view source", or try "curl") RFC draft is here: http://mbelshe.github.com/SPDY-Specification/draft-mbelshe-spdy-00.xml -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uppercase tab completion keywords in psql?
On fre, 2012-03-23 at 07:52 -0700, David Fetter wrote: > On Thu, Mar 22, 2012 at 06:05:30PM -0400, Andrew Dunstan wrote: > > On 03/22/2012 05:49 PM, Bruce Momjian wrote: > > >Robert Haas and I are disappointed by this change. I liked the > > >fact that I could post nice-looking SQL queries without having to > > >use my capslock key (which I use as a second control key). Any > > >chance of reverting this change? > > > > > > > Should it be governed by a setting? > > Something like (upper|lower|preserve) ? How about this patch then? (There are actually four possible settings, see patch.) diff --git i/doc/src/sgml/ref/psql-ref.sgml w/doc/src/sgml/ref/psql-ref.sgml index b849101..be9d37d 100644 --- i/doc/src/sgml/ref/psql-ref.sgml +++ w/doc/src/sgml/ref/psql-ref.sgml @@ -2652,6 +2652,22 @@ bar +COMP_KEYWORD_CASE + + +Determines which letter case to use when completing an SQL key word. +If set to lower or upper, the +completed word will be in lower or upper case, respectively. If set +to preserve-lower +or preserve-upper (the default), the completed word +will be in the case of the word already entered, but words being +completed without anything entered will be in lower or upper case, +respectively. + + + + + DBNAME diff --git i/src/bin/psql/tab-complete.c w/src/bin/psql/tab-complete.c index 6f481bb..00d87d5 100644 --- i/src/bin/psql/tab-complete.c +++ w/src/bin/psql/tab-complete.c @@ -682,7 +682,7 @@ static char **complete_from_variables(char *text, const char *prefix, const char *suffix); static char *complete_from_files(const char *text, int state); -static char *pg_strdup_same_case(const char *s, const char *ref); +static char *pg_strdup_keyword_case(const char *s, const char *ref); static PGresult *exec_query(const char *query); static void get_previous_words(int point, char **previous_words, int nwords); @@ -3048,7 +3048,7 @@ create_or_drop_command_generator(const char *text, int state, bits32 excluded) { if ((pg_strncasecmp(name, text, string_length) == 0) && !(words_after_create[list_index - 1].flags & excluded)) - return pg_strdup_same_case(name, text); + return pg_strdup_keyword_case(name, text); } /* if nothing matches, return NULL */ return NULL; @@ -3335,9 +3335,9 @@ complete_from_list(const char *text, int state) if (completion_case_sensitive) return pg_strdup(item); else -/* If case insensitive matching was requested initially, return - * it in the case of what was already entered. */ -return pg_strdup_same_case(item, text); +/* If case insensitive matching was requested initially, adjust + * the case according to setting. */ +return pg_strdup_keyword_case(item, text); } } @@ -3374,9 +3374,9 @@ complete_from_const(const char *text, int state) if (completion_case_sensitive) return pg_strdup(completion_charp); else - /* If case insensitive matching was requested initially, return it - * in the case of what was already entered. */ - return pg_strdup_same_case(completion_charp, text); + /* If case insensitive matching was requested initially, adjust the + * case according to setting. */ + return pg_strdup_keyword_case(completion_charp, text); } else return NULL; @@ -3484,27 +3484,48 @@ complete_from_files(const char *text, int state) /* - * Make a pg_strdup copy of s and convert it to the same case as ref. + * Make a pg_strdup copy of s and convert the case according to + * COMP_KEYWORD_CASE variable, using ref as the text that was already entered. */ static char * -pg_strdup_same_case(const char *s, const char *ref) +pg_strdup_keyword_case(const char *s, const char *ref) { char *ret, *p; unsigned char first = ref[0]; + int tocase; + const char *varval; + + varval = GetVariable(pset.vars, "COMP_KEYWORD_CASE"); + if (!varval) + tocase = 0; + else if (strcmp(varval, "lower") == 0) + tocase = -2; + else if (strcmp(varval, "preserve-lower") == 0) + tocase = -1; + else if (strcmp(varval, "preserve-upper") == 0) + tocase = +1; + else if (strcmp(varval, "upper") == 0) + tocase = +2; + else + tocase = 0; - if (isalpha(first)) - { - ret = pg_strdup(s); - if (islower(first)) - for (p = ret; *p; p++) -*p = pg_tolower((unsigned char) *p); - else - for (p = ret; *p; p++) -*p = pg_toupper((unsigned char) *p); - return ret; - } + /* default */ + if (tocase == 0) + tocase = +1; + + ret = pg_strdup(s); + + if (tocase == -2 + || ((tocase == -1 || tocase == +1) && islower(first)) + || (tocase == -1 && !isalpha(first)) + ) + for (p = ret; *p; p++) + *p = pg_tolower((unsigned char) *p); else - return pg_strdup(s); + for (p = ret; *p; p++) + *p = pg_toupper((unsigned char) *p); + + return ret; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your sub
[HACKERS] checkpointer code behaving strangely on postmaster -T
I noticed while doing some tests that the checkpointer process does not recover very nicely after a backend crashes under postmaster -T (after all processes have been kill -CONTd, of course, and postmaster told to shutdown via Ctrl-C on its console). For some reason it seems to get stuck on a loop doing sleep(0.5s) In other case I caught it trying to do a checkpoint, but it was progressing a single page each time and then sleeping. In that condition, the checkpoint took a very long time to finish. Pressing Ctrl-C in the postmaster console at that point does not have any effect. -- Álvaro Herrera -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Fri, Mar 30, 2012 at 7:04 PM, Marko Kreen wrote: > Have you looked at the examples? PQgetResult() is pretty good hint > that one resultset finished... Ok, the demos are around this long thread and hard to find, so here is a summary of links: Original design mail: http://archives.postgresql.org/message-id/20120224154616.ga16...@gmail.com First patch with quick demos: http://archives.postgresql.org/message-id/20120226221922.ga6...@gmail.com Demos as diff: http://archives.postgresql.org/message-id/20120324002224.ga19...@gmail.com Demos/experiments/tests (bit messier than the demos-as-diffs): https://github.com/markokr/libpq-rowproc-demos Note - the point is that user *must* call PQgetResult() when resultset ends. Thus also the "PQgetRow() does not return errors" decision. I'll put this mail into commitfest page too, seems I've forgotten to put some mails there. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] tracking context switches with perf record
I've been playing around with "perf record" on the IBM POWER7 machine, and it's pretty cool. One of the things I don't like is that some of the tools don't produce very nice text reports that you can cut and paste into an email - it's kind of a text-based GUI where you can zoom around and look at things. Still, it's nice. One of the neat things I've just discovered is that you can use a call-graph report on context-switches to find out what user code is resulting in context-switching. For example, I ran a 32-client pgbench test at scale factor 300 for half an hour and captured data for 10 minutes relatively near the beginning. The partially expanded call tree for postgres looks like this: - 87.79% .schedule_timeout + 99.99% .unix_stream_recvmsg - 11.63% .SyS_semtimedop .SyS_ipc syscall_exit semop - PGSemaphoreLock - 98.48% LWLockAcquire - 53.11% XLogInsert + 48.39% heap_update + 18.15% heap_insert + 16.36% RecordTransactionCommit + 15.67% log_heap_clean + 0.98% _bt_insertonpg + 18.71% ProcArrayEndTransaction - 14.77% SimpleLruReadPage_ReadOnly TransactionIdGetStatus TransactionLogFetch - TransactionIdDidCommit + 53.52% HeapTupleSatisfiesVacuum + 45.00% HeapTupleSatisfiesMVCC + 0.82% HeapTupleHeaderAdvanceLatestRemovedXid + 6.08% TransactionIdSetPageStatus + 2.63% LockBuffer + 1.92% GetSnapshotData + 1.51% SimpleLruReadPage + 1.52% ProcSleep + 0.50% do_work So what you can see here is that all the context switches that happened in postgres with the result either of a .schedule_timeout or a .SyS_semtimedop. 87.79% of the context switches came from .schedule_timeout, and 99.99% of those came from unix_stream_recvmsg. If you expand that branch of the call tree, you find that all of them are coming eventually from secure_read; the server is waiting for a new query from the client. This is, obviously, overhead we can't eliminate from this test; waiting for the client is part of the job. The other 11.63% of the context switches result from PGSemaphoreLock calling semop. Of those, 98.48% come from LWLockAcquire and 1.52% come from ProcSleep. The ProcSleep calls are all from blocking while attempting to acquire a heavyweight lock; and nearly all of them are from conflicting tuple locks. However, they account for such a small portion of the overall context switches that they're probably not worth worrying about. The more interesting waits, in my view anyway, are the ones that come from LWLockAcquire, which account for nearly all of the semaphore sleeps. As you can see, XLogInsert accounts for over half of those, and ProcArrayEndTransaction and SimpleLruReadPage_ReadOnly account for most of the rest. Just out of curiosity, I expanded a couple of those another level, and you can see what's going on there, too. I was slightly surprised by the number of calls to log_heap_clean, since I wouldn't have expected vacuuming to generate that much work; it turns out that 99.9% of those are HOT pruning events happening during query execution. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On 03/30/2012 11:41 AM, Robert Haas wrote: On Fri, Mar 30, 2012 at 10:55 AM, Dobes Vandermeer wrote: Well, in our case HTTP is a clear win (but not replacement) and SPDY a potential one (even as a replacement). Even if SPDY is not widely adopted it could still replace FEBE if there's a clear advantage to using it, I don't know enough to make the call right now. I can see that there are some advantages to having an HTTP interface to the database, but I think throwing our existing protocol out the window or relegating it to the status of a second-class citizen would be foolish. Right, I can't imagine it happening. And I wouldn't really be keen to add an alternative protocol either. I could imagine a client which presented a SPDY interface to the world and translated it into standard calls, possibly via libpq. It's well to remember that we are not a green fields project here. HTTP is a non-trivial protocol that tends to impose lots of escaping and de-escaping overhead which is unnecessary for people who just want to connect to the database and run queries. I can completely understand that someone might want the ability to do GET /db/table/pk and have that return an answer very, very quickly, by bypassing the usual parser and planner and just firing off an index-scan and returning the results as JSON or somesuch. But I think it would be a serious mistake to assume that GET /q?q=myquery is going to come out better than what we have now in the general case. Indeed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Fri, Mar 30, 2012 at 11:59:12AM -0400, Tom Lane wrote: > Marko Kreen writes: > > On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote: > >> Marko Kreen writes: > >>> Second conclusion is that current dblink row-processor usage is broken > >>> when user uses multiple SELECTs in SQL as dblink uses plain PQexec(). > > >> Yeah. Perhaps we should tweak the row-processor callback API so that > >> it gets an explicit notification that "this is a new resultset". > >> Duplicating PQexec's behavior would then involve having the dblink row > >> processor throw away any existing tuplestore and start over when it > >> gets such a call. > >> > >> There's multiple ways to express that but the most convenient thing > >> from libpq's viewpoint, I think, is to have a callback that occurs > >> immediately after collecting a RowDescription message, before any > >> rows have arrived. So maybe we could express that as a callback > >> with valid "res" but "columns" set to NULL? > >> > >> A different approach would be to add a row counter to the arguments > >> provided to the row processor; then you'd know a new resultset had > >> started if you saw rowcounter == 0. This might have another advantage > >> of not requiring the row processor to count the rows for itself, which > >> I think many row processors would otherwise have to do. > > > Try to imagine how final documentation will look like. > > > Then imagine documentation for PGrecvRow() / PQgetRow(). > > What's your point, exactly? PGrecvRow() / PQgetRow() aren't going to > make that any better as currently defined, because there's noplace to > indicate "this is a new resultset" in those APIs either. Have you looked at the examples? PQgetResult() is pretty good hint that one resultset finished... -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Marko Kreen writes: > On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote: >> Marko Kreen writes: >>> Second conclusion is that current dblink row-processor usage is broken >>> when user uses multiple SELECTs in SQL as dblink uses plain PQexec(). >> Yeah. Perhaps we should tweak the row-processor callback API so that >> it gets an explicit notification that "this is a new resultset". >> Duplicating PQexec's behavior would then involve having the dblink row >> processor throw away any existing tuplestore and start over when it >> gets such a call. >> >> There's multiple ways to express that but the most convenient thing >> from libpq's viewpoint, I think, is to have a callback that occurs >> immediately after collecting a RowDescription message, before any >> rows have arrived. So maybe we could express that as a callback >> with valid "res" but "columns" set to NULL? >> >> A different approach would be to add a row counter to the arguments >> provided to the row processor; then you'd know a new resultset had >> started if you saw rowcounter == 0. This might have another advantage >> of not requiring the row processor to count the rows for itself, which >> I think many row processors would otherwise have to do. > Try to imagine how final documentation will look like. > Then imagine documentation for PGrecvRow() / PQgetRow(). What's your point, exactly? PGrecvRow() / PQgetRow() aren't going to make that any better as currently defined, because there's noplace to indicate "this is a new resultset" in those APIs either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote: > Marko Kreen writes: > > My conclusion is that row-processor API is low-level expert API and > > quite easy to misuse. It would be preferable to have something more > > robust as end-user API, the PQgetRow() is my suggestion for that. > > Thus I see 3 choices: > > > 1) Push row-processor as main API anyway and describe all dangerous > >scenarios in documentation. > > 2) Have both PQgetRow() and row-processor available in , > >PQgetRow() as preferred API and row-processor for expert usage, > >with proper documentation what works and what does not. > > 3) Have PQgetRow() in , move row-processor to . > > I still am failing to see the use-case for PQgetRow. ISTM the entire > point of a special row processor is to reduce the per-row processing > overhead, but PQgetRow greatly increases that overhead. No, decreasing CPU overhead is minor win. I guess in realistic application, like dblink, you can't even measure the difference. The *major* win comes from avoiding buffering of all rows in PGresult. Ofcourse, this is noticeable only with bigger resultsets. I guess such buffering pessimizes memory usage: code always works on cold cache. And simply keeping RSS low is good for long-term health of a process. Second major win is avoiding the need to use cursor with small chunks to access resultset of unknown size. Thus stalling application until next block arrives from network. The "PGresult *PQgetRow()" is for applications that do not convert rows immediately to some internal format, but keep using PGresult. So they can be converted to row-by-row processing with minimal changes to actual code. Note that the PGrowValue is temporary struct that application *must* move data away from. If app internally uses PGresult, then it's pretty annoying to invent a new internal format for long-term storage. But maybe I'm overestimating the number of such applications. > And it doesn't > reduce complexity much either IMO: you still have all the primary risk > factors arising from processing rows in advance of being sure that the > whole query completed successfully. It avoids the complexity of: * How to pass error from callback to upper code * Needing to know how exceptions behave * How to use early exit to pass rows to upper code one-by-one, (by storing the PGresult and PGrowValue in temp place and later checking their values) * How to detect that new resultset has started. (keeping track of previous PGresult or noting some quirky API behaviour we may invent for such case) * Needing to make sure the callback does not leak to call-sites that expect regular libpq behaviour. ("Always call PQregisterRowProcessor(db, NULL, NULL) after query finishes" ) ["But now I'm in exception handler, how do I find the connection?"] I've now reviewed the callback code and even done some coding with it and IMHO it's too low-level to be end-user-API. Yes, the "query-may-still-fail" complexity remains, but thats not unlike the usual "multi-statement-transaction-is-not-guaranteed-to-succeed" complexity. Another compexity that remains is "how-to-skip-current-resultset", but that is a problem only on sync connections and the answer is simple - "call PQgetResult()". Or "call PQgetRow/PQrecvRow" if user wants to avoid buffering. > Plus it conflates "no more data" > with "there was an error receiving the data" or "there was an error on > the server side". Well, current PQgetRow() is written with style: "return only single-row PGresult, to see errors user must call PQgetResult()". Basically so that user it forced to fall back familiar libpq usage pattern. It can be changed, so that PQgetRow() returns also errors. Or we can drop it and just keep PQrecvRow(). > PQrecvRow alleviates the per-row-overhead aspect of > that but doesn't really do a thing from the complexity standpoint; > it doesn't look to me to be noticeably easier to use than a row > processor callback. > I think PQgetRow and PQrecvRow just add more API calls without making > any fundamental improvements, and so we could do without them. "There's > more than one way to do it" is not necessarily a virtue. Please re-read the above list of problematic situations that this API fixes. Then, if you still think that PQrecvRow() is pointless, sure, let's drop it. We can also postpone it to 9.3, to poll users whether they want easier API, or is maximum performance important. (PQrecvRow() *does* have few cycles of overhead compared to callbacks.) Only option that we have on the table for 9.2 but not later is moving the callback API to . > > Second conclusion is that current dblink row-processor usage is broken > > when user uses multiple SELECTs in SQL as dblink uses plain PQexec(). > > Yeah. Perhaps we should tweak the row-processor callback API so that > it gets an explicit notification that "this is a new resultset". > Duplicating PQexec's behavior would then involve having the d
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Fri, Mar 30, 2012 at 10:55 AM, Dobes Vandermeer wrote: > Well, in our case HTTP is a clear win (but not replacement) and SPDY a > potential one (even as a replacement). Even if SPDY is not widely adopted > it could still replace FEBE if there's a clear advantage to using it, I > don't know enough to make the call right now. I can see that there are some advantages to having an HTTP interface to the database, but I think throwing our existing protocol out the window or relegating it to the status of a second-class citizen would be foolish. HTTP is a non-trivial protocol that tends to impose lots of escaping and de-escaping overhead which is unnecessary for people who just want to connect to the database and run queries. I can completely understand that someone might want the ability to do GET /db/table/pk and have that return an answer very, very quickly, by bypassing the usual parser and planner and just firing off an index-scan and returning the results as JSON or somesuch. But I think it would be a serious mistake to assume that GET /q?q=myquery is going to come out better than what we have now in the general case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
Robert Haas writes: > Oh, right: I remember that now. I still think it's a bad way to do > it, because the trigger potentially has a lot of work to do to > reconstruct a working command string, and it still ends up getting > executed by the wrong user. For CREATE EXTENSION it's not that bad, That's true, I'm only kind of saying that the INSTEAD OF keyword still makes sense (instead of security_checks). I agree that the feature is simpler to use the way to propose it. >> context = 'DROP TYPE' >> tag = 'DROP FUNCTION' > > I'm not sure I understand how you're using the words "context" and Yeah context is not explicit, we could call that "toplevel": the command tag of the command that the user typed. When toplevel is null, the event trigger is fired on a command the user sent, when it's not null, the trigger is fired on some inner command operation. > "tag". I think for a drop trigger I would want the function to > receive this information: type of object dropped, OID of object > dropped, column number in the case of a column drop, flag indicating > whether it's a toplevel drop or a cascaded drop. I wouldn't object to > also making the currently-in-context toplevel command tag available, > but I think most drop triggers wouldn't really care, so I wouldn't > personally spend much implementation effort on it if it turns out to > be hard. I'm not sure it would be hard as I'm only seeing a single depth possible here, so a single per-backend string static variable would do. > But in general, I don't really know what a "proper" subcommand is or > why some subcommands should be more proper than others, or why we > should even be concerned about whether something is a subcommand at > all. I think it's fine and useful to have triggers that fire in those > kinds of places, but I don't see why we should limit ourselves to > that. For applications like replication, auditing, and enhanced > security, the parse tree and subcommand/non-subcommand status of a > particular operation are irrelevant. What you need is an exact Not really. When replicating you could perfectly say that you only replicate the toplevel DROP because the replica will also do the cascade dance and you might have decided not to replicate all related objects on the other side. The information you need really want not to miss is when only the cascaded object is part of the replication, not the main one. That was not covered by my previous patch but now we have a way to cover it. > description of the operation that got performed (e.g. the default on > table X column Y got dropped); you might be able to reverse-engineer > that from the parse tree, but it's much better to have the system pass > you the information you need more directly. Certainly, there are > cases where you might want to have the parse tree, or even the raw > command text, available, but I'm not even convinced that that those > cases will be the most commonly used ones unless, of course, they're > the only ones we offer, in which case everyone will go down that path > by necessity. There are far too many variants and cases of our command to be able to extract their parameters in a flat way (a bunch of variables compared to a nested description ala json or xml), and I don't think such a flat representation is going to be much better than the parse tree. Now, we will later be able to offer a normalized rewritten command string from the parse tree to the use, but I don't see us adding support for that from cascaded drops, one other reason why I like to expose them as sub commands. > Again, I'm not understanding the distinction between toplevel events > and sub-events. I don't see any need for such a distinction. I think > there are just events, and some of them happen at command start/end > and others happen somewhere in the middle. As long as it's a safe and > useful place to fire a trigger, who cares? I guess you're head is too heavily in the code side of things as opposed to the SQL user view point. Maybe my attempt to conciliate both views is not appropriate, but I really do think it is. >> Given the scope of this mini expression language, we can easily bypass >> calling the executor in v1 here, and reconsider later if we want to >> allow calling a UDF in the WHEN clause… I don't think it's an easy >> feature to add in, though. > > Or a necessary one. AFAICS, the main benefit of WHEN clauses on Exactly. > regular triggers is that you can prevent the AFTER trigger queue from > getting huge, and maybe a save a little on the cost of invoking a > trigger function just to exit again. But neither of those should be > relevant here - nobody does that much DDL, and anybody writing command > triggers should understand that this is advanced magic not intended > for beginners. Wizards below level 10 need not apply. Here it's only a facility to manage your event trigger code organization, and I insisted in having it in the syntax because in the event trigger grammar
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Fri, Mar 30, 2012 at 9:57 PM, Merlin Moncure wrote: > On Thu, Mar 29, 2012 at 10:04 AM, Andrew Dunstan > wrote: > > 1. I've been in discussion with some people about adding simple JSON > extract > > functions. We already have some (i.e. xpath()) for XML. > > Your json work is a great start in marrying document level database > features with a relational backend. My take is that storing rich data > inside the database in json format, while tempting, is generally a > mistake. Unless the document is black box it should be decomposed and > stored relationally and marked back up into a document as it goes out > the door. > I found storing documents (with no schema) in CouchDB very sexy. For analytics purposes it's very handy to throw whatever data you've got into a document and save it to the database and sort out the schema later, when you have a need for it. It could be you want to allow hundreds of different fields in the document, and the document has a hierarchal structure, and so on ... and yet there's no interest in spending time figuring out how to map all that effectively into a SQL schema. Some things might start out a black box but become interesting for reporting purposes later. While I was doing this I always thought this would have been a better approach for my previous project, an accounting application. If I could just have stored entities like invoice & customer as a single document that is inserted, updated, etc. atomically it would be a lot simpler and faster than having to break things out into columns and rows spread over various tables. Some fields are just "information" that isn't searched for or used relationally but ends up getting a field (or a whole bunch of fields, like your address) in the DB schema anyway. Adding, removing, and changing columns is always scary to me in the SQL database - when there's no schema you can fix these things as part of the definition of your "view" instead. So I kind of think the document database kind of bridges the gap between an OODBMS and the RDBMS because the document is like a little cluster of objects and then you populate your relations by mapping a view of those documents and use that for analysis and search. What I'm saying is that jsonpath probably isn't the whole story: > another way of bulk moving json into native backend structures without > parsing would also be very helpful. For example, being able to cast a > json document into a record or a record array would be just amazing. > Hmm whatever that exactly is, it does sound cool!
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Fri, Mar 30, 2012 at 4:30 PM, Daniel Farina wrote: > On Thu, Mar 29, 2012 at 10:55 PM, Dobes Vandermeer > wrote: > >> Virtual hosts. Same port.>> I think SPDY or like-protocols [...] give a > crisp treatment to > >> interactive, stateful workloads involving > >> > >> back-and-forth between client and server with multiplexing, fixing > >> some problems with the hacks in HTTP-land from before. > > > > It sounds like at some level you're really talking about replacing the > > built-in protocol with SPDY because SPDY is possibly a better baseline > than > > updating the existing protocol. That's an interesting idea, I think this > > project could evolve in that direction if there's demand for it. > > If only so there is a smaller set of arbitrary decisions to make about > how to delimit messages...but if SPDY doesn't get widely deployed, or > exacts an unacceptable performance penalty, it is game over. > Well, in our case HTTP is a clear win (but not replacement) and SPDY a potential one (even as a replacement). Even if SPDY is not widely adopted it could still replace FEBE if there's a clear advantage to using it, I don't know enough to make the call right now.
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On 03/30/2012 09:57 AM, Merlin Moncure wrote: What I'm saying is that jsonpath probably isn't the whole story: another way of bulk moving json into native backend structures without parsing would also be very helpful. For example, being able to cast a json document into a record or a record array would be just amazing. It should be possible. After all, hstore's populate_record() does that. If we married that logic up with the json parser we should be half way there. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Arun Chaitanya writes: > The link to the paper is > http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf Given the authorship of that paper, I'd have to wonder whether Microsoft has filed for any patents regarding these ideas. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Thanks Robert, Yes. I think I am being over ambitious as I never had any Open Source development experience. Anyways, please go through the idea. I have posted the link to the paper in on of the replies. Please, suggest me other options which I can take up as a GSOC 2012 project. On Fri, Mar 30, 2012 at 5:34 PM, Robert Haas wrote: > On Fri, Mar 30, 2012 at 7:33 AM, Arun Chaitanya > wrote: >> I wanted to take up this as a GSOC 2012 project. > > This would be a great query planner optimization but the chances of > getting it done in one summer as a GSoC project seem to me to be nil. > You've never had a patch accepted before; picking a massive > reorganization of the query planner as your first project is not going > to work out well. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Thanks a lot Heikki. I have already posted an example in the mail. The link to the paper is http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf Hope this helps, Arun On Fri, Mar 30, 2012 at 5:32 PM, Heikki Linnakangas wrote: > (off-list) > > You'll want to post a link to the paper, otherwise people who are not GSoC > mentors will have no idea what you're talking about ;-). Posting an example > query and access plans would illustrate the point, too. > > > On 30.03.2012 14:33, Arun Chaitanya wrote: >> >> Hi, >> >> I wanted to take up this as a GSOC 2012 project. >> >> SQL supports nested queries. When the inner query contains a >> correlation variable the present optimizer takes an iterative >> execution plan. If the inner query scans over a relation, the >> iterative plan chosen can be sub-optimal. >> >> The goal of this project is to enable De-correlation for all possible >> cases. The iterative execution plan can be converted to a set oriented >> plan by taking a join over the base relations involved. Sufficient >> work has already been done in this area and has been implemented in >> SQL Server. >> >> The changes required to incorporate the above mentioned strategy is in >> rewriting phase to the best of my knowledge. The key idea is to >> introduce the APPLY operator in the raw parse tree. In the above >> mentioned Papers, the author has mentioned the process of removing the >> apply. The author has proposed a set of rules which will allow us to >> achieve the goal. The present postgresql optimizer community has done >> some work in these lines for simple subqueries involving =,> ,< in >> the predicates [ I observed it by seeing the result of EXPLAIN for >> relevant queries ]. The optimization is not done for subqueries >> containing aggregate queries and existential and containment queries. >> >> An example query from TPCH benchmark discussed by the author: >> >> select c_custkey >> from customer >> where 100< (select sum(o_totalprice) >> from orders >> where o_custkey = c_custkey) >> >> In the above case, c_custkey is a correlation variable (parameterized) >> coming from the outer query. Hence in the present system, the inner >> query is executed as many times as the tuples in the customer >> relation. As the subQuery involves a scan over orders relation, the >> total I/O cost involved is pretty high. >> >> Using the transformation proposed by the author, the query can be >> re-written as >> >> select c_custkey >> from customer left outer join >> orders on o_custkey = c custkey >> group by c_custkey >> having 100< sum(o_totalprice) >> >> This allows the optimizer to chose a better plan for left outer join >> and avoid iterative execution. The idea here is not to get a rewritten >> query as output but to generate a plan tree that does the same as the >> above query. >> >> Regards, >> Arun >> > > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
On Fri, Mar 30, 2012 at 4:32 AM, Dimitri Fontaine wrote: > I did that another way in previous incarnations of the patch, which was > to allow for INSTEAD OF event trigger backed by a SECURITY DEFINER > function. When the extension is whitelisted, prevent against recursion > then CREATE EXTENSION in the security definer function, then signal that > the execution should now be aborted. > > That was too dangerous given the lack of policy about where exactly the > user code is fired, but I think we could now implement that for some of > the event timing specs we're listing. Only some of them, I guess only > those that are happening before we lock the objects. Oh, right: I remember that now. I still think it's a bad way to do it, because the trigger potentially has a lot of work to do to reconstruct a working command string, and it still ends up getting executed by the wrong user. For CREATE EXTENSION it's not that bad, because the arguments to the command are so simple, but of course any time we extend the CREATE EXTENSION syntax, the trigger needs to know about it too whether it's security-relevant or not, and doing something similar with, say, ALTER TABLE would be a ridiculously complicated. I think there is a use case for what you called an INSTEAD OF trigger, but I don't believe in this one. It seems to me that there's a lot of power in being able to *just* intercept the security decision and then let the rest of the command go about its business. Of course, you have to avoid getting security checks (like, you must own the table in order to drop a column) with integrity checks (like, you can't drop a column from pg_class) but I think that's not very hard to get right. >> More generally, my thought on the structure of this is that you're >> going to have certain toplevel events, many of which will happen at >> only a single place in the code, like "an object got dropped" or "a >> DDL command started" or "a DDL command ended". So we give those >> names, like sql_drop, ddl_command_start, and ddl_command_end. Inside > > I really dislike mixing sql_drop and ddl_command_start as being the same > kind of objects here, even if I can bend my head in the right angle and > see that it's a fair view when looking at how it's implemented. I can't > see a way to explain that to users without having to explain them how > drop cascade is implemented. > > So my proposal here is to “fake” a “proper“ subcommand thanks to the new > context variable. If you DROP TYPE foo CASCADE and that in turn drops a > function foo_in(), then an event trigger is fired with > > context = 'DROP TYPE' > tag = 'DROP FUNCTION' > > Same idea when you DROP TABLE … CASCADE and a SEQUENCE and a bunch of > index need to disappear too, you get an usual event trigger fired with > the context set to 'DROP TABLE' this time. > > I don't think we need to arrange for explicitly publishing the context > specific information here. If we need to, we have to find the right > timing spec where we can guarantee still being in the top level command > and where we already have the details filled in, then users can attach a > trigger here and register the information for themselves. I'm not sure I understand how you're using the words "context" and "tag". I think for a drop trigger I would want the function to receive this information: type of object dropped, OID of object dropped, column number in the case of a column drop, flag indicating whether it's a toplevel drop or a cascaded drop. I wouldn't object to also making the currently-in-context toplevel command tag available, but I think most drop triggers wouldn't really care, so I wouldn't personally spend much implementation effort on it if it turns out to be hard. But in general, I don't really know what a "proper" subcommand is or why some subcommands should be more proper than others, or why we should even be concerned about whether something is a subcommand at all. I think it's fine and useful to have triggers that fire in those kinds of places, but I don't see why we should limit ourselves to that. For applications like replication, auditing, and enhanced security, the parse tree and subcommand/non-subcommand status of a particular operation are irrelevant. What you need is an exact description of the operation that got performed (e.g. the default on table X column Y got dropped); you might be able to reverse-engineer that from the parse tree, but it's much better to have the system pass you the information you need more directly. Certainly, there are cases where you might want to have the parse tree, or even the raw command text, available, but I'm not even convinced that that those cases will be the most commonly used ones unless, of course, they're the only ones we offer, in which case everyone will go down that path by necessity. >> your trigger procedure, the set of magic variables that is available >> will depend on which toplevel event you set the trigger on, but >> hopefully al
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Thu, Mar 29, 2012 at 10:04 AM, Andrew Dunstan wrote: > 1. I've been in discussion with some people about adding simple JSON extract > functions. We already have some (i.e. xpath()) for XML. I've built a couple of applications that push data in and out of xml via manual composition going out and xpath coming in. TBH, I found this to be a pretty tedious way of developing a general application structure and a couple of notches down from the more sql driven approach. Not that jsonpath/xpath aren't wonderful functions -- but I thing for general information passing there's a better way. Your json work is a great start in marrying document level database features with a relational backend. My take is that storing rich data inside the database in json format, while tempting, is generally a mistake. Unless the document is black box it should be decomposed and stored relationally and marked back up into a document as it goes out the door. This is why brevity and flexibility of syntax is so important when marshaling data in and out of transport formats. It encourages people to take the right path and get the best of both worlds -- a rich backend with strong constraints that can natively speak such that writing data driven web services is easy. What I'm saying is that jsonpath probably isn't the whole story: another way of bulk moving json into native backend structures without parsing would also be very helpful. For example, being able to cast a json document into a record or a record array would be just amazing. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: pg_test_timing utility, to measure clock monotonicity and timing
On Wed, Mar 28, 2012 at 10:43 AM, Fujii Masao wrote: > On Wed, Mar 28, 2012 at 9:19 PM, Robert Haas wrote: >> On Tue, Mar 27, 2012 at 10:10 PM, Fujii Masao wrote: >>> On Wed, Mar 28, 2012 at 5:17 AM, Robert Haas wrote: pg_test_timing utility, to measure clock monotonicity and timing cost. >>> >>> When I compiled this, I got a compiler warning. Attached patch >>> silences the warning. >> >> Unfortunately, that *produces* a warning on my machine. Normally, I >> think we handle this using INT64_FORMAT, but the fact that it's %10ld >> here and not just %lld makes that awkward. I guess we maybe need to >> insert some kludgy workaround here - write it into a separate buffer, >> and then blank-pad it, or something like that. > > This seems a simplest workaround. How about attached patch? Thanks, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
On Fri, Mar 30, 2012 at 7:33 AM, Arun Chaitanya wrote: > I wanted to take up this as a GSOC 2012 project. This would be a great query planner optimization but the chances of getting it done in one summer as a GSoC project seem to me to be nil. You've never had a patch accepted before; picking a massive reorganization of the query planner as your first project is not going to work out well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Hi, I wanted to take up this as a GSOC 2012 project. SQL supports nested queries. When the inner query contains a correlation variable the present optimizer takes an iterative execution plan. If the inner query scans over a relation, the iterative plan chosen can be sub-optimal. The goal of this project is to enable De-correlation for all possible cases. The iterative execution plan can be converted to a set oriented plan by taking a join over the base relations involved. Sufficient work has already been done in this area and has been implemented in SQL Server. The changes required to incorporate the above mentioned strategy is in rewriting phase to the best of my knowledge. The key idea is to introduce the APPLY operator in the raw parse tree. In the above mentioned Papers, the author has mentioned the process of removing the apply. The author has proposed a set of rules which will allow us to achieve the goal. The present postgresql optimizer community has done some work in these lines for simple subqueries involving =, > , < in the predicates [ I observed it by seeing the result of EXPLAIN for relevant queries ]. The optimization is not done for subqueries containing aggregate queries and existential and containment queries. An example query from TPCH benchmark discussed by the author: select c_custkey from customer where 100 < (select sum(o_totalprice) from orders where o_custkey = c_custkey) In the above case, c_custkey is a correlation variable (parameterized) coming from the outer query. Hence in the present system, the inner query is executed as many times as the tuples in the customer relation. As the subQuery involves a scan over orders relation, the total I/O cost involved is pretty high. Using the transformation proposed by the author, the query can be re-written as select c_custkey from customer left outer join orders on o_custkey = c custkey group by c_custkey having 100 < sum(o_totalprice) This allows the optimizer to chose a better plan for left outer join and avoid iterative execution. The idea here is not to get a rewritten query as output but to generate a plan tree that does the same as the above query. Regards, Arun -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Tom Lane writes: >> That's how I did it first, but Alvaro opposed to that because it allows >> for more than one extension to provide for the same feature name. >> http://archives.postgresql.org/pgsql-hackers/2012-03/msg01425.php > > Right, but the question that has to be considered is how often would > that be intentional as opposed to an undesirable name collision. > I think Hitoshi was right upthread that it will seldom if ever be > the case that somebody is independently reimplementing somebody > else's API, so the use-case for intentional substitution seems thin. I reverted that change and we're now back to: Table "pg_catalog.pg_extension_feature" Column | Type | Modifiers +--+--- extoid | oid | not null extfeature | name | not null Indexes: "pg_extension_feature_index" UNIQUE, btree (extoid, extfeature) "pg_extension_feature_oid_index" UNIQUE, btree (oid) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support extension-provides.v7.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] poll: CHECK TRIGGER?
Hello 2012/3/28 Heikki Linnakangas : > Ok, seems that the API issue is settled, so I'm now looking at the code > actually doing the checking. My first impression is that this is a lot of > code. Can we simplify it? > I played with this and It is not be reduced without darkening current code of pl_exec.c. So I moved code related to checking from files to new file pl_check.c. This code is relative large - about 50 kb, but it is relative simple and I hope it is readable. I afraid so this cannot be reduced by reuse with other pl_func.c (significantly). Recursive iteration over nodes is relative not big part of this patch (~25%) and general stmt walker doesn't help too much. > Since this is deeply integrated into the PL/pgSQL interpreter, I was > expecting that this would run through the normal interpreter, in a special > mode that skips all the actual execution of queries, and shortcuts all loops > and other control statements so that all code is executed only once. That > would mean sprinkling some "if (check_only)" code into the normal exec_* > functions. I'm not sure how invasive that would be, but it's worth > considering. I think you would be able to more easily catch more errors that > way, and the check code would stay better in sync with the execution code. -1 there are a few places, that are very difficult: basic block with exception handling - exception handlers, CASE stmt, Other issue is increasing of complexity some routines like exec_eval* > > Another thought is that check_stmt() and all its subroutines are very > similar to the plpgsql_dumptree() code. Would it make sense to merge those? > You could have an output mode, in addition to the xml and plain-text > formats, that would just dump the whole tree like plpgsql_dumptree() does. > It is difficult now - without changes in plpgsql_stmt_if, plpgsql_stmt_case and plpgsql_stmt_block is not possible to write general walker that is usable for checking and dumptree. It needs redesign of these nodes first. > In prepare_expr(), you use a subtransaction to catch any ERRORs that happen > during parsing the expression. That's a good idea, and I think many of the > check_* functions could be greatly simplified by adopting a similar > approach. Just ereport() any errors you find, and catch them at the > appropriate level, appending the error to the output string. Your current > approach of returning true/false depending on whether there was any errors > seems tedious. It cannot be implemented in AST interpret. Without removing some requested functionality - fatal_errors. > > If you create a function with an invalid body (ie. set > check_function_bodies=off; create function ... $$ bogus $$;) , > plpgsql_check_function() still throws an error. It's understandable that it > cannot do in-depth analysis if the function cannot be parsed, but I would > expect the syntax error to be returned as a return value like other errors > that it complains about, not thrown as a hard ERROR. That would make it more > useful to bulk-check all functions in a database with something like "select > plpgsql_check_function(oid) from pg_class". As it is, the checking stops at > the first invalid function with an error. done postgres=> select plpgsql_check_function('sss'::regproc, 0); plpgsql_check_function - error:42601:syntax error at or near "adasdfsadf" Query: adasdfsadf -- ^ Context: compilation of PL/pgSQL function "sss" near line 1 (4 rows) > > PS. I think plpgsql_check_function() belongs in pl_handler.c done Regards Pavel Stehule > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com plpgsql_check_function2-2012-03-30.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
Robert Haas writes: > I'm thinking of things like extension whitelisting. When some > unprivileged user says "CREATE EXTENSION harmless", and harmless is > marked as superuser-only, we might like to have a hook that gets > called *at permissions-checking time* and gets to say, oh, well, that > extension is on the white-list, so we're going to allow it. I think > you can come up with similar cases for other commands, where in > general the operation is restricted to superusers or database owners > or table owners but in specific cases you want to allow others to do > it. I did that another way in previous incarnations of the patch, which was to allow for INSTEAD OF event trigger backed by a SECURITY DEFINER function. When the extension is whitelisted, prevent against recursion then CREATE EXTENSION in the security definer function, then signal that the execution should now be aborted. That was too dangerous given the lack of policy about where exactly the user code is fired, but I think we could now implement that for some of the event timing specs we're listing. Only some of them, I guess only those that are happening before we lock the objects. I would then prefer using the INSTEAD OF words that are way more easy to grasp than AT. > CREATE EVENT TRIGGER name ON event_name (event_subtype_name [, ...]) > EXECUTE PROCEDURE function_name(args); create event trigger prohibit_some_ddl preceding when tag in ('CREATE TABLE', 'ALTER TABLE') execute procedure throw_an_error(); > > I guess that would make sense if you think there would ever be more > than one choice for . I'm not immediately seeing a > use case for that, though - I was explicitly viewing the syntax foo So, the variables in question are tag, objectid, objectname, schemaname and from a very recent email context. On reflexion, I think the variable here would only be either tag or context, and that's it. > More generally, my thought on the structure of this is that you're > going to have certain toplevel events, many of which will happen at > only a single place in the code, like "an object got dropped" or "a > DDL command started" or "a DDL command ended". So we give those > names, like sql_drop, ddl_command_start, and ddl_command_end. Inside I really dislike mixing sql_drop and ddl_command_start as being the same kind of objects here, even if I can bend my head in the right angle and see that it's a fair view when looking at how it's implemented. I can't see a way to explain that to users without having to explain them how drop cascade is implemented. So my proposal here is to “fake” a “proper“ subcommand thanks to the new context variable. If you DROP TYPE foo CASCADE and that in turn drops a function foo_in(), then an event trigger is fired with context = 'DROP TYPE' tag = 'DROP FUNCTION' Same idea when you DROP TABLE … CASCADE and a SEQUENCE and a bunch of index need to disappear too, you get an usual event trigger fired with the context set to 'DROP TABLE' this time. I don't think we need to arrange for explicitly publishing the context specific information here. If we need to, we have to find the right timing spec where we can guarantee still being in the top level command and where we already have the details filled in, then users can attach a trigger here and register the information for themselves. > your trigger procedure, the set of magic variables that is available > will depend on which toplevel event you set the trigger on, but > hopefully all firings of that toplevel event can provide the same > magic variables. For example, at ddl_command_start time, you're just > gonna get the command tag, but at ddl_command_end time you will get > the command tag plus maybe some other stuff. With my proposal above, you could get the same set of information when being called as a toplevel event or a subevent (one where the context is not null). That would mean adding object name and schema name lokkups in the drop cascade code, though. We can also decide not to do that extra lookup and just publish the object id which we certainly do have. This way, the timing spec of a sub-event can still be of the same kind as the top-level event ones, we still have before and after lock entry points, same with lookup if we add that feature, etc. > Now, we COULD stop there. I mean, we could document that you can > create a trigger on ddl_command_start and every DDL command will fire > that trigger, and if the trigger doesn't care about some DDL > operations, then it can look at the command tag and return without > doing anything for the operations it doesn't care about. The only > real disadvantage of doing it that way is speed, and maybe a bit of > code complexity within the trigger. So my further thought was that Within my “context proposal”, you also lose the ability to refer to sub events as plain events with a context, which I find so much cleaner. > we'd allow you to specify
Re: [HACKERS] HTTP Frontend? (and a brief thought on materialized views)
On Thu, Mar 29, 2012 at 10:55 PM, Dobes Vandermeer wrote: >> Virtual hosts. Same port. > > In that case, the frontend would not be tied to a specific PostgreSQL > server, then? I think initially this might complicate things a bit, and you > could solve it by putting an HTTP proxy in front to do the virtual hosts for > you. I think these problems are treatable, as you mention, and if there is to be any value at all by using mostly off-the-shelf components (which, at this time, are rarer for the new-generation stuff). That's the draw, for me. > Ah, I get it - you want a way to add some extra information to the protocol > in a backwards compatible way. HTTP (and SPDY) provides a "standard" way to > do that. Makes sense. > > I'll make a note of these and hash out the details a bit more once there's > something working to add them to. A lot of them are old ideas, but it would be nice to encourage experimentation by getting over some of the small-matter-of-programming and backwards-compatibility issues. >> I think SPDY or like-protocols [...] give a crisp treatment to >> interactive, stateful workloads involving >> >> back-and-forth between client and server with multiplexing, fixing >> some problems with the hacks in HTTP-land from before. > > It sounds like at some level you're really talking about replacing the > built-in protocol with SPDY because SPDY is possibly a better baseline than > updating the existing protocol. That's an interesting idea, I think this > project could evolve in that direction if there's demand for it. If only so there is a smaller set of arbitrary decisions to make about how to delimit messages...but if SPDY doesn't get widely deployed, or exacts an unacceptable performance penalty, it is game over. The worst possible outcome is the thing that becomes common also is impractical for session-oriented sequential SQL execution, but I am hopeful that given the use cases driving this standardization process that this is less likely to happen. FEBE's framing/message format doesn't seem like an area where people are itching to try wild and crazy changes unique to Postgres (maybe someone has...) so using a more commonly seen delimitation format seems like a concession that opens a lot of more useful interesting doors. This discussion is somewhat premature because HTTP 2.0 isn't standardized, nor has SPDY become a wide-spread defacto format (although the percentage of well-known web-infrastructure projects implementing it has grown both impressively both recently and overall), and there are not even straw-man bandwidth/latency measurements (in terms of vs. FEBE), but as long as someone is bringing up HTTP, I thought it worth discussing in a little more depth, because it's something I poll regularly mentally, looking for a sign that It's Time. It wasn't ready enough for me to start a thread, but clearly I couldn't quite resist replying to one... -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers