Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On Tue, Jul 7, 2015 at 12:57 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 30, 2015 at 10:30 PM, Amit Kapila amit.kapil...@gmail.com wrote: I have still not added documentation and have not changed anything for waiting column in pg_stat_activity as I think before that we need to finalize the user interface. Apart from that as mentioned above still wait for some event types (like IO, netwrok IO, etc.) is not added and also I think separate function/'s (like we have for existing ones pg_stat_get_backend_waiting) will be required which again depends upon user interface. Yes, we need to discuss what events to track. As I suggested upthread, Itagaki-san's patch would be helpful to think that. He proposed to track not only wait event like locking and I/O operation but also CPU events like query parsing, planning, and etc. I think that tracking even CPU events would be useful to analyze the performance problem. I think as part of this patch, we are trying to capture events where we need to wait, extending the scope to include CPU events might duplicate the tracing events (DTRACE) we already have in PostgreSQL and it might degrade performance in certain cases. If we try to capture events only during waits, then there is almost no chance of having any visible performance impact. I suggest for an initial implementation, lets track Heavy weight locks, Light Weight Locks and IO events and then we can add more events if we think those are important. Here are some review comments on the patch: When I played around the patch, the test of make check failed. Yes, I still need to update some of the tests according to updates in pg_stat_activity, but the main reason I have not done so is that the user interface needs some more discussion. Each backend reports its event when trying to take a lock. But the reported event is never reset until next event is reported. Is this OK? This means that the wait_event column keeps showing the *last* event while a backend is in idle state, for example. So, shouldn't we reset the reported event or report another one when releasing the lock? As pointed by Kyotaro-San, 'waiting' column will indicate whether it is still waiting on the event specified in wait_event column. Currently waiting is updated only for Heavy Weight locks, if there is no objection to use it for other wait_events (aka break backward compatibility for that parameter), then I can update it for other events as well, do you have any better suggestions for the same? +read_string_from_waitevent(uint8 wait_event) The performance of this function looks poor because its worst case is O(n): n is the number of all the events that we are trying to track. Also in pg_stat_activity, this function is called per backend. Can't we retrieve the event name by using wait event ID as an index of WaitEventTypeMap array? Yes, we can do that way. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] expose confirmed_flush for replication slots
Hi, I had some trouble today with a misbehaving logical replication client which had confirmed a flush of an LSN far into the future. Debugging it was a bit of a pain for a number of reasons, but I think the most important one was that confirmed_flush isn't exposed in pg_stat_replication_slots. Attached patch exposes it, as confirmed_flush_lsn (to make it look like restart_lsn; not sure whether we should just keep the internal name or not). Adding this one to the next commit fest, but any feedback welcome in the meanwhile. .m diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e82a53a..ffaf451 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -674,7 +674,8 @@ CREATE VIEW pg_replication_slots AS L.active_pid, L.xmin, L.catalog_xmin, -L.restart_lsn +L.restart_lsn, +L.confirmed_flush_lsn FROM pg_get_replication_slots() AS L LEFT JOIN pg_database D ON (L.datoid = D.oid); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 9a2793f..90d75ce 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -206,6 +206,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) TransactionId xmin; TransactionId catalog_xmin; XLogRecPtr restart_lsn; + XLogRecPtr confirmed_flush_lsn; pid_t active_pid; Oid database; NameDataslot_name; @@ -224,6 +225,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) catalog_xmin = slot-data.catalog_xmin; database = slot-data.database; restart_lsn = slot-data.restart_lsn; + confirmed_flush_lsn = slot-data.confirmed_flush; namecpy(slot_name, slot-data.name); namecpy(plugin, slot-data.plugin); @@ -273,6 +275,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) else nulls[i++] = true; + if (confirmed_flush_lsn != InvalidTransactionId) + values[i++] = LSNGetDatum(confirmed_flush_lsn); + else + nulls[i++] = true; + tuplestore_putvalues(tupstore, tupdesc, values, nulls); } diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 6fd1278..aee82d2 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -5183,7 +5183,7 @@ DATA(insert OID = 3779 ( pg_create_physical_replication_slot PGNSP PGUID 12 1 0 DESCR(create a physical replication slot); DATA(insert OID = 3780 ( pg_drop_replication_slot PGNSP PGUID 12 1 0 0 0 f f f f f f v 1 0 2278 19 _null_ _null_ _null_ _null_ _null_ pg_drop_replication_slot _null_ _null_ _null_ )); DESCR(drop a replication slot); -DATA(insert OID = 3781 ( pg_get_replication_slots PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 {19,19,25,26,16,23,28,28,3220} {o,o,o,o,o,o,o,o,o} {slot_name,plugin,slot_type,datoid,active,active_pid,xmin,catalog_xmin,restart_lsn} _null_ _null_ pg_get_replication_slots _null_ _null_ _null_ )); +DATA(insert OID = 3781 ( pg_get_replication_slots PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 {19,19,25,26,16,23,28,28,3220,3220} {o,o,o,o,o,o,o,o,o,o} {slot_name,plugin,slot_type,datoid,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn} _null_ _null_ pg_get_replication_slots _null_ _null_ _null_ )); DESCR(information about replication slots currently in use); DATA(insert OID = 3786 ( pg_create_logical_replication_slot PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2249 19 19 {19,19,25,3220} {i,i,o,o} {slot_name,plugin,slot_name,xlog_position} _null_ _null_ pg_create_logical_replication_slot _null_ _null_ _null_ )); DESCR(set up a logical replication slot); diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index cd53375..63cd323 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1416,8 +1416,9 @@ pg_replication_slots| SELECT l.slot_name, l.active_pid, l.xmin, l.catalog_xmin, -l.restart_lsn - FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin, restart_lsn) +l.restart_lsn, +l.confirmed_flush_lsn + FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn) LEFT JOIN pg_database d ON ((l.datoid = d.oid))); pg_roles| SELECT pg_authid.rolname, pg_authid.rolsuper, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] PL/pgSQL, RAISE and error context
On Wed, Jul 8, 2015 at 1:35 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2015-07-07 18:15 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Tue, Jul 7, 2015 at 9:04 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for what is RAISE EXCEPTION - first or second case? First: RAISE (unless caught) is no different than any other kind of error. On Tue, Jul 7, 2015 at 9:42 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/07/2015 04:56 PM, Merlin Moncure wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for Java's exception handling is so different from PostgreSQL's errors that I don't think there's much to be learned from that. But I'll bite: First of all, Java's exceptions always contain a stack trace. It's up to you when you catch an exception to decide whether to print it or not. try { ... } catch (Exception e) { e.printStackTrace() } is fairly common, actually. There is nothing like a NOTICE in Java, i.e. an exception that's thrown but doesn't affect the control flow. The best I can think of is System.out.println(), which of course has no stack trace attached to it. exactly. Perhaps you're arguing that NOTICE is more like printing to stderr, and should never contain any context information. I don't think that would be an improvement. It's very handy to have the context information available if don't know where a NOTICE is coming from, even if in most cases you're not interested in it. That's exactly what I'm arguing. NOTICE (and WARNING) are for printing out information to client side logging; it's really the only tool we have for that purpose and it fits that role perfectly. Of course, you may want to have NOTICE print context, especially when debugging, but some control over that would be nice and in most cases it's really not necessary. I really don't understand the objection to offering control over that behavior although I certainly understand wanting to keep the default behavior as it currently is. This is really quite different from a programming language's exception handling. First, there's a server, which produces the errors, and a separate client, which displays them. You cannot catch an exception in the client. BTW, let me throw in one use case to consider. We've been talking about psql, and what to print, but imagine a more sophisticated client like pgAdmin. It's not limited to either printing the context or not. It could e.g. hide the context information of all messages when they occur, but if you double-click on it, it's expanded to show all the context, location and all. You can't do that if the server doesn't send the context information in the first place. I would be happy to show you the psql redirected output logs from my nightly server processes that spew into the megabytes because of logging various high level steps (did this, did that). Oh, I believe you. I understand what the problem is, we're only talking about how best to address it. Yeah. For posterity, a psql based solution would work fine for me, but a server side solution has a lot of advantages (less protocol chatter, more configurability, keeping libpq/psql light). After some work on reduced version of plpgsql.min_context patch I am inclining to think so ideal solution needs more steps - because this issue has more than one dimension. There are two independent issues: 1. old plpgsql workaround that reduced the unwanted call stack info for RAISE NOTICE. Negative side effect of this workaround is missing context info about the RAISE command that raises the exception. We know a function, but we don't know a line of related RAISE statement. The important is fact, so NOTICE doesn't bubble to up. So this workaround was relative successful without to implement some filtering on client or log side. 2. second issue is general suppressing context info for interactive client or for log. These issues should be solved separately, because solution for @2 doesn't fix @1, and @1 is too local for @2. So what we can do? 1. remove current plpgsql workaround - and implement client_min_context and log_min_context 2. implement plpgsql.min_context, and
Re: [HACKERS] expose confirmed_flush for replication slots
On 2015-07-08 14:57, I wrote: Adding this one to the next commit fest, but any feedback welcome in the meanwhile. Forgot to change PG_GET_REPLICATION_SLOTS_COLS; fixed in the attached patch. .m diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e82a53a..ffaf451 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -674,7 +674,8 @@ CREATE VIEW pg_replication_slots AS L.active_pid, L.xmin, L.catalog_xmin, -L.restart_lsn +L.restart_lsn, +L.confirmed_flush_lsn FROM pg_get_replication_slots() AS L LEFT JOIN pg_database D ON (L.datoid = D.oid); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 9a2793f..ddbf8c4 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -158,7 +158,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS) Datum pg_get_replication_slots(PG_FUNCTION_ARGS) { -#define PG_GET_REPLICATION_SLOTS_COLS 9 +#define PG_GET_REPLICATION_SLOTS_COLS 10 ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo-resultinfo; TupleDesc tupdesc; Tuplestorestate *tupstore; @@ -206,6 +206,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) TransactionId xmin; TransactionId catalog_xmin; XLogRecPtr restart_lsn; + XLogRecPtr confirmed_flush_lsn; pid_t active_pid; Oid database; NameDataslot_name; @@ -224,6 +225,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) catalog_xmin = slot-data.catalog_xmin; database = slot-data.database; restart_lsn = slot-data.restart_lsn; + confirmed_flush_lsn = slot-data.confirmed_flush; namecpy(slot_name, slot-data.name); namecpy(plugin, slot-data.plugin); @@ -273,6 +275,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) else nulls[i++] = true; + if (confirmed_flush_lsn != InvalidTransactionId) + values[i++] = LSNGetDatum(confirmed_flush_lsn); + else + nulls[i++] = true; + tuplestore_putvalues(tupstore, tupdesc, values, nulls); } diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 6fd1278..aee82d2 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -5183,7 +5183,7 @@ DATA(insert OID = 3779 ( pg_create_physical_replication_slot PGNSP PGUID 12 1 0 DESCR(create a physical replication slot); DATA(insert OID = 3780 ( pg_drop_replication_slot PGNSP PGUID 12 1 0 0 0 f f f f f f v 1 0 2278 19 _null_ _null_ _null_ _null_ _null_ pg_drop_replication_slot _null_ _null_ _null_ )); DESCR(drop a replication slot); -DATA(insert OID = 3781 ( pg_get_replication_slots PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 {19,19,25,26,16,23,28,28,3220} {o,o,o,o,o,o,o,o,o} {slot_name,plugin,slot_type,datoid,active,active_pid,xmin,catalog_xmin,restart_lsn} _null_ _null_ pg_get_replication_slots _null_ _null_ _null_ )); +DATA(insert OID = 3781 ( pg_get_replication_slots PGNSP PGUID 12 1 10 0 0 f f f f f t s 0 0 2249 {19,19,25,26,16,23,28,28,3220,3220} {o,o,o,o,o,o,o,o,o,o} {slot_name,plugin,slot_type,datoid,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn} _null_ _null_ pg_get_replication_slots _null_ _null_ _null_ )); DESCR(information about replication slots currently in use); DATA(insert OID = 3786 ( pg_create_logical_replication_slot PGNSP PGUID 12 1 0 0 0 f f f f f f v 2 0 2249 19 19 {19,19,25,3220} {i,i,o,o} {slot_name,plugin,slot_name,xlog_position} _null_ _null_ pg_create_logical_replication_slot _null_ _null_ _null_ )); DESCR(set up a logical replication slot); diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index cd53375..63cd323 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1416,8 +1416,9 @@ pg_replication_slots| SELECT l.slot_name, l.active_pid, l.xmin, l.catalog_xmin, -l.restart_lsn - FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin, restart_lsn) +l.restart_lsn, +l.confirmed_flush_lsn + FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn) LEFT JOIN pg_database d ON ((l.datoid = d.oid))); pg_roles| SELECT pg_authid.rolname, pg_authid.rolsuper, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] New functions
07.07.2015, 18:34, Michael Paquier michael.paqu...@gmail.com: Speaking of which, I have rewritten the patch as attached. This looks way cleaner than the previous version submitted. Dmitry, does that look fine for you? I am switching this patch as Waiting on Author. Regards, -- Michael Michael, hello. I'm looking your variant of patch. You create function ssl_extensions_info(), that gives information of SSL extensions in more informative view. So, it's cool. -- Best regards, Dmitry Voronin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementation of global temporary tables?
Hi 2015-07-08 9:08 GMT+02:00 Zhaomo Yang zhy...@cs.ucsd.edu: more global temp tables are little bit comfortable for developers, I'd like to emphasize this point. This feature does much more than saving a developer from issuing a CREATE TEMP TABLE statement in every session. Here are two common use cases and I'm sure there are more. (1) Imagine in a web application scenario, a developer wants to cache some session information in a temp table. What's more, he also wants to specify some rules which reference the session information. Without this feature, the rules will be removed at the end of every session since they depend on a temporary object. Global temp tables will allow the developer to define the temp table and the rules once. (2) The second case is mentioned by Tom Lane back in 2010 in a thread about global temp tables. (http://www.postgresql.org/message-id/9319.1272130...@sss.pgh.pa.us) The context that I've seen it come up in is that people don't want to clutter their functions with create-it-if-it-doesn't-exist logic, which you have to have given the current behavior of temp tables. 2.a - using on demand created temp tables - most simple solution, but doesn't help with catalogue bloating I've read the thread and people disapprove this approach because of the potential catalog bloat. However, I'd like to champion it. Indeed, this approach may have a bloat issue. But for users who needs global temp tables, they now have to create a new temp table in every session, which means they already have the bloat problem and presumably they have some policy to deal with it. In other words, implementing global temp tables by this approach gives users the same performance, plus the convenience the feature brings. The root problem here is that whether whether having the unoptimized feature is better than having no feature at all. Actually, there was a very similar discussion back in 2009 on global temp tables. Let me borrow Kevin Grittner's and Tom Lane's arguments here. Kevin Grittner's argument: http://www.postgresql.org/message-id/49f82aea.ee98.002...@wicourts.gov ... If you're saying we can implement the standard's global temporary tables in a way that performs better than current temporary tables, that's cool. That would be a nice bonus in addition to the application programmer convenience and having another tick-mark on the standards compliance charts. Do you think that's feasible? If not, the feature would be useful to some with the same performance that temporary tables currently provide. Tom Lane's arguments: http://www.postgresql.org/message-id/24110.1241035...@sss.pgh.pa.us I'm all for eliminating catalog overheads, if we can find a way to do that. I don't think that you get to veto implementation of the feature until we can find a way to optimize it better. The question is not about whether having the optimization would be better than not having it --- it's about whether having the unoptimized feature is better than having no feature at all (which means people have to implement the same behavior by hand, and they'll *still* not get the optimization). There have been several threads here discussing global temp table since 2007. Quite a few ideas aimed to avoid the bloat issue by not storing the metadata of the session copy in the catalog. However, it seems that none of them has been implemented, or even has a feasible design. So why don't we implement it in a unoptimized way first? I am not sure, if it is not useless work. Now, I am thinking so best implementation of global temp tables is enhancing unlogged tables to have local content. All local data can be saved in session memory. Usually it is less than 2KB with statistic, and you don't need to store it in catalogue. When anybody is working with any table, related data are copied to system cache - and there can be injected a implementation of global temp tables. regards Pavel Stehule Is there still interest about this feature? I'm very interested in this feature. I'm thinking about one implementation which is similar to Pavel's 2009 proposal ( http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc78...@mail.gmail.com). Here are the major ideas of my design: (1) Creating the cross-session persistent schema as a regular table and creating session-private temp tables when a session first accesses it. (2) For DML queries, The global temp table is overloaded by its session copy after the relation is opened by an oid or a rangevar. For DDL queries, which copy is used depends on whether the query needs to access the data or metadata of the global temp table. There are more differences between this design and Pavel's 2009 proposal and I'd like to send a detailed proposal to the mailing list but first I want to know if our community would accept a global temp table implementation which provides the same performance as
Re: [HACKERS] Bypassing SQL lexer and parser
06.07.2015, 20:28, "Guillaume Lelarge" guilla...@lelarge.info: That sounds a lot like a background worker.Thank a lot! That's exactly what I was looking for. 07.07.2015, 03:29, "Jim Nasby" jim.na...@bluetreble.com: There is support for plugging into the parser and executor, so that might be a possibility, but...For now, it's the key question as I didn't find anything like that in PostreSQL docs. There are some good samples of implementing custom background workers but all them use prepared stetements via SPI. AFAIK HTSQL is very happy producing SQL; why not just let it hand SQL to Postgres (which it already does...)Yes, they both work like a charm but HTSQL is implemented in Python which adds some extra overhead. Have you by chance talked to Clark or Kirill about this?Not yet. I'm not currently going to follow all the HTSQL specs (at least at the starting point), just will use them as reference.
Re: [HACKERS] Improving log capture of TAP tests with IPC::Run
On Wed, Jul 8, 2015 at 6:10 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/25/2015 07:14 AM, Michael Paquier wrote: After looking at the issues with the TAP test suite that hamster faced a couple of days ago, which is what has been discussed on this thread: http://www.postgresql.org/message-id/13002.1434307...@sss.pgh.pa.us I have developed a patch to improve log capture of the TAP tests by being able to collect stderr and stdout output of each command run in the tests by using more extensively IPC::Run::run (instead of system() that is not able to help much) that has already been sent on the thread above. This is a massive improvement to the usability of TAP tests. They are practically impossible to debug currently. Thanks for working on this! The patch redirects the output of all system_or_bail commands to a log file. That's a good start, but I wish we could get *everything* in the same log file. That includes also: * stdout and stderr of *all* commands. Including all the commands run with command_ok/command_fails. That makes sense. I have switched command_ok, command_exit_is and command_fails to show up their output instead of having them sent to void. * the command line of commands being executed. It's difficult to follow the log when you don't know which output corresponds to which command. OK, I have added some stuff for that. Let me know your thoughts. Each command started is printed in the log file before starting with a message mentioning what is the type of test used. * whenever a test case is reported as success/fail. Just to be sure, does this concern the ok/not ok messages printed out by each test run? Or is it a custom message that you have in mind? Looking at the manual page of Test::More, it looks like you could change where the perl script's STDOUT and STDERR point to, because Test::More takes a copy of them (when? at program startup I guess..). That would be much more convenient than decorating every run call with logfile. Hm. There are two types of logs we want to capture: 1) stdout and stderr from the subprocesses kicked by IPC::Run::run 2) Status messages written in the log file by the process running the tests. Perhaps we could redirect the output of stdout and stderr but I think that this is going to need an fd open from the beginning of the test until the end, with something like that: open my $test_logfile_fd, '', $test_logfile; *STDOUT = $test_logfile_fd; *STDERR = $test_logfile_fd; While that would work on OSX and Linux for sure, I suspect that this will not on Windows where two concurrent processes cannot write to the same file. Also, the output can be correctly captured by just appending that to a couple of places: [ '', $test_logfile, '21'] And this solution proves to work as well on Windows... -- Michael From dccb8d2226ea2ffa40fd9702cbc26196f214471d Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Wed, 8 Jul 2015 17:24:24 +0900 Subject: [PATCH] Improve log capture of TAP tests and fix race conditions All tests have their logs stored as regress_log/$TEST_NAME, with content captured from the many commands run during the tests. Commands that were previously called in silent mode are made verbose and have their output written in the newly-created log files. --- src/Makefile.global.in | 1 + src/bin/pg_basebackup/t/010_pg_basebackup.pl | 4 +- src/bin/pg_controldata/t/001_pg_controldata.pl | 2 +- src/bin/pg_ctl/t/001_start_stop.pl | 2 +- src/bin/pg_ctl/t/002_status.pl | 6 +- src/bin/pg_rewind/.gitignore | 1 - src/bin/pg_rewind/Makefile | 2 +- src/bin/pg_rewind/RewindTest.pm| 74 ++- src/bin/pg_rewind/t/001_basic.pl | 1 - src/bin/pg_rewind/t/002_databases.pl | 1 - src/bin/pg_rewind/t/003_extrafiles.pl | 1 - src/test/perl/TestLib.pm | 83 -- src/test/ssl/ServerSetup.pm| 6 +- 13 files changed, 109 insertions(+), 75 deletions(-) diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 8eab178..8d1250d 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -337,6 +337,7 @@ cd $(srcdir) TESTDIR='$(CURDIR)' PATH=$(bindir):$$PATH PGPORT='6$(DEF_PGPOR endef define prove_check +rm -rf $(srcdir)/tmp_check/log cd $(srcdir) TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl endef diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index c8c9250..10a523b 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -30,7 +30,7 @@ print HBA local replication all trust\n; print HBA host replication all 127.0.0.1/32 trust\n; print HBA host
[HACKERS] more-helpful-izing a debug message
Hi, One of the debug messages related to logical replication could be more helpful than it currently is. The attached patch reorders the two operations to make it so. Please consider patching and back-patching. .m diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c index 824bc91..7643add 100644 --- a/src/backend/replication/logical/logical.c +++ b/src/backend/replication/logical/logical.c @@ -406,11 +406,12 @@ CreateDecodingContext(XLogRecPtr start_lsn, * decoding. Clients have to be able to do that to support synchronous * replication. */ - start_lsn = slot-data.confirmed_flush; elog(DEBUG1, cannot stream from %X/%X, minimum is %X/%X, forwarding, (uint32) (start_lsn 32), (uint32) start_lsn, (uint32) (slot-data.confirmed_flush 32), (uint32) slot-data.confirmed_flush); + + start_lsn = slot-data.confirmed_flush; } ctx = StartupDecodingContext(output_plugin_options, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] configure can't detect proper pthread flags
On 03/21/2015 01:06 AM, Max Filippov wrote: On Fri, Mar 20, 2015 at 3:43 PM, Max Filippov jcmvb...@gmail.com wrote: Ok, one more attempt: maybe instead of checking that stderr is empty we could check that stderr has changed in the presence of the option that we test? The patch: http://www.postgresql.org/message-id/1426860321-13586-1-git-send-email-jcmvb...@gmail.com Let's step back a bit. If I understand correctly, we have that does -foo compiler option produce a warning? check because we're trying all the different pthread-related flags, and use everything that works. We don't want to use flags that add warnings, however, so we need that check. But why do we try to use all the flags, and not just the first one that works? The original script stopped at the first one that works, but we changed that in this commit: commit e48322a6d6cfce1ec52ab303441df329ddbc04d1 Author: Bruce Momjian br...@momjian.us Date: Thu Aug 12 16:39:50 2004 + Be more aggressive about adding flags to thread compiles. The configure test only tests for building a binary, not building a shared library. On Linux, you can build a binary with -pthread, but you can't build a binary that uses a threaded shared library unless you also use -pthread when building the binary, or adding -lpthread to the shared library build. This patch has the effect of doing the later by adding both -pthread and -lpthread when building libpq. The discussion that lead to that commit is here: http://www.postgresql.org/message-id/flat/200408101453.36209.xzi...@users.sourceforge.net#200408101453.36209.xzi...@users.sourceforge.net. I tried reverting that commit, and lo-and-behold everything still works. Turns out that using -lpthread is not in fact necessary when building libpq on Linux. It seems that it was in fact a GCC bug all along, that it didn't link the shared library with libpthread, when you passed just the -pthread flag; see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=. I suggest that we revert that work-around for that GCC bug, and stop testing the pthread flags as soon as we find one that works. Then we can also remove the test for whether the compiler produces any warnings. AFAICS, after that our ax_thread.m4 file is not materially different from the upstream autoconf-archive version. Let's just replace our ax_pthread.m4 file with the latest upstream version. Of course, that breaks this again for anyone compiling on Linux with GCC version 3.2 or older. I don't have much sympathy for that old systems, and there is a work-around available for them anyway: specify PTHREAD_CFLAGS=-pthread -lpthread on the configure command line. Then the configure script will just verify that that works, and not run the auto-detection code. You can also use that work-around to build older branches with uClibc, which produces the annoying gethostbyname() warnings that started this thread. To err on the side of caution, I'm thinking this should be committed to master and REL9_5_STABLE only. Thoughts? - Heikki From 711e5c7a1762f0c12e7b7e457c84484827246086 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Wed, 8 Jul 2015 11:11:18 +0300 Subject: [PATCH 1/1] Replace our hacked version of ax_pthread.m4 with latest upstream version. Our version was different from the upstream version in that we tried to use all possible pthread-related flags that the compiler accepts, rather than just the first one that works. That was changed in commit e48322a6d6cfce1ec52ab303441df329ddbc04d1, which was needed to work-around a GCC bug that was fixed in GCC version 3.3, and we hardly care about such an old version anymore (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=). This fixes the detection for compilers that print warnings with the chosen flags. That's pretty annoying on its own right, but it's not nice that it inconspicuously disables thread-safety. If you really want to compile with GCC version 3.2 or older, you can still work-around it manually by setting PTHREAD_CFLAGS=-pthread -lpthread manually on the configure command line. --- aclocal.m4| 2 +- config/acx_pthread.m4 | 170 -- config/ax_pthread.m4 | 332 ++ configure | 310 -- configure.in | 2 +- 5 files changed, 583 insertions(+), 233 deletions(-) delete mode 100644 config/acx_pthread.m4 create mode 100644 config/ax_pthread.m4 diff --git a/aclocal.m4 b/aclocal.m4 index eaf9800..6f930b6 100644 --- a/aclocal.m4 +++ b/aclocal.m4 @@ -1,6 +1,6 @@ dnl aclocal.m4 m4_include([config/ac_func_accept_argtypes.m4]) -m4_include([config/acx_pthread.m4]) +m4_include([config/ax_pthread.m4]) m4_include([config/c-compiler.m4]) m4_include([config/c-library.m4]) m4_include([config/docbook.m4]) diff --git a/config/acx_pthread.m4 b/config/acx_pthread.m4 deleted file mode
Re: [HACKERS] pg_receivexlog --create-slot-if-not-exists
Le 07/07/2015 14:55, Andres Freund a écrit : On 2015-06-19 17:21:25 +0200, Cédric Villemain wrote: To make slot usage in pg_receivexlog easier, should we add --create-slot-if-not-exists? That'd mean you could run the same command the first and later invocation. +1 (with a shorter name please, if you can find one... ) How about a seperate --if-not-exists modifier? Right now --create works as an abbreviation for --create-slot, but --create-slot-if-not-exists would break that. Having done a quick overview of other binaries provided by postgresql I realized that pg_receivexlog is distinct from other tools creating something at a sql level: for db, role and lang we have create and drop as distinct commands. Both dropdb and dropuser have a '--if-exists' (like in SQL) but have not for create operation. Maybe we should have used createslot/dropslot in the first place and use --if-exists --if-not-exists accordingly. Having all in one tool, adding one or both options looks good to me. The only alternative I see is more like --force: return success even if the operation was not required (create or drop). -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set of patch to address several Coverity issues
On 2015-07-08 14:11:59 +0900, Michael Paquier wrote: Arg... I thought I triggered a couple of weeks a problem in this code path when desc-arg_arraytype[i] is InvalidOid with argtypes == NULL. Visibly I did something wrong... Speaking of which, shouldn't this thing at least use OidIsValid? - if (fcinfo-flinfo-fn_oid) + if (OidIsValid(fcinfo-flinfo-fn_oid)) get_func_signature(fcinfo-flinfo-fn_oid, argtypes, nargs); We do the current type of tests in a bunch of places, I'd not modify code just to change it. But since apparently the whole test is pointless, I could see replacing it by an assert. 4) Return result of timestamp2tm is not checked 2 times in GetCurrentDateTime and GetCurrentTimeUsec, while all the other 40 calls of this function do sanity checks. Returning ERRCODE_DATETIME_VALUE_OUT_OF_RANGE in case of an error would be good for consistency. See 0004. (issue reported previously here CAB7nPqRSk=J8eUdd55fL-w+k=8sDTHLVBt-cgG9jWN=vo2o...@mail.gmail.com) But the other cases accept either arbitrary input or perform timezone conversions. Not the case here, no? The callers of GetCurrentDateTime() and GetCurrentTimeUsec() do some work on pg_tm, see time_timetz() that does accept some input DecodeDateTime() for example. So what? GetCurrentDateTime()'s returned data is still pretty much guaranteed to be correct unless a lot of things have gone wrong previously? In any case, we are going to need at least (void) in front of those calls. We're needing nothing of the sort. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] configure can't detect proper pthread flags
Heikki Linnakangas hlinn...@iki.fi writes: I suggest that we revert that work-around for that GCC bug, and stop testing the pthread flags as soon as we find one that works. OK ... Then we can also remove the test for whether the compiler produces any warnings. Don't see how that follows? AFAICS, after that our ax_thread.m4 file is not materially different from the upstream autoconf-archive version. Let's just replace our ax_pthread.m4 file with the latest upstream version. It definitely *would* be nice to get back in sync with the upstream version of that file. But I'm unconvinced about the warnings angle. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
On Tue, Jul 7, 2015 at 5:37 PM, Simon Riggs si...@2ndquadrant.com wrote: On 6 July 2015 at 17:28, Simon Riggs si...@2ndquadrant.com wrote: I think we need something for pg_upgrade to rewrite existing VMs. Otherwise a large read only database would suddenly require a massive revacuum after upgrade, which seems bad. That can wait for now until we all agree this patch is sound. Since we need to rewrite the vm map, I think we should call the new map vfm +1 for changing the name, as now map contains more than visibility information. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] psql tabcomplete - minor bugfix - tabcomplete for SET ROLE TO xxx
2015-07-02 11:03 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi: On 05/29/2015 10:41 AM, Pavel Stehule wrote: 2015-05-29 9:28 GMT+02:00 Jeevan Chalke jeevan.cha...@gmail.com: I agree with Peter that We don't tab-complete everything we possibly could, but using tabs after SET ROLE TO provides DEFAULT as an option which seems wrong. This patch adds list of roles over there, which I guess good to have than giving something unusual. ... But back to this topic. I am thinking so it is little bit different due fact so we support two very syntax for one feature. And looks little bit strange, so one way is supported by autocomplete and second not. Yeah, it's a bit strange. We have a specific autocomplete rule for SET ROLE, but SET ROLE TO is treated as a generic GUC. With your patch, we'd also lose the auto-completion to SET ROLE TO DEFAULT. I think we want to encourage people to use the SQL-standard syntax SET ROLE ... rather than the PostgreSQL-specific SET ROLE TO On the whole, this just doesn't seem like much of an improvement. I'll mark this as 'rejected' in the commitfest app. PS. I note that the auto-completion for SET XXX TO ... is pretty lousy in general. We have rules for DateStyle, IntervalStyle, GEQO and search_path, but that's it. That could be expanded a lot. All enum-type GUCs could be handled with a single rule that queries pg_settings.enumvals, for example, and booleans would be easy too. But that's a different story. I wrote a patch for fallback tabcomplete for bool and enum GUC variables Regards Pavel - Heikki commit 7749d7d3fabf468dbe2c5f397add9f8e31f59614 Author: Pavel Stehule pavel.steh...@gooddata.com Date: Wed Jul 8 14:24:55 2015 +0200 initial diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 0683548..c4e56c8 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -742,6 +742,14 @@ static const SchemaQuery Query_for_list_of_matviews = { FROM pg_catalog.pg_tablesample_method \ WHERE substring(pg_catalog.quote_ident(tsmname),1,%d)='%s' +#define Query_for_enum \ + SELECT name FROM ( \ + SELECT unnest(enumvals) AS name \ +FROM pg_catalog.pg_settings \ + WHERE pg_catalog.lower(name)=pg_catalog.lower('%s') \ + UNION SELECT 'DEFAULT' ) ss \ + WHERE pg_catalog.substring(name,1,%%d)='%%s' + /* * This is a list of all things in Pgsql, which can show up after CREATE or * DROP; and there is also a query to get a list of them. @@ -832,6 +840,8 @@ static PGresult *exec_query(const char *query); static void get_previous_words(int point, char **previous_words, int nwords); +static char *get_vartype(const char *varname); + #ifdef NOT_USED static char *quote_file_name(char *text, int match_type, char *quote_pointer); static char *dequote_file_name(char *text, char quote_char); @@ -3548,20 +3558,6 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH_LIST(my_list); } - else if (pg_strcasecmp(prev2_wd, IntervalStyle) == 0) - { - static const char *const my_list[] = - {postgres, postgres_verbose, sql_standard, iso_8601, NULL}; - - COMPLETE_WITH_LIST(my_list); - } - else if (pg_strcasecmp(prev2_wd, GEQO) == 0) - { - static const char *const my_list[] = - {ON, OFF, DEFAULT, NULL}; - - COMPLETE_WITH_LIST(my_list); - } else if (pg_strcasecmp(prev2_wd, search_path) == 0) { COMPLETE_WITH_QUERY(Query_for_list_of_schemas @@ -3571,10 +3567,31 @@ psql_completion(const char *text, int start, int end) } else { - static const char *const my_list[] = - {DEFAULT, NULL}; + /* fallback for GUC settings */ - COMPLETE_WITH_LIST(my_list); + char *vartype = get_vartype(prev2_wd); + + if (strcmp(vartype, enum) == 0) + { +char querybuf[1024]; + +snprintf(querybuf, 1024, Query_for_enum, prev2_wd); +COMPLETE_WITH_QUERY(querybuf); + } + else if (strcmp(vartype, bool) == 0) + { +static const char *const my_list[] = +{ON, OFF, DEFAULT, NULL}; + +COMPLETE_WITH_LIST(my_list); + } + else + { +static const char *const my_list[] = +{DEFAULT, NULL}; + +COMPLETE_WITH_LIST(my_list); + } } } @@ -4636,6 +4653,42 @@ get_previous_words(int point, char **previous_words, int nwords) } } +static char * +get_vartype(const char *varname) +{ + PQExpBufferData query_buffer; + char *e_varname; + PGresult *result; + int string_length; + static char resbuf[10]; + + initPQExpBuffer(query_buffer); + + string_length = strlen(varname); + e_varname = pg_malloc(string_length * 2 + 1); + PQescapeString(e_varname, varname, string_length); + + appendPQExpBuffer(query_buffer, + SELECT vartype FROM pg_settings WHERE pg_catalog.lower(name) = pg_catalog.lower('%s'), + e_varname); + + result = exec_query(query_buffer.data); + termPQExpBuffer(query_buffer); + free(e_varname); + + resbuf[0] = '\0'; + + if (PQresultStatus(result) == PGRES_TUPLES_OK) + { + if (PQntuples(result) 0) +
Re: [HACKERS] New functions
On Wed, Jul 8, 2015 at 9:15 PM, Дмитрий Воронин carriingfat...@yandex.ru wrote: 07.07.2015, 18:34, Michael Paquier michael.paqu...@gmail.com: Speaking of which, I have rewritten the patch as attached. This looks way cleaner than the previous version submitted. Dmitry, does that look fine for you? I am switching this patch as Waiting on Author. Regards, -- Michael Michael, hello. I'm looking your variant of patch. You create function ssl_extensions_info(), that gives information of SSL extensions in more informative view. So, it's cool. OK cool. I think a committer could look at it, hence switching it to Ready for Committer. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement
On Wed, Jul 8, 2015 at 3:20 AM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: My intention for the 6th step is all recorded to wal, so if a crash occurs the recovery process clean the mess. AFAIU, PostgreSQL recovery is based on redoing WAL. What you described earlier, undoing based on the WAL does not fit in the current framework. Understood! During the recovery to check the status of a transaction can I use src/backend/access/transam/clog.c:TransactionIdGetStatus ? I'm asking because I don't know this part of code to much. AFAIU, not unless all WALs (or atleast WALs till the commit/abort record of the transaction in question) are replayed. So we'll need to execute this procedure after replay, then the ResetUnloggedTables should be executed in two phases: 1) Before the replay checking just the datafiles with the init fork (_init) 2) After the replay checking the datafiles with the stamped init fork ( _init_XID, or something else) Is this reasonable? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Implementation of global temporary tables?
more global temp tables are little bit comfortable for developers, I'd like to emphasize this point. This feature does much more than saving a developer from issuing a CREATE TEMP TABLE statement in every session. Here are two common use cases and I'm sure there are more. (1) Imagine in a web application scenario, a developer wants to cache some session information in a temp table. What's more, he also wants to specify some rules which reference the session information. Without this feature, the rules will be removed at the end of every session since they depend on a temporary object. Global temp tables will allow the developer to define the temp table and the rules once. (2) The second case is mentioned by Tom Lane back in 2010 in a thread about global temp tables. (http://www.postgresql.org/message-id/9319.1272130...@sss.pgh.pa.us) The context that I've seen it come up in is that people don't want to clutter their functions with create-it-if-it-doesn't-exist logic, which you have to have given the current behavior of temp tables. 2.a - using on demand created temp tables - most simple solution, but doesn't help with catalogue bloating I've read the thread and people disapprove this approach because of the potential catalog bloat. However, I'd like to champion it. Indeed, this approach may have a bloat issue. But for users who needs global temp tables, they now have to create a new temp table in every session, which means they already have the bloat problem and presumably they have some policy to deal with it. In other words, implementing global temp tables by this approach gives users the same performance, plus the convenience the feature brings. The root problem here is that whether whether having the unoptimized feature is better than having no feature at all. Actually, there was a very similar discussion back in 2009 on global temp tables. Let me borrow Kevin Grittner's and Tom Lane's arguments here. Kevin Grittner's argument: http://www.postgresql.org/message-id/49f82aea.ee98.002...@wicourts.gov ... If you're saying we can implement the standard's global temporary tables in a way that performs better than current temporary tables, that's cool. That would be a nice bonus in addition to the application programmer convenience and having another tick-mark on the standards compliance charts. Do you think that's feasible? If not, the feature would be useful to some with the same performance that temporary tables currently provide. Tom Lane's arguments: http://www.postgresql.org/message-id/24110.1241035...@sss.pgh.pa.us I'm all for eliminating catalog overheads, if we can find a way to do that. I don't think that you get to veto implementation of the feature until we can find a way to optimize it better. The question is not about whether having the optimization would be better than not having it --- it's about whether having the unoptimized feature is better than having no feature at all (which means people have to implement the same behavior by hand, and they'll *still* not get the optimization). There have been several threads here discussing global temp table since 2007. Quite a few ideas aimed to avoid the bloat issue by not storing the metadata of the session copy in the catalog. However, it seems that none of them has been implemented, or even has a feasible design. So why don't we implement it in a unoptimized way first? Is there still interest about this feature? I'm very interested in this feature. I'm thinking about one implementation which is similar to Pavel's 2009 proposal ( http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc78...@mail.gmail.com). Here are the major ideas of my design: (1) Creating the cross-session persistent schema as a regular table and creating session-private temp tables when a session first accesses it. (2) For DML queries, The global temp table is overloaded by its session copy after the relation is opened by an oid or a rangevar. For DDL queries, which copy is used depends on whether the query needs to access the data or metadata of the global temp table. There are more differences between this design and Pavel's 2009 proposal and I'd like to send a detailed proposal to the mailing list but first I want to know if our community would accept a global temp table implementation which provides the same performance as currently temp tables do. Thanks, Zhaomo
Re: [HACKERS] Freeze avoidance of very large table.
On 7 July 2015 at 18:45, Sawada Masahiko sawada.m...@gmail.com wrote: On Wed, Jul 8, 2015 at 12:37 AM, Andres Freund and...@anarazel.de wrote: On 2015-07-07 16:25:13 +0100, Simon Riggs wrote: I don't think pg_freespacemap is the right place. I agree that pg_freespacemap sounds like an odd location. I'd prefer to add that as a single function into core, so we can write formal tests. With the advent of src/test/modules it's not really a prerequisite for things to be builtin to be testable. I think there's fair arguments for moving stuff like pg_stattuple, pg_freespacemap, pg_buffercache into core at some point, but that's probably a separate discussion. I understood. So I will place bunch of test like src/test/module/visibilitymap_test, which contains some tests regarding this feature, and gather them into one patch. Please place it in core. I see value in having a diagnostic function for general use on production systems. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement
On 2015-07-03 08:18:09 -0300, Fabrízio de Royes Mello wrote: *** ALTER TABLE changes ... Without going into the specifics of this change: Are we actually sure this feature warrants the complexity it'll introduce? I'm really rather doubtful. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] libpq: Allow specifying multiple host names to try to connect to
On 04/19/2015 11:18 AM, Mikko Tiihonen wrote: Hi, I would like allow specifying multiple host names for libpq to try to connecting to. This is currently only supported if the host name resolves to multiple addresses. Having the support for it without complex dns setup would be much easier. Example: psql -h dbslave,dbmaster -p 5432 dbname psql 'postgresql://dbslave,dbmaster:5432/dbname' Here the idea is that without any added complexity of pgbouncer or similar tool I can get any libpq client to try connecting to multiple nodes until one answers. I have added the similar functionality to the jdbc driver few years ago. Because libpq almost supported the feature already the patch is very simple. I just split the given host name and do a dns lookup on each separately, and link the results. If you configure a port that does not exist you can see the libpq trying to connect to multiple hosts. psql -h 127.0.0.2,127.0.0.3, -p psql: could not connect to server: Connection refused Is the server running on host 127.0.0.2,127.0.0.3 (127.0.0.2) and accepting TCP/IP connections on port ? could not connect to server: Connection refused Is the server running on host 127.0.0.2,127.0.0.3 (127.0.0.3) and accepting TCP/IP connections on port ? Further improvement would be to add a connection parameter to limit connection only to master (writable) or to slave (read only). I like the idea of allowing multiple hosts to be specified where if it can't connect to the server libpq will try the next host. psql -h dns-fail-name,localhost psql: could not translate host name dns-fail-name,localhost to address: System error If name in the list doesn't resolve it fails to try the next name. I think it should treat this the same as connection refused. In the error messages when it can't connect to a host you print the entire host string not the actual host being connected to. Ie Is the server running on host 127.0.0.2,127.0.0.3 (127.0.0.3) and accepting It should print just the host that had the failed connection. We also need to decide how we want this feature to behave if libpq can contact the postmaster but can't establish a connection (user/password failure, the database is in recovery mode etc..) do we want to try the next host or stop. My thinking is that the times you would actually use this feature are 1) To connect to a group of replica systems (either read-only streaming replicas or FDW proxies or BDR machines) 2) To connect to a cluster of pgbouncer or plproxy systems so the proxy isn't a single point of failure 3) To connect to a set of servers master1, standby-server1, standby-server2 where you would want it to try the next server in the list. In all of these cases I think you would want to try the next machine in the list if you can't actually establish a usable connection. I also don't think the patch is enough to be helpful with case 3 since you don't actually want a connection to a standby-server unless that server has been promoted to the master. Another concern I have is that the server needs to be listening on the same port against all hosts this means that in a development environment we can't fully test this feature using just a single server. I can't think of anything else we have in core that couldn't be tested on a single server (all the replication stuff works fine if you setup two separate clusters on different ports on one server) You update the documentation just for psql but your change effects any libpq application if we go forward with this patch we should update the documentation for libpq as well. This approach seems to work with the url style of conninfo For example postgres://some-down-host.info,some-other-host.org:5435/test1 seems to work as expected but I don't like that syntax I would rather see postgres://some-down-host.info:5435/test1,postgres://some-other-host.org:5435/test1 This would be a more invasive change but I think the syntax is more usable. -Mikko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement
On Wed, Jul 8, 2015 at 10:53 AM, Andres Freund and...@anarazel.de wrote: On 2015-07-03 08:18:09 -0300, Fabrízio de Royes Mello wrote: *** ALTER TABLE changes ... Without going into the specifics of this change: Are we actually sure this feature warrants the complexity it'll introduce? I'm really rather doubtful. Think in an ETL job that can be use an unlogged table to improve the load performance, but this job create a large table and to guarantee the data consistency you need to transform it into a regular table, and with the current implementation rewrite the entire heap, toast and indexes. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] dblink: add polymorphic functions.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/07/2015 10:22 PM, Michael Paquier wrote: On Mon, Jul 6, 2015 at 11:52 PM, Joe Conway m...@joeconway.com wrote: That explains why the first example works while the second does not. I'm not sure how hard it would be to fix that, but it appears that that is where we should focus. Wouldn't it be fine if we drop some of the functions proposed without impacting the feature? Most of the functions overlap with each other, making us see the limitations we see. Hence, wouldn't it be enough to just have this set of functions in the patch? dblink_get_result(text, bool, anyelement) dblink (text, text, boolean, anyelement) dblink_fetch (text, text, int, boolean, anyelement) I think new using function names is better especially if we are only going to support a subset. I have no idea what to call them however. Did someone else suggest dblink_any(), etc? I also think that the ultimately best solution is (what I believe to be spec compliant) SRF casting, but I guess that could be a task for a later day. - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVnT30AAoJEDfy90M199hl/dcP/29Byp5SE0TPE5EHeJwS6MDf 76KpsiyMZkxddBaiXiPbgrWEBZHdGExyzYj2qO9utPuFlzVTZpAbVcoFXaUg9Ak/ VMKkoxSB1iq1E1Pu64K26xvCU0GzMW1bazcGo4263iSlDjiRCB+CkL5UokEqSmvb c9u7UZ3sdgn34C9iF3Z6pHa5GQoYB+i3kCxFuCdYMyHFZYdA0EtDvFNCoCNsWUCW 783IiQbeKCcj+j0bJ5v8lGfvHGfAJV0C0xiYiLios2nyClOvIQQQv2YN84BWDM6F oWWJoExd51iL94RF/hRBJ/WXFFBfg/r30ULW5/dwhdp8cl1wPW+cY4srEHlTluZZ KzoCujD5rhofJsS7tV9xs6tV4K/4/enknogHT0iWT89B/svAR52SUkRiH22mvhps cpVVOsIP+ojmvmWW2p8jvq9ml5Ls6Z24AyRZ+OVZnVZGzAC9Z+NhwcjJaDWiWU2+ lgVuvIorYWZLHGzd6syrZKxKxdJRZ5ibYe2SN3Q2wlicZRPS5cXOgTfKoSduMTvY GoNRPHQSPeuBNoq4pFUWN8EqH3oijN+PqdnIwc0HMaDyxR0AnISZiLwYWnY8WlqB pdFV6tGsvsM6WbXTIqJ/3diRjRMl2/rtz6o8b3a1K7eL+PB7v5T0I+h9pwJQM2+a WPffDIaY2Odnt9Axebu5 =CZ0t -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
On Wed, Jul 8, 2015 at 10:12 AM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/07/2015 10:22 PM, Michael Paquier wrote: On Mon, Jul 6, 2015 at 11:52 PM, Joe Conway m...@joeconway.com wrote: That explains why the first example works while the second does not. I'm not sure how hard it would be to fix that, but it appears that that is where we should focus. Wouldn't it be fine if we drop some of the functions proposed without impacting the feature? Most of the functions overlap with each other, making us see the limitations we see. Hence, wouldn't it be enough to just have this set of functions in the patch? dblink_get_result(text, bool, anyelement) dblink (text, text, boolean, anyelement) dblink_fetch (text, text, int, boolean, anyelement) I think new using function names is better especially if we are only going to support a subset. I have no idea what to call them however. Did someone else suggest dblink_any(), etc? I also think that the ultimately best solution is (what I believe to be spec compliant) SRF casting, but I guess that could be a task for a later day. totally agree. Even if we had SRF casting, OP's patch has value because of abstraction benefits. Also given that we are in an extension we can relax a bit about adding extra functions IMO. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement
On 2015-07-08 10:58:51 -0300, Fabrízio de Royes Mello wrote: Think in an ETL job that can be use an unlogged table to improve the load performance, but this job create a large table and to guarantee the data consistency you need to transform it into a regular table, and with the current implementation rewrite the entire heap, toast and indexes. Don't buy that. The final target relation will usually already have content. Also everything but wal_level=minimal will force you to WAL log the contents anyway. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] copy.c handling for RLS is insecure
Noah, * Noah Misch (n...@leadboat.com) wrote: On Tue, Dec 02, 2014 at 11:32:27AM -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Thu, Nov 27, 2014 at 2:03 AM, Stephen Frost sfr...@snowman.net wrote: Alright, I've done the change to use the RangeVar from CopyStmt, but also added a check wherein we verify that the relation's OID returned from the planned query is the same as the relation's OID that we did the RLS check on- if they're different, we throw an error. Please let me know if there are any remaining concerns. Here is the check in question (added in commit 143b39c): plan = planner(query, 0, NULL); /* * If we were passed in a relid, make sure we got the same one back * after planning out the query. It's possible that it changed * between when we checked the policies on the table and decided to * use a query and now. */ if (queryRelId != InvalidOid) { Oid relid = linitial_oid(plan-relationOids); /* * There should only be one relationOid in this case, since we * will only get here when we have changed the command for the * user from a COPY relation TO to COPY (SELECT * FROM * relation) TO, to allow row level security policies to be * applied. */ Assert(list_length(plan-relationOids) == 1); if (relid != queryRelId) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg(relation referenced by COPY statement has changed))); } That's clearly an improvement, but I'm not sure it's water-tight. What if the name that originally referenced a table ended up referencing a view? Then you could get list_length(plan-relationOids) != 1. I'll test it out and see what happens. Certainly a good question and if there's an issue there then I'll get it addressed. Yes, it can be made to reference a view and trip the assertion. There's a different issue with that Assertion, actually- if you've got an RLS policy which references another table directly in a subselect then you'll also trip it up, but more below.. (And, in that case, I also wonder if you could get eval_const_expressions() to do evil things on your behalf while planning.) If it can be made to reference a view then there's an issue as the view might include a function call itself which is provided by the attacker.. Indeed. As the parenthetical remark supposed, the check happens too late to prevent a security breach. planner() has run eval_const_expressions(), executing code of the view owner's choosing. It happens too late to prevent the user from running code specified by the table owner- but there's not a solution to that risk except to set 'row_security = off' and use a mechanism which doesn't respect on-select rules, which is only COPY, right? A normal SELECT will certainly fire the on-select rule. Clearly, if we found a relation originally then we need that same relation with the same OID after the conversion to a query. That is necessary but not sufficient. CREATE RULE can convert a table to a view without changing the OID, thereby fooling the check. Test procedure: It's interesting to consider that COPY purportedly operates under the SELECT privilege, yet fails to respect on-select rules. Having spent a bit of time thinking about this now, it occurs to me that we've drifted from the original concern and are now trying to solve an insolvable issue here. We're not trying to prevent against an attacker who owns the table we're going to COPY and wants to get us to run code they've written- that can happen by simply having RLS and that's why it's not enabled by default for superusers and why we have 'row_security = off', which pg_dump sets by default. The original issue that Robert brought up was the concern about multiple lookups of RangeVar-Oid. That was a problem in the CVE highlighted and the original/current coding because we weren't doing fully qualified lookups based on the originally found and locked Oid. I'm trying to figure out why weren't not simply doing that here. After a bit of discussion with Andres, my thinking on this is to do the following: - Fully qualify the name based on the opened relation - Keep the initial lock on the relation throughout - Remove the Assert() (other relations can be pulled in by RLS) - Keep the OID check, shouldn't hurt to have it Thoughts? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] dblink: add polymorphic functions.
2015-07-08 17:12 GMT+02:00 Joe Conway m...@joeconway.com: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/07/2015 10:22 PM, Michael Paquier wrote: On Mon, Jul 6, 2015 at 11:52 PM, Joe Conway m...@joeconway.com wrote: That explains why the first example works while the second does not. I'm not sure how hard it would be to fix that, but it appears that that is where we should focus. Wouldn't it be fine if we drop some of the functions proposed without impacting the feature? Most of the functions overlap with each other, making us see the limitations we see. Hence, wouldn't it be enough to just have this set of functions in the patch? dblink_get_result(text, bool, anyelement) dblink (text, text, boolean, anyelement) dblink_fetch (text, text, int, boolean, anyelement) I think new using function names is better especially if we are only going to support a subset. I have no idea what to call them however. Did someone else suggest dblink_any(), etc? +1 Pavel I also think that the ultimately best solution is (what I believe to be spec compliant) SRF casting, but I guess that could be a task for a later day. - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVnT30AAoJEDfy90M199hl/dcP/29Byp5SE0TPE5EHeJwS6MDf 76KpsiyMZkxddBaiXiPbgrWEBZHdGExyzYj2qO9utPuFlzVTZpAbVcoFXaUg9Ak/ VMKkoxSB1iq1E1Pu64K26xvCU0GzMW1bazcGo4263iSlDjiRCB+CkL5UokEqSmvb c9u7UZ3sdgn34C9iF3Z6pHa5GQoYB+i3kCxFuCdYMyHFZYdA0EtDvFNCoCNsWUCW 783IiQbeKCcj+j0bJ5v8lGfvHGfAJV0C0xiYiLios2nyClOvIQQQv2YN84BWDM6F oWWJoExd51iL94RF/hRBJ/WXFFBfg/r30ULW5/dwhdp8cl1wPW+cY4srEHlTluZZ KzoCujD5rhofJsS7tV9xs6tV4K/4/enknogHT0iWT89B/svAR52SUkRiH22mvhps cpVVOsIP+ojmvmWW2p8jvq9ml5Ls6Z24AyRZ+OVZnVZGzAC9Z+NhwcjJaDWiWU2+ lgVuvIorYWZLHGzd6syrZKxKxdJRZ5ibYe2SN3Q2wlicZRPS5cXOgTfKoSduMTvY GoNRPHQSPeuBNoq4pFUWN8EqH3oijN+PqdnIwc0HMaDyxR0AnISZiLwYWnY8WlqB pdFV6tGsvsM6WbXTIqJ/3diRjRMl2/rtz6o8b3a1K7eL+PB7v5T0I+h9pwJQM2+a WPffDIaY2Odnt9Axebu5 =CZ0t -END PGP SIGNATURE-
Re: [HACKERS] [PATCH] libpq: Allow specifying multiple host names to try to connect to
Steve Singer st...@ssinger.info writes: On 04/19/2015 11:18 AM, Mikko Tiihonen wrote: Hi, I would like allow specifying multiple host names for libpq to try to connecting to. This is currently only supported if the host name resolves to multiple addresses. Having the support for it without complex dns setup would be much easier. Example: psql -h dbslave,dbmaster -p 5432 dbname psql 'postgresql://dbslave,dbmaster:5432/dbname' Here the idea is that without any added complexity of pgbouncer or similar tool I can get any libpq client to try connecting to multiple nodes until one answers. I have added the similar functionality to the jdbc driver few years ago. Because libpq almost supported the feature already the patch is very simple. I just split the given host name and do a dns lookup on each separately, and link the results. If you configure a port that does not exist you can see the libpq trying to connect to multiple hosts. psql -h 127.0.0.2,127.0.0.3, -p psql: could not connect to server: Connection refused Is the server running on host 127.0.0.2,127.0.0.3 (127.0.0.2) and accepting TCP/IP connections on port ? could not connect to server: Connection refused Is the server running on host 127.0.0.2,127.0.0.3 (127.0.0.3) and accepting TCP/IP connections on port ? Further improvement would be to add a connection parameter to limit connection only to master (writable) or to slave (read only). Another concern I have is that the server needs to be listening on the same port against all hosts this means that in a development environment we can't fully test this feature using just a single server. I can't think of anything else we have in core that couldn't be tested on a single server (all the replication stuff works fine if you setup two separate clusters on different ports on one server) You update the documentation just for psql but your change effects any libpq application if we go forward with this patch we should update the documentation for libpq as well. This approach seems to work with the url style of conninfo For example postgres://some-down-host.info,some-other-host.org:5435/test1 seems to work as expected but I don't like that syntax I would rather see postgres://some-down-host.info:5435/test1,postgres://some-other-host.org:5435/test1 This would be a more invasive change but I think the syntax is more usable. I agree with this; it seems to me that it's more powerful to be able to specify complete urls for when they may differ. For the non-url case though, I don't see a clean way of doing this. We could always, e.g., locally bind port specification to the closest host specification, but that seems nasty, and is still less powerful than passing urls (or we could just do the same for all parameters, but that's just a mess). Might it be reasonable to only allow the multi-host syntax in the url-style and not otherwise? signature.asc Description: PGP signature
Re: [HACKERS] Improving log capture of TAP tests with IPC::Run
On 07/08/2015 11:26 AM, Michael Paquier wrote: On Wed, Jul 8, 2015 at 6:10 AM, Heikki Linnakangas hlinn...@iki.fi wrote: * whenever a test case is reported as success/fail. Just to be sure, does this concern the ok/not ok messages printed out by each test run? Or is it a custom message that you have in mind? Right. It would be nice to have the same output that's printed to the console also in the log. Looking at the manual page of Test::More, it looks like you could change where the perl script's STDOUT and STDERR point to, because Test::More takes a copy of them (when? at program startup I guess..). That would be much more convenient than decorating every run call with logfile. Hm. There are two types of logs we want to capture: 1) stdout and stderr from the subprocesses kicked by IPC::Run::run 2) Status messages written in the log file by the process running the tests. Perhaps we could redirect the output of stdout and stderr but I think that this is going to need an fd open from the beginning of the test until the end, with something like that: open my $test_logfile_fd, '', $test_logfile; *STDOUT = $test_logfile_fd; *STDERR = $test_logfile_fd; While that would work on OSX and Linux for sure, I suspect that this will not on Windows where two concurrent processes cannot write to the same file. Hmm, as long as you make sure all the processes use the same filehandle, rather than open the log file separately, I think it should work. But it's Windows, so who knows.. I came up with the attached, which does that. It also plays some tricks with perl tie, to copy the ok - ... lines that go to the console, to the log. I tried to test that on my Windows system, but I don't have IPC::Run installed. How did you get that on Windows? Can you test this? Also, the output can be correctly captured by just appending that to a couple of places: [ '', $test_logfile, '21'] And this solution proves to work as well on Windows... Yeah, but that's tedious. - Heikki commit 70b922632412c2719c0d21b00a4abf0cb6062f5b Author: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Wed Jul 8 18:14:16 2015 +0300 Improve logging in TAP tests WIP diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 8eab178..8d1250d 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -337,6 +337,7 @@ cd $(srcdir) TESTDIR='$(CURDIR)' PATH=$(bindir):$$PATH PGPORT='6$(DEF_PGPOR endef define prove_check +rm -rf $(srcdir)/tmp_check/log cd $(srcdir) TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl endef diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index c8c9250..e47c3a0 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -30,7 +30,7 @@ print HBA local replication all trust\n; print HBA host replication all 127.0.0.1/32 trust\n; print HBA host replication all ::1/128 trust\n; close HBA; -system_or_bail 'pg_ctl', '-s', '-D', $tempdir/pgdata, 'reload'; +system_or_bail 'pg_ctl', '-D', $tempdir/pgdata, 'reload'; command_fails( [ 'pg_basebackup', '-D', $tempdir/backup ], diff --git a/src/bin/pg_controldata/t/001_pg_controldata.pl b/src/bin/pg_controldata/t/001_pg_controldata.pl index a4180e7..e36fa2d 100644 --- a/src/bin/pg_controldata/t/001_pg_controldata.pl +++ b/src/bin/pg_controldata/t/001_pg_controldata.pl @@ -11,6 +11,6 @@ program_options_handling_ok('pg_controldata'); command_fails(['pg_controldata'], 'pg_controldata without arguments fails'); command_fails([ 'pg_controldata', 'nonexistent' ], 'pg_controldata with nonexistent directory fails'); -system_or_bail initdb -D '$tempdir'/data -A trust /dev/null; +system_or_bail 'initdb', '-D', $tempdir/data, '-A', 'trust'; command_like([ 'pg_controldata', $tempdir/data ], qr/checkpoint/, 'pg_controldata produces output'); diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl index 6c9ec5c..bcceb57d 100644 --- a/src/bin/pg_ctl/t/001_start_stop.pl +++ b/src/bin/pg_ctl/t/001_start_stop.pl @@ -36,4 +36,4 @@ command_ok([ 'pg_ctl', 'restart', '-D', $tempdir/data, '-w', '-m', 'fast' ], command_ok([ 'pg_ctl', 'restart', '-D', $tempdir/data, '-w', '-m', 'fast' ], 'pg_ctl restart with server running'); -system_or_bail 'pg_ctl', '-s', 'stop', '-D', $tempdir/data, '-m', 'fast'; +system_or_bail 'pg_ctl', 'stop', '-D', $tempdir/data, '-m', 'fast'; diff --git a/src/bin/pg_ctl/t/002_status.pl b/src/bin/pg_ctl/t/002_status.pl index 0558854..ec0a2a7 100644 --- a/src/bin/pg_ctl/t/002_status.pl +++ b/src/bin/pg_ctl/t/002_status.pl @@ -18,9 +18,9 @@ close CONF; command_exit_is([ 'pg_ctl', 'status', '-D', $tempdir/data ], 3, 'pg_ctl status with server not running'); -system_or_bail 'pg_ctl', '-s', '-l', $tempdir/logfile, '-D', +system_or_bail 'pg_ctl', '-l', $tempdir/logfile,
[HACKERS] Waits monitoring
Hello. Currently, PostgreSQL offers many metrics for monitoring. However, detailed monitoring of waits is still not supported yet. Such monitoring would let dba know how long backend waited for particular event and therefore identify bottlenecks. This functionality is very useful, especially for highload databases. Metric for waits monitoring are provided by many popular commercial DBMS. We currently have requests of this feature from companies migrating to PostgreSQL from commercial DBMS. Thus, I think it would be nice for PostgreSQL to have it too. Main problem of monitoring waits is that waits could be very short and it's hard to implement monitoring so that it introduce very low overhead. For instance, there were couple of tries to implement LWLocks monitoring for PostgreSQL: http://www.postgresql.org/message-id/flat/cag95seug-qxqzymwtk6wgg8hfezup3d6c+az4m_qzd+y+bf...@mail.gmail.com http://www.postgresql.org/message-id/flat/4fe8ca2c.3030...@uptime.jp#4fe8ca2c.3030...@uptime.jp Attached patch implements waits monitoring for PostgreSQL. Following of monitoring was implemented: 1) History of waits (by sampling) 2) Waits profiling 3) Trace backend waits to file Monitored waits: 1) HW Locks (lock.c) 2) LW Locks (lwlock.c) 3) IO (md.c) 3) Network (be-secure.c) 5) Latch (pg_latch.c) Two new GUC variables added: * waits_monitoring = on/off - turn on/off all monitoring * waits_flush_period = on/off - defines period in milliseconds, after that backend local waits will be flushed to shared memory ## Profiling Implementation: before some wait each process calls function `StartWait`. This function saves wait's data and remembers current time. After the wait process calls `StopWait` that increases count, calculates time of wait and saves this info to backend's local memory. Every `waits_flush_period` milliseconds collected data from local memory will be flushed to shared memory. In Unix-systems `gettimeofday` function is used that gives enough accuracy for measuring wait times in microseconds. `pg_stat_wait_get_profile` view show collected information from shared memory. At this all functions implemented in `pg_stat_wait` extension. In the future maybe it has the point to move them to base codebase. I used atomic functions to avoid locks in the backends. By using TAS operation backend can skip writing to shared memory, when it's reading by some query, so user always gets consistent data. I also did little refactoring in lwlock.c for lwlocks grouping support. Now LWLock contains `group` field that used for lwlock identification . Array `main_lwlock_groups` contains offsets of each group in main tranche. ## Waits history Implementation: collector background worker cycle through the list of processes (every `history_period` ms) and stores current timestamp and waits info to history. View `pg_stat_wait_history` can be used for reading the history. Sampling of history uses double buffering. Backend has two blocks for current waits, and when collector reads from one, backend writes to other. Like any sampling it can skip some waits, but by using this method backends and collector does not require locks and don't block each other. History GUC parameters: * shared_preload_libraries = 'pg_stat_wait.so' - for background worker that will be sample waits. * pg_stat_wait.history = on/off - turn on/off history recording * pg_stat_wait.history_size = how many records keep in history * pg_stat_wait.history_period - period in millseconds between the sampling ## Views * pg_stat_wait_history - waits history * pg_stat_wait_profile - profile * pg_stat_wait_current - current waits * pg_wait_events - full list of class and event of waits ## Functions * pg_wait_class_list() - returns wait classes * pg_wait_event_list() - returns wait events * pg_stat_wait_get_history() - returns history * pg_stat_wait_get_profile(backend_pid int4, reset bool) - returns current profile, `reset` parameter can be used for resetting data * pg_stat_wait_get_current(backend_pid int4) - current waits * pg_stat_wait_reset_profile() - resets profile If `backend_pid` is NULL, these functions returns information about all processes. ### Trace functions * pg_start_trace(backend_pid int4, filename cstring) - start waits timing to file * pg_is_in_trace(backend_pid int4) - returns true if tracing is on in process * pg_stop_trace(backend_pid int4) - stops waits tracing to file ## Testing I tested patch mostly on Linux and FreeBSD (Intel processors). PostgreSQL support very wide variety of platforms and OS. I hope community will help me with testing on other platforms. Configuration: Intel(R) Xeon(R) CPU X5675@3.07GHz, 24 cores, pgbench initialized with -S 200, fsync off, database on tmpfs. I got stable results only in select queries. With update queries results variety between runs is too wide to measure overhead of monitoring. Here is sample results: Monitoring off: [ildus@1-i-kurbangaliev postgrespro]$
Re: [HACKERS] Determine operator from it's function
On 7/4/15 12:33 AM, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: On 7/3/15 2:33 AM, Heikki Linnakangas wrote: On 07/03/2015 01:20 AM, Jim Nasby wrote: Is there a way to determine the operator that resulted in calling the operator function? I thought fcinfo-flinfo-fn_expr might get set to the OpExpr, but seems it can be a FuncExpr even when called via an operator... Don't think there is. Why do you need to know? I'd like to support arbitrary operators in variant. Why would you expect there to be multiple operators pointing at the same function? If there were multiple operators pointing at the same function, why would you need to distinguish them? ISTM that such a situation would necessarily mean that there was no distinction worthy of notice. Because frequently there *almost* is no distinction. Witness the large number of *_cmp functions and the 6 wrappers that normally accompany them. (The particular situation you are bitching about comes from the fact that eval_const_expressions's simplify_functions code deliberately ignores any distinction between operators and functions. But for its purposes, that is *correct*, and I will strongly resist any claim that it isn't. If you are unhappy then you defined your operators wrongly.) I'm not sure how you got 'bitching' out of what I said: I did initial testing and it looked like I was getting an OpExpr in fn_expr, but I think that's because I was using a real table to test with. When I do something like 'a' 'b' it looks like the operator gets written out of the plan. If that's indeed what's happening is there a hook I could use to change that behavior? All I need is a way to know what the original operator was. In this case an OpExpr would be easiest but presumably it wouldn't be difficult to turn a Name into an OpExpr. FWIW, if we had this then by my count we could get rid of ~130 wrapper functions: decibel@decina:[10:38]~/pgsql/HEAD/src/backend (master $%=)$git grep _cmp|grep PG_RETURN_BOOL|wc -l 130 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
On Wed, Jul 8, 2015 at 11:29 AM, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Jul 8, 2015 at 10:12 AM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/07/2015 10:22 PM, Michael Paquier wrote: On Mon, Jul 6, 2015 at 11:52 PM, Joe Conway m...@joeconway.com wrote: That explains why the first example works while the second does not. I'm not sure how hard it would be to fix that, but it appears that that is where we should focus. Wouldn't it be fine if we drop some of the functions proposed without impacting the feature? Most of the functions overlap with each other, making us see the limitations we see. Hence, wouldn't it be enough to just have this set of functions in the patch? dblink_get_result(text, bool, anyelement) dblink (text, text, boolean, anyelement) dblink_fetch (text, text, int, boolean, anyelement) I think new using function names is better especially if we are only going to support a subset. I have no idea what to call them however. Did someone else suggest dblink_any(), etc? I also think that the ultimately best solution is (what I believe to be spec compliant) SRF casting, but I guess that could be a task for a later day. totally agree. Even if we had SRF casting, OP's patch has value because of abstraction benefits. Also given that we are in an extension we can relax a bit about adding extra functions IMO. merlin I'm fine with separate functions, that's what I originally proposed to Joe way-back when I was trying to figure out if such a thing was possible. That would also allow me to move the rowtype parameter to the first parameter, much more in line with other polymorphic functions. And that *might* resolve the parameter ambiguity issue Questions: Would moving rowtype to the first parameter resolve the parameter ambiguity issue? Would we want new function names anyway? How much of a goal is reducing function count? The following suffixes all make a degree of sense to me: _any() _to_row() _to_rowtype() _to_recordset()-- borrowing from json[b]_to_recordsset() and json[b]_populate_recordset(), the first functions polymorphic functions to come to mind. IMO, _to_recordset() would win on POLA, and _any() would win on terse.
Re: [HACKERS] dblink: add polymorphic functions.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/08/2015 08:51 AM, Corey Huinker wrote: Questions: Would moving rowtype to the first parameter resolve the parameter ambiguity issue? Not for the existing functions but with new functions I don't think it matters. You would know to always ignore either the first or last argument when determining which variant was wanted. Would we want new function names anyway? Yes, see above How much of a goal is reducing function count? Do you mean reducing number of C functions or SQL functions? The following suffixes all make a degree of sense to me: _any() _to_row() _to_rowtype() _to_recordset()-- borrowing from json[b]_to_recordsset() and json[b]_populate_recordset(), the first functions polymorphic functions to come to mind. IMO, _to_recordset() would win on POLA, and _any() would win on terse. The problem is that jsonb_to_recordset() does not behave like these new dblink functions in that it requires a runtime column definition list. It might be better to use something completely different, so I think _any() or maybe _to_any() is better. Any other ideas for names out there? Joe - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVnU4kAAoJEDfy90M199hlxtIP/i9+ksY4Mq9lN0Ne+moLs3My KY1lyQqXkynJpnbYArBPnmC8ejso/f9FpAfkoI8jA+YfzVLgN38aM/H5d6Kvpt2R mA/Dpw7OpUrbZCsUT6JO7p0WRTqX2lNqX9FausgVXCTDXkYvKm3Vc3AgOUPVOfgv BHuls6xlYtVbxJsQ3zm//sONwE6SmBexi6LWlzJiH3+UjfjYOEs2yj5aCObac+2+ 2umrc3vfAPoCcXSEcMOwHYWBkwu1pxwtHrGObZYUt6pHCmNsj4o67AQv6z64x6fl bRgvy/GOz2ict1DOs7kWha7Fvr9xC3FTyXxWaIpePo5mT92AzILp1L5+YgGZTxaA jQISashYH5EAob7ow3hRJL2Gey7iQzgpHBClDlb/Tv4kDWV6BaBWaeQL2AUHEEGN Y81hrQ6nsKnAQpPGUqvN0VHDUHn81S3T1SJZRptennGVqvuHrKwyVQj0yJo3wfcT itnFZS2XmhNY11uVUZnZ0lZMClLn2jjedmNrfSHQPm+5EZBoW2B2QoXe/J/Oci1S UEfl4IJyNgjAxYiG+7koAlo5DrxTPupVLWnoMxao+U71OOvU2Tzz6Jx/qV9+Rs2j 2web3tAKGyytRK/C+OO/dgjQsOdvR9D6lLp6l3GuC4q06KWe35bZuNJzACqQQaHj 7s3oZuTRKWqu4fHXW1om =RxAo -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] configure can't detect proper pthread flags
On 07/08/2015 04:38 PM, Tom Lane wrote: Heikki Linnakangas hlinn...@iki.fi writes: I suggest that we revert that work-around for that GCC bug, and stop testing the pthread flags as soon as we find one that works. OK ... Then we can also remove the test for whether the compiler produces any warnings. Don't see how that follows? That test was added after the GCC-bug workaround, because that workaround caused warnings. The upstream philosophy is to have a list of flags, and try them in order until you find one that works. With the workaround that we added, after it finds one flag that makes the pthread test program to compile, it adds every flag in the list to the command line as long as they donn't make the test program to fail again. For example, after it found out that -pthread makes the compilation to work, it appended -pthreads -mthreads -mt -lpthread --thread-safe to PTHREAD_CFLAGS, as long as none of those caused a compiler error. They could cause warnings though. That's why we had to add the test to check for warnings. The only scenario where you might now get warnings if we switch to upstream version, and didn't before, is if one of the flags makes pthreads to work, but also creates compiler warnings, while another flag later in the list would make it work without warnings. That's not totally inconceivable, but I doubt it happens. In any case, there's nothing PostgreSQL-specific about that, so if that happens, I'd like to find out so that we can complain to the upstream. I'll commit the upstream version, and we'll see what the buildfarm thinks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] configure can't detect proper pthread flags
Heikki Linnakangas hlinn...@iki.fi writes: The only scenario where you might now get warnings if we switch to upstream version, and didn't before, is if one of the flags makes pthreads to work, but also creates compiler warnings, while another flag later in the list would make it work without warnings. That's not totally inconceivable, but I doubt it happens. In any case, there's nothing PostgreSQL-specific about that, so if that happens, I'd like to find out so that we can complain to the upstream. Actually, it looks like the modern version of ax_pthread.m4 adds -Werror while testing, so that this should not be an issue anyway (at least on compilers that accept -Werror). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fillfactor for GIN indexes
In dataPlaceToPageLeaf-function: if (append) { /* * Even when appending, trying to append more items than will fit is * not completely free, because we will merge the new items and old * items into an array below. In the best case, every new item fits in * a single byte, and we can use all the free space on the old page as * well as the new page. For simplicity, ignore segment overhead etc. */ maxitems = Min(maxitems, freespace + GinDataPageMaxDataSize); } Hmm. So after splitting the page, there is freespace + GinDataPageMaxDataSize bytes available on both pages together. freespace has been adjusted with the fillfactor, but GinDataPageMaxDataSize is not. So this overshoots, because when leafRepackItems actually distributes the segments on the pages, it will fill both pages only up to the fillfactor. This is an upper bound, so that's harmless, it only leads to some unnecessary work in dealing with the item lists. But I think that should be: maxitems = Min(maxitems, freespace + leaf-maxdatasize); else { /* * Calculate a conservative estimate of how many new items we can fit * on the two pages after splitting. * * We can use any remaining free space on the old page to store full * segments, as well as the new page. Each full-sized segment can hold * at least MinTuplesPerSegment items */ int nnewsegments; nnewsegments = freespace / GinPostingListSegmentMaxSize; nnewsegments += GinDataPageMaxDataSize / GinPostingListSegmentMaxSize; maxitems = Min(maxitems, nnewsegments * MinTuplesPerSegment); } This branch makes the same mistake, but this is calculating a lower bound. It's important that maxitems is not set to higher value than what actually fits on the page, otherwise you can get an ERROR later in the function, when it turns out that not all the items actually fit on the page. The saving grace here is that this branch is never taken when building a new index, because index build inserts all the TIDs in order, but that seems pretty fragile. Should use leaf-maxdatasize instead of GinDataPageMaxDataSize here too. But that can lead to funny things, if fillfactor is small, and BLCKSZ is small too. With the minimums, BLCKSZ=1024 and fillfactor=0.2, the above formula will set nnewsegments to zero. That's not going to end up well. The problem is that maxdatasize becomes smaller than GinPostingListSegmentMaxSize, which is a problem. I think GinGetMaxDataSize() needs to make sure that the returned value is always = GinPostingListSegmentMaxSize. Now that we have a fillfactor, shouldn't we replace the 75% heuristic later in that function, when inserting to the rightmost page rather than building it from scratch? In B-tree, the fillfactor is applied to that case too. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
On Wed, Jul 8, 2015 at 4:39 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2015-07-08 8:35 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com: 2015-07-07 18:15 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Tue, Jul 7, 2015 at 9:04 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for what is RAISE EXCEPTION - first or second case? First: RAISE (unless caught) is no different than any other kind of error. On Tue, Jul 7, 2015 at 9:42 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/07/2015 04:56 PM, Merlin Moncure wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for Java's exception handling is so different from PostgreSQL's errors that I don't think there's much to be learned from that. But I'll bite: First of all, Java's exceptions always contain a stack trace. It's up to you when you catch an exception to decide whether to print it or not. try { ... } catch (Exception e) { e.printStackTrace() } is fairly common, actually. There is nothing like a NOTICE in Java, i.e. an exception that's thrown but doesn't affect the control flow. The best I can think of is System.out.println(), which of course has no stack trace attached to it. exactly. Perhaps you're arguing that NOTICE is more like printing to stderr, and should never contain any context information. I don't think that would be an improvement. It's very handy to have the context information available if don't know where a NOTICE is coming from, even if in most cases you're not interested in it. That's exactly what I'm arguing. NOTICE (and WARNING) are for printing out information to client side logging; it's really the only tool we have for that purpose and it fits that role perfectly. Of course, you may want to have NOTICE print context, especially when debugging, but some control over that would be nice and in most cases it's really not necessary. I really don't understand the objection to offering control over that behavior although I certainly understand wanting to keep the default behavior as it currently is. This is really quite different from a programming language's exception handling. First, there's a server, which produces the errors, and a separate client, which displays them. You cannot catch an exception in the client. BTW, let me throw in one use case to consider. We've been talking about psql, and what to print, but imagine a more sophisticated client like pgAdmin. It's not limited to either printing the context or not. It could e.g. hide the context information of all messages when they occur, but if you double-click on it, it's expanded to show all the context, location and all. You can't do that if the server doesn't send the context information in the first place. I would be happy to show you the psql redirected output logs from my nightly server processes that spew into the megabytes because of logging various high level steps (did this, did that). Oh, I believe you. I understand what the problem is, we're only talking about how best to address it. Yeah. For posterity, a psql based solution would work fine for me, but a server side solution has a lot of advantages (less protocol chatter, more configurability, keeping libpq/psql light). After some work on reduced version of plpgsql.min_context patch I am inclining to think so ideal solution needs more steps - because this issue has more than one dimension. There are two independent issues: 1. old plpgsql workaround that reduced the unwanted call stack info for RAISE NOTICE. Negative side effect of this workaround is missing context info about the RAISE command that raises the exception. We know a function, but we don't know a line of related RAISE statement. The important is fact, so NOTICE doesn't bubble to up. So this workaround was relative successful without to implement some filtering on client or log side. I found a other issue of this workaround - it doesn't work well for nested SQL statement call, when inner statement invoke RAISE NOTICE. In this case a context is showed too. postgres=# insert into xx values(60); NOTICE: NOTICE: trigger_func(before_ins_stmt) called: action = INSERT, when
Re: [HACKERS] Hashable custom types
Paul Ramsey pram...@cleverelephant.ca writes: On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca wrote: Is it possible to make custom types hashable? There's no hook in the CREATE TYPE call for a hash function, but can one be hooked up somewhere else? In an operator? See 35.14.6., System Dependencies on Operator Classes Coming back to this, I created an appropriate opclass... CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry) RETURNS boolean AS '$libdir/postgis-2.2', 'lwgeom_hash_eq' LANGUAGE 'c' IMMUTABLE STRICT; Why are you creating a separate equality operator, rather than just using the type's existing equality operator (I assume it's got one) in the hash opclass? It still says I lack the secret sauce... ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hashable custom types
On July 8, 2015 at 1:36:49 PM, Tom Lane (t...@sss.pgh.pa.us) wrote: Paul Ramsey pram...@cleverelephant.ca writes: UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. So… what does that mean for types that have both btree and hash equality operators? Don’t all the built-ins also have this problem? What I'm asking is why it would possibly be sensible to have different notions of equality for hash and btree. In every existing type that has both btree and hash opclasses, the equality members of those opclasses are *the same operator*. You don't really want UNION giving different answers depending on which implementation the planner happens to pick, do you? Well, that’s where things get pretty darn ugly. For reasons of practicality at the time, our equality btree operator ‘=‘ got tied to ‘bounding box equals’ way back in the mists of pre-history. (Basically the reasoning was “all our index work is about bounding boxes!”. I must admit, given my understanding of the zen of PostgreSQL now (and the features that PostgreSQL now has) that would not happen now.) So… yeah. Probably ‘=‘ should be something else, probably something quite expensive but logically defensible like a geometric equality test, but it’s not, and we have lots of third part stuff layered on top of us that expects existing semantics. If getting the objects to UNION means that the btree and hash ops have to be the same, then that just means I’m SOL and will revert to my hack of just casting the objects to bytea to force them to UNION in the recursive CTE. P.
Re: [HACKERS] PL/pgSQL, RAISE and error context
2015-07-08 8:35 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com: 2015-07-07 18:15 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Tue, Jul 7, 2015 at 9:04 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for what is RAISE EXCEPTION - first or second case? First: RAISE (unless caught) is no different than any other kind of error. On Tue, Jul 7, 2015 at 9:42 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/07/2015 04:56 PM, Merlin Moncure wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for Java's exception handling is so different from PostgreSQL's errors that I don't think there's much to be learned from that. But I'll bite: First of all, Java's exceptions always contain a stack trace. It's up to you when you catch an exception to decide whether to print it or not. try { ... } catch (Exception e) { e.printStackTrace() } is fairly common, actually. There is nothing like a NOTICE in Java, i.e. an exception that's thrown but doesn't affect the control flow. The best I can think of is System.out.println(), which of course has no stack trace attached to it. exactly. Perhaps you're arguing that NOTICE is more like printing to stderr, and should never contain any context information. I don't think that would be an improvement. It's very handy to have the context information available if don't know where a NOTICE is coming from, even if in most cases you're not interested in it. That's exactly what I'm arguing. NOTICE (and WARNING) are for printing out information to client side logging; it's really the only tool we have for that purpose and it fits that role perfectly. Of course, you may want to have NOTICE print context, especially when debugging, but some control over that would be nice and in most cases it's really not necessary. I really don't understand the objection to offering control over that behavior although I certainly understand wanting to keep the default behavior as it currently is. This is really quite different from a programming language's exception handling. First, there's a server, which produces the errors, and a separate client, which displays them. You cannot catch an exception in the client. BTW, let me throw in one use case to consider. We've been talking about psql, and what to print, but imagine a more sophisticated client like pgAdmin. It's not limited to either printing the context or not. It could e.g. hide the context information of all messages when they occur, but if you double-click on it, it's expanded to show all the context, location and all. You can't do that if the server doesn't send the context information in the first place. I would be happy to show you the psql redirected output logs from my nightly server processes that spew into the megabytes because of logging various high level steps (did this, did that). Oh, I believe you. I understand what the problem is, we're only talking about how best to address it. Yeah. For posterity, a psql based solution would work fine for me, but a server side solution has a lot of advantages (less protocol chatter, more configurability, keeping libpq/psql light). After some work on reduced version of plpgsql.min_context patch I am inclining to think so ideal solution needs more steps - because this issue has more than one dimension. There are two independent issues: 1. old plpgsql workaround that reduced the unwanted call stack info for RAISE NOTICE. Negative side effect of this workaround is missing context info about the RAISE command that raises the exception. We know a function, but we don't know a line of related RAISE statement. The important is fact, so NOTICE doesn't bubble to up. So this workaround was relative successful without to implement some filtering on client or log side. I found a other issue of this workaround - it doesn't work well for nested SQL statement call, when inner statement invoke RAISE NOTICE. In this case a context is showed too. postgres=# insert into xx values(60); NOTICE: NOTICE: trigger_func(before_ins_stmt) called: action = INSERT, when = BEFORE, level = STATEMENT CONTEXT: SQL statement INSERT INTO boo VALUES(30) PL/pgSQL function hh()
Re: [HACKERS] PL/pgSQL, RAISE and error context
2015-07-08 23:46 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Wed, Jul 8, 2015 at 4:39 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2015-07-08 8:35 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com: 2015-07-07 18:15 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Tue, Jul 7, 2015 at 9:04 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for what is RAISE EXCEPTION - first or second case? First: RAISE (unless caught) is no different than any other kind of error. On Tue, Jul 7, 2015 at 9:42 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/07/2015 04:56 PM, Merlin Moncure wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for Java's exception handling is so different from PostgreSQL's errors that I don't think there's much to be learned from that. But I'll bite: First of all, Java's exceptions always contain a stack trace. It's up to you when you catch an exception to decide whether to print it or not. try { ... } catch (Exception e) { e.printStackTrace() } is fairly common, actually. There is nothing like a NOTICE in Java, i.e. an exception that's thrown but doesn't affect the control flow. The best I can think of is System.out.println(), which of course has no stack trace attached to it. exactly. Perhaps you're arguing that NOTICE is more like printing to stderr, and should never contain any context information. I don't think that would be an improvement. It's very handy to have the context information available if don't know where a NOTICE is coming from, even if in most cases you're not interested in it. That's exactly what I'm arguing. NOTICE (and WARNING) are for printing out information to client side logging; it's really the only tool we have for that purpose and it fits that role perfectly. Of course, you may want to have NOTICE print context, especially when debugging, but some control over that would be nice and in most cases it's really not necessary. I really don't understand the objection to offering control over that behavior although I certainly understand wanting to keep the default behavior as it currently is. This is really quite different from a programming language's exception handling. First, there's a server, which produces the errors, and a separate client, which displays them. You cannot catch an exception in the client. BTW, let me throw in one use case to consider. We've been talking about psql, and what to print, but imagine a more sophisticated client like pgAdmin. It's not limited to either printing the context or not. It could e.g. hide the context information of all messages when they occur, but if you double-click on it, it's expanded to show all the context, location and all. You can't do that if the server doesn't send the context information in the first place. I would be happy to show you the psql redirected output logs from my nightly server processes that spew into the megabytes because of logging various high level steps (did this, did that). Oh, I believe you. I understand what the problem is, we're only talking about how best to address it. Yeah. For posterity, a psql based solution would work fine for me, but a server side solution has a lot of advantages (less protocol chatter, more configurability, keeping libpq/psql light). After some work on reduced version of plpgsql.min_context patch I am inclining to think so ideal solution needs more steps - because this issue has more than one dimension. There are two independent issues: 1. old plpgsql workaround that reduced the unwanted call stack info for RAISE NOTICE. Negative side effect of this workaround is missing context info about the RAISE command that raises the exception. We know a function, but we don't know a line of related RAISE statement. The important is fact, so NOTICE doesn't bubble to up. So this workaround was relative successful without to implement some filtering on client or log side. I found a other issue of this workaround - it doesn't work well for nested SQL statement
Re: [HACKERS] Freeze avoidance of very large table.
On Fri, Jul 3, 2015 at 1:25 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote: On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote: Also, the flags of each heap page header might be set PD_ALL_FROZEN, as well as all-visible Is it possible to have VM bits set to frozen but not visible? The description makes those two states sound independent of each other. Are they? Or not? Do we test for an impossible state? It's impossible to have VM bits set to frozen but not visible. These bit are controlled independently. But eventually, when all-frozen bit is set, all-visible is also set. If that combination is currently impossible, could it be used indicate that the page is all empty? Having a crash-proof bitmap of all-empty pages would make vacuum truncation scans much more efficient. Cheers, Jeff
Re: [HACKERS] Hashable custom types
Paul Ramsey pram...@cleverelephant.ca writes: UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. So⦠what does that mean for types that have both btree and hash equality operators? Donât all the built-ins also have this problem? What I'm asking is why it would possibly be sensible to have different notions of equality for hash and btree. In every existing type that has both btree and hash opclasses, the equality members of those opclasses are *the same operator*. You don't really want UNION giving different answers depending on which implementation the planner happens to pick, do you? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hashable custom types
It still says I lack the secret sauce... ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. UNION will preferentially glom onto the btree equality operator, if memory serves. If that isn't also the hash equality operator, things won't work pleasantly. So… what does that mean for types that have both btree and hash equality operators? Don’t all the built-ins also have this problem?
[HACKERS] Postmaster's handing of startup-process crash is busted
My Salesforce colleagues observed a failure mode in which a bug in the crash recovery logic caused the startup process to get a SEGV while trying to recover after a backend crash. The postmaster should have given up at that point, but instead it kept on respawning the startup process, which of course just kept on crashing. The cause seems to be a bug in my old commit 442231d7f71764b8: if FatalError has been set, we suppose that the startup process died because we SIGQUIT'd it, which is simply wrong in this case. AFAICS the only way to fix this properly is to explicitly track whether we sent the startup process a kill signal. I started out with a separate boolean, but after awhile decided that it'd be better to invent an enum representing the startup process state, which could also subsume the existing but rather ad-hoc flag RecoveryError. So that led me to the attached patch. Any thoughts/objections? regards, tom lane diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index df8037b..77636a2 100644 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *** static pid_t StartupPID = 0, *** 249,254 --- 249,265 PgStatPID = 0, SysLoggerPID = 0; + /* Startup process's status */ + typedef enum + { + STARTUP_NOT_RUNNING, + STARTUP_RUNNING, + STARTUP_SIGNALED, /* we sent it a SIGQUIT or SIGKILL */ + STARTUP_CRASHED + } StartupStatusEnum; + + static StartupStatusEnum StartupStatus = STARTUP_NOT_RUNNING; + /* Startup/shutdown state */ #define NoShutdown 0 #define SmartShutdown 1 *** static pid_t StartupPID = 0, *** 258,264 static int Shutdown = NoShutdown; static bool FatalError = false; /* T if recovering from backend crash */ - static bool RecoveryError = false; /* T if WAL recovery failed */ /* * We use a simple state machine to control startup, shutdown, and --- 269,274 *** static bool RecoveryError = false; /* T *** 301,308 * states, nor in PM_SHUTDOWN states (because we don't enter those states * when trying to recover from a crash). It can be true in PM_STARTUP state, * because we don't clear it until we've successfully started WAL redo. - * Similarly, RecoveryError means that we have crashed during recovery, and - * should not try to restart. */ typedef enum { --- 311,316 *** PostmasterMain(int argc, char *argv[]) *** 1246,1251 --- 1254,1260 */ StartupPID = StartupDataBase(); Assert(StartupPID != 0); + StartupStatus = STARTUP_RUNNING; pmState = PM_STARTUP; /* Some workers may be scheduled to start now */ *** reaper(SIGNAL_ARGS) *** 2591,2596 --- 2600,2606 if (Shutdown NoShutdown (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus))) { + StartupStatus = STARTUP_NOT_RUNNING; pmState = PM_WAIT_BACKENDS; /* PostmasterStateMachine logic does the rest */ continue; *** reaper(SIGNAL_ARGS) *** 2600,2605 --- 2610,2616 { ereport(LOG, (errmsg(shutdown at recovery target))); + StartupStatus = STARTUP_NOT_RUNNING; Shutdown = SmartShutdown; TerminateChildren(SIGTERM); pmState = PM_WAIT_BACKENDS; *** reaper(SIGNAL_ARGS) *** 2624,2639 /* * After PM_STARTUP, any unexpected exit (including FATAL exit) of * the startup process is catastrophic, so kill other children, ! * and set RecoveryError so we don't try to reinitialize after ! * they're gone. Exception: if FatalError is already set, that ! * implies we previously sent the startup process a SIGQUIT, so * that's probably the reason it died, and we do want to try to * restart in that case. */ if (!EXIT_STATUS_0(exitstatus)) { ! if (!FatalError) ! RecoveryError = true; HandleChildCrash(pid, exitstatus, _(startup process)); continue; --- 2635,2652 /* * After PM_STARTUP, any unexpected exit (including FATAL exit) of * the startup process is catastrophic, so kill other children, ! * and set StartupStatus so we don't try to reinitialize after ! * they're gone. Exception: if StartupStatus is STARTUP_SIGNALED, ! * then we previously sent the startup process a SIGQUIT; so * that's probably the reason it died, and we do want to try to * restart in that case. */ if (!EXIT_STATUS_0(exitstatus)) { ! if (StartupStatus == STARTUP_SIGNALED) ! StartupStatus = STARTUP_NOT_RUNNING; ! else ! StartupStatus = STARTUP_CRASHED; HandleChildCrash(pid, exitstatus, _(startup process)); continue; *** reaper(SIGNAL_ARGS) *** 2642,2647 --- 2655,2661 /* * Startup succeeded, commence normal operations */ + StartupStatus = STARTUP_NOT_RUNNING; FatalError =
[HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement
On Wed, Jul 8, 2015 at 1:27 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Fri, Jul 3, 2015 at 9:29 AM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: On Fri, Jul 3, 2015 at 4:48 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Apr 2, 2015 at 3:24 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 25, 2015 at 9:46 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: http://www.postgresql.org/message-id/ca+tgmozm+-0r7h0edpzzjbokvvq+gavkchmno4fypveccw-...@mail.gmail.com I like the idea of the feature a lot, but the proposal to which you refer here mentions some problems, which I'm curious how you think you might solve. (I don't have any good ideas myself, beyond what I mentioned there.) You're right, and we have another design (steps 1 and 2 was implemented last year): *** ALTER TABLE changes 1) ATController 1.1) Acquire an AccessExclusiveLock (src/backend/commands/tablecmds.c - AlterTableGetLockLevel:3023) 2) Prepare to change relpersistence (src/backend/commands/tablecmds.c - ATPrepCmd:3249-3270) • check temp table (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11074) • check foreign key constraints (src/backend/commands/tablecmds.c - ATPrepChangePersistence:11102) 3) FlushRelationBuffers, DropRelFileNodeBuffers and smgrimmedsync (MAIN_FORKNUM, FSM_FORKNUM, VISIBILITYMAP_FORKNUM and INIT_FORKNUM if exists) 4) Create a new fork called TRANSIENT INIT FORK: • from Unlogged to Logged (create _initl fork) (INIT_TO_LOGGED_FORKNUM) ∘ new forkName (src/common/relpath.c) called _initl ∘ insert XLog record to drop it if transaction abort • from Logged to Unlogged (create _initu fork) (INIT_TO_UNLOGGED_FORKUM) ∘ new forkName (src/common/relpath.c) called _initu ∘ insert XLog record to drop it if transaction abort AFAIU, while reading WAL, the server doesn't know whether the transaction that produced that WAL record aborted or committed. It's only when it sees a COMMIT/ABORT record down the line, it can confirm whether the transaction committed or aborted. So, one can only redo the things that WAL tells have been done. We can not undo things based on the WAL. We might record this fact somewhere while reading the WAL and act on it once we know the status of the transaction, but I do not see that as part of this idea. This comment applies to all the steps inserting WALs for undoing things. Even if I xlog the create/drop of the transient fork? Yes. 5) Change the relpersistence in catalog (pg_class-relpersistence) (heap, toast, indexes) 6) Remove/Create INIT_FORK • from Unlogged to Logged ∘ remove the INIT_FORK and INIT_TO_LOGGED_FORK adding to the pendingDeletes queue • from Logged to Unlogged ∘ remove the INIT_TO_UNLOGGED_FORK adding to the pendingDeletes queue ∘ create the INIT_FORK using heap_create_init_fork ∘ insert XLog record to drop init fork if the transaction abort *** CRASH RECOVERY changes 1) During crash recovery (src/backend/access/transam/xlog.c:6507:ResetUnloggedRelations) This operation is carried out in two phases: one before replaying WAL records and second after that. Are you sure that the WALs generated for the unlogged or logged forks, as described above, have been taken care of? Hummm... actually no... • if the transient fork _initl exists then ∘ drop the transient fork _initl ∘ if the init fork doesn't exist then create it ∘ reset relation • if the transient fork _initu exists then ∘ drop the transient fork _initl ∘ if the init fork exists then drop it ∘ don't reset the relation Consider case of converting unlogged to logged. The server crashes after 6th step when both the forks are removed. During crash recovery, it will not see any of the fork and won't reset the unlogged table. Remember the table is still unlogged since the transaction was aborted due to the crash. So, it has to have an init fork to be reset on crash recovery. Similarly while converting from logged to unlogged. The server crashes in the 6th step after creating the INIT_FORK, during crash recovery the init fork will be seen and a supposedly logged table will be trashed. My intention for the 6th step is all recorded to wal, so if a crash occurs the recovery process clean the mess. AFAIU, PostgreSQL recovery is based on redoing WAL. What you described earlier, undoing based on the WAL does not fit in the current framework. The ideas in 1 and 2 might be better than having yet another init fork. The link for idea 2 is missing... Sorry for that. 2nd idea is what Robert described using control file. 1. http://www.postgresql.org/message-id/533d457a.4030...@vmware.com Unfortunately I completely missed this idea, but is also interesting. But I'll need to stamp in both ways (from
Re: [HACKERS] PL/pgSQL, RAISE and error context
2015-07-07 18:15 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Tue, Jul 7, 2015 at 9:04 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for what is RAISE EXCEPTION - first or second case? First: RAISE (unless caught) is no different than any other kind of error. On Tue, Jul 7, 2015 at 9:42 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/07/2015 04:56 PM, Merlin Moncure wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for Java's exception handling is so different from PostgreSQL's errors that I don't think there's much to be learned from that. But I'll bite: First of all, Java's exceptions always contain a stack trace. It's up to you when you catch an exception to decide whether to print it or not. try { ... } catch (Exception e) { e.printStackTrace() } is fairly common, actually. There is nothing like a NOTICE in Java, i.e. an exception that's thrown but doesn't affect the control flow. The best I can think of is System.out.println(), which of course has no stack trace attached to it. exactly. Perhaps you're arguing that NOTICE is more like printing to stderr, and should never contain any context information. I don't think that would be an improvement. It's very handy to have the context information available if don't know where a NOTICE is coming from, even if in most cases you're not interested in it. That's exactly what I'm arguing. NOTICE (and WARNING) are for printing out information to client side logging; it's really the only tool we have for that purpose and it fits that role perfectly. Of course, you may want to have NOTICE print context, especially when debugging, but some control over that would be nice and in most cases it's really not necessary. I really don't understand the objection to offering control over that behavior although I certainly understand wanting to keep the default behavior as it currently is. This is really quite different from a programming language's exception handling. First, there's a server, which produces the errors, and a separate client, which displays them. You cannot catch an exception in the client. BTW, let me throw in one use case to consider. We've been talking about psql, and what to print, but imagine a more sophisticated client like pgAdmin. It's not limited to either printing the context or not. It could e.g. hide the context information of all messages when they occur, but if you double-click on it, it's expanded to show all the context, location and all. You can't do that if the server doesn't send the context information in the first place. I would be happy to show you the psql redirected output logs from my nightly server processes that spew into the megabytes because of logging various high level steps (did this, did that). Oh, I believe you. I understand what the problem is, we're only talking about how best to address it. Yeah. For posterity, a psql based solution would work fine for me, but a server side solution has a lot of advantages (less protocol chatter, more configurability, keeping libpq/psql light). After some work on reduced version of plpgsql.min_context patch I am inclining to think so ideal solution needs more steps - because this issue has more than one dimension. There are two independent issues: 1. old plpgsql workaround that reduced the unwanted call stack info for RAISE NOTICE. Negative side effect of this workaround is missing context info about the RAISE command that raises the exception. We know a function, but we don't know a line of related RAISE statement. The important is fact, so NOTICE doesn't bubble to up. So this workaround was relative successful without to implement some filtering on client or log side. 2. second issue is general suppressing context info for interactive client or for log. These issues should be solved separately, because solution for @2 doesn't fix @1, and @1 is too local for @2. So what we can do? 1. remove current plpgsql workaround - and implement client_min_context and log_min_context 2. implement plpgsql.min_context, and client_min_context and log_min_context @1 is consistent, but isn't possible to configure same behave as was before @2 is
Re: [HACKERS] Hashable custom types
On Fri, Apr 25, 2014 at 4:54 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Apr 25, 2014 at 4:47 PM, Paul Ramsey pram...@cleverelephant.ca wrote: Is it possible to make custom types hashable? There's no hook in the CREATE TYPE call for a hash function, but can one be hooked up somewhere else? In an operator? See 35.14.6., System Dependencies on Operator Classes Coming back to this, I created an appropriate opclass... CREATE OR REPLACE FUNCTION geometry_hash_eq(geom1 geometry, geom2 geometry) RETURNS boolean AS '$libdir/postgis-2.2', 'lwgeom_hash_eq' LANGUAGE 'c' IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION geometry_hash(geom1 geometry) RETURNS integer AS '$libdir/postgis-2.2', 'lwgeom_hash' LANGUAGE 'c' IMMUTABLE STRICT; -- Availability: 0.9.0 CREATE OPERATOR == ( LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = geometry_hash_eq, COMMUTATOR = '==', RESTRICT = contsel, JOIN = contjoinsel ); CREATE OPERATOR CLASS hash_geometry_ops DEFAULT FOR TYPE geometry USING hash AS OPERATOR 1 == (geometry, geometry), FUNCTION 1 geometry_hash(geometry); I even tested that it as an index! create index hashidx on points using hash ( the_geom_webmercator); CREATE INDEX But when I run my recursive query... WITH RECURSIVE find_cluster(cartodb_id, cluster_id, geom) AS ( (SELECT points.cartodb_id, points.cartodb_id as cluster_id, points.the_geom_webmercator as geom FROM points WHERE points.cartodb_id in (select cartodb_id from points)) UNION (SELECT pts.cartodb_id, n.cluster_id, pts.the_geom_webmercator as geom FROM points pts JOIN find_cluster n ON ST_DWithin(n.geom, pts.the_geom_webmercator, 2) WHERE n.cartodb_id pts.cartodb_id) ) SELECT * FROM find_cluster; It still says I lack the secret sauce... ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. What's the sauce? Thanks! P -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] obsolete comment in elog.h
Hi the comment about NOTICE level in elog.h is obsolete Regards Pavel diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h new file mode 100644 index 8e90661..7684717 *** a/src/include/utils/elog.h --- b/src/include/utils/elog.h *** *** 33,40 * client regardless of client_min_messages, * but by default not sent to server log. */ #define NOTICE 18 /* Helpful messages to users about query ! * operation; sent to client and server log by ! * default. */ #define WARNING 19 /* Warnings. NOTICE is for expected messages * like implicit sequence creation by SERIAL. * WARNING is for unexpected messages. */ --- 33,40 * client regardless of client_min_messages, * but by default not sent to server log. */ #define NOTICE 18 /* Helpful messages to users about query ! * operation; sent to client and not to server ! * log by default. */ #define WARNING 19 /* Warnings. NOTICE is for expected messages * like implicit sequence creation by SERIAL. * WARNING is for unexpected messages. */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Waits monitoring
On Thu, Jul 9, 2015 at 1:52 AM, Ildus Kurbangaliev i.kurbangal...@postgrespro.ru wrote: Hello. Currently, PostgreSQL offers many metrics for monitoring. However, detailed monitoring of waits is still not supported yet. Such monitoring would let dba know how long backend waited for particular event and therefore identify bottlenecks. This functionality is very useful, especially for highload databases. Metric for waits monitoring are provided by many popular commercial DBMS. We currently have requests of this feature from companies migrating to PostgreSQL from commercial DBMS. Thus, I think it would be nice for PostgreSQL to have it too. Yes, It is good have such wait monitoring views in PostgreSQL. you can add this patch to the open commitfest. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory Accounting v11
On 7 July 2015 at 20:23, David Rowley david.row...@2ndquadrant.com wrote: On 7 July 2015 at 18:59, Jeff Davis pg...@j-davis.com wrote: One other thing that I don't remember seeing discussed would be to just store a List in each context which would store all of the mem_allocated's that need to be updated during each allocation and deallocation of memory. The list would be setup once when the context is created. When a child context is setup we'd just loop up the parent context chain and list_concat() all of the parent's lists onto the child's lists. Would this method add too much overhead when a context is deleted? Has this been explored already? That's a good idea, as it would be faster than recursing. Also, the number of parents can't change, so we can just make an array, and that would be quite fast to update. Unless I'm missing something, this sounds like a nice solution. It would require more space in MemoryContextData, but that might not be a problem. Oh an array is even better, but why more space? won't it just be a pointer to an array? It can be NULL if there's nothing to track. Sounds like it would be the same amount of space, at least on a 64 bit machine. I'd imagine that the first element of the array could just store the length of it. The type might be slightly wider for what you need for an array length, but this'll save having an extra field in the struct for it. Are you going to implement this? or are you happy with the single level context tracking is the right thing? I'm going to mark the patch as waiting on author for now. Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
[HACKERS] Fix to expose a GUC variable, Log_disconnections, to outside of postgres.c
Hi, I just found that Log_disconnections value has not been exposed to outside of postgres.c. Despite that, Log_connections has already been exposed. So, I'd like to add attached fix to touch the Log_disconnections value in other modules. Any comments? Regards, -- NAGAYASU Satoshi sn...@uptime.jp commit afffca193b69ea5eb42f5e7273d48a6a82eb7e37 Author: Satoshi Nagayasu sn...@uptime.jp Date: Thu Jul 9 03:42:31 2015 + Fix to expose Log_disconnections GUC variable to the outside postgres.c. diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h index d160304..4c76f30 100644 --- a/src/include/postmaster/postmaster.h +++ b/src/include/postmaster/postmaster.h @@ -25,6 +25,7 @@ extern bool ClientAuthInProgress; extern int PreAuthDelay; extern int AuthenticationTimeout; extern bool Log_connections; +extern bool Log_disconnections; extern bool log_hostname; extern bool enable_bonjour; extern char *bonjour_name; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix to expose a GUC variable, Log_disconnections, to outside of postgres.c
On 2015/07/09 13:06, Tom Lane wrote: Satoshi Nagayasu sn...@uptime.jp writes: I just found that Log_disconnections value has not been exposed to outside of postgres.c. Despite that, Log_connections has already been exposed. Why would an external module need to touch either one? To check settings of GUC variables from a shared preload library. For example, I'd like to print WARNING in _PG_init() when some GUC variable is disabled on postmaster startup. Regards, -- NAGAYASU Satoshi sn...@uptime.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solaris testers wanted for strxfrm() behavior
On Sat, Jun 27, 2015 at 11:57:30AM -0700, Peter Geoghegan wrote: On Sat, Jun 27, 2015 at 9:51 AM, Noah Misch n...@leadboat.com wrote: PostgreSQL 9.5 adds a strxfrm() call in bttext_abbrev_convert(), which does not account for the Solaris bug. I wish to determine whether that bug is still relevant today. If you have access to Solaris with the is_IS.ISO8859-1 locale installed (or root access to install it), please compile and run the attached test program on each Solaris version you have available. Reply here with the program's output. I especially need a report from Solaris 10, but reports from older and newer versions are valuable. Thanks. I did consider this. Sorry, but I must question the point of worrying about an ancient bug in Solaris. One function had a comment explaining its workaround for an OS bug, while another function ignored the same bug. That is always a defect in the comments at least; our code shall tell a uniform story about its API assumptions. I started this thread estimating that it would end with me merely deleting the comment. Thomas Munro and Tom Lane located evidence I hadn't found, evidence that changed the conclusion. When you have to worry about a standard library function blithely writing past the end of a buffer, when its C89 era interface must be passed the size of said buffer, where does it end? Don't worry about the possibility of such basic bugs until someone reports one. Once you have such a report, though, assume the interface behaves as last reported until you receive new evidence. We decide whether to work around such bugs based on factors like prevalence of affected systems, simplicity of the workaround, and ease of field diagnosis in the absence of the workaround. Non-factors include whether the bug is egregious, is contrary to documentation, or is contrary to a pertinent third-party specification. Those sources speak to which of the library provider or PostgreSQL was wrong, but they play little or no role in dictating the workarounds to deploy. I hope that clarifies things. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix to expose a GUC variable, Log_disconnections, to outside of postgres.c
Satoshi Nagayasu sn...@uptime.jp writes: I just found that Log_disconnections value has not been exposed to outside of postgres.c. Despite that, Log_connections has already been exposed. Why would an external module need to touch either one? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing vacuum truncation scans
On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes jeff.ja...@gmail.com wrote: Attached is a patch that implements the vm scan for truncation. It introduces a variable to hold the last blkno which was skipped during the forward portion. Any blocks after both this blkno and after the last inspected nonempty page (which the code is already tracking) must have been observed to be empty by the current vacuum. Any other process rendering the page nonempty are required to clear the vm bit, and no other process can set the bit again during the vacuum's lifetime. So if the bit is still set, the page is still empty without needing to inspect it. The patch looks good and it improves the vacuum truncation speed significantly. There is still the case of pages which had their visibility bit set by a prior vacuum and then were not inspected by the current one. Once the truncation scan runs into these pages, it falls back to the previous behavior of reading block by block backwards. So there could still be reason to optimize that fallback using forward-reading prefetch. The case, I didn't understand is that, how the current vacuum misses the page which was set by the prior vacuum? The page should be counted either in skipped_pages or in nonempty_pages. Is it possible that a page doesn't comes under these two categories and it is not empty also? If the above doesn't exist, then we can directly truncate the relation from the highest block number of either nonempty_pages or skipped_pages to till the end of the relation. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
On Thu, Jul 9, 2015 at 4:31 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Jul 3, 2015 at 1:25 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote: On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote: Also, the flags of each heap page header might be set PD_ALL_FROZEN, as well as all-visible Is it possible to have VM bits set to frozen but not visible? The description makes those two states sound independent of each other. Are they? Or not? Do we test for an impossible state? It's impossible to have VM bits set to frozen but not visible. These bit are controlled independently. But eventually, when all-frozen bit is set, all-visible is also set. If that combination is currently impossible, could it be used indicate that the page is all empty? Yeah, the status of that VM bits set to frozen but not visible is impossible, so we could use this status for another something status of the page. Having a crash-proof bitmap of all-empty pages would make vacuum truncation scans much more efficient. The empty page is always marked all-visible by vacuum today, it's not enough? Regards, -- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Comment nitpicking in predicate_refuted_by_recurse()
Attached find patch that makes certain comments in predicate_refuted_by_recurse() clearer, for example: -* AND-clause R= AND-clause if A refutes any of B's items +* AND-clause A R= AND-clause B if A refutes any of B's items The comment above the function is written using the latter style so adopt the same style inside the function. Thanks, Amit diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c index d9e49d1..c76421c 100644 --- a/src/backend/optimizer/util/predtest.c +++ b/src/backend/optimizer/util/predtest.c @@ -501,7 +501,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate) case CLASS_AND: /* - * AND-clause R= AND-clause if A refutes any of B's items + * AND-clause A R= AND-clause B if A refutes any of B's items * * Needed to handle (x AND y) R= ((!x OR !y) AND z) */ @@ -537,7 +537,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate) case CLASS_OR: /* - * AND-clause R= OR-clause if A refutes each of B's items + * AND-clause A R= OR-clause B if A refutes each of B's items */ result = true; iterate_begin(pitem, predicate, pred_info) @@ -562,7 +562,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate) return true; /* - * AND-clause R= atom if any of A's items refutes B + * AND-clause A R= atom B if any of A's items refutes B */ result = false; iterate_begin(citem, clause, clause_info) @@ -584,7 +584,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate) case CLASS_OR: /* - * OR-clause R= OR-clause if A refutes each of B's items + * OR-clause A R= OR-clause B if A refutes each of B's items */ result = true; iterate_begin(pitem, predicate, pred_info) @@ -601,7 +601,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate) case CLASS_AND: /* - * OR-clause R= AND-clause if each of A's items refutes + * OR-clause A R= AND-clause B if each of A's items refutes * any of B's items. */ result = true; @@ -638,7 +638,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate) return true; /* - * OR-clause R= atom if each of A's items refutes B + * OR-clause A R= atom B if each of A's items refutes B */ result = true; iterate_begin(citem, clause, clause_info) @@ -674,7 +674,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate) case CLASS_AND: /* - * atom R= AND-clause if A refutes any of B's items + * atom A R= AND-clause B if A refutes any of B's items */ result = false; iterate_begin(pitem, predicate, pred_info) @@ -691,7 +691,7 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate) case CLASS_OR: /* - * atom R= OR-clause if A refutes each of B's items + * atom A R= OR-clause B if A refutes each of B's items */ result = true; iterate_begin(pitem, predicate, pred_info) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
Hi here is initial version of reduced patch. It is small code, but relative big (although I expected bigger) change in tests. if these changes are too big, then we have to introduce a plpgsql GUC plpgsql.client_min_context and plpgsql.log_min_client. These GUC overwrite global setting for plpgsql functions. I'll be more happy without these variables. It decrease a impact of changes, but there is not clean what behave is expected when PL are used together - and when fails PLpgSQL function called from PLPerl. The context filtering should be really solved on TOP level. Regards Pavel 2015-07-08 14:09 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Wed, Jul 8, 2015 at 1:35 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2015-07-07 18:15 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Tue, Jul 7, 2015 at 9:04 AM, Pavel Stehule pavel.steh...@gmail.com wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for what is RAISE EXCEPTION - first or second case? First: RAISE (unless caught) is no different than any other kind of error. On Tue, Jul 7, 2015 at 9:42 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/07/2015 04:56 PM, Merlin Moncure wrote: It doesn't have to if the behavior is guarded with a GUC. I just don't understand what all the fuss is about. The default behavior of logging that is well established by other languages (for example java) that manage error stack for you should be to: *) Give stack trace when an uncaught exception is thrown *) Do not give stack trace in all other logging cases unless asked for Java's exception handling is so different from PostgreSQL's errors that I don't think there's much to be learned from that. But I'll bite: First of all, Java's exceptions always contain a stack trace. It's up to you when you catch an exception to decide whether to print it or not. try { ... } catch (Exception e) { e.printStackTrace() } is fairly common, actually. There is nothing like a NOTICE in Java, i.e. an exception that's thrown but doesn't affect the control flow. The best I can think of is System.out.println(), which of course has no stack trace attached to it. exactly. Perhaps you're arguing that NOTICE is more like printing to stderr, and should never contain any context information. I don't think that would be an improvement. It's very handy to have the context information available if don't know where a NOTICE is coming from, even if in most cases you're not interested in it. That's exactly what I'm arguing. NOTICE (and WARNING) are for printing out information to client side logging; it's really the only tool we have for that purpose and it fits that role perfectly. Of course, you may want to have NOTICE print context, especially when debugging, but some control over that would be nice and in most cases it's really not necessary. I really don't understand the objection to offering control over that behavior although I certainly understand wanting to keep the default behavior as it currently is. This is really quite different from a programming language's exception handling. First, there's a server, which produces the errors, and a separate client, which displays them. You cannot catch an exception in the client. BTW, let me throw in one use case to consider. We've been talking about psql, and what to print, but imagine a more sophisticated client like pgAdmin. It's not limited to either printing the context or not. It could e.g. hide the context information of all messages when they occur, but if you double-click on it, it's expanded to show all the context, location and all. You can't do that if the server doesn't send the context information in the first place. I would be happy to show you the psql redirected output logs from my nightly server processes that spew into the megabytes because of logging various high level steps (did this, did that). Oh, I believe you. I understand what the problem is, we're only talking about how best to address it. Yeah. For posterity, a psql based solution would work fine for me, but a server side solution has a lot of advantages (less protocol chatter, more configurability, keeping libpq/psql light). After some work on reduced version of plpgsql.min_context patch I am inclining to think so ideal solution needs more steps - because this issue has more than one dimension. There are two independent issues: 1. old plpgsql workaround that
Re: [HACKERS] Implementation of global temporary tables?
2015-07-09 7:32 GMT+02:00 Zhaomo Yang zhy...@cs.ucsd.edu: I am not sure, if it is not useless work. I don't understand why an implementation taking approach 2.a would be useless. As I said, its performance will be no worse than current temp tables and it will provide a lot of convenience to users who need to create temp tables in every session. Surely it should be step forward. But you will to have to solve lot of problems with duplicated tables in system catalogue, and still it doesn't solve the main problem with temporary tables - the bloating catalogue - and related performance degradation. Although global temp tables is nice to have feature (for PLpgSQL developers), we can live without it - and with some patterns and extensions, we are living well. But the performance issue is not be fixed by any pattern. So the major motivation for introduction of global temp tables is performance - from 90%. It should be a primary target to merge this feature to upstream. I believe, when bloating will be solved, then the chance to accept this patch will be pretty high. Regards Pavel Thanks, Zhaomo On Tue, Jul 7, 2015 at 11:53 PM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi 2015-07-08 9:08 GMT+02:00 Zhaomo Yang zhy...@cs.ucsd.edu: more global temp tables are little bit comfortable for developers, I'd like to emphasize this point. This feature does much more than saving a developer from issuing a CREATE TEMP TABLE statement in every session. Here are two common use cases and I'm sure there are more. (1) Imagine in a web application scenario, a developer wants to cache some session information in a temp table. What's more, he also wants to specify some rules which reference the session information. Without this feature, the rules will be removed at the end of every session since they depend on a temporary object. Global temp tables will allow the developer to define the temp table and the rules once. (2) The second case is mentioned by Tom Lane back in 2010 in a thread about global temp tables. (http://www.postgresql.org/message-id/9319.1272130...@sss.pgh.pa.us) The context that I've seen it come up in is that people don't want to clutter their functions with create-it-if-it-doesn't-exist logic, which you have to have given the current behavior of temp tables. 2.a - using on demand created temp tables - most simple solution, but doesn't help with catalogue bloating I've read the thread and people disapprove this approach because of the potential catalog bloat. However, I'd like to champion it. Indeed, this approach may have a bloat issue. But for users who needs global temp tables, they now have to create a new temp table in every session, which means they already have the bloat problem and presumably they have some policy to deal with it. In other words, implementing global temp tables by this approach gives users the same performance, plus the convenience the feature brings. The root problem here is that whether whether having the unoptimized feature is better than having no feature at all. Actually, there was a very similar discussion back in 2009 on global temp tables. Let me borrow Kevin Grittner's and Tom Lane's arguments here. Kevin Grittner's argument: http://www.postgresql.org/message-id/49f82aea.ee98.002...@wicourts.gov ... If you're saying we can implement the standard's global temporary tables in a way that performs better than current temporary tables, that's cool. That would be a nice bonus in addition to the application programmer convenience and having another tick-mark on the standards compliance charts. Do you think that's feasible? If not, the feature would be useful to some with the same performance that temporary tables currently provide. Tom Lane's arguments: http://www.postgresql.org/message-id/24110.1241035...@sss.pgh.pa.us I'm all for eliminating catalog overheads, if we can find a way to do that. I don't think that you get to veto implementation of the feature until we can find a way to optimize it better. The question is not about whether having the optimization would be better than not having it --- it's about whether having the unoptimized feature is better than having no feature at all (which means people have to implement the same behavior by hand, and they'll *still* not get the optimization). There have been several threads here discussing global temp table since 2007. Quite a few ideas aimed to avoid the bloat issue by not storing the metadata of the session copy in the catalog. However, it seems that none of them has been implemented, or even has a feasible design. So why don't we implement it in a unoptimized way first? I am not sure, if it is not useless work. Now, I am thinking so best implementation of global temp tables is enhancing unlogged tables to have local content. All local data can be saved in session memory. Usually it is less than 2KB with statistic,
Re: [HACKERS] copy.c handling for RLS is insecure
On Wed, Jul 08, 2015 at 10:55:47AM -0400, Stephen Frost wrote: It's interesting to consider that COPY purportedly operates under the SELECT privilege, yet fails to respect on-select rules. In released branches, COPY consistently refuses to operate directly on a view. (There's no (longer?) such thing as a table with an ON SELECT rule. CREATE RULE _RETURN AS ON SELECT ... converts a table to a view.) Having spent a bit of time thinking about this now, it occurs to me that we've drifted from the original concern and are now trying to solve an insolvable issue here. We're not trying to prevent against an attacker who owns the table we're going to COPY and wants to get us to run code they've written- that can happen by simply having RLS and that's why it's not enabled by default for superusers and why we have 'row_security = off', which pg_dump sets by default. The problem I wanted to see solved was the fact that, by running a DDL command concurrent with a COPY rel TO command, you can make the COPY read from a view. That is not possible in any serial execution of COPY with DDL. Now, you make a good point that before this undesirable outcome can happen, the user issuing the COPY command will have already trusted the roles that can pass owner checks for rel. That probably makes this useless as an attack tool. Nonetheless, I don't want COPY rejects views to soften into COPY rejects views, except in XYZ race condition. After a bit of discussion with Andres, my thinking on this is to do the following: - Fully qualify the name based on the opened relation - Keep the initial lock on the relation throughout - Remove the Assert() (other relations can be pulled in by RLS) That's much better. We have considerable experience with designs like that. - Keep the OID check, shouldn't hurt to have it What benefit is left? The planner does not promise any particular order within relationOids, and an order change could make false alarms here. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers