Re: [HACKERS] RangeType internal use
Horiguchi-san, On 06-02-2015 PM 04:34, Kyotaro HORIGUCHI wrote: > Hi, from nearby:) > Thank you! >> I wonder why I cannot find a way to get a range type for a given (sub-) >> type. I would like to build a RangeType from Datum's of lower and upper >> bounds. Much like how construct_array() builds an ArrayType from a Datum >> array of elements given elements' type info. >> >> Is there some way I do not seem to know? If not, would it be worthwhile >> to make something like construct_range() that returns a RangeType given >> Datum's of lower and upper bounds and subtype info? > > make_range needs the range type itself. > > On SQL interfalce, you can get range type coresponds to a base > type by looking up the pg_range catalog. > > SELECT rngtypid::regtype, rngsubtype::regtype > FROM pg_range WHERE rngsubtype = 'int'::regtype; > > rngtypid | rngsubtype > ---+ > int4range | integer > > But there's only one syscache for this catalog which takes range > type id. So the reverse resolution rngsubtype->rngtype seems not > available. TypeCahce has only comparison function info as surely > available element related to range types but this wouldn't > help. I think scanning the entire cache is not allowable even if > possible. > > Perhaps what is needed is adding RANGESUBTYPE syscache but I > don't know whether it is allowable or not. > > Thoughts? Actually, I'm wondering if there is one-to-one mapping from rangetype to subtype (and vice versa?), then this should be OK. But if not (that is designers of range types thought there is not necessarily such a mapping), then perhaps we could add, say, rngtypeisdefault flag to pg_range. Perhaps following is not too pretty: + +/* + * get_rangetype_for_type + * + * returns a TypeCacheEntry for a range type of a given (sub-) type. + */ +TypeCacheEntry * +get_rangetype_for_type(Oid subtypid) +{ + Relationrelation; + SysScanDesc scan; + HeapTuple rangeTuple; + Oid rngsubtype; + Oid rngtypid = InvalidOid; + + relation = heap_open(RangeRelationId, AccessShareLock); + + scan = systable_beginscan(relation, InvalidOid, false, + NULL, 0, NULL); + + while ((rangeTuple = systable_getnext(scan)) != NULL) + { + rngsubtype = ((Form_pg_range) GETSTRUCT(rangeTuple))->rngsubtype; + + if (rngsubtype == subtypid) + { + rngtypid = ((Form_pg_range) GETSTRUCT(rangeTuple))->rngtypid; + break; + } + } + + systable_endscan(scan); + heap_close(relation, AccessShareLock); + + return(rngtypid != InvalidOid + ? lookup_type_cache(rngtypid, TYPECACHE_RANGE_INFO): NULL); +} Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeType internal use
Hi, from nearby:) > I wonder why I cannot find a way to get a range type for a given (sub-) > type. I would like to build a RangeType from Datum's of lower and upper > bounds. Much like how construct_array() builds an ArrayType from a Datum > array of elements given elements' type info. > > Is there some way I do not seem to know? If not, would it be worthwhile > to make something like construct_range() that returns a RangeType given > Datum's of lower and upper bounds and subtype info? make_range needs the range type itself. On SQL interfalce, you can get range type coresponds to a base type by looking up the pg_range catalog. SELECT rngtypid::regtype, rngsubtype::regtype FROM pg_range WHERE rngsubtype = 'int'::regtype; rngtypid | rngsubtype ---+ int4range | integer But there's only one syscache for this catalog which takes range type id. So the reverse resolution rngsubtype->rngtype seems not available. TypeCahce has only comparison function info as surely available element related to range types but this wouldn't help. I think scanning the entire cache is not allowable even if possible. Perhaps what is needed is adding RANGESUBTYPE syscache but I don't know whether it is allowable or not. Thoughts? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL
On Fri, Jan 30, 2015 at 10:26 PM, Andreas Karlsson wrote: > On 01/30/2015 07:48 AM, Michael Paquier wrote: >> Looking at the latest patch, it seems that in >> AlterTableGetLockLevel@tablecmds.c we ought to put AT_ReAddConstraint, >> AT_AddConstraintRecurse and AT_ProcessedConstraint under the same >> banner as AT_AddConstraint. Thoughts? > > Good point. I think moving those would be a good thing even though it is > technically not necessary for AT_AddConstraintRecurse, since that one should > only be related to check constraints. Andreas, are you planning to send an updated patch? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Early Setup of instrumentation information in pg_stat_statements
On Thu, Feb 5, 2015 at 8:58 PM, Amit Kapila wrote: > > Currently in pg_stat_statements, the setup to track > instrumentation/totaltime information is done after > ExecutorStart(). Can we do this before ExecutorStart()? On further analyzing, I found that currently it is done after ExecutorStart() because we want to allocate Instrumentation info in per-query context which is allocated in ExecutorStart(), but I wonder why can't it be done inside ExecutorStart() after per-query context is available. Currently backend (standard_ExecutorRun()) takes care of updating the stats/instrumentation info for a plugin (pg_stat_statements) and the same is initialized by plugin itself, won't it be better that both the operations be done by backend as backend has more knowledge when it is appropriate to initialize or update and plugin needs to just indicate that it needs stats. Does anyone sees problem with doing it in above way or can think of a better way such that this information (that plugin needs stats) can be made available in ExecutorStart phase? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_basebackup may fail to send feedbacks.
Sorry, I misunderstood that. > > At Wed, 4 Feb 2015 19:22:39 +0900, Fujii Masao > > wrote in > > > >> On Wed, Feb 4, 2015 at 4:58 PM, Kyotaro HORIGUCHI > >> wrote: > >> > I'm very sorry for confused report. The problem found in 9.4.0 > >> > and the diagnosis was mistakenly done on master. > >> > > >> > 9.4.0 has no problem of feedback delay caused by slow xlog > >> > receiving on pg_basebackup mentioned in the previous mail. But > >> > the current master still has this problem. > >> > >> Seems walreceiver has the same problem. No? > > > > pg_receivexlog.c would have the same problem since it uses the > > same function with pg_basebackup.c. > > > > The correspondent of HandleCopyStream in wansender is > > WalReceiverMain, and it doesn't seem to have the same kind of > > loop shown below. It seems to surely send feedback per one > > record. > > > > | r = stream_reader(); > > | while (r > 0) > > | { > > | ... wal record processing stuff without sending feedback.. > > | r = stream_reader(); > > | } > > WalReceiverMain() has the similar code as follows. > > len = walrcv_receive(NAPTIME_PER_CYCLE, &buf); > if (len != 0) > { > for (;;) > { > if (len > 0) > { > > len = walrcv_receive(0, &buf); > } > } The loop seems a bit different but eventually the same about this discussion. 408> len = walrcv_receive(NAPTIME_PER_CYCLE, &buf); 409> if (len != 0) 410> { 415> for (;;) 416> { 417> if (len > 0) 418> { 425>XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1); 426> } 427-438> else {break;} 439> len = walrcv_receive(0, &buf); 440> } 441> } XLogWalRcvProcessMsg doesn't send feedback when processing 'w'=WAL record packet. So the same thing as pg_basebackup and pg_receivexlog will occur on walsender. Exiting the for(;;) loop by TimestampDifferenceExceeds just before line 439 is an equivalent measure to I poposed for receivelog.c, but calling XLogWalRcvSendHSFeedback(false) there is seemingly simpler (but I feel a bit uncomfortable for the latter) Or other measures? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
On Thu, Feb 5, 2015 at 11:58 PM, Michael Paquier wrote: > An updated patch is attached. I just noticed that the patch I sent was incorrect: - Parameter name was still wal_availability_check_interval and not wal_retrieve_retry_interval - Documentation was incorrect. Please use the patch attached instead for further review. -- Michael From 06a4d3d1f5fe4362d7be1404cbf0b45b74fea69f Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 19 Jan 2015 16:08:48 +0900 Subject: [PATCH] Add wal_retrieve_retry_interval This parameter aids to control at which timing WAL availability is checked when a node is in recovery, particularly when successive failures happen when fetching WAL archives, or when fetching WAL records from a streaming source. Default value is 5s. --- doc/src/sgml/recovery-config.sgml | 17 + src/backend/access/transam/recovery.conf.sample | 9 + src/backend/access/transam/xlog.c | 47 + src/backend/utils/adt/timestamp.c | 38 src/include/utils/timestamp.h | 2 ++ 5 files changed, 99 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 0c64ff2..d4babbd 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -145,6 +145,23 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows + + wal_retrieve_retry_interval (integer) + +wal_retrieve_retry_interval recovery parameter + + + + +This parameter specifies the amount of time to wait when a failure +occurred when reading WAL from a source (be it via streaming +replication, local pg_xlog or WAL archive) for a node +in standby mode, or when WAL is expected from a source. Default +value is 5s. + + + + diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index b777400..458308c 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -58,6 +58,15 @@ # #recovery_end_command = '' # +# +# wal_retrieve_retry_interval +# +# specifies an optional internal to wait for WAL to become available when +# a failure occurred when reading WAL from a source for a node in standby +# mode, or when WAL is expected from a source. +# +#wal_retrieve_retry_interval = '5s' +# #--- # RECOVERY TARGET PARAMETERS #--- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 629a457..111e53d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime; static char *recoveryTargetName; static int recovery_min_apply_delay = 0; static TimestampTz recoveryDelayUntilTime; +static int wal_retrieve_retry_interval = 5000; /* options taken from recovery.conf for XLOG streaming */ static bool StandbyModeRequested = false; @@ -4881,6 +4882,26 @@ readRecoveryCommandFile(void) (errmsg_internal("trigger_file = '%s'", TriggerFile))); } + else if (strcmp(item->name, "wal_retrieve_retry_interval") == 0) + { + const char *hintmsg; + + if (!parse_int(item->value, &wal_retrieve_retry_interval, GUC_UNIT_MS, + &hintmsg)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("parameter \"%s\" requires a temporal value", +"wal_retrieve_retry_interval"), + hintmsg ? errhint("%s", _(hintmsg)) : 0)); + ereport(DEBUG2, + (errmsg_internal("wal_retrieve_retry_interval = '%s'", item->value))); + + if (wal_retrieve_retry_interval <= 0) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("\"%s\" must have a value strictly positive", +"wal_retrieve_retry_interval"))); + } else if (strcmp(item->name, "recovery_min_apply_delay") == 0) { const char *hintmsg; @@ -10340,8 +10361,8 @@ static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt, XLogRecPtr tliRecPtr) { - static pg_time_t last_fail_time = 0; - pg_time_t now; + TimestampTz now = GetCurrentTimestamp(); + TimestampTz last_fail_time = now; /*--- * Standby mode is implemented by a state machine: @@ -10490,15 +10511,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * machine, so we've exhausted all the options for * obtaining the requested WAL. We're going to loop back * and retry from the archive, but if it hasn't been long - * since last attempt, sleep 5 seconds to avoid - * busy-waiting. + * since last attempt, sleep the amount of time specified + * by wal_r
[HACKERS] pg_basebackup, tablespace mapping and path canonicalization
Hi I stumbled on what appears to be inconsistent handling of double slashes in tablespace paths when using pg_basebackup with the -T/--tablespace-mapping option: ibarwick:postgresql (master)$ mkdir /tmp//foo-old ibarwick:postgresql (master)$ $PG_HEAD/psql 'dbname=postgres port=9595' psql (9.5devel) Type "help" for help. postgres=# CREATE TABLESPACE foo LOCATION '/tmp//foo-old'; CREATE TABLESPACE postgres=# \db List of tablespaces Name| Owner | Location +--+-- foo| ibarwick | /tmp/foo-old pg_default | ibarwick | pg_global | ibarwick | (3 rows) So far so good. However attempting to take a base backup (on the same machine) and remap the tablespace directory: ibarwick:postgresql (master)$ $PG_HEAD/pg_basebackup -p9595 --pgdata=/tmp//backup --tablespace-mapping=/tmp//foo-old=/tmp//foo-new produces the following message: pg_basebackup: directory "/tmp/foo-old" exists but is not empty which, while undeniably true, is unexpected and could potentially encourage someone to hastily delete "/tmp/foo-old" after confusing it with the new directory. The double-slash in the old tablespace path is the culprit: ibarwick:postgresql (master)$ $PG_HEAD/pg_basebackup -p9595 --pgdata=/tmp//backup --tablespace-mapping=/tmp/foo-old=/tmp//foo-new NOTICE: pg_stop_backup complete, all required WAL segments have been archived The documentation does state: To be effective, olddir must exactly match the path specification of the tablespace as it is currently defined. which I understood to mean that e.g. tildes would not be expanded, but it's somewhat surprising that the path is not canonicalized in the same way it is pretty much everywhere else (including in "CREATE TABLESPACE"). The attached patch adds the missing canonicalization; I can't see any reason not to do this. Thoughts? Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c new file mode 100644 index fbf7106..349bd90 *** a/src/bin/pg_basebackup/pg_basebackup.c --- b/src/bin/pg_basebackup/pg_basebackup.c *** tablespace_list_append(const char *arg) *** 199,204 --- 199,207 exit(1); } + canonicalize_path(cell->old_dir); + canonicalize_path(cell->new_dir); + if (tablespace_dirs.tail) tablespace_dirs.tail->next = cell; else -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table-level log_autovacuum_min_duration
On Fri, Feb 6, 2015 at 4:50 AM, Naoya Anzai wrote: >> The following review has been posted through the commitfest application: >> make installcheck-world: tested, failed >> Implements feature: tested, failed >> Spec compliant: tested, failed >> Documentation:tested, failed > I'm sorry, I just sent it by mistake. > All of them have passed. That's fine. I think you used the UI on the commit fest app, and it is not intuitive that you need to check those boxes at first sight when using it for the first time. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] No-rewrite timestamp<->timestamptz conversions
On Tue, Nov 05, 2013 at 05:02:58PM -0800, Josh Berkus wrote: > I'd also love some way of doing a no-rewrite conversion between > timestamp and timestamptz, based on the assumption that the original > values are UTC time. That's one I encounter a lot. It was such a conversion that motivated me to add the no-rewrite ALTER TABLE ALTER TYPE support in the first place. Interesting. Support for it didn't end up in any submitted patch due to a formal problem: a protransform function shall only consult IMMUTABLE facts, but we posit that timezone==UTC is a STABLE observation. However, a protransform function can easily simplify the immutable expression "tscol AT TIME ZONE 'UTC'", avoiding a rewrite. See attached patch. Examples: begin; create table t (c timestamptz); set client_min_messages = debug1; -- rewrite: depends on timezone GUC alter table t alter c type timestamp; -- rewrite: depends on timezone GUC alter table t alter c type timestamptz; -- no rewrite: always UTC+0 alter table t alter c type timestamp using c at time zone 'UTC'; -- no rewrite: always UTC+0 alter table t alter c type timestamptz using c at time zone 'Etc/Universal'; -- rewrite: always UTC+0 in the present day, but not historically alter table t alter c type timestamp using c at time zone 'Atlantic/Reykjavik'; -- rewrite: always UTC+0 in the present day, but not historically alter table t alter c type timestamptz using c at time zone 'Africa/Lome'; -- no rewrite: always UTC+0 alter table t alter c type timestamp using c at time zone 'GMT'; -- rewrite: always UTC+1 alter table t alter c type timestamptz using c at time zone '1 hour'::interval; -- no rewrite: always UTC+0 alter table t alter c type timestamp using c at time zone '0 hour'::interval; rollback; diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index 67e0cf9..723c670 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -27,6 +27,7 @@ #include "funcapi.h" #include "libpq/pqformat.h" #include "miscadmin.h" +#include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "parser/scansup.h" #include "utils/array.h" @@ -4874,6 +4875,87 @@ interval_part(PG_FUNCTION_ARGS) } +/* timestamp_zone_transform() + * If the zone argument of a timestamp_zone() or timestamptz_zone() call is a + * plan-time constant denoting a zone equivalent to UTC, the call will always + * return its second argument unchanged. Simplify the expression tree + * accordingly. Civil time zones almost never qualify, because jurisdictions + * that follow UTC today have not done so continuously. + */ +Datum +timestamp_zone_transform(PG_FUNCTION_ARGS) +{ + Node *func_node = (Node *) PG_GETARG_POINTER(0); + FuncExpr *expr = (FuncExpr *) func_node; + Node *ret = NULL; + Node *zone_node; + + Assert(IsA(expr, FuncExpr)); + Assert(list_length(expr->args) == 2); + + zone_node = (Node *) linitial(expr->args); + + if (IsA(zone_node, Const) &&!((Const *) zone_node)->constisnull) + { + text *zone = DatumGetTextPP(((Const *) zone_node)->constvalue); + chartzname[TZ_STRLEN_MAX + 1]; + char *lowzone; + int type, + abbrev_offset; + pg_tz *tzp; + boolnoop = false; + + /* +* If the timezone is forever UTC+0, the FuncExpr function call is a +* no-op for all possible timestamps. This passage mirrors code in +* timestamp_zone(). +*/ + text_to_cstring_buffer(zone, tzname, sizeof(tzname)); + lowzone = downcase_truncate_identifier(tzname, + strlen(tzname), + false); + type = DecodeTimezoneAbbrev(0, lowzone, &abbrev_offset, &tzp); + if (type == TZ || type == DTZ) + noop = (abbrev_offset == 0); + else if (type == DYNTZ) + { + /* +* An abbreviation of a single-offset timezone ought not to be +* configured as a DYNTZ, so don't bother checking. +*/ + } + else + { + longtzname_offset; + + tzp = pg_tzset(tzname); + if (tzp && pg_get_timezone_offset(tzp, &tzname_offset)) + noop = (tzname_offset == 0); + } + + if (noop) + { + Node *timestamp = (Node *) lsecond(expr->args); + + /* Strip any existing RelabelType node(
Re: [HACKERS] Table-level log_autovacuum_min_duration
> The following review has been posted through the commitfest application: > make installcheck-world: tested, failed > Implements feature: tested, failed > Spec compliant: tested, failed > Documentation:tested, failed I'm sorry, I just sent it by mistake. All of them have passed. --- Naoya Anzai -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] RangeType internal use
Hi, I wonder why I cannot find a way to get a range type for a given (sub-) type. I would like to build a RangeType from Datum's of lower and upper bounds. Much like how construct_array() builds an ArrayType from a Datum array of elements given elements' type info. Is there some way I do not seem to know? If not, would it be worthwhile to make something like construct_range() that returns a RangeType given Datum's of lower and upper bounds and subtype info? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table-level log_autovacuum_min_duration
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Hi, I'm Naoya Anzai. I've been working as a PostgreSQL Support Engineer for 6 years. I am a newcomer of reviewing, and My programming skill is not so high. But the following members also participate in this review. (We say "Two heads are better than one." :)) Akira Kurosawa Taiki Kondo Huong Dangminh So I believe reviewing patches is not difficult for us. This is a review of Table-level log_autovacuum_min_duration patch: http://www.postgresql.org/message-id/cab7npqtbqsbegvb8coh01k7argys9kbuv8dr+aqgonfvb8k...@mail.gmail.com Submission review The patch applies cleanly to HEAD, and it works fine on Fedora release 20. There is no regression test,but I think it is no problem because other parameter also is not tested. Usability review pg_dump/pg_restore support is OK. I think this feature is a nice idea and I also want this feature. Feature test I executed following commands after setting "log_autovacuum_min_duration" to 1000 in the GUC. ("bar" table is already created with no options.) CREATE TABLE foo ( ... ) WITH ( log_autovacuum_min_duration = 0 ); ALTER TABLE bar SET (log_autovacuum_min_duration = 0 ); Then, only in "foo" and "bar" table, autovacuum log was printed out even if elapsed time of autovacuum is lesser than 1000ms. This behavior was expected and there was no crash or failed assertion. So it looked good for me. But, I executed following command, in "buzz" table, autovacuum log was printed out if elapsed time is more than 1000ms. CREATE TABLE buzz ( ... ) WITH ( log_autovacuum_min_duration = -1 ); ^^ I expect autovacuum log is NOT printed out even if elapsed time is more than 1000ms in this case. My thought is wrong, isn't it? In my opinion, there is an use case that autovacuum log is always printed out in all tables excepting specified tables. I think there is a person who wants to use it like this case, but I (or he) can NOT use in this situation. How about your opinion? Performance review Not reviewed from this point of view. Coding review I think description of log_autovacuum_min_duration in reloptions.c (line:215) should be like other parameters (match with guc.c). So it should be "Sets the minimum execution time above which autovacuum actions will be logged." but not "Log autovacuum execution for given threshold". There is no new function which is defined in this patch, and modified contents are not related to OS-dependent contents, so I think it will work fine on Windows/BSD etc. Architecture review About the modification of gram.y. I think it is not good that log_min_duration is initialized to zeros in gram.y when "FREEZE" option is set. Because any VACUUM queries never use this parameter. I think log_min_duration always should be initialized to -1. Regards, Naoya Anzai (NEC Solution Innovators,Ltd.) The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] HEADS UP: PGCon 2015 major schedule changes
Hello, By request, the format of PGCon 2015 will differ significantly from previous year. Our goal is to give you more of what you want while still keeping the stuff you've always liked. In June 2015, PGCon will be structured as follows: Unconference: 16-17 June 2015 (Tue afternoon & all day Wed) Beginner Tutorials: 17 June 2015 (Wed) Talks: 18-19 June 2015 (Thu-Fri) Advanced Tutorial: 20 June 2015 (Sat) The big changes are: - Unconference moved to weekdays and now 1.5 days (was one day; Saturday) - Tutorials split between beginner and advanced, and now on Wednesday & Saturday (was Tuesday & Wednesday) Why? The unconference has become a bigger and more significant part of PGCon for PostgreSQL contributors. It has moved to earlier in the week to coordinate with other developer meetings, in order to expand the participation in development discussions and meetings around PGCon. Additionally, the shift of some tutorials to Saturday allows tutorials to involve key PostgreSQL contributors without schedule conflicts. Unfortunately, this meant moving something else to Saturday, at least for this year. We considered moving the talks to earlier in the week, but we felt that our changes were already disruptive and wanted to minimize the effects this late change may have on people who have already booked travel / accommodation. To those affected, we apologize and hope that this new structure will benefit everyone. — Dan Langille http://langille.org/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 2/5/15 4:53 PM, Josh Berkus wrote: >> Actually, perhaps we should have a boolean setting that just implies >> min=max, instead of having a configurable minimum?. That would cover all >> of those reasons pretty well. So we would have a "max_wal_size" setting, >> and a boolean "preallocate_all_wal = on | off". Would anyone care for >> the flexibility of setting a minimum that's different from the maximum? > I do, actually. Here's the case I want it for: > > I have a web application which gets all of its new data as uncoordinated > batch updates from customers. Since it's possible for me to receive > several batch updates at once, I set max_wal_size to 16GB, roughtly the > side of 8 batch updates. But I don't want the WAL that big all the time > because it slows down backup snapshots. So I set min_wal_size to 2GB, > roughly the size of one batch update. > > That's an idiosyncratic case, but I can imagine more of them out there. > > I wouldn't be opposed to min_wal_size = -1 meaning "same as > max_wal_size" though. +1 for min_wal_size. Like Josh, I can think of instances where this would be good. -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Redesigning checkpoint_segments
On 02/05/2015 01:42 PM, Heikki Linnakangas wrote: > There are a few reasons for making the minimum configurable: Any thoughts on what the default minimum should be, if the default max is 1.1GB/64? > 1. Creating new segments when you need them is not free, so if you have > a workload with occasional very large spikes, you might want to prepare > for them. The auto-tuning will accommodate for the peak usage, but it's > a moving average so if the peaks are infrequent enough, it will shrink > the size down between the spikes. > > 2. To avoid running out of disk space on write to WAL (which leads to a > PANIC). In particular, if you have the WAL on the same filesystem as the > data, pre-reserving all the space required for WAL makes it much more > likely that you when you run out of disk space, you run out when writing > regular data, not WAL. > > 3. Unforeseen issues with the auto-tuning. It might not suite everyone, > so it's nice that you can still get the old behaviour by setting min=max. > > Actually, perhaps we should have a boolean setting that just implies > min=max, instead of having a configurable minimum?. That would cover all > of those reasons pretty well. So we would have a "max_wal_size" setting, > and a boolean "preallocate_all_wal = on | off". Would anyone care for > the flexibility of setting a minimum that's different from the maximum? I do, actually. Here's the case I want it for: I have a web application which gets all of its new data as uncoordinated batch updates from customers. Since it's possible for me to receive several batch updates at once, I set max_wal_size to 16GB, roughtly the side of 8 batch updates. But I don't want the WAL that big all the time because it slows down backup snapshots. So I set min_wal_size to 2GB, roughly the size of one batch update. That's an idiosyncratic case, but I can imagine more of them out there. I wouldn't be opposed to min_wal_size = -1 meaning "same as max_wal_size" though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
>>> Default of 4 for min_wal_size? >> >> I assume you mean 4 segments; why not 3 as currently? As long as the >> system has the latitude to ratchet it up when needed, there seems to >> be little advantage to raising the minimum. Of course I guess there >> must be some advantage or Heikki wouldn't have made it configurable, >> but I'd err on the side of keeping this one small. Hopefully the >> system that automatically adjusts this is really smart, and a large >> min_wal_size is superfluous for most people. > > Keep in mind that the current is actually 7, not three (3*2+1). So 3 > would be a siginficant decrease. However, I don't feel strongly about > it either way. I think that there is probably a minimum reasonable > value > 1, but I'm not sure what it is. Good point. OK, 4 works for me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Thu, Feb 5, 2015 at 4:42 PM, Heikki Linnakangas wrote: > Actually, perhaps we should have a boolean setting that just implies > min=max, instead of having a configurable minimum?. That would cover all of > those reasons pretty well. So we would have a "max_wal_size" setting, and a > boolean "preallocate_all_wal = on | off". Would anyone care for the > flexibility of setting a minimum that's different from the maximum? I like the way you have it now better. If we knew for certain that there were no advantage in configuring a value between 0 and the maximum, that would be one thing, but we don't and can't know that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 02/05/2015 11:28 PM, Robert Haas wrote: On Thu, Feb 5, 2015 at 2:11 PM, Josh Berkus wrote: Default of 4 for min_wal_size? I assume you mean 4 segments; why not 3 as currently? As long as the system has the latitude to ratchet it up when needed, there seems to be little advantage to raising the minimum. Of course I guess there must be some advantage or Heikki wouldn't have made it configurable, but I'd err on the side of keeping this one small. Hopefully the system that automatically adjusts this is really smart, and a large min_wal_size is superfluous for most people. There are a few reasons for making the minimum configurable: 1. Creating new segments when you need them is not free, so if you have a workload with occasional very large spikes, you might want to prepare for them. The auto-tuning will accommodate for the peak usage, but it's a moving average so if the peaks are infrequent enough, it will shrink the size down between the spikes. 2. To avoid running out of disk space on write to WAL (which leads to a PANIC). In particular, if you have the WAL on the same filesystem as the data, pre-reserving all the space required for WAL makes it much more likely that you when you run out of disk space, you run out when writing regular data, not WAL. 3. Unforeseen issues with the auto-tuning. It might not suite everyone, so it's nice that you can still get the old behaviour by setting min=max. Actually, perhaps we should have a boolean setting that just implies min=max, instead of having a configurable minimum?. That would cover all of those reasons pretty well. So we would have a "max_wal_size" setting, and a boolean "preallocate_all_wal = on | off". Would anyone care for the flexibility of setting a minimum that's different from the maximum? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 02/05/2015 01:28 PM, Robert Haas wrote: > On Thu, Feb 5, 2015 at 2:11 PM, Josh Berkus wrote: >> Except that, when setting up servers for customers, one thing I pretty >> much always do for them is temporarily increase checkpoint_segments for >> the initial data load. So having Postgres do this automatically would >> be a feature, not a bug. > > Right! > >> I say we go with ~~ 1GB. That's an 8X increase over current default >> size for the maximum > > Sounds great. > >> Default of 4 for min_wal_size? > > I assume you mean 4 segments; why not 3 as currently? As long as the > system has the latitude to ratchet it up when needed, there seems to > be little advantage to raising the minimum. Of course I guess there > must be some advantage or Heikki wouldn't have made it configurable, > but I'd err on the side of keeping this one small. Hopefully the > system that automatically adjusts this is really smart, and a large > min_wal_size is superfluous for most people. Keep in mind that the current is actually 7, not three (3*2+1). So 3 would be a siginficant decrease. However, I don't feel strongly about it either way. I think that there is probably a minimum reasonable value > 1, but I'm not sure what it is. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On Thu, Feb 5, 2015 at 2:11 PM, Josh Berkus wrote: > Except that, when setting up servers for customers, one thing I pretty > much always do for them is temporarily increase checkpoint_segments for > the initial data load. So having Postgres do this automatically would > be a feature, not a bug. Right! > I say we go with ~~ 1GB. That's an 8X increase over current default > size for the maximum Sounds great. > Default of 4 for min_wal_size? I assume you mean 4 segments; why not 3 as currently? As long as the system has the latitude to ratchet it up when needed, there seems to be little advantage to raising the minimum. Of course I guess there must be some advantage or Heikki wouldn't have made it configurable, but I'd err on the side of keeping this one small. Hopefully the system that automatically adjusts this is really smart, and a large min_wal_size is superfluous for most people. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL renegotiation and other related woes
On 01/26/2015 12:14 PM, Andres Freund wrote: Hi, When working on getting rid of ImmediateInterruptOK I wanted to verify that ssl still works correctly. Turned out it didn't. But neither did it in master. Turns out there's two major things we do wrong: 1) We ignore the rule that once called and returning SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called again with the same parameters. Unfortunately that rule doesn't mean just that the same parameters have to be passed in, but also that we can't just constantly switch between _read()/write(). Especially nonblocking backend code (i.e. walsender) and the whole frontend code violate this rule. I don't buy that. Googling around, and looking at the man pages, I just don't see anything to support that claim. I believe it is supported to switch between reads and writes. 2) We start renegotiations in be_tls_write() while in nonblocking mode, but don't properly retry to handle socket readyness. There's a loop that retries handshakes twenty times (???), but what actually is needed is to call SSL_get_error() and ensure that there's actually data available. Yeah, that's just crazy. Actually, I don't think we need to call SSL_do_handshake() at all. At least on modern OpenSSL versions. OpenSSL internally uses a state machine, and SSL_read() and SSL_write() calls will complete the handshake just as well. 2) is easy enough to fix, but 1) is pretty hard. Before anybody says that 1) isn't an important rule: It reliably causes connection aborts within a couple renegotiations. The higher the latency the higher the likelihood of aborts. Even locally it doesn't take very long to abort. Errors usually are something like "SSL connection has been closed unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host of other similar messages. There's a couple reports of those in the archives and I've seen many more in client logs. Yeah, I can reproduce that. There's clearly something wrong. I believe this is an OpenSSL bug. I traced the "unexpected message" error to this code in OpenSSL's s3_read_bytes() function: case SSL3_RT_APPLICATION_DATA: /* * At this point, we were expecting handshake data, but have * application data. If the library was running inside ssl3_read() * (i.e. in_read_app_data is set) and it makes sense to read * application data at this point (session renegotiation not yet * started), we will indulge it. */ if (s->s3->in_read_app_data && (s->s3->total_renegotiations != 0) && (((s->state & SSL_ST_CONNECT) && (s->state >= SSL3_ST_CW_CLNT_HELLO_A) && (s->state <= SSL3_ST_CR_SRVR_HELLO_A) ) || ((s->state & SSL_ST_ACCEPT) && (s->state <= SSL3_ST_SW_HELLO_REQ_A) && (s->state >= SSL3_ST_SR_CLNT_HELLO_A) ) )) { s->s3->in_read_app_data = 2; return (-1); } else { al = SSL_AD_UNEXPECTED_MESSAGE; SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD); goto f_err; } So that quite clearly throws an error, *unless* the original application call was SSL_read(). As you also concluded, OpenSSL doesn't like it when SSL_write() is called when it's in renegotiation. I think that's OpenSSL's fault and it should "indulge" the application data message even in SSL_write(). Can we work-around that easily? I believe we can. The crucial part is that we mustn't let SSL_write() to process any incoming application data. We can achieve that if we always call SSL_read() to drain such data, before calling SSL_write(). But that's not quite enough; if we're in renegotiation, SSL_write() might still try to read more data from the socket that has arrived after the SSL_read() call. Fortunately, we can easily prevent that by hacking pqsecure_raw_read() to always return EWOULDBLOCK, when it's called through SSL_write(). The attached patch does that. I've been running your pg_receivexlog test case with this for about 15 minutes without any errors now, with ssl_renegotiation_limit=50kB, while before it errored out within seconds. In theory, I guess we should do similar hacks in the server, and always call SSL_read() before SSL_write(), but it seems to work without it. Or maybe not; OpenSSL server and client code is not symmetric, so perhaps it works in server mode without that. PS. The server code started renegotiation when it's 1kB from reaching ssl_renegotiation_limit. I increased that to 32kB, because in testing, I got some "SSL failed to renegotiate connection before limit expired" errors in testing before I did that. PPS. I did all testing of this patch with my sleeping logic simplification patch applied (http://www.postgresql.org/message-id/54d3821e.9060...@vmware.com). It applies without it too, and I
Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Fujii Masao wrote: > +TimestampDifference(start_time, stop_time, &secs, µsecs); > +pg_usleep(interval_msec * 1000L - (100L * secs + 1L * microsecs)); > > What if the large interval is set and a signal arrives during the sleep? > I'm afraid that a signal cannot terminate the sleep on some platforms. > This problem exists even now because pg_usleep is used, but the sleep > time is just 5 seconds, so it's not so bad. But the patch allows a user to > set large sleep time. Yes, and I thought that we could live with that for this patch... Now that you mention it something similar to what recoveryPausesHere would be quite good to ease the shutdown of a process interrupted, even more than now as well. So let's do that. > Shouldn't we use WaitLatch or split the pg_usleep like recoveryPausesHere() > does? I'd vote for the latter as we would still need to calculate a timestamp difference in any cases, so it feels easier to do that in the new single API and this patch does (routine useful for plugins as well). Now I will not fight if people think that using recoveryWakeupLatch is better. An updated patch is attached. This patch contains as well a fix for something that was mentioned upthread but not added in latest version: wal_availability_check_interval should be used when waiting for a WAL record from a stream, and for a segment when fetching from archives. Last version did only the former, not the latter. -- Michael From 9c3a7fa35538993c1345a40fe0973332a76bdb81 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 19 Jan 2015 16:08:48 +0900 Subject: [PATCH] Add wal_availability_check_interval This parameter aids to control at which timing WAL availability is checked when a node is in recovery, particularly when successive failures happen when fetching WAL archives, or when fetching WAL records from a streaming source. Default value is 5s. --- doc/src/sgml/recovery-config.sgml | 21 +++ src/backend/access/transam/recovery.conf.sample | 10 ++ src/backend/access/transam/xlog.c | 47 + src/backend/utils/adt/timestamp.c | 38 src/include/utils/timestamp.h | 2 ++ 5 files changed, 104 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 0c64ff2..7fd51ce 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -145,6 +145,27 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows + + wal_availability_check_interval (integer) + +wal_availability_check_interval recovery parameter + + + + +This parameter specifies the amount of time to wait when +WAL is not available for a node in recovery. Default value is +5s. + + +A node in recovery will wait for this amount of time if +restore_command returns nonzero exit status code when +fetching new WAL segment files from archive or when a WAL receiver +is not able to fetch a WAL record when using streaming replication. + + + + diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index b777400..70d3946 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -58,6 +58,16 @@ # #recovery_end_command = '' # +# +# wal_availability_check_interval +# +# specifies an optional interval to check for availability of WAL when +# recovering a node. This interval of time represents the frequency of +# retries if a previous command of restore_command returned nonzero exit +# status code or if a walreceiver did not stream completely a WAL record. +# +#wal_availability_check_interval = '5s' +# #--- # RECOVERY TARGET PARAMETERS #--- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 629a457..4f4efca 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime; static char *recoveryTargetName; static int recovery_min_apply_delay = 0; static TimestampTz recoveryDelayUntilTime; +static int wal_availability_check_interval = 5000; /* options taken from recovery.conf for XLOG streaming */ static bool StandbyModeRequested = false; @@ -4881,6 +4882,26 @@ readRecoveryCommandFile(void) (errmsg_internal("trigger_file = '%s'", TriggerFile))); } + else if (strcmp(item->name, "wal_availability_check_interval") == 0) + { + const char *hintmsg; + + if (!parse_int(item->value, &wal_availability_check_interval, GUC_UNIT_MS, + &hintmsg)) +ereport(ERROR, + (errcode(
Re: [HACKERS] s_lock.h default definitions are rather confused
On 2015-01-15 17:59:40 +0100, Andres Freund wrote: > On 2015-01-15 11:56:24 -0500, Tom Lane wrote: > > Andres Freund writes: > > > On 2015-01-15 10:57:10 -0500, Tom Lane wrote: > > >> While I'll not cry too hard when we decide to break C89 compatibility, > > >> I don't want it to happen accidentally; so having a pretty old-school > > >> compiler in the farm seems important to me. > > > > > I'd worked on setting up a modern gcc (or was it clang?) with the > > > appropriate flags to warn about !C89 stuff some time back, but failed > > > because of configure bugs. > > > > My recollection is that there isn't any reasonable way to get gcc to > > warn about C89 violations as such. -ansi -pedantic is not very fit > > for the purpose. > > It was clang, which has -Wc99-extensions/-Wc11-extensions. gcc-5 now has: * A new command-line option -Wc90-c99-compat has been added to warn about features not present in ISO C90, but present in ISO C99. * A new command-line option -Wc99-c11-compat has been added to warn about features not present in ISO C99, but present in ISO C11. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila wrote: >>/* >>+* We recheck the actual size even if pglz_compress() report success, >>+* because it might be satisfied with having saved as little as one byte >>+* in the compressed data. >>+*/ >>+ *len = (uint16) compressed_len; >>+ if (*len >= orig_len - 1) >>+ return false; >>+ return true; >>+} > > As per latest code ,when compression is 'on' we introduce additional 2 bytes > in the header of each block image for storing raw_length of the compressed > block. > In order to achieve compression while accounting for these two additional > bytes, we must ensure that compressed length is less than original length - 2. > So , IIUC the above condition should rather be > > If (*len >= orig_len -2 ) > return false; > return true; > The attached patch contains this. It also has a cosmetic change- renaming > compressBuf to uncompressBuf as it is used to store uncompressed page. Agreed on both things. Just looking at your latest patch after some time to let it cool down, I noticed a couple of things. #define MaxSizeOfXLogRecordBlockHeader \ (SizeOfXLogRecordBlockHeader + \ - SizeOfXLogRecordBlockImageHeader + \ + SizeOfXLogRecordBlockImageHeader, \ + SizeOfXLogRecordBlockImageCompressionInfo + \ There is a comma here instead of a sum sign. We should really sum up all those sizes to evaluate the maximum size of a block header. + * Permanently allocate readBuf uncompressBuf. We do it this way, + * rather than just making a static array, for two reasons: This comment is just but weird, "readBuf AND uncompressBuf" is more appropriate. + * We recheck the actual size even if pglz_compress() report success, + * because it might be satisfied with having saved as little as one byte + * in the compressed data. We add two bytes to store raw_length with the + * compressed image. So for compression to be effective compressed_len should + * be atleast < orig_len - 2 This comment block should be reworked, and misses a dot at its end. I rewrote it like that, hopefully that's clearer: + /* +* We recheck the actual size even if pglz_compress() reports success and see +* if at least 2 bytes of length have been saved, as this corresponds to the +* additional amount of data stored in WAL record for a compressed block +* via raw_length. +*/ In any case, those things have been introduced by what I did in previous versions... And attached is a new patch. -- Michael From 2b0c54bc7c4bfb2494cf8b0394d56635b01d7c5a Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 25 Nov 2014 14:24:26 +0900 Subject: [PATCH] Support compression for full-page writes in WAL Compression is controlled with a new parameter called wal_compression. This parameter can be changed at session level to control WAL compression. --- contrib/pg_xlogdump/pg_xlogdump.c | 17 ++- doc/src/sgml/config.sgml | 29 + src/backend/access/transam/xlog.c | 1 + src/backend/access/transam/xloginsert.c | 161 ++ src/backend/access/transam/xlogreader.c | 71 +--- src/backend/utils/misc/guc.c | 9 ++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + src/include/access/xlogreader.h | 7 +- src/include/access/xlogrecord.h | 49 ++-- src/include/pg_config.h.in| 4 +- 11 files changed, 297 insertions(+), 53 deletions(-) diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c index c1bfbc2..3ebaac6 100644 --- a/contrib/pg_xlogdump/pg_xlogdump.c +++ b/contrib/pg_xlogdump/pg_xlogdump.c @@ -363,14 +363,14 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats, * takes up BLCKSZ bytes, minus the "hole" length. * * XXX: We peek into xlogreader's private decoded backup blocks for the - * hole_length. It doesn't seem worth it to add an accessor macro for + * length of block. It doesn't seem worth it to add an accessor macro for * this. */ fpi_len = 0; for (block_id = 0; block_id <= record->max_block_id; block_id++) { if (XLogRecHasBlockImage(record, block_id)) - fpi_len += BLCKSZ - record->blocks[block_id].hole_length; + fpi_len += record->blocks[block_id].bkp_len; } /* Update per-rmgr statistics */ @@ -465,9 +465,16 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record) blk); if (XLogRecHasBlockImage(record, block_id)) { -printf(" (FPW); hole: offset: %u, length: %u\n", - record->blocks[block_id].hole_offset, - record->blocks[block_id].hole_length); +if (record->blocks[block_id].is_compressed) + printf(" (FPW compressed); hole offset: %u, " + "compressed length: %u, original length: %u\n", + record->blocks[block_id].h
Re: [HACKERS] Possible problem with pgcrypto
On 02/05/2015 01:18 PM, Marko Tiikkaja wrote: On 2/5/15 4:48 PM, Jan Wieck wrote: What the script does is to encode a small string with pgp_sym_encrypt() and then repeatedly try to decrypt it with different "wrong" passwords. The expected error message for that is of course "Wrong key or corrupt data". Every now and then, I get a different error message. Things I've seen are: "Not text data" That's not unexpected; the check for whether the data is text or not appears to happen quite early in the process of decoding. So it's enough to get to that point without anything being obviously broken. I suspected something like that. In addition to the two errors above, it doesn't appear to be too difficult to see PXE_MBUF_SHORT_READ, which would give you ERROR: Corrupt data. I wonder why that error message is different, though. From reading the code as far I did, I expected to see that, but haven't seen it yet. "pgcrypto bug" That doesn't look too good, but I can't reproduce it against 9.3.6 either. Let me improve the script to a point where it can run for a long time in the background and collect all different error cases as examples of encrypted data and wrong password. Thanks so far. Jan -- Jan Wieck Senior Software Engineer http://slony.info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 02/04/2015 04:16 PM, David Steele wrote: > On 2/4/15 3:06 PM, Robert Haas wrote: >>> Hmmm, I see your point. I spend a lot of time on AWS and in >>> container-world, where disk space is a lot more constrained. However, >>> it probably makes more sense to recommend non-default settings for that >>> environment, since it requires non-default settings anyway. >>> >>> So, 384MB? >> That's certainly better, but I think we should go further. Again, >> you're not committed to using this space all the time, and if you are >> using it you must have a lot of write activity, which means you are >> not running on a tin can and a string. If you have a little tiny >> database, say 100MB, running on a little-tiny Amazon instance, >> handling a small number of transactions, you're going to stay close to >> wal_min_size anyway. Right? > > The main exception I can think of is when using dump/restore to upgrade > instead of pg_upgrade. This would generate a lot of WAL for what could > otherwise be a low-traffic database. Except that, when setting up servers for customers, one thing I pretty much always do for them is temporarily increase checkpoint_segments for the initial data load. So having Postgres do this automatically would be a feature, not a bug. I say we go with ~~ 1GB. That's an 8X increase over current default size for the maximum Default of 4 for min_wal_size? On 02/04/2015 07:37 PM, Amit Kapila wrote:> On Thu, Feb 5, 2015 at 3:11 AM, Josh Berkus > If we did add one, I'd suggest calling it "wal_size_limit" or something >> similar. > > I think both the names (max_wal_size and wal_size_limit) seems to > indicate the same same thing. Few more suggestions: > typical_wal_size, wal_size_soft_limit? Again, you're suggesting more complicated (and difficult to translate, and for that matter misleading) names in order to work around a future feature which nobody is currently working on, and we may never have. Let's keep clear and simple parameter names which most people can understand, instead of making things complicated for the sake of complexity. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible problem with pgcrypto
On 02/05/2015 10:58 AM, Tom Lane wrote: Jan Wieck writes: I have encountered a small instability in the behavior of pgcrypto's pgp_sym_decrypt() function. Attached is a script that can reproduce the problem. It may have to be run repeatedly because the symptom occurs rather seldom. What the script does is to encode a small string with pgp_sym_encrypt() and then repeatedly try to decrypt it with different "wrong" passwords. The expected error message for that is of course "Wrong key or corrupt data". Every now and then, I get a different error message. Things I've seen are: Have you tested this with this week's releases? We fixed some memory-mishandling bugs in pgcrypto ... The posted script reproduces the symptom in today's checkout of master as well as REL9_4_STABLE. Jan -- Jan Wieck Senior Software Engineer http://slony.info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Fujii Masao wrote: > I wrote >> This is an inspiration from lz4 APIs. Wouldn't it be buggy for a >> compression algorithm to return a size of 0 bytes as compressed or >> decompressed length btw? We could as well make it return a negative >> value when a failure occurs if you feel more comfortable with it. > > I feel that's better. Attached is the updated version of the patch. > I changed the pg_lzcompress and pg_lzdecompress so that they return -1 > when failure happens. Also I applied some cosmetic changes to the patch > (e.g., shorten the long name of the newly-added macros). > Barring any objection, I will commit this. I just had a look at your updated version, ran some sanity tests, and things look good from me. The new names of the macros at the top of tuptoaster.c are clearer as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Whether this is a realistic expectation given the wording of the SQL-MED > standard is unclear. I've only got a draft as of 2011-12 handy, but I'm not seeing anything in the standard that cares one bit about the value of the options specified for the FDW. All that's said is "Both the option name and the permissible ranges of option values of a generic option are defined by the foreign-data wrappers." In my view, that means that if we give FDWs the ability to control what's displayed for their options, we're well within the requirements of the spec. > I'm also concerned that if we take this on board as being a security > concern, it will mean that any time we make an effort to push some > construct we didn't before over to the remote end, we have to worry about > whether it would be a security breach to allow the local user to cause > that code to execute on the remote end. It's tough enough worrying about > semantic-equivalence issues without mixing hostile-user scenarios in. I agree that we don't want to go there but I don't see this as leading us in that direction. > So I would rather say that the baseline security expectation is that > granting a user mapping should be presumed to be tantamount to granting > direct access to the remote server with that login info. In that context, > being able to see the password should not be considered to be any big deal. I don't agree with this, however. Being able to execute arbitrary SQL on the remote side through a specific interface does not equate to having the password and direct access to the remote server. Even if it did, there are good reasons to not expose passwords, even to the user whose password it is. We take steps to avoid exposing user's passwords and password hashes in core and I do not see this case as any different. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Possible problem with pgcrypto
On 2/5/15 4:48 PM, Jan Wieck wrote: What the script does is to encode a small string with pgp_sym_encrypt() and then repeatedly try to decrypt it with different "wrong" passwords. The expected error message for that is of course "Wrong key or corrupt data". Every now and then, I get a different error message. Things I've seen are: "Not text data" That's not unexpected; the check for whether the data is text or not appears to happen quite early in the process of decoding. So it's enough to get to that point without anything being obviously broken. In addition to the two errors above, it doesn't appear to be too difficult to see PXE_MBUF_SHORT_READ, which would give you ERROR: Corrupt data. I wonder why that error message is different, though. "pgcrypto bug" That doesn't look too good, but I can't reproduce it against 9.3.6 either. .m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
Robert Haas writes: > We've got a mix of styles for extensible options right now: That we do. > So COPY puts the options at the very end, but EXPLAIN and VACUUM put > them right after the command name. I prefer the latter style and > would vote to adopt it here. Meh. Options-at-the-end seems by far the most sensible style to me. The options-right-after-the-keyword style is a mess, both logically and from a parsing standpoint, and the only reason we have it at all is historical artifact (ask Bruce about the origins of VACUUM ANALYZE over a beer sometime). Still, I can't help noticing that I'm being outvoted. I'll shut up now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
On Wed, Feb 4, 2015 at 8:24 PM, Tom Lane wrote: > Kyotaro HORIGUCHI writes: >> The phrase "{INDEX | TABLE |..} name" seems to me indivisible as >> target specification. IMHO, the options for VACUUM and so is >> placed *just after* command name, not *before* the target. > >> If this is right, the syntax would be like this. > >> REINDEX [ (option [, option ...] ) ] {INDEX | TABLE | etc } name > >> What do you think about this? > > I think this is wrong and ugly. INDEX etc are part of the command name. I don't think so. I think they're part of the target. We have one manual page is for REINDEX, not five separate reference pages for REINDEX INDEX, REINDEX TABLE, REINDEX SCHEMA, REINDEX DATABASE, and REINDEX SYSTEM. If we really wanted to, we could probably even support this: REINDEX INDEX foo, TABLE bar, TABLE baz; We've got a mix of styles for extensible options right now: EXPLAIN [ ( option [, ...] ) ] statement COPY table_name [ ( column_name [, ...] ) ] FROM { 'filename' | PROGRAM 'command' | STDIN } [ [ WITH ] ( option [, ...] ) ] VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE } [, ...] ) ] [ table_name [ (column_name [, ...] ) ] ] So COPY puts the options at the very end, but EXPLAIN and VACUUM put them right after the command name. I prefer the latter style and would vote to adopt it here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords
Tom Lane-2 wrote > Stephen Frost < > sfrost@ > > writes: >> * Robert Haas ( > robertmhaas@ > ) wrote: >>> On Thu, Feb 5, 2015 at 10:48 AM, Stephen Frost < > sfrost@ > > wrote: And I thought this was about FDW options and not about dblink, really.. > >>> The OP is pretty clearly asking about dblink. > >> I was just pointing out that it was an issue that all FDWs suffer from, >> since we don't have any way for an FDW to say "don't show this option", >> as discussed. > > The dblink example is entirely uncompelling, given that as you said > somebody with access to a dblink connection could execute ALTER USER on > the far end. So lets fix that loop-hole as well... > So I would rather say that the baseline security expectation is that > granting a user mapping should be presumed to be tantamount to granting > direct access to the remote server with that login info. In that context, > being able to see the password should not be considered to be any big > deal. Is there any provision whereby "USAGE" would restrict the person so granted from viewing any particulars even though they can call/name the item being granted; and then require "SELECT" privileges to actual view any of the associated settings? Regardless, the OP described behavior of suppressing user options normally but then showing them upon being granted USAGE on the server seems strange. David J. -- View this message in context: http://postgresql.nabble.com/GRANT-USAGE-on-FOREIGN-SERVER-exposes-passwords-tp5836652p5836826.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords
Stephen Frost writes: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Thu, Feb 5, 2015 at 10:48 AM, Stephen Frost wrote: >>> And I thought this was about FDW options and not about dblink, really.. >> The OP is pretty clearly asking about dblink. > I was just pointing out that it was an issue that all FDWs suffer from, > since we don't have any way for an FDW to say "don't show this option", > as discussed. The dblink example is entirely uncompelling, given that as you said somebody with access to a dblink connection could execute ALTER USER on the far end. It's much more credible to imagine that someone might be given a mapping for a postgres_fdw, oracle_fdw, etc foreign server with just a few foreign tables, with the expectation that that couldn't be parlayed into unfettered access to the remote server. Whether this is a realistic expectation given the wording of the SQL-MED standard is unclear. I'm also concerned that if we take this on board as being a security concern, it will mean that any time we make an effort to push some construct we didn't before over to the remote end, we have to worry about whether it would be a security breach to allow the local user to cause that code to execute on the remote end. It's tough enough worrying about semantic-equivalence issues without mixing hostile-user scenarios in. So I would rather say that the baseline security expectation is that granting a user mapping should be presumed to be tantamount to granting direct access to the remote server with that login info. In that context, being able to see the password should not be considered to be any big deal. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords
* Robert Haas (robertmh...@gmail.com) wrote: > On Thu, Feb 5, 2015 at 10:48 AM, Stephen Frost wrote: > >> If she's got dblink access, she can run arbitrary > >> SQL queries on the remote server anyway, which is all the password > >> would let her do. Also, she could use dblink to run ALTER ROLE foo > >> PASSWORD '...' on the remote server, and then she'll *definitely* know > >> the password. > > > > And I thought this was about FDW options and not about dblink, really.. > > The OP is pretty clearly asking about dblink. I was just pointing out that it was an issue that all FDWs suffer from, since we don't have any way for an FDW to say "don't show this option", as discussed. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords
On Thu, Feb 5, 2015 at 10:48 AM, Stephen Frost wrote: >> If she's got dblink access, she can run arbitrary >> SQL queries on the remote server anyway, which is all the password >> would let her do. Also, she could use dblink to run ALTER ROLE foo >> PASSWORD '...' on the remote server, and then she'll *definitely* know >> the password. > > And I thought this was about FDW options and not about dblink, really.. The OP is pretty clearly asking about dblink. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible problem with pgcrypto
Jan Wieck writes: > I have encountered a small instability in the behavior of pgcrypto's > pgp_sym_decrypt() function. Attached is a script that can reproduce the > problem. It may have to be run repeatedly because the symptom occurs > rather seldom. > What the script does is to encode a small string with pgp_sym_encrypt() > and then repeatedly try to decrypt it with different "wrong" passwords. > The expected error message for that is of course > "Wrong key or corrupt data". > Every now and then, I get a different error message. Things I've seen are: Have you tested this with this week's releases? We fixed some memory-mishandling bugs in pgcrypto ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Possible problem with pgcrypto
Hi, I have encountered a small instability in the behavior of pgcrypto's pgp_sym_decrypt() function. Attached is a script that can reproduce the problem. It may have to be run repeatedly because the symptom occurs rather seldom. What the script does is to encode a small string with pgp_sym_encrypt() and then repeatedly try to decrypt it with different "wrong" passwords. The expected error message for that is of course "Wrong key or corrupt data". Every now and then, I get a different error message. Things I've seen are: "Not text data" "pgcrypto bug" This seems to be triggered by a combination of the random data included in the encrypted data as well as the wrong password, because for an instance of encrypted data only certain passwords cause this symptom. I wonder if this may actually be a bug in pgcrypto or if this is an error inherent in the way, the encrypted data is encoded. I.e. that the decryption algorithm cannot really figure out what is wrong and just sometimes gets a little further in the attempt to decrypt. Jan -- Jan Wieck Senior Software Engineer http://slony.info pgcrypto_test.sh Description: application/shellscript -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords
* Robert Haas (robertmh...@gmail.com) wrote: > On Tue, Feb 3, 2015 at 6:26 PM, Noah Yetter wrote: > > The obvious objection is, "well you should just use foreign tables instead > > of dblink()". I'll cut a long story short by saying that doesn't work for > > us. We are using postgres_fdw to allow our analysts to run queries against > > AWS Redshift and blend those results with tables in our OLTP schema. If you > > know anything about Redshift, or about analysts, you'll realize immediately > > why foreign tables are not a viable solution. Surely there are many others > > in a similar position, where the flexibility offered by dblink() makes it > > preferable to fixed foreign tables. > > > > S... what gives? This seems like a really obvious security hole. I've > > searched the mailing list archives repeatedly and found zero discussion of > > this issue. > > Maybe this is an impertinent question, but why do you care if the user > has the password? Eh. Password-reuse risk, policies, regulations and auditing all come to mind. > If she's got dblink access, she can run arbitrary > SQL queries on the remote server anyway, which is all the password > would let her do. Also, she could use dblink to run ALTER ROLE foo > PASSWORD '...' on the remote server, and then she'll *definitely* know > the password. And I thought this was about FDW options and not about dblink, really.. > I would suggest not relying on password authentication in this > situation. Instead, use pg_hba.conf to restrict connections by IP and > SSL mode, and maybe consider SSL certificate authentication. That's not actually an option here though, is it? dblink_connect requires a password-based authentication, unless you're a superuser (which I'm pretty sure Noah Y would prefer these folks not be..). Further, I don't think you get to control whatever the pg_hba.conf is on the RedShift side.. I agree with the general sentiment that it'd be better to use other authentication methods (SSL certificates or Kerberos credentials), but we'd need to provide a way for those to work for non-superusers. Kerberos credential-forwarding comes to mind but I don't know of anyone who is currently working on that and I doubt it'd work with Redshift anyway. > All that having been said, it wouldn't be crazy to try to invent a > system to lock this down, but it *would* be complicated. An > individual FDW can call its authentication-related options anything it > likes; they do not need to be called 'password'. So we'd need a way > to identify which options should be hidden from untrusted users, and > then a bunch of mechanism to do that. Agreed, we'd need to provide a way for FDWs to specify which options should be hidden and which shouldn't be. For my 2c, I do think that'd be worthwhile to do. We let users change their own passwords with ALTER USER too, but they don't get to view it (or even the hash of it) in pg_authid. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Early Setup of instrumentation information in pg_stat_statements
Currently in pg_stat_statements, the setup to track instrumentation/totaltime information is done after ExecutorStart(). Can we do this before ExecutorStart()? In particular, I am referring to below code: static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags) { .. standard_ExecutorStart(queryDesc, eflags); .. if (pgss_enabled() && queryDesc->plannedstmt->queryId != 0) { .. if (queryDesc->totaltime == NULL) { .. queryDesc->totaltime = InstrAlloc(1, INSTRUMENT_ALL); .. } } } The reason why I am asking is that to track instrumentation information (like buffer usage parameters) in case of parallel sequence scan, we need to pass this information at start of backend workers which are started in ExecutorStart() phase and at that time, there is no information available which can guarantee (we have queryId stored in planned stmt, but I think that is not sufficient to decide) that such an information is needed by plugin. This works well for Explain statement as that has the information for instrumentation available before ExecutorStart() phase. Please suggest me if there is a better way to make this information available in ExecutorStart() phase? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords
Robert Haas writes: > All that having been said, it wouldn't be crazy to try to invent a > system to lock this down, but it *would* be complicated. An > individual FDW can call its authentication-related options anything it > likes; they do not need to be called 'password'. So we'd need a way > to identify which options should be hidden from untrusted users, and > then a bunch of mechanism to do that. It's also debatable whether this wouldn't be a violation of the SQL standard. I see nothing in the SQL-MED spec authorizing filtering of the information_schema.user_mapping_options view. We actually are doing some filtering of values in user_mapping_options, but it's all-or-nothing so far as the options for any one mapping go. That's still not exactly supportable per spec but it's probably less of a violation than option-by-option filtering would be. It also looks like that filtering differs in corner cases from what the regular pg_user_mappings view does, which is kinda silly. In particular I think we should try to get rid of the explicit provision for superuser access. I was hoping Peter would weigh in on what his design considerations were for these views ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables
> That's assuming that toasting is evenly spread between tables. In my > experience, that's not a great bet... Time to create a test: SELECT chunk_id::bigint/10 as id_range, count(*), count(*)/(10::float) density FROM (SELECT chunk_id FROM pg_toast.pg_toast_39000165 WHERE chunk_id < 1 AND chunk_seq = 0) f GROUP BY id_range ORDER BY id_range; The machine in question was restored in parallel in Sept 2013 as part of an upgrade from 8.4. It has about 2000 tables, so its definitely not dominated by a couple tables. Progress towards oid wrap around is about 25.6%. With minimal effort, I found 2 bad examples, and I’m sure I can easily find more. I attached the results for those two. There were runs of 1,100,000+ and 600,000+ chunk_ids where more than 99% of the chunk_id are taken. After restore completion, oid densities averaged less than 20 per 100,000 and 400 per 100,000 respectively. The only reasons those runs seem to be so short is because the tables were much smaller back then. I expect that next time I dump restore (necessary for upgrading OS versions due to the collation issue), I’m going to have runs closer to 20,,000. > ... this "fix" would actually make things enormously worse. With the > single counter feeding all tables, you at least have a reasonable > probability that there are not enormously long runs of consecutive OIDs in > any one toast table. With a sequence per table, you are nearly guaranteed > that there are such runs, because inserts into other tables don't create a > break. It makes each toast table independent (and far less likely to wrap) . It would wrap when the sum(mods on THIS toast table) > 2^32. Right now the function looks like: sum(mods on ALL toast tables in cluster) + sum(created normal tables in cluster * k) + sum(created temp tables in cluster * k) + [...] > 2^32, WHERE k average number of ids consumed for pg_class, pg_type, etc... In the case of an insert only table (which is a common use case for partitions), the id would only wrap when the TOAST table was “full”. On the other hand currently, it would wrap into its pg_restored section when the combined oid consuming operations on the cluster surpassed 4 billion. That being said, I’m certainly not attached to that solution. My real argument is that although its not a problem today, we are only about 5 years from it being a problem for large installs and the first time you’ll hear about it is after someone has a 5 minute production outage on a database thats been taking traffic for 2 years. - Matt K. id_range | count | density -+---+- 390 | 92188 | 0.92188 391 | 99186 | 0.99186 392 | 99826 | 0.99826 393 | 99101 | 0.99101 394 | 99536 | 0.99536 395 | 99796 | 0.99796 396 | 99321 | 0.99321 397 | 99768 | 0.99768 398 | 99744 | 0.99744 399 | 99676 | 0.99676 400 | 98663 | 0.98663 401 | 40690 | 0.4069 403 |92 | 0.00092 404 | 491 | 0.00491 407 |74 | 0.00074 408 |54 | 0.00054 415 | 152 | 0.00152 416 |47 | 0.00047 419 |59 | 0.00059 422 | 2 | 2e-05 423 |14 | 0.00014 424 | 5 | 5e-05 425 |11 | 0.00011 426 | 7 | 7e-05 427 | 5 | 5e-05 428 | 6 | 6e-05 517 | 5 | 5e-05 518 | 9 | 9e-05 519 | 6 | 6e-05 520 |12 | 0.00012 521 |17 | 0.00017 522 | 5 | 5e-05 588 |15 | 0.00015 589 |10 | 0.0001 590 |19 | 0.00019 591 |12 | 0.00012 592 |12 | 0.00012 593 | 2 | 2e-05 617 | 4 | 4e-05 618 | 9 | 9e-05 619 | 7 | 7e-05 620 |14 | 0.00014 621 | 5 | 5e-05 622 |11 | 0.00011 682 | 8 | 8e-05 683 |13 | 0.00013 684 |17 | 0.00017 685 | 6 | 6e-05 686 |17 | 0.00017 687 | 4 | 4e-05 767 | 5 | 5e-05 768 |10 | 0.0001 769 | 9 | 9e-05 770 | 2 | 2e-05 771 |14 | 0.00014 772 | 2 | 2e-05 773 |11 | 0.00011 774 |13 | 0.00013 775 |10 | 0.0001 776 | 3 | 3e-05 914 | 7 | 7e-05 915 | 7 | 7e-05 916 | 1 | 1e-05 917 | 3 | 3e-05 918 | 3 | 3e-05 919 | 5 | 5e-05 920 | 4 | 4e-05 921 | 9 | 9e-05 922 | 9 | 9e-05 923 | 1 | 1e-05 (70 rows) id_range | count | density -+---+- 402 | 96439 | 0.96439 403 | 99102 | 0.99102 404 | 98787 | 0.98787
Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords
On Tue, Feb 3, 2015 at 6:26 PM, Noah Yetter wrote: > The obvious objection is, "well you should just use foreign tables instead > of dblink()". I'll cut a long story short by saying that doesn't work for > us. We are using postgres_fdw to allow our analysts to run queries against > AWS Redshift and blend those results with tables in our OLTP schema. If you > know anything about Redshift, or about analysts, you'll realize immediately > why foreign tables are not a viable solution. Surely there are many others > in a similar position, where the flexibility offered by dblink() makes it > preferable to fixed foreign tables. > > S... what gives? This seems like a really obvious security hole. I've > searched the mailing list archives repeatedly and found zero discussion of > this issue. Maybe this is an impertinent question, but why do you care if the user has the password? If she's got dblink access, she can run arbitrary SQL queries on the remote server anyway, which is all the password would let her do. Also, she could use dblink to run ALTER ROLE foo PASSWORD '...' on the remote server, and then she'll *definitely* know the password. I would suggest not relying on password authentication in this situation. Instead, use pg_hba.conf to restrict connections by IP and SSL mode, and maybe consider SSL certificate authentication. All that having been said, it wouldn't be crazy to try to invent a system to lock this down, but it *would* be complicated. An individual FDW can call its authentication-related options anything it likes; they do not need to be called 'password'. So we'd need a way to identify which options should be hidden from untrusted users, and then a bunch of mechanism to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 02/05/2015 04:47 PM, Andres Freund wrote: On 2015-02-05 09:42:37 -0500, Robert Haas wrote: I previously proposed 100 segments, or 1.6GB. If that seems too large, how about 64 segments, or 1.024GB? I think there will be few people who can't tolerate a gigabyte of xlog under peak load, and an awful lot who will benefit from it. It'd be quite easier to go there if we'd shrink back to the min_size after a while, after having peaked above it. IIUC the patch doesn't do that? It doesn't actively go and remove files once they've already been recycled, but if the system stays relatively idle for several checkpoints, the WAL usage will shrink down again. That's the core idea of the patch. If the system stays completely or almost completely idle, that won't happen though, because then it will never switch to a new segment so none of the segments become old so that they could be removed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 2015-02-05 09:42:37 -0500, Robert Haas wrote: > I previously proposed 100 segments, or 1.6GB. If that seems too > large, how about 64 segments, or 1.024GB? I think there will be few > people who can't tolerate a gigabyte of xlog under peak load, and an > awful lot who will benefit from it. It'd be quite easier to go there if we'd shrink back to the min_size after a while, after having peaked above it. IIUC the patch doesn't do that? Admittedly it's not easy to come up with an algorithm that doesn't cause superflous file removals. Initiating wal files isn't cheap. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Simplify sleeping while reading/writing from client
Looking again at the code after Andres' interrupt-handling patch series, I got confused by the fact that there are several wait-retry loops in different layers, and reading and writing works slightly differently. I propose the attached, which pulls all the wait-retry logic up to secure_read() and secure_write(). This makes the code a lot more understandable. - Heikki From 4e73dbc72d279c2cb66af0a16357eab17f1c740b Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Thu, 5 Feb 2015 16:44:13 +0200 Subject: [PATCH 1/1] Simplify waiting logic in reading from / writing to client. The client socket is always in non-blocking mode, and if we actually want blocking behaviour, we emulate it by sleeping and retrying. But we retry loops at different layers for reads and writes, which was confusing. To simplify, remove all the sleeping and retrying code from the lower levels, from be_tls_read and secure_raw_read and secure_raw_write, and put all the logic in secure_read() and secure_write(). --- src/backend/libpq/be-secure-openssl.c | 81 +-- src/backend/libpq/be-secure.c | 143 ++ src/backend/libpq/pqcomm.c| 3 +- src/include/libpq/libpq-be.h | 4 +- 4 files changed, 79 insertions(+), 152 deletions(-) diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index d5f9712..39587e8 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -511,14 +511,11 @@ be_tls_close(Port *port) * Read data from a secure connection. */ ssize_t -be_tls_read(Port *port, void *ptr, size_t len) +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor) { ssize_t n; int err; - int waitfor; - int latchret; -rloop: errno = 0; n = SSL_read(port->ssl, ptr, len); err = SSL_get_error(port->ssl, n); @@ -528,39 +525,15 @@ rloop: port->count += n; break; case SSL_ERROR_WANT_READ: + *waitfor = WL_SOCKET_READABLE; + errno = EWOULDBLOCK; + n = -1; + break; case SSL_ERROR_WANT_WRITE: - /* Don't retry if the socket is in nonblocking mode. */ - if (port->noblock) - { -errno = EWOULDBLOCK; -n = -1; -break; - } - - waitfor = WL_LATCH_SET; - - if (err == SSL_ERROR_WANT_READ) -waitfor |= WL_SOCKET_READABLE; - else -waitfor |= WL_SOCKET_WRITEABLE; - - latchret = WaitLatchOrSocket(MyLatch, waitfor, port->sock, 0); - - /* - * We'll, among other situations, get here if the low level - * routine doing the actual recv() via the socket got interrupted - * by a signal. That's so we can handle interrupts once outside - * openssl, so we don't jump out from underneath its covers. We - * can check this both, when reading and writing, because even - * when writing that's just openssl's doing, not a 'proper' write - * initiated by postgres. - */ - if (latchret & WL_LATCH_SET) - { -ResetLatch(MyLatch); -ProcessClientReadInterrupt(true); /* preserves errno */ - } - goto rloop; + *waitfor = WL_SOCKET_WRITEABLE; + errno = EWOULDBLOCK; + n = -1; + break; case SSL_ERROR_SYSCALL: /* leave it to caller to ereport the value of errno */ if (n != -1) @@ -595,12 +568,10 @@ rloop: * Write data to a secure connection. */ ssize_t -be_tls_write(Port *port, void *ptr, size_t len) +be_tls_write(Port *port, void *ptr, size_t len, int *waitfor) { ssize_t n; int err; - int waitfor; - int latchret; /* * If SSL renegotiations are enabled and we're getting close to the @@ -653,7 +624,6 @@ be_tls_write(Port *port, void *ptr, size_t len) } } -wloop: errno = 0; n = SSL_write(port->ssl, ptr, len); err = SSL_get_error(port->ssl, n); @@ -663,30 +633,15 @@ wloop: port->count += n; break; case SSL_ERROR_WANT_READ: + *waitfor = WL_SOCKET_READABLE; + errno = EWOULDBLOCK; + n = -1; + break; case SSL_ERROR_WANT_WRITE: - - waitfor = WL_LATCH_SET; - - if (err == SSL_ERROR_WANT_READ) -waitfor |= WL_SOCKET_READABLE; - else -waitfor |= WL_SOCKET_WRITEABLE; - - latchret = WaitLatchOrSocket(MyLatch, waitfor, port->sock, 0); - - /* - * Check for interrupts here, in addition to secure_write(), - * because an interrupted write in secure_raw_write() will return - * here, and we cannot return to secure_write() until we've - * written something. - */ - if (latchret & WL_LATCH_SET) - { -ResetLatch(MyLatch); -ProcessClientWriteInterrupt(true); /* preserves errno */ - } - - goto wloop; + *waitfor = WL_SOCKET_WRITEABLE; + errno = EWOULDBLOCK; + n = -1; + break; case SSL_ERROR_SYSCALL: /* leave it to caller to ereport the value of errno */ if (n != -1) diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c index c2c1842..4e7acbe 100644 --- a/src/backend/libpq/be-secure.c +++ b/src/backend/libpq/be-secure.c @@ -127,30 +127,45 @@ ssize_t secure_read(Port *port, void *
Re: [HACKERS] Redesigning checkpoint_segments
On Wed, Feb 4, 2015 at 4:41 PM, Josh Berkus wrote: >> That's certainly better, but I think we should go further. Again, >> you're not committed to using this space all the time, and if you are >> using it you must have a lot of write activity, which means you are >> not running on a tin can and a string. If you have a little tiny >> database, say 100MB, running on a little-tiny Amazon instance, >> handling a small number of transactions, you're going to stay close to >> wal_min_size anyway. Right? > > Well, we can test that. > > So what's your proposed size? I previously proposed 100 segments, or 1.6GB. If that seems too large, how about 64 segments, or 1.024GB? I think there will be few people who can't tolerate a gigabyte of xlog under peak load, and an awful lot who will benefit from it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlikely-to-happen crash in ecpg driver caused by NULL-pointer check not done
On Tue, Feb 03, 2015 at 10:44:17PM +0900, Michael Paquier wrote: > All those things are addressed in the patch attached. Fixed a typo and commited. Thanks Michael for fixing and Heikki for reviewing. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, >/* >+* We recheck the actual size even if pglz_compress() report success, >+* because it might be satisfied with having saved as little as one byte >+* in the compressed data. >+*/ >+ *len = (uint16) compressed_len; >+ if (*len >= orig_len - 1) >+ return false; >+ return true; >+} As per latest code ,when compression is 'on' we introduce additional 2 bytes in the header of each block image for storing raw_length of the compressed block. In order to achieve compression while accounting for these two additional bytes, we must ensure that compressed length is less than original length - 2. So , IIUC the above condition should rather be If (*len >= orig_len -2 ) return false; return true; The attached patch contains this. It also has a cosmetic change- renaming compressBuf to uncompressBuf as it is used to store uncompressed page. Thank you, Rahila Syed -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier Sent: Wednesday, January 07, 2015 9:32 AM To: Rahila Syed Cc: PostgreSQL mailing lists Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes On Wed, Jan 7, 2015 at 12:51 AM, Rahila Syed wrote: > Following are some comments, Thanks for the feedback. >>uint16 hole_offset:15, /* number of bytes in "hole" */ > Typo in description of hole_offset Fixed. That's "before hole". >>for (block_id = 0; block_id <= record->max_block_id; block_id++) >>- { >>- if (XLogRecHasBlockImage(record, block_id)) >>- fpi_len += BLCKSZ - > record->blocks[block_id].hole_length; >>- } >>+ fpi_len += record->blocks[block_id].bkp_len; > > IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is > incorrectly removed from the above for loop. Fixed. >>typedef struct XLogRecordCompressedBlockImageHeader > I am trying to understand the purpose behind declaration of the above > struct. IIUC, it is defined in order to introduce new field uint16 > raw_length and it has been declared as a separate struct from > XLogRecordBlockImageHeader to not affect the size of WAL record when > compression is off. > I wonder if it is ok to simply memcpy the uint16 raw_length in the > hdr_scratch when compression is on and not have a separate header > struct for it neither declare it in existing header. raw_length can be > a locally defined variable is XLogRecordAssemble or it can be a field > in registered_buffer struct like compressed_page. > I think this can simplify the code. > Am I missing something obvious? You are missing nothing. I just introduced this structure for a matter of readability to show the two-byte difference between non-compressed and compressed header information. It is true that doing it my way makes the structures duplicated, so let's simply add the compression-related information as an extra structure added after XLogRecordBlockImageHeader if the block is compressed. I hope this addresses your concerns. >> /* >> * Fill in the remaining fields in the XLogRecordBlockImageHeader >> * struct and add new entries in the record chain. >> */ > >> bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; > > This code line seems to be misplaced with respect to the above comment. > Comment indicates filling of XLogRecordBlockImageHeader fields while > fork_flags is a field of XLogRecordBlockHeader. > Is it better to place the code close to following condition? > if (needs_backup) > { Yes, this comment should not be here. I replaced it with the comment in HEAD. >>+ *the original length of the >>+ * block without its page hole being deducible from the compressed >>+ data >>+ * itself. > IIUC, this comment before XLogRecordBlockImageHeader seems to be no > longer valid as original length is not deducible from compressed data > and rather stored in header. Aah, true. This was originally present in the header of PGLZ that has been removed to make it available for frontends. Updated patches are attached. Regards, -- Michael __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v15.patch Description: Support-compression-for-full-page-writes-in-WAL_v15.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File based Incremental backup v9
Hi Marco, On Sunday 01 February 2015 00:47:24 Marco Nenciarini wrote: > You can find the updated patch attached to this message. I've been testing the v9 patch with checksums enabled and I end up with a lot of warnings like these ones: WARNING: page verification failed, calculated checksum 47340 but expected 47342 WARNING: page verification failed, calculated checksum 16649 but expected 16647 WARNING: page verification failed, calculated checksum 13567 but expected 13565 WARNING: page verification failed, calculated checksum 14110 but expected 14108 WARNING: page verification failed, calculated checksum 40990 but expected 40988 WARNING: page verification failed, calculated checksum 46242 but expected 46244 I can reproduce the problem with the following script: WORKDIR=/home/fcanovai/tmp psql -c "CREATE DATABASE pgbench" pgbench -i -s 100 --foreign-keys pgbench mkdir $WORKDIR/tbsp psql -c "CREATE TABLESPACE tbsp LOCATION '$WORKDIR/tbsp'" psql -c "ALTER DATABASE pgbench SET TABLESPACE tbsp" Regards, Francesco -- Francesco Canovai - 2ndQuadrant Italy PostgreSQL Training, Services and Support francesco.cano...@2ndquadrant.it | www.2ndQuadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Tue, Jan 6, 2015 at 11:09 AM, Michael Paquier wrote: > On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao wrote: >> On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote: >> The patch 1 cannot be applied to the master successfully because of >> recent change. > Yes, that's caused by ccb161b. Attached are rebased versions. > >>> - The real stuff comes with patch 2, that implements the removal of >>> PGLZ_Header, changing the APIs of compression and decompression to pglz to >>> not have anymore toast metadata, this metadata being now localized in >>> tuptoaster.c. Note that this patch protects the on-disk format (tested with >>> pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of >>> compression and decompression look like with this patch, simply performing >>> operations from a source to a destination: >>> extern int32 pglz_compress(const char *source, int32 slen, char *dest, >>> const PGLZ_Strategy *strategy); >>> extern int32 pglz_decompress(const char *source, char *dest, >>> int32 compressed_size, int32 raw_size); >>> The return value of those functions is the number of bytes written in the >>> destination buffer, and 0 if operation failed. >> >> So it's guaranteed that 0 is never returned in success case? I'm not sure >> if that case can really happen, though. > This is an inspiration from lz4 APIs. Wouldn't it be buggy for a > compression algorithm to return a size of 0 bytes as compressed or > decompressed length btw? We could as well make it return a negative > value when a failure occurs if you feel more comfortable with it. I feel that's better. Attached is the updated version of the patch. I changed the pg_lzcompress and pg_lzdecompress so that they return -1 when failure happens. Also I applied some cosmetic changes to the patch (e.g., shorten the long name of the newly-added macros). Barring any objection, I will commit this. Regards, -- Fujii Masao *** a/src/backend/access/heap/tuptoaster.c --- b/src/backend/access/heap/tuptoaster.c *** *** 35,43 #include "access/tuptoaster.h" #include "access/xact.h" #include "catalog/catalog.h" #include "miscadmin.h" #include "utils/fmgroids.h" - #include "utils/pg_lzcompress.h" #include "utils/rel.h" #include "utils/typcache.h" #include "utils/tqual.h" --- 35,43 #include "access/tuptoaster.h" #include "access/xact.h" #include "catalog/catalog.h" + #include "common/pg_lzcompress.h" #include "miscadmin.h" #include "utils/fmgroids.h" #include "utils/rel.h" #include "utils/typcache.h" #include "utils/tqual.h" *** *** 45,50 --- 45,70 #undef TOAST_DEBUG + /* + * The information at the start of the compressed toast data. + */ + typedef struct toast_compress_header + { + int32 vl_len_; /* varlena header (do not touch directly!) */ + int32 rawsize; + } toast_compress_header; + + /* + * Utilities for manipulation of header information for compressed + * toast entries. + */ + #define TOAST_COMPRESS_HDRSZ ((int32) sizeof(toast_compress_header)) + #define TOAST_COMPRESS_RAWSIZE(ptr) (((toast_compress_header *) ptr)->rawsize) + #define TOAST_COMPRESS_RAWDATA(ptr) \ + (((char *) ptr) + TOAST_COMPRESS_HDRSZ) + #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \ + (((toast_compress_header *) ptr)->rawsize = len) + static void toast_delete_datum(Relation rel, Datum value); static Datum toast_save_datum(Relation rel, Datum value, struct varlena * oldexternal, int options); *** *** 53,58 static bool toastid_valueid_exists(Oid toastrelid, Oid valueid); --- 73,79 static struct varlena *toast_fetch_datum(struct varlena * attr); static struct varlena *toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length); + static struct varlena *toast_decompress_datum(struct varlena * attr); static int toast_open_indexes(Relation toastrel, LOCKMODE lock, Relation **toastidxs, *** *** 138,148 heap_tuple_untoast_attr(struct varlena * attr) /* If it's compressed, decompress it */ if (VARATT_IS_COMPRESSED(attr)) { ! PGLZ_Header *tmp = (PGLZ_Header *) attr; ! ! attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ); ! SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ); ! pglz_decompress(tmp, VARDATA(attr)); pfree(tmp); } } --- 159,166 /* If it's compressed, decompress it */ if (VARATT_IS_COMPRESSED(attr)) { ! struct varlena *tmp = attr; ! attr = toast_decompress_datum(tmp); pfree(tmp); } } *** *** 163,173 heap_tuple_untoast_attr(struct varlena * attr) /* * This is a compressed value inside of the main tuple */ ! PGLZ_Header *tmp = (PGLZ_Header *) attr; ! ! attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ); ! SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ); ! pglz_decompress(tmp, VARDATA(attr));