date:20150205

On Thu, Feb 5, 2015 at 11:58 PM, Michael Paquier wrote:
 An updated patch is attached.
I just noticed that the patch I sent was incorrect:
- Parameter name was still wal_availability_check_interval and not
wal_retrieve_retry_interval
- Documentation was incorrect.
Please use the patch attached instead for further review.
-- 
Michael
From 06a4d3d1f5fe4362d7be1404cbf0b45b74fea69f Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 19 Jan 2015 16:08:48 +0900
Subject: [PATCH] Add wal_retrieve_retry_interval

This parameter aids to control at which timing WAL availability is checked
when a node is in recovery, particularly  when successive failures happen when
fetching WAL archives, or when fetching WAL records from a streaming source.

Default value is 5s.
---
 doc/src/sgml/recovery-config.sgml   | 17 +
 src/backend/access/transam/recovery.conf.sample |  9 +
 src/backend/access/transam/xlog.c   | 47 +
 src/backend/utils/adt/timestamp.c   | 38 
 src/include/utils/timestamp.h   |  2 ++
 5 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 0c64ff2..d4babbd 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,23 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /listitem
  /varlistentry
 
+ varlistentry id=wal-retrieve-retry-interval xreflabel=wal_retrieve_retry_interval
+  termvarnamewal_retrieve_retry_interval/varname (typeinteger/type)
+  indexterm
+primaryvarnamewal_retrieve_retry_interval/ recovery parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+This parameter specifies the amount of time to wait when a failure
+occurred when reading WAL from a source (be it via streaming
+replication, local filenamepg_xlog/ or WAL archive) for a node
+in standby mode, or when WAL is expected from a source. Default
+value is literal5s/.
+   /para
+  /listitem
+ /varlistentry
+
 /variablelist
 
   /sect1
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index b777400..458308c 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,15 @@
 #
 #recovery_end_command = ''
 #
+#
+# wal_retrieve_retry_interval
+#
+# specifies an optional internal to wait for WAL to become available when
+# a failure occurred when reading WAL from a source for a node in standby
+# mode, or when WAL is expected from a source.
+#
+#wal_retrieve_retry_interval = '5s'
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 629a457..111e53d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int	wal_retrieve_retry_interval = 5000;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -4881,6 +4882,26 @@ readRecoveryCommandFile(void)
 	(errmsg_internal(trigger_file = '%s',
 	 TriggerFile)));
 		}
+		else if (strcmp(item-name, wal_retrieve_retry_interval) == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item-value, wal_retrieve_retry_interval, GUC_UNIT_MS,
+		   hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg(parameter \%s\ requires a temporal value,
+wal_retrieve_retry_interval),
+		 hintmsg ? errhint(%s, _(hintmsg)) : 0));
+			ereport(DEBUG2,
+	(errmsg_internal(wal_retrieve_retry_interval = '%s', item-value)));
+
+			if (wal_retrieve_retry_interval = 0)
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg(\%s\ must have a value strictly positive,
+wal_retrieve_retry_interval)));
+		}
 		else if (strcmp(item-name, recovery_min_apply_delay) == 0)
 		{
 			const char *hintmsg;
@@ -10340,8 +10361,8 @@ static bool
 WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 			bool fetching_ckpt, XLogRecPtr tliRecPtr)
 {
-	static pg_time_t last_fail_time = 0;
-	pg_time_t	now;
+	TimestampTz now = GetCurrentTimestamp();
+	TimestampTz	last_fail_time = now;
 
 	/*---
 	 * Standby mode is implemented by a state machine:
@@ -10490,15 +10511,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * machine, so we've exhausted all the options for
 	 * obtaining the requested WAL. We're

Re: [HACKERS] pg_basebackup may fail to send feedbacks.

2015-02-05 Thread Kyotaro HORIGUCHI

Sorry, I misunderstood that.

  At Wed, 4 Feb 2015 19:22:39 +0900, Fujii Masao masao.fu...@gmail.com 
  wrote in 
  cahgqgwgudgcmnhzinkd37i+jijdkruecrea1ncrs1mmte3r...@mail.gmail.com
  On Wed, Feb 4, 2015 at 4:58 PM, Kyotaro HORIGUCHI
  horiguchi.kyot...@lab.ntt.co.jp wrote:
   I'm very sorry for confused report. The problem found in 9.4.0
   and the diagnosis was mistakenly done on master.
  
   9.4.0 has no problem of feedback delay caused by slow xlog
   receiving on pg_basebackup mentioned in the previous mail. But
   the current master still has this problem.
 
  Seems walreceiver has the same problem. No?
 
  pg_receivexlog.c would have the same problem since it uses the
  same function with pg_basebackup.c.
 
  The correspondent of HandleCopyStream in wansender is
  WalReceiverMain, and it doesn't seem to have the same kind of
  loop shown below. It seems to surely send feedback per one
  record.
 
  |   r = stream_reader();
  |   while (r  0)
  |   {
  |  ... wal record processing stuff without sending feedback..
  |  r = stream_reader();
  |   }
 
 WalReceiverMain() has the similar code as follows.
 
 len = walrcv_receive(NAPTIME_PER_CYCLE, buf);
 if (len != 0)
 {
 for (;;)
 {
 if (len  0)
 {
 
 len = walrcv_receive(0, buf);
 }
 }

The loop seems a bit different but eventually the same about this
discussion.

408 len = walrcv_receive(NAPTIME_PER_CYCLE, buf);
409 if (len != 0)
410 {
415 for (;;)
416 {
417 if (len  0)
418 {
425XLogWalRcvProcessMsg(buf[0], buf[1], len - 1);
426 }
427-438 else {break;}
439 len = walrcv_receive(0, buf);
440 }
441 }

XLogWalRcvProcessMsg doesn't send feedback when processing
'w'=WAL record packet. So the same thing as pg_basebackup and
pg_receivexlog will occur on walsender.

Exiting the for(;;) loop by TimestampDifferenceExceeds just
before line 439 is an equivalent measure to I poposed for
receivelog.c, but calling XLogWalRcvSendHSFeedback(false) there
is seemingly simpler (but I feel a bit uncomfortable for the
latter)

Or other measures?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

On Fri, Jan 30, 2015 at 10:26 PM, Andreas Karlsson wrote:
 On 01/30/2015 07:48 AM, Michael Paquier wrote:
 Looking at the latest patch, it seems that in
 AlterTableGetLockLevel@tablecmds.c we ought to put AT_ReAddConstraint,
 AT_AddConstraintRecurse and AT_ProcessedConstraint under the same
 banner as AT_AddConstraint. Thoughts?

 Good point. I think moving those would be a good thing even though it is
 technically not necessary for AT_AddConstraintRecurse, since that one should
 only be related to check constraints.

Andreas, are you planning to send an updated patch?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RangeType internal use

2015-02-05 Thread Amit Langote


Horiguchi-san,

On 06-02-2015 PM 04:34, Kyotaro HORIGUCHI wrote:
 Hi, from nearby:)
 

Thank you!

 I wonder why I cannot find a way to get a range type for a given (sub-)
 type. I would like to build a RangeType from Datum's of lower and upper
 bounds. Much like how construct_array() builds an ArrayType from a Datum
 array of elements given elements' type info.

 Is there some way I do not seem to know? If not, would it be worthwhile
 to make something like construct_range() that returns a RangeType given
 Datum's of lower and upper bounds and subtype info?
 
 make_range needs the range type itself.
 
 On SQL interfalce, you can get range type coresponds to a base
 type by looking up the pg_range catalog.
 
 SELECT rngtypid::regtype, rngsubtype::regtype
  FROM pg_range WHERE rngsubtype = 'int'::regtype;
 
  rngtypid  | rngsubtype 
 ---+
  int4range | integer
 
 But there's only one syscache for this catalog which takes range
 type id. So the reverse resolution rngsubtype-rngtype seems not
 available. TypeCahce has only comparison function info as surely
 available element related to range types but this wouldn't
 help. I think scanning the entire cache is not allowable even if
 possible.
 
 Perhaps what is needed is adding RANGESUBTYPE syscache but I
 don't know whether it is allowable or not.
 
 Thoughts?

Actually, I'm wondering if there is one-to-one mapping from rangetype to
subtype (and vice versa?), then this should be OK. But if not (that is
designers of range types thought there is not necessarily such a
mapping), then perhaps we could add, say, rngtypeisdefault flag to pg_range.

Perhaps following is not too pretty:

+
+/*
+ * get_rangetype_for_type
+ *
+ * returns a TypeCacheEntry for a range type of a given (sub-) type.
+ */
+TypeCacheEntry *
+get_rangetype_for_type(Oid subtypid)
+{
+   Relationrelation;
+   SysScanDesc scan;
+   HeapTuple   rangeTuple;
+   Oid rngsubtype;
+   Oid rngtypid = InvalidOid;
+
+   relation = heap_open(RangeRelationId, AccessShareLock);
+
+   scan = systable_beginscan(relation, InvalidOid, false,
+ NULL, 0, NULL);
+
+   while ((rangeTuple = systable_getnext(scan)) != NULL)
+   {
+   rngsubtype = ((Form_pg_range) 
GETSTRUCT(rangeTuple))-rngsubtype;
+
+   if (rngsubtype == subtypid)
+   {
+   rngtypid = ((Form_pg_range) 
GETSTRUCT(rangeTuple))-rngtypid;
+   break;
+   }
+   }
+
+   systable_endscan(scan);
+   heap_close(relation, AccessShareLock);
+
+   return(rngtypid != InvalidOid
+   ? lookup_type_cache(rngtypid, 
TYPECACHE_RANGE_INFO): NULL);
+}

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Syed, Rahila

Hello,
 
/*
+* We recheck the actual size even if pglz_compress() report success,
+* because it might be satisfied with having saved as little as one byte
+* in the compressed data.
+*/
+   *len = (uint16) compressed_len;
+   if (*len = orig_len - 1)
+   return false;
+   return true;
+}

As per latest code ,when compression is 'on' we introduce additional 2 bytes in 
the header of each block image for storing raw_length of the compressed block. 
In order to achieve compression while accounting for these two additional 
bytes, we must ensure that compressed length is less than original length - 2.
So , IIUC the above condition should rather be 

If (*len = orig_len -2 )
return false;
return true;
 
The attached patch contains this. It also has a cosmetic change-  renaming 
compressBuf to uncompressBuf as it is used to store uncompressed page.
 
Thank you,
Rahila Syed

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
Sent: Wednesday, January 07, 2015 9:32 AM
To: Rahila Syed
Cc: PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Wed, Jan 7, 2015 at 12:51 AM, Rahila Syed rahilasyed...@gmail.com wrote:
 Following are some comments,
Thanks for the feedback.

uint16  hole_offset:15, /* number of bytes in hole */
 Typo in description of hole_offset
Fixed. That's before hole.

for (block_id = 0; block_id = record-max_block_id; block_id++)
-   {
-   if (XLogRecHasBlockImage(record, block_id))
-   fpi_len += BLCKSZ -
 record-blocks[block_id].hole_length;
-   }
+   fpi_len += record-blocks[block_id].bkp_len;

 IIUC, if condition, /if(XLogRecHasBlockImage(record, block_id))/ is 
 incorrectly removed from the above for loop.
Fixed.

typedef struct XLogRecordCompressedBlockImageHeader
 I am trying to understand the purpose behind declaration of the above 
 struct. IIUC, it is defined in order to introduce new field uint16 
 raw_length and it has been declared as a separate struct from 
 XLogRecordBlockImageHeader to not affect the size of WAL record when 
 compression is off.
 I wonder if it is ok to simply memcpy the uint16 raw_length in the 
 hdr_scratch when compression is on and not have a separate header 
 struct for it neither declare it in existing header. raw_length can be 
 a locally defined variable is XLogRecordAssemble or it can be a field 
 in registered_buffer struct like compressed_page.
 I think this can simplify the code.
 Am I missing something obvious?
You are missing nothing. I just introduced this structure for a matter of 
readability to show the two-byte difference between non-compressed and 
compressed header information. It is true that doing it my way makes the 
structures duplicated, so let's simply add the compression-related information 
as an extra structure added after XLogRecordBlockImageHeader if the block is 
compressed. I hope this addresses your concerns.

 /*
  * Fill in the remaining fields in the XLogRecordBlockImageHeader
  * struct and add new entries in the record chain.
 */

   bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;

 This code line seems to be misplaced with respect to the above comment.
 Comment indicates filling of XLogRecordBlockImageHeader fields while 
 fork_flags is a field of XLogRecordBlockHeader.
 Is it better to place the code close to following condition?
   if (needs_backup)
   {
Yes, this comment should not be here. I replaced it with the comment in HEAD.


+  *the original length of the
+ * block without its page hole being deducible from the compressed 
+ data
+ * itself.
 IIUC, this comment before XLogRecordBlockImageHeader seems to be no 
 longer valid as original length is not deducible from compressed data 
 and rather stored in header.
Aah, true. This was originally present in the header of PGLZ that has been 
removed to make it available for frontends.

Updated patches are attached.
Regards,
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v15.patch
Description: Support-compression-for-full-page-writes-in-WAL_v15.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unlikely-to-happen crash in ecpg driver caused by NULL-pointer check not done

2015-02-05 Thread Michael Meskes

On Tue, Feb 03, 2015 at 10:44:17PM +0900, Michael Paquier wrote:
 All those things are addressed in the patch attached.

Fixed a typo and commited. Thanks Michael for fixing and Heikki for
reviewing.

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords

2015-02-05 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Feb 3, 2015 at 6:26 PM, Noah Yetter nyet...@gmail.com wrote:
  The obvious objection is, well you should just use foreign tables instead
  of dblink().  I'll cut a long story short by saying that doesn't work for
  us.  We are using postgres_fdw to allow our analysts to run queries against
  AWS Redshift and blend those results with tables in our OLTP schema.  If you
  know anything about Redshift, or about analysts, you'll realize immediately
  why foreign tables are not a viable solution.  Surely there are many others
  in a similar position, where the flexibility offered by dblink() makes it
  preferable to fixed foreign tables.
 
  S... what gives?  This seems like a really obvious security hole.  I've
  searched the mailing list archives repeatedly and found zero discussion of
  this issue.
 
 Maybe this is an impertinent question, but why do you care if the user
 has the password?  

Eh.  Password-reuse risk, policies, regulations and auditing all come to
mind.

 If she's got dblink access, she can run arbitrary
 SQL queries on the remote server anyway, which is all the password
 would let her do.  Also, she could use dblink to run ALTER ROLE foo
 PASSWORD '...' on the remote server, and then she'll *definitely* know
 the password.

And I thought this was about FDW options and not about dblink, really..

 I would suggest not relying on password authentication in this
 situation.  Instead, use pg_hba.conf to restrict connections by IP and
 SSL mode, and maybe consider SSL certificate authentication.

That's not actually an option here though, is it?  dblink_connect
requires a password-based authentication, unless you're a superuser
(which I'm pretty sure Noah Y would prefer these folks not be..).

Further, I don't think you get to control whatever the pg_hba.conf is on
the RedShift side..  I agree with the general sentiment that it'd be
better to use other authentication methods (SSL certificates or Kerberos
credentials), but we'd need to provide a way for those to work for
non-superusers.  Kerberos credential-forwarding comes to mind but I
don't know of anyone who is currently working on that and I doubt it'd
work with Redshift anyway.

 All that having been said, it wouldn't be crazy to try to invent a
 system to lock this down, but it *would* be complicated.  An
 individual FDW can call its authentication-related options anything it
 likes; they do not need to be called 'password'.  So we'd need a way
 to identify which options should be hidden from untrusted users, and
 then a bunch of mechanism to do that.

Agreed, we'd need to provide a way for FDWs to specify which options
should be hidden and which shouldn't be.  For my 2c, I do think that'd
be worthwhile to do.  We let users change their own passwords with ALTER
USER too, but they don't get to view it (or even the hash of it) in
pg_authid.

Thanks,

Stephen


signature.asc
Description: Digital signature

[HACKERS] Possible problem with pgcrypto

2015-02-05 Thread Jan Wieck


Hi,

I have encountered a small instability in the behavior of pgcrypto's 
pgp_sym_decrypt() function. Attached is a script that can reproduce the 
problem. It may have to be run repeatedly because the symptom occurs 
rather seldom.


What the script does is to encode a small string with pgp_sym_encrypt() 
and then repeatedly try to decrypt it with different wrong passwords. 
The expected error message for that is of course


Wrong key or corrupt data.

Every now and then, I get a different error message. Things I've seen are:

Not text data
pgcrypto bug

This seems to be triggered by a combination of the random data included 
in the encrypted data as well as the wrong password, because for an 
instance of encrypted data only certain passwords cause this symptom.


I wonder if this may actually be a bug in pgcrypto or if this is an 
error inherent in the way, the encrypted data is encoded. I.e. that the 
decryption algorithm cannot really figure out what is wrong and just 
sometimes gets a little further in the attempt to decrypt.



Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


pgcrypto_test.sh
Description: application/shellscript

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2015-02-05 Thread Josh Berkus

On 02/05/2015 01:28 PM, Robert Haas wrote:
 On Thu, Feb 5, 2015 at 2:11 PM, Josh Berkus j...@agliodbs.com wrote:
 Except that, when setting up servers for customers, one thing I pretty
 much always do for them is temporarily increase checkpoint_segments for
 the initial data load.  So having Postgres do this automatically would
 be a feature, not a bug.
 
 Right!
 
 I say we go with ~~ 1GB.  That's an 8X increase over current default
 size for the maximum
 
 Sounds great.
 
 Default of 4 for min_wal_size?
 
 I assume you mean 4 segments; why not 3 as currently?  As long as the
 system has the latitude to ratchet it up when needed, there seems to
 be little advantage to raising the minimum.  Of course I guess there
 must be some advantage or Heikki wouldn't have made it configurable,
 but I'd err on the side of keeping this one small.  Hopefully the
 system that automatically adjusts this is really smart, and a large
 min_wal_size is superfluous for most people.

Keep in mind that the current is actually 7, not three (3*2+1).  So 3
would be a siginficant decrease.  However, I don't feel strongly about
it either way.  I think that there is probably a minimum reasonable
value  1, but I'm not sure what it is.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSL renegotiation and other related woes


On 01/26/2015 12:14 PM, Andres Freund wrote:

Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.

Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning
SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
again with the same parameters. Unfortunately that rule doesn't mean
just that the same parameters have to be passed in, but also that we
can't just constantly switch between _read()/write(). Especially
nonblocking backend code (i.e. walsender) and the whole frontend code
violate this rule.


I don't buy that. Googling around, and looking at the man pages, I just 
don't see anything to support that claim. I believe it is supported to 
switch between reads and writes.



2) We start renegotiations in be_tls_write() while in nonblocking mode,
but don't properly retry to handle socket readyness. There's a loop
that retries handshakes twenty times (???), but what actually is
needed is to call SSL_get_error() and ensure that there's actually
data available.


Yeah, that's just crazy.

Actually, I don't think we need to call SSL_do_handshake() at all. At 
least on modern OpenSSL versions. OpenSSL internally uses a state 
machine, and SSL_read() and SSL_write() calls will complete the 
handshake just as well.



2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like SSL connection has been closed
unexpectedly or SSL Error: sslv3 alert unexpected message and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.


Yeah, I can reproduce that. There's clearly something wrong.

I believe this is an OpenSSL bug. I traced the unexpected message 
error to this code in OpenSSL's s3_read_bytes() function:



case SSL3_RT_APPLICATION_DATA:
/*
 * At this point, we were expecting handshake data, but have
 * application data.  If the library was running inside ssl3_read()
 * (i.e. in_read_app_data is set) and it makes sense to read
 * application data at this point (session renegotiation not yet
 * started), we will indulge it.
 */
if (s-s3-in_read_app_data 
(s-s3-total_renegotiations != 0) 
(((s-state  SSL_ST_CONNECT) 
  (s-state = SSL3_ST_CW_CLNT_HELLO_A) 
  (s-state = SSL3_ST_CR_SRVR_HELLO_A)
 ) || ((s-state  SSL_ST_ACCEPT) 
   (s-state = SSL3_ST_SW_HELLO_REQ_A) 
   (s-state = SSL3_ST_SR_CLNT_HELLO_A)
 )
)) {
s-s3-in_read_app_data = 2;
return (-1);
} else {
al = SSL_AD_UNEXPECTED_MESSAGE;
SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
goto f_err;
}


So that quite clearly throws an error, *unless* the original application 
call was SSL_read(). As you also concluded, OpenSSL doesn't like it when 
SSL_write() is called when it's in renegotiation. I think that's 
OpenSSL's fault and it should indulge the application data message 
even in SSL_write().


Can we work-around that easily? I believe we can. The crucial part is 
that we mustn't let SSL_write() to process any incoming application 
data. We can achieve that if we always call SSL_read() to drain such 
data, before calling SSL_write(). But that's not quite enough; if we're 
in renegotiation, SSL_write() might still try to read more data from the 
socket that has arrived after the SSL_read() call. Fortunately, we can 
easily prevent that by hacking pqsecure_raw_read() to always return 
EWOULDBLOCK, when it's called through SSL_write().


The attached patch does that. I've been running your pg_receivexlog test 
case with this for about 15 minutes without any errors now, with 
ssl_renegotiation_limit=50kB, while before it errored out within seconds.


In theory, I guess we should do similar hacks in the server, and always 
call SSL_read() before SSL_write(), but it seems to work without it. Or 
maybe not; OpenSSL server and client code is not symmetric, so perhaps 
it works in server mode without that.


PS. The server code started renegotiation when it's 1kB from reaching 
ssl_renegotiation_limit. I increased that to 32kB, because in testing, I 
got some SSL failed to renegotiate connection before limit expired 
errors in testing before I did that.


PPS. I did all testing of this patch with my sleeping logic 
simplification patch applied 
(http://www.postgresql.org/message-id/54d3821e.9060...@vmware.com). It 
applies without it too, and I don't think it matters, but I thought

Re: [HACKERS] Redesigning checkpoint_segments

On Thu, Feb 5, 2015 at 2:11 PM, Josh Berkus j...@agliodbs.com wrote:
 Except that, when setting up servers for customers, one thing I pretty
 much always do for them is temporarily increase checkpoint_segments for
 the initial data load.  So having Postgres do this automatically would
 be a feature, not a bug.

Right!

 I say we go with ~~ 1GB.  That's an 8X increase over current default
 size for the maximum

Sounds great.

 Default of 4 for min_wal_size?

I assume you mean 4 segments; why not 3 as currently?  As long as the
system has the latitude to ratchet it up when needed, there seems to
be little advantage to raising the minimum.  Of course I guess there
must be some advantage or Heikki wouldn't have made it configurable,
but I'd err on the side of keeping this one small.  Hopefully the
system that automatically adjusts this is really smart, and a large
min_wal_size is superfluous for most people.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments


On 02/05/2015 11:28 PM, Robert Haas wrote:

On Thu, Feb 5, 2015 at 2:11 PM, Josh Berkus j...@agliodbs.com wrote:

Default of 4 for min_wal_size?


I assume you mean 4 segments; why not 3 as currently?  As long as the
system has the latitude to ratchet it up when needed, there seems to
be little advantage to raising the minimum.  Of course I guess there
must be some advantage or Heikki wouldn't have made it configurable,
but I'd err on the side of keeping this one small.  Hopefully the
system that automatically adjusts this is really smart, and a large
min_wal_size is superfluous for most people.


There are a few reasons for making the minimum configurable:

1. Creating new segments when you need them is not free, so if you have 
a workload with occasional very large spikes, you might want to prepare 
for them. The auto-tuning will accommodate for the peak usage, but it's 
a moving average so if the peaks are infrequent enough, it will shrink 
the size down between the spikes.


2. To avoid running out of disk space on write to WAL (which leads to a 
PANIC). In particular, if you have the WAL on the same filesystem as the 
data, pre-reserving all the space required for WAL makes it much more 
likely that you when you run out of disk space, you run out when writing 
regular data, not WAL.


3. Unforeseen issues with the auto-tuning. It might not suite everyone, 
so it's nice that you can still get the old behaviour by setting min=max.


Actually, perhaps we should have a boolean setting that just implies 
min=max, instead of having a configurable minimum?. That would cover all 
of those reasons pretty well. So we would have a max_wal_size setting, 
and a boolean preallocate_all_wal = on | off. Would anyone care for 
the flexibility of setting a minimum that's different from the maximum?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem with pgcrypto

2015-02-05 Thread Jan Wieck


On 02/05/2015 10:58 AM, Tom Lane wrote:

Jan Wieck j...@wi3ck.info writes:

I have encountered a small instability in the behavior of pgcrypto's
pgp_sym_decrypt() function. Attached is a script that can reproduce the
problem. It may have to be run repeatedly because the symptom occurs
rather seldom.



What the script does is to encode a small string with pgp_sym_encrypt()
and then repeatedly try to decrypt it with different wrong passwords.
The expected error message for that is of course
 Wrong key or corrupt data.



Every now and then, I get a different error message. Things I've seen are:


Have you tested this with this week's releases?  We fixed some
memory-mishandling bugs in pgcrypto ...


The posted script reproduces the symptom in today's checkout of master 
as well as REL9_4_STABLE.



Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2015-02-05 Thread Josh Berkus

On 02/04/2015 04:16 PM, David Steele wrote:
 On 2/4/15 3:06 PM, Robert Haas wrote:
 Hmmm, I see your point.  I spend a lot of time on AWS and in
 container-world, where disk space is a lot more constrained.  However,
 it probably makes more sense to recommend non-default settings for that
 environment, since it requires non-default settings anyway.

 So, 384MB?
 That's certainly better, but I think we should go further.  Again,
 you're not committed to using this space all the time, and if you are
 using it you must have a lot of write activity, which means you are
 not running on a tin can and a string.  If you have a little tiny
 database, say 100MB, running on a little-tiny Amazon instance,
 handling a small number of transactions, you're going to stay close to
 wal_min_size anyway.  Right?
 
 The main exception I can think of is when using dump/restore to upgrade
 instead of pg_upgrade.  This would generate a lot of WAL for what could
 otherwise be a low-traffic database.

Except that, when setting up servers for customers, one thing I pretty
much always do for them is temporarily increase checkpoint_segments for
the initial data load.  So having Postgres do this automatically would
be a feature, not a bug.

I say we go with ~~ 1GB.  That's an 8X increase over current default
size for the maximum

Default of 4 for min_wal_size?

On 02/04/2015 07:37 PM, Amit Kapila wrote: On Thu, Feb 5, 2015 at 3:11
AM, Josh Berkus j...@agliodbs.com
 If we did add one, I'd suggest calling it wal_size_limit or something
 similar.

 I think both the names (max_wal_size and wal_size_limit) seems to
 indicate the same same thing.  Few more suggestions:
 typical_wal_size, wal_size_soft_limit?

Again, you're suggesting more complicated (and difficult to translate,
and for that matter misleading) names in order to work around a future
feature which nobody is currently working on, and we may never have.
Let's keep clear and simple parameter names which most people can
understand, instead of making things complicated for the sake of complexity.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem with pgcrypto

2015-02-05 Thread Jan Wieck


On 02/05/2015 01:18 PM, Marko Tiikkaja wrote:

On 2/5/15 4:48 PM, Jan Wieck wrote:

What the script does is to encode a small string with pgp_sym_encrypt()
and then repeatedly try to decrypt it with different wrong passwords.
The expected error message for that is of course

  Wrong key or corrupt data.

Every now and then, I get a different error message. Things I've seen are:

  Not text data


That's not unexpected; the check for whether the data is text or not
appears to happen quite early in the process of decoding.  So it's
enough to get to that point without anything being obviously broken.


I suspected something like that.



In addition to the two errors above, it doesn't appear to be too
difficult to see PXE_MBUF_SHORT_READ, which would give you  ERROR:
Corrupt data.  I wonder why that error message is different, though.


From reading the code as far I did, I expected to see that, but haven't 
seen it yet.





  pgcrypto bug


That doesn't look too good, but I can't reproduce it against 9.3.6 either.


Let me improve the script to a point where it can run for a long time in 
the background and collect all different error cases as examples of 
encrypted data and wrong password.



Thanks so far.
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Thu, Feb 5, 2015 at 11:06 PM, Syed, Rahila rahila.s...@nttdata.com wrote:
/*
+* We recheck the actual size even if pglz_compress() report success,
+* because it might be satisfied with having saved as little as one byte
+* in the compressed data.
+*/
+   *len = (uint16) compressed_len;
+   if (*len = orig_len - 1)
+   return false;
+   return true;
+}

 As per latest code ,when compression is 'on' we introduce additional 2 bytes 
 in the header of each block image for storing raw_length of the compressed 
 block.
 In order to achieve compression while accounting for these two additional 
 bytes, we must ensure that compressed length is less than original length - 2.
 So , IIUC the above condition should rather be

 If (*len = orig_len -2 )
 return false;
 return true;
 The attached patch contains this. It also has a cosmetic change-  renaming 
 compressBuf to uncompressBuf as it is used to store uncompressed page.

Agreed on both things.

Just looking at your latest patch after some time to let it cool down,
I noticed a couple of things.

 #define MaxSizeOfXLogRecordBlockHeader \
 (SizeOfXLogRecordBlockHeader + \
- SizeOfXLogRecordBlockImageHeader + \
+ SizeOfXLogRecordBlockImageHeader, \
+ SizeOfXLogRecordBlockImageCompressionInfo + \
There is a comma here instead of a sum sign. We should really sum up
all those sizes to evaluate the maximum size of a block header.

+ * Permanently allocate readBuf uncompressBuf.  We do it this way,
+ * rather than just making a static array, for two reasons:
This comment is just but weird, readBuf AND uncompressBuf is more appropriate.

+ * We recheck the actual size even if pglz_compress() report success,
+ * because it might be satisfied with having saved as little as one byte
+ * in the compressed data. We add two bytes to store raw_length  with the
+ * compressed image. So for compression to be effective
compressed_len should
+ * be atleast  orig_len - 2
This comment block should be reworked, and misses a dot at its end. I
rewrote it like that, hopefully that's clearer:
+   /*
+* We recheck the actual size even if pglz_compress() reports
success and see
+* if at least 2 bytes of length have been saved, as this
corresponds to the
+* additional amount of data stored in WAL record for a compressed block
+* via raw_length.
+*/

In any case, those things have been introduced by what I did in
previous versions... And attached is a new patch.
-- 
Michael
From 2b0c54bc7c4bfb2494cf8b0394d56635b01d7c5a Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Tue, 25 Nov 2014 14:24:26 +0900
Subject: [PATCH] Support compression for full-page writes in WAL

Compression is controlled with a new parameter called wal_compression.
This parameter can be changed at session level to control WAL compression.
---
 contrib/pg_xlogdump/pg_xlogdump.c |  17 ++-
 doc/src/sgml/config.sgml  |  29 +
 src/backend/access/transam/xlog.c |   1 +
 src/backend/access/transam/xloginsert.c   | 161 ++
 src/backend/access/transam/xlogreader.c   |  71 +---
 src/backend/utils/misc/guc.c  |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 +
 src/include/access/xlogreader.h   |   7 +-
 src/include/access/xlogrecord.h   |  49 ++--
 src/include/pg_config.h.in|   4 +-
 11 files changed, 297 insertions(+), 53 deletions(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index c1bfbc2..3ebaac6 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -363,14 +363,14 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats,
 	 * takes up BLCKSZ bytes, minus the hole length.
 	 *
 	 * XXX: We peek into xlogreader's private decoded backup blocks for the
-	 * hole_length. It doesn't seem worth it to add an accessor macro for
+	 * length of block. It doesn't seem worth it to add an accessor macro for
 	 * this.
 	 */
 	fpi_len = 0;
 	for (block_id = 0; block_id = record-max_block_id; block_id++)
 	{
 		if (XLogRecHasBlockImage(record, block_id))
-			fpi_len += BLCKSZ - record-blocks[block_id].hole_length;
+			fpi_len += record-blocks[block_id].bkp_len;
 	}
 
 	/* Update per-rmgr statistics */
@@ -465,9 +465,16 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
    blk);
 			if (XLogRecHasBlockImage(record, block_id))
 			{
-printf( (FPW); hole: offset: %u, length: %u\n,
-	   record-blocks[block_id].hole_offset,
-	   record-blocks[block_id].hole_length);
+if (record-blocks[block_id].is_compressed)
+	printf( (FPW compressed); hole offset: %u, 
+		   compressed length: %u, original length: %u\n,
+		   record-blocks[block_id].hole_offset,

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-05 Thread Fujii Masao

On Tue, Jan 6, 2015 at 11:09 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Jan 5, 2015 at 10:29 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Dec 28, 2014 at 10:57 PM, Michael Paquier wrote:
 The patch 1 cannot be applied to the master successfully because of
 recent change.
 Yes, that's caused by ccb161b. Attached are rebased versions.

 - The real stuff comes with patch 2, that implements the removal of
 PGLZ_Header, changing the APIs of compression and decompression to pglz to
 not have anymore toast metadata, this metadata being now localized in
 tuptoaster.c. Note that this patch protects the on-disk format (tested with
 pg_upgrade from 9.4 to a patched HEAD server). Here is how the APIs of
 compression and decompression look like with this patch, simply performing
 operations from a source to a destination:
 extern int32 pglz_compress(const char *source, int32 slen, char *dest,
   const PGLZ_Strategy *strategy);
 extern int32 pglz_decompress(const char *source, char *dest,
   int32 compressed_size, int32 raw_size);
 The return value of those functions is the number of bytes written in the
 destination buffer, and 0 if operation failed.

 So it's guaranteed that 0 is never returned in success case? I'm not sure
 if that case can really happen, though.
 This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
 compression algorithm to return a size of 0 bytes as compressed or
 decompressed length btw? We could as well make it return a negative
 value when a failure occurs if you feel more comfortable with it.

I feel that's better. Attached is the updated version of the patch.
I changed the pg_lzcompress and pg_lzdecompress so that they return -1
when failure happens. Also I applied some cosmetic changes to the patch
(e.g., shorten the long name of the newly-added macros).
Barring any objection, I will commit this.

Regards,

-- 
Fujii Masao
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***
*** 35,43 
  #include access/tuptoaster.h
  #include access/xact.h
  #include catalog/catalog.h
  #include miscadmin.h
  #include utils/fmgroids.h
- #include utils/pg_lzcompress.h
  #include utils/rel.h
  #include utils/typcache.h
  #include utils/tqual.h
--- 35,43 
  #include access/tuptoaster.h
  #include access/xact.h
  #include catalog/catalog.h
+ #include common/pg_lzcompress.h
  #include miscadmin.h
  #include utils/fmgroids.h
  #include utils/rel.h
  #include utils/typcache.h
  #include utils/tqual.h
***
*** 45,50 
--- 45,70 
  
  #undef TOAST_DEBUG
  
+ /*
+  *	The information at the start of the compressed toast data.
+  */
+ typedef struct toast_compress_header
+ {
+ 	int32		vl_len_;	/* varlena header (do not touch directly!) */
+ 	int32		rawsize;
+ } toast_compress_header;
+ 
+ /*
+  * Utilities for manipulation of header information for compressed
+  * toast entries.
+  */
+ #define	TOAST_COMPRESS_HDRSZ		((int32) sizeof(toast_compress_header))
+ #define TOAST_COMPRESS_RAWSIZE(ptr)	(((toast_compress_header *) ptr)-rawsize)
+ #define TOAST_COMPRESS_RAWDATA(ptr) \
+ 	(((char *) ptr) + TOAST_COMPRESS_HDRSZ)
+ #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \
+ 	(((toast_compress_header *) ptr)-rawsize = len)
+ 
  static void toast_delete_datum(Relation rel, Datum value);
  static Datum toast_save_datum(Relation rel, Datum value,
   struct varlena * oldexternal, int options);
***
*** 53,58  static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
--- 73,79 
  static struct varlena *toast_fetch_datum(struct varlena * attr);
  static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
  		int32 sliceoffset, int32 length);
+ static struct varlena *toast_decompress_datum(struct varlena * attr);
  static int toast_open_indexes(Relation toastrel,
     LOCKMODE lock,
     Relation **toastidxs,
***
*** 138,148  heap_tuple_untoast_attr(struct varlena * attr)
  		/* If it's compressed, decompress it */
  		if (VARATT_IS_COMPRESSED(attr))
  		{
! 			PGLZ_Header *tmp = (PGLZ_Header *) attr;
! 
! 			attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 			SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 			pglz_decompress(tmp, VARDATA(attr));
  			pfree(tmp);
  		}
  	}
--- 159,166 
  		/* If it's compressed, decompress it */
  		if (VARATT_IS_COMPRESSED(attr))
  		{
! 			struct varlena *tmp = attr;
! 			attr = toast_decompress_datum(tmp);
  			pfree(tmp);
  		}
  	}
***
*** 163,173  heap_tuple_untoast_attr(struct varlena * attr)
  		/*
  		 * This is a compressed value inside of the main tuple
  		 */
! 		PGLZ_Header *tmp = (PGLZ_Header *) attr;
! 
! 		attr = (struct varlena *) palloc(PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 		SET_VARSIZE(attr, PGLZ_RAW_SIZE(tmp) + VARHDRSZ);
! 		pglz_decompress(tmp, VARDATA(attr));
  	}
  	else if (VARATT_IS_SHORT(attr))
  	{
---

Re: [HACKERS] Redesigning checkpoint_segments

2015-02-05 Thread Andres Freund

On 2015-02-05 09:42:37 -0500, Robert Haas wrote:
 I previously proposed 100 segments, or 1.6GB.  If that seems too
 large, how about 64 segments, or 1.024GB?  I think there will be few
 people who can't tolerate a gigabyte of xlog under peak load, and an
 awful lot who will benefit from it.

It'd be quite easier to go there if we'd shrink back to the min_size
after a while, after having peaked above it. IIUC the patch doesn't do
that?

Admittedly it's not easy to come up with an algorithm that doesn't cause
superflous file removals. Initiating wal files isn't cheap.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords

2015-02-05 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 All that having been said, it wouldn't be crazy to try to invent a
 system to lock this down, but it *would* be complicated.  An
 individual FDW can call its authentication-related options anything it
 likes; they do not need to be called 'password'.  So we'd need a way
 to identify which options should be hidden from untrusted users, and
 then a bunch of mechanism to do that.

It's also debatable whether this wouldn't be a violation of the SQL
standard.  I see nothing in the SQL-MED spec authorizing filtering
of the information_schema.user_mapping_options view.

We actually are doing some filtering of values in user_mapping_options,
but it's all-or-nothing so far as the options for any one mapping go.
That's still not exactly supportable per spec but it's probably less of a
violation than option-by-option filtering would be.

It also looks like that filtering differs in corner cases from what the
regular pg_user_mappings view does, which is kinda silly.  In particular
I think we should try to get rid of the explicit provision for superuser
access.

I was hoping Peter would weigh in on what his design considerations
were for these views ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Simplify sleeping while reading/writing from client

Looking again at the code after Andres' interrupt-handling patch series, 
I got confused by the fact that there are several wait-retry loops in 
different layers, and reading and writing works slightly differently.


I propose the attached, which pulls all the wait-retry logic up to 
secure_read() and secure_write(). This makes the code a lot more 
understandable.


- Heikki
From 4e73dbc72d279c2cb66af0a16357eab17f1c740b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Thu, 5 Feb 2015 16:44:13 +0200
Subject: [PATCH 1/1] Simplify waiting logic in reading from / writing to
 client.

The client socket is always in non-blocking mode, and if we actually want
blocking behaviour, we emulate it by sleeping and retrying. But we retry
loops at different layers for reads and writes, which was confusing.
To simplify, remove all the sleeping and retrying code from the lower
levels, from be_tls_read and secure_raw_read and secure_raw_write, and put
all the logic in secure_read() and secure_write().
---
 src/backend/libpq/be-secure-openssl.c |  81 +--
 src/backend/libpq/be-secure.c | 143 ++
 src/backend/libpq/pqcomm.c|   3 +-
 src/include/libpq/libpq-be.h  |   4 +-
 4 files changed, 79 insertions(+), 152 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index d5f9712..39587e8 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -511,14 +511,11 @@ be_tls_close(Port *port)
  *	Read data from a secure connection.
  */
 ssize_t
-be_tls_read(Port *port, void *ptr, size_t len)
+be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
 {
 	ssize_t		n;
 	int			err;
-	int			waitfor;
-	int			latchret;
 
-rloop:
 	errno = 0;
 	n = SSL_read(port-ssl, ptr, len);
 	err = SSL_get_error(port-ssl, n);
@@ -528,39 +525,15 @@ rloop:
 			port-count += n;
 			break;
 		case SSL_ERROR_WANT_READ:
+			*waitfor = WL_SOCKET_READABLE;
+			errno = EWOULDBLOCK;
+			n = -1;
+			break;
 		case SSL_ERROR_WANT_WRITE:
-			/* Don't retry if the socket is in nonblocking mode. */
-			if (port-noblock)
-			{
-errno = EWOULDBLOCK;
-n = -1;
-break;
-			}
-
-			waitfor = WL_LATCH_SET;
-
-			if (err == SSL_ERROR_WANT_READ)
-waitfor |= WL_SOCKET_READABLE;
-			else
-waitfor |= WL_SOCKET_WRITEABLE;
-
-			latchret = WaitLatchOrSocket(MyLatch, waitfor, port-sock, 0);
-
-			/*
-			 * We'll, among other situations, get here if the low level
-			 * routine doing the actual recv() via the socket got interrupted
-			 * by a signal. That's so we can handle interrupts once outside
-			 * openssl, so we don't jump out from underneath its covers. We
-			 * can check this both, when reading and writing, because even
-			 * when writing that's just openssl's doing, not a 'proper' write
-			 * initiated by postgres.
-			 */
-			if (latchret  WL_LATCH_SET)
-			{
-ResetLatch(MyLatch);
-ProcessClientReadInterrupt(true);  /* preserves errno */
-			}
-			goto rloop;
+			*waitfor = WL_SOCKET_WRITEABLE;
+			errno = EWOULDBLOCK;
+			n = -1;
+			break;
 		case SSL_ERROR_SYSCALL:
 			/* leave it to caller to ereport the value of errno */
 			if (n != -1)
@@ -595,12 +568,10 @@ rloop:
  *	Write data to a secure connection.
  */
 ssize_t
-be_tls_write(Port *port, void *ptr, size_t len)
+be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
 {
 	ssize_t		n;
 	int			err;
-	int			waitfor;
-	int			latchret;
 
 	/*
 	 * If SSL renegotiations are enabled and we're getting close to the
@@ -653,7 +624,6 @@ be_tls_write(Port *port, void *ptr, size_t len)
 		}
 	}
 
-wloop:
 	errno = 0;
 	n = SSL_write(port-ssl, ptr, len);
 	err = SSL_get_error(port-ssl, n);
@@ -663,30 +633,15 @@ wloop:
 			port-count += n;
 			break;
 		case SSL_ERROR_WANT_READ:
+			*waitfor = WL_SOCKET_READABLE;
+			errno = EWOULDBLOCK;
+			n = -1;
+			break;
 		case SSL_ERROR_WANT_WRITE:
-
-			waitfor = WL_LATCH_SET;
-
-			if (err == SSL_ERROR_WANT_READ)
-waitfor |= WL_SOCKET_READABLE;
-			else
-waitfor |= WL_SOCKET_WRITEABLE;
-
-			latchret = WaitLatchOrSocket(MyLatch, waitfor, port-sock, 0);
-
-			/*
-			 * Check for interrupts here, in addition to secure_write(),
-			 * because an interrupted write in secure_raw_write() will return
-			 * here, and we cannot return to secure_write() until we've
-			 * written something.
-			 */
-			if (latchret  WL_LATCH_SET)
-			{
-ResetLatch(MyLatch);
-ProcessClientWriteInterrupt(true); /* preserves errno */
-			}
-
-			goto wloop;
+			*waitfor = WL_SOCKET_WRITEABLE;
+			errno = EWOULDBLOCK;
+			n = -1;
+			break;
 		case SSL_ERROR_SYSCALL:
 			/* leave it to caller to ereport the value of errno */
 			if (n != -1)
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index c2c1842..4e7acbe 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -127,30 +127,45 @@ ssize_t

Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords

On Tue, Feb 3, 2015 at 6:26 PM, Noah Yetter nyet...@gmail.com wrote:
 The obvious objection is, well you should just use foreign tables instead
 of dblink().  I'll cut a long story short by saying that doesn't work for
 us.  We are using postgres_fdw to allow our analysts to run queries against
 AWS Redshift and blend those results with tables in our OLTP schema.  If you
 know anything about Redshift, or about analysts, you'll realize immediately
 why foreign tables are not a viable solution.  Surely there are many others
 in a similar position, where the flexibility offered by dblink() makes it
 preferable to fixed foreign tables.

 S... what gives?  This seems like a really obvious security hole.  I've
 searched the mailing list archives repeatedly and found zero discussion of
 this issue.

Maybe this is an impertinent question, but why do you care if the user
has the password?  If she's got dblink access, she can run arbitrary
SQL queries on the remote server anyway, which is all the password
would let her do.  Also, she could use dblink to run ALTER ROLE foo
PASSWORD '...' on the remote server, and then she'll *definitely* know
the password.

I would suggest not relying on password authentication in this
situation.  Instead, use pg_hba.conf to restrict connections by IP and
SSL mode, and maybe consider SSL certificate authentication.

All that having been said, it wouldn't be crazy to try to invent a
system to lock this down, but it *would* be complicated.  An
individual FDW can call its authentication-related options anything it
likes; they do not need to be called 'password'.  So we'd need a way
to identify which options should be hidden from untrusted users, and
then a bunch of mechanism to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments


On 02/05/2015 04:47 PM, Andres Freund wrote:

On 2015-02-05 09:42:37 -0500, Robert Haas wrote:

I previously proposed 100 segments, or 1.6GB.  If that seems too
large, how about 64 segments, or 1.024GB?  I think there will be few
people who can't tolerate a gigabyte of xlog under peak load, and an
awful lot who will benefit from it.


It'd be quite easier to go there if we'd shrink back to the min_size
after a while, after having peaked above it. IIUC the patch doesn't do
that?


It doesn't actively go and remove files once they've already been 
recycled, but if the system stays relatively idle for several 
checkpoints, the WAL usage will shrink down again. That's the core idea 
of the patch.


If the system stays completely or almost completely idle, that won't 
happen though, because then it will never switch to a new segment so 
none of the segments become old so that they could be removed.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Early Setup of instrumentation information in pg_stat_statements

2015-02-05 Thread Amit Kapila

Currently in pg_stat_statements, the setup to track
instrumentation/totaltime information is done after
ExecutorStart().  Can we do this before ExecutorStart()?
In particular, I am referring to below code:

static void
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
..
standard_ExecutorStart(queryDesc, eflags);
..
if (pgss_enabled()  queryDesc-plannedstmt-queryId != 0)
{
..
if (queryDesc-totaltime == NULL)
{
..
queryDesc-totaltime = InstrAlloc(1, INSTRUMENT_ALL);
..
}
}
}

The reason why I am asking is that to track instrumentation
information (like buffer usage parameters) in case of parallel
sequence scan, we need to pass this information at start of
backend workers which are started in ExecutorStart() phase
and at that time, there is no information available which can
guarantee (we have queryId stored in planned stmt, but I think
that is not sufficient to decide) that such an information is
needed by plugin.  This works well for Explain statement as
that has the information for instrumentation available before
ExecutorStart() phase.

Please suggest me if there is a better way to make this
information available in ExecutorStart() phase?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Redesigning checkpoint_segments

On Wed, Feb 4, 2015 at 4:41 PM, Josh Berkus j...@agliodbs.com wrote:
 That's certainly better, but I think we should go further.  Again,
 you're not committed to using this space all the time, and if you are
 using it you must have a lot of write activity, which means you are
 not running on a tin can and a string.  If you have a little tiny
 database, say 100MB, running on a little-tiny Amazon instance,
 handling a small number of transactions, you're going to stay close to
 wal_min_size anyway.  Right?

 Well, we can test that.

 So what's your proposed size?

I previously proposed 100 segments, or 1.6GB.  If that seems too
large, how about 64 segments, or 1.024GB?  I think there will be few
people who can't tolerate a gigabyte of xlog under peak load, and an
awful lot who will benefit from it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] 4B row limit for CLOB tables

2015-02-05 Thread Matthew Kelly

 That's assuming that toasting is evenly spread between tables. In my 
 experience, that's not a great bet...

Time to create a test:
SELECT chunk_id::bigint/10 as id_range, count(*), count(*)/(10::float) 
density FROM (SELECT chunk_id FROM pg_toast.pg_toast_39000165 WHERE chunk_id 
 1 AND chunk_seq = 0) f GROUP BY id_range ORDER BY id_range;

The machine in question was restored in parallel in Sept 2013 as part of an 
upgrade from 8.4.  It has about 2000 tables, so its definitely not dominated by 
a couple tables. Progress towards oid wrap around is about 25.6%.

With minimal effort, I found 2 bad examples, and I’m sure I can easily find 
more. I attached the results for those two.

There were runs of 1,100,000+ and 600,000+ chunk_ids where more than 99% of the 
chunk_id are taken.  After restore completion, oid densities averaged less than 
20 per 100,000 and 400 per 100,000 respectively.  The only reasons those runs 
seem to be so short is because the tables were much smaller back then.  I 
expect that next time I dump restore (necessary for upgrading OS versions due 
to the collation issue), I’m going to have runs closer to 20,,000.

 ... this fix would actually make things enormously worse.  With the
 single counter feeding all tables, you at least have a reasonable
 probability that there are not enormously long runs of consecutive OIDs in
 any one toast table.  With a sequence per table, you are nearly guaranteed
 that there are such runs, because inserts into other tables don't create a
 break.

It makes each toast table independent (and far less likely to wrap) .  It would 
wrap when the sum(mods on THIS toast table)  2^32.  Right now the function 
looks like:

sum(mods on ALL toast tables in cluster) + sum(created normal tables in cluster 
* k) + sum(created temp tables in cluster * k) + [...]  2^32,
WHERE k average number of ids consumed for pg_class, pg_type, etc...

In the case of an insert only table (which is a common use case for 
partitions), the id would only wrap when the TOAST table was “full”.  On the 
other hand currently, it would wrap into its pg_restored section when the 
combined oid consuming operations on the cluster surpassed 4 billion.

That being said, I’m certainly not attached to that solution.  My real argument 
is that although its not a problem today, we are only about 5 years from it 
being a problem for large installs and the first time you’ll hear about it is 
after someone has a 5 minute production outage on a database thats been taking 
traffic for 2 years.

- Matt K.


id_range | count | density 
-+---+-
 390 | 92188 | 0.92188
 391 | 99186 | 0.99186
 392 | 99826 | 0.99826
 393 | 99101 | 0.99101
 394 | 99536 | 0.99536
 395 | 99796 | 0.99796
 396 | 99321 | 0.99321
 397 | 99768 | 0.99768
 398 | 99744 | 0.99744
 399 | 99676 | 0.99676
 400 | 98663 | 0.98663
 401 | 40690 |  0.4069
 403 |92 | 0.00092
 404 |   491 | 0.00491
 407 |74 | 0.00074
 408 |54 | 0.00054
 415 |   152 | 0.00152
 416 |47 | 0.00047
 419 |59 | 0.00059
 422 | 2 |   2e-05
 423 |14 | 0.00014
 424 | 5 |   5e-05
 425 |11 | 0.00011
 426 | 7 |   7e-05
 427 | 5 |   5e-05
 428 | 6 |   6e-05
 517 | 5 |   5e-05
 518 | 9 |   9e-05
 519 | 6 |   6e-05
 520 |12 | 0.00012
 521 |17 | 0.00017
 522 | 5 |   5e-05
 588 |15 | 0.00015
 589 |10 |  0.0001
 590 |19 | 0.00019
 591 |12 | 0.00012
 592 |12 | 0.00012
 593 | 2 |   2e-05
 617 | 4 |   4e-05
 618 | 9 |   9e-05
 619 | 7 |   7e-05
 620 |14 | 0.00014
 621 | 5 |   5e-05
 622 |11 | 0.00011
 682 | 8 |   8e-05
 683 |13 | 0.00013
 684 |17 | 0.00017
 685 | 6 |   6e-05
 686 |17 | 0.00017
 687 | 4 |   4e-05
 767 | 5 |   5e-05
 768 |10 |  0.0001
 769 | 9 |   9e-05
 770 | 2 |   2e-05
 771 |14 | 0.00014
 772 | 2 |   2e-05
 773 |11 | 0.00011
 774 |13 | 0.00013
 775 |10 |  0.0001
 776 | 3 |   3e-05
 914 | 7 |   7e-05
 915 | 7 |   7e-05
 916 | 1 |   1e-05
 917 | 3 |   3e-05
 918 | 3 |   3e-05
 919 | 5 |   5e-05
 920 | 4 |   4e-05
 921 | 9 |   9e-05
 922 | 9 |   9e-05
 923 | 1 |   1e-05
(70 rows)

id_range | count | density 
-+---+-
 402 | 96439 | 0.96439
 403 | 99102 | 0.99102
 404 | 98787 | 0.98787
 405 | 99351

Re: [HACKERS] Redesigning checkpoint_segments

 Default of 4 for min_wal_size?

 I assume you mean 4 segments; why not 3 as currently?  As long as the
 system has the latitude to ratchet it up when needed, there seems to
 be little advantage to raising the minimum.  Of course I guess there
 must be some advantage or Heikki wouldn't have made it configurable,
 but I'd err on the side of keeping this one small.  Hopefully the
 system that automatically adjusts this is really smart, and a large
 min_wal_size is superfluous for most people.

 Keep in mind that the current is actually 7, not three (3*2+1).  So 3
 would be a siginficant decrease.  However, I don't feel strongly about
 it either way.  I think that there is probably a minimum reasonable
 value  1, but I'm not sure what it is.

Good point.  OK, 4 works for me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

On Thu, Feb 5, 2015 at 4:42 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Actually, perhaps we should have a boolean setting that just implies
 min=max, instead of having a configurable minimum?. That would cover all of
 those reasons pretty well. So we would have a max_wal_size setting, and a
 boolean preallocate_all_wal = on | off. Would anyone care for the
 flexibility of setting a minimum that's different from the maximum?

I like the way you have it now better.  If we knew for certain that
there were no advantage in configuring a value between 0 and the
maximum, that would be one thing, but we don't and can't know that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2015-02-05 Thread David Steele

On 2/5/15 4:53 PM, Josh Berkus wrote:
 Actually, perhaps we should have a boolean setting that just implies
 min=max, instead of having a configurable minimum?. That would cover all
 of those reasons pretty well. So we would have a max_wal_size setting,
 and a boolean preallocate_all_wal = on | off. Would anyone care for
 the flexibility of setting a minimum that's different from the maximum?
 I do, actually.  Here's the case I want it for:

 I have a web application which gets all of its new data as uncoordinated
 batch updates from customers.  Since it's possible for me to receive
 several batch updates at once, I set max_wal_size to 16GB, roughtly the
 side of 8 batch updates.  But I don't want the WAL that big all the time
 because it slows down backup snapshots.  So I set min_wal_size to 2GB,
 roughly the size of one batch update.

 That's an idiosyncratic case, but I can imagine more of them out there.

 I wouldn't be opposed to min_wal_size = -1 meaning same as
 max_wal_size though.

+1 for min_wal_size.  Like Josh, I can think of instances where this
would be good.

-- 
- David Steele
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature

[HACKERS] HEADS UP: PGCon 2015 major schedule changes

2015-02-05 Thread Dan Langille

Hello,

By request, the format of PGCon 2015 will differ significantly from previous 
year.
Our goal is to give you more of what you want while still keeping the stuff 
you've always liked.

In June 2015, PGCon will be structured as follows:

Unconference: 16-17 June 2015 (Tue afternoon  all day Wed)

Beginner Tutorials: 17 June 2015 (Wed)

Talks: 18-19 June 2015 (Thu-Fri)

Advanced Tutorial: 20 June 2015 (Sat)

The big changes are:
- Unconference moved to weekdays and now 1.5 days (was one day; Saturday)
- Tutorials split between beginner and advanced, and now on Wednesday  Saturday
  (was Tuesday  Wednesday)

Why?

The unconference has become a bigger and more significant part of PGCon
for PostgreSQL contributors.  It has moved to earlier in the week to
coordinate with other developer meetings, in order to expand the
participation in development discussions and meetings around PGCon.
Additionally, the shift of some tutorials to Saturday allows tutorials to 
involve key PostgreSQL contributors without schedule conflicts.

Unfortunately, this meant moving something else to Saturday, at least for this 
year.
We considered moving the talks to earlier in the week, but we felt that our 
changes
were already disruptive and wanted to minimize the effects this late change may
have on people who have already booked travel / accommodation.  To those 
affected, we apologize and hope that this new structure will benefit everyone.

— 
Dan Langille
http://langille.org/







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2015-02-05 Thread Josh Berkus

On 02/05/2015 01:42 PM, Heikki Linnakangas wrote:
 There are a few reasons for making the minimum configurable:

Any thoughts on what the default minimum should be, if the default max
is 1.1GB/64?

 1. Creating new segments when you need them is not free, so if you have
 a workload with occasional very large spikes, you might want to prepare
 for them. The auto-tuning will accommodate for the peak usage, but it's
 a moving average so if the peaks are infrequent enough, it will shrink
 the size down between the spikes.
 
 2. To avoid running out of disk space on write to WAL (which leads to a
 PANIC). In particular, if you have the WAL on the same filesystem as the
 data, pre-reserving all the space required for WAL makes it much more
 likely that you when you run out of disk space, you run out when writing
 regular data, not WAL.
 
 3. Unforeseen issues with the auto-tuning. It might not suite everyone,
 so it's nice that you can still get the old behaviour by setting min=max.
 
 Actually, perhaps we should have a boolean setting that just implies
 min=max, instead of having a configurable minimum?. That would cover all
 of those reasons pretty well. So we would have a max_wal_size setting,
 and a boolean preallocate_all_wal = on | off. Would anyone care for
 the flexibility of setting a minimum that's different from the maximum?

I do, actually.  Here's the case I want it for:

I have a web application which gets all of its new data as uncoordinated
batch updates from customers.  Since it's possible for me to receive
several batch updates at once, I set max_wal_size to 16GB, roughtly the
side of 8 batch updates.  But I don't want the WAL that big all the time
because it slows down backup snapshots.  So I set min_wal_size to 2GB,
roughly the size of one batch update.

That's an idiosyncratic case, but I can imagine more of them out there.

I wouldn't be opposed to min_wal_size = -1 meaning same as
max_wal_size though.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RangeType internal use

2015-02-05 Thread Amit Langote

Hi,

I wonder why I cannot find a way to get a range type for a given (sub-)
type. I would like to build a RangeType from Datum's of lower and upper
bounds. Much like how construct_array() builds an ArrayType from a Datum
array of elements given elements' type info.

Is there some way I do not seem to know? If not, would it be worthwhile
to make something like construct_range() that returns a RangeType given
Datum's of lower and upper bounds and subtype info?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-05 Thread Naoya Anzai

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation:tested, failed

Hi,

I'm Naoya Anzai.
I've been working as a PostgreSQL Support Engineer for 6 years.
I am a newcomer of reviewing, and My programming skill is not so high.
But the following members also participate in this review. (We say
Two heads are better than one. :))

Akira Kurosawa kurosawa-ak...@mxc.nes.nec.co.jp
Taiki Kondo kondo-ta...@mxt.nes.nec.co.jp
Huong Dangminh dangminh-hu...@mxm.nes.nec.co.jp

So I believe reviewing patches is not difficult for us.

This is a review of Table-level log_autovacuum_min_duration patch:
http://www.postgresql.org/message-id/cab7npqtbqsbegvb8coh01k7argys9kbuv8dr+aqgonfvb8k...@mail.gmail.com

Submission review

The patch applies cleanly to HEAD, and it works fine on Fedora
release 20.
There is no regression test,but I think it is no problem
because other parameter also is not tested.

Usability review

pg_dump/pg_restore support is OK.
I think this feature is a nice idea and I also want this feature.

Feature test

I executed following commands after setting
log_autovacuum_min_duration to 1000 in the GUC. (bar table is
already created with no options.)

CREATE TABLE foo ( ... ) WITH ( log_autovacuum_min_duration = 0 );
ALTER TABLE bar SET (log_autovacuum_min_duration = 0 );

Then, only in foo and bar table, autovacuum log was printed out
even if elapsed time of autovacuum is lesser than 1000ms. This
behavior was expected and there was no crash or failed assertion.
So it looked good for me. But, I executed following command, in
buzz table, autovacuum log was printed out if elapsed time is
more than 1000ms.

CREATE TABLE buzz ( ... ) WITH ( log_autovacuum_min_duration = -1 );
^^

I expect autovacuum log is NOT printed out even if elapsed time is
more than 1000ms in this case. My thought is wrong, isn't it? In my
opinion, there is an use case that autovacuum log is always printed
out in all tables excepting specified tables. I think there is a
person who wants to use it like this case, but I (or he) can NOT use
in this situation.

How about your opinion?

Performance review

Not reviewed from this point of view.

Coding review

I think description of log_autovacuum_min_duration in reloptions.c
(line:215) should be like other parameters (match with guc.c). So
it should be Sets the minimum execution time above which autovacuum
actions will be logged. but not Log autovacuum execution for
given threshold.

There is no new function which is defined in this patch, and
modified contents are not related to OS-dependent contents, so I
think it will work fine on Windows/BSD etc.

Architecture review

About the modification of gram.y.

I think it is not good that log_min_duration is initialized to
zeros in gram.y when FREEZE option is set. Because any VACUUM
queries never use this parameter. I think log_min_duration always
should be initialized to -1.

Regards,
Naoya Anzai (NEC Solution Innovators,Ltd.)

The new status of this patch is: Waiting on Author

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords

2015-02-05 Thread David G Johnston

Tom Lane-2 wrote
 Stephen Frost lt;

 sfrost@

 gt; writes:
 * Robert Haas (

 robertmhaas@

 ) wrote:
 On Thu, Feb 5, 2015 at 10:48 AM, Stephen Frost lt;

 sfrost@

 gt; wrote:
 And I thought this was about FDW options and not about dblink, really..
 
 The OP is pretty clearly asking about dblink.
 
 I was just pointing out that it was an issue that all FDWs suffer from,
 since we don't have any way for an FDW to say don't show this option,
 as discussed.
 
 The dblink example is entirely uncompelling, given that as you said
 somebody with access to a dblink connection could execute ALTER USER on
 the far end.  

So lets fix that loop-hole as well...


 So I would rather say that the baseline security expectation is that
 granting a user mapping should be presumed to be tantamount to granting
 direct access to the remote server with that login info.  In that context,
 being able to see the password should not be considered to be any big
 deal.

Is there any provision whereby USAGE would restrict the person so granted
from viewing any particulars even though they can call/name the item being
granted; and then require SELECT privileges to actual view any of the
associated settings?

Regardless, the OP described behavior of suppressing user options normally
but then showing them upon being granted USAGE on the server seems strange.

David J.




--
View this message in context: 
http://postgresql.nabble.com/GRANT-USAGE-on-FOREIGN-SERVER-exposes-passwords-tp5836652p5836826.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GRANT USAGE on FOREIGN SERVER exposes passwords

2015-02-05 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Feb 5, 2015 at 10:48 AM, Stephen Frost sfr...@snowman.net wrote:
 And I thought this was about FDW options and not about dblink, really..

 The OP is pretty clearly asking about dblink.

 I was just pointing out that it was an issue that all FDWs suffer from,
 since we don't have any way for an FDW to say don't show this option,
 as discussed.

The dblink example is entirely uncompelling, given that as you said
somebody with access to a dblink connection could execute ALTER USER on
the far end.  It's much more credible to imagine that someone might be
given a mapping for a postgres_fdw, oracle_fdw, etc foreign server with
just a few foreign tables, with the expectation that that couldn't be
parlayed into unfettered access to the remote server.

Whether this is a realistic expectation given the wording of the SQL-MED
standard is unclear.

I'm also concerned that if we take this on board as being a security
concern, it will mean that any time we make an effort to push some
construct we didn't before over to the remote end, we have to worry about
whether it would be a security breach to allow the local user to cause
that code to execute on the remote end.  It's tough enough worrying about
semantic-equivalence issues without mixing hostile-user scenarios in.

So I would rather say that the baseline security expectation is that
granting a user mapping should be presumed to be tantamount to granting
direct access to the remote server with that login info.  In that context,
being able to see the password should not be considered to be any big deal.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-05 Thread Naoya Anzai

 The following review has been posted through the commitfest application:
 make installcheck-world:  tested, failed
 Implements feature:   tested, failed
 Spec compliant:   tested, failed
 Documentation:tested, failed
I'm sorry, I just sent it by mistake.
All of them have passed.

---
Naoya Anzai

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] No-rewrite timestamp-timestamptz conversions

2015-02-05 Thread Noah Misch

On Tue, Nov 05, 2013 at 05:02:58PM -0800, Josh Berkus wrote:
 I'd also love some way of doing a no-rewrite conversion between
 timestamp and timestamptz, based on the assumption that the original
 values are UTC time.  That's one I encounter a lot.

It was such a conversion that motivated me to add the no-rewrite ALTER TABLE
ALTER TYPE support in the first place.  Interesting.  Support for it didn't
end up in any submitted patch due to a formal problem: a protransform function
shall only consult IMMUTABLE facts, but we posit that timezone==UTC is a
STABLE observation.  However, a protransform function can easily simplify the
immutable expression tscol AT TIME ZONE 'UTC', avoiding a rewrite.  See
attached patch.  Examples:

begin;
create table t (c timestamptz);
set client_min_messages = debug1;
-- rewrite: depends on timezone GUC
alter table t alter c type timestamp;
-- rewrite: depends on timezone GUC
alter table t alter c type timestamptz;
-- no rewrite: always UTC+0
alter table t alter c type timestamp using c at time zone 'UTC';
-- no rewrite: always UTC+0
alter table t alter c type timestamptz using c at time zone 'Etc/Universal';
-- rewrite: always UTC+0 in the present day, but not historically
alter table t alter c type timestamp using c at time zone 'Atlantic/Reykjavik';
-- rewrite: always UTC+0 in the present day, but not historically
alter table t alter c type timestamptz using c at time zone 'Africa/Lome';
-- no rewrite: always UTC+0
alter table t alter c type timestamp using c at time zone 'GMT';
-- rewrite: always UTC+1
alter table t alter c type timestamptz using c at time zone '1 hour'::interval;
-- no rewrite: always UTC+0
alter table t alter c type timestamp using c at time zone '0 hour'::interval;
rollback;
diff --git a/src/backend/utils/adt/timestamp.c 
b/src/backend/utils/adt/timestamp.c
index 67e0cf9..723c670 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -27,6 +27,7 @@
 #include funcapi.h
 #include libpq/pqformat.h
 #include miscadmin.h
+#include nodes/makefuncs.h
 #include nodes/nodeFuncs.h
 #include parser/scansup.h
 #include utils/array.h
@@ -4874,6 +4875,87 @@ interval_part(PG_FUNCTION_ARGS)
 }
 
 
+/* timestamp_zone_transform()
+ * If the zone argument of a timestamp_zone() or timestamptz_zone() call is a
+ * plan-time constant denoting a zone equivalent to UTC, the call will always
+ * return its second argument unchanged.  Simplify the expression tree
+ * accordingly.  Civil time zones almost never qualify, because jurisdictions
+ * that follow UTC today have not done so continuously.
+ */
+Datum
+timestamp_zone_transform(PG_FUNCTION_ARGS)
+{
+   Node   *func_node = (Node *) PG_GETARG_POINTER(0);
+   FuncExpr   *expr = (FuncExpr *) func_node;
+   Node   *ret = NULL;
+   Node   *zone_node;
+
+   Assert(IsA(expr, FuncExpr));
+   Assert(list_length(expr-args) == 2);
+
+   zone_node = (Node *) linitial(expr-args);
+
+   if (IsA(zone_node, Const) !((Const *) zone_node)-constisnull)
+   {
+   text   *zone = DatumGetTextPP(((Const *) 
zone_node)-constvalue);
+   chartzname[TZ_STRLEN_MAX + 1];
+   char   *lowzone;
+   int type,
+   abbrev_offset;
+   pg_tz  *tzp;
+   boolnoop = false;
+
+   /*
+* If the timezone is forever UTC+0, the FuncExpr function call 
is a
+* no-op for all possible timestamps.  This passage mirrors 
code in
+* timestamp_zone().
+*/
+   text_to_cstring_buffer(zone, tzname, sizeof(tzname));
+   lowzone = downcase_truncate_identifier(tzname,
+   
   strlen(tzname),
+   
   false);
+   type = DecodeTimezoneAbbrev(0, lowzone, abbrev_offset, tzp);
+   if (type == TZ || type == DTZ)
+   noop = (abbrev_offset == 0);
+   else if (type == DYNTZ)
+   {
+   /*
+* An abbreviation of a single-offset timezone ought 
not to be
+* configured as a DYNTZ, so don't bother checking.
+*/
+   }
+   else
+   {
+   longtzname_offset;
+
+   tzp = pg_tzset(tzname);
+   if (tzp  pg_get_timezone_offset(tzp, tzname_offset))
+   noop = (tzname_offset == 0);
+   }
+
+   if (noop)
+   {
+   Node   *timestamp = (Node *) lsecond(expr-args);
+
+   /* Strip any existing RelabelType node(s) */
+

Re: [HACKERS] File based Incremental backup v9

2015-02-05 Thread Francesco Canovai

Hi Marco,

On Sunday 01 February 2015 00:47:24 Marco Nenciarini wrote:
 You can find the updated patch attached to this message.

I've been testing the v9 patch with checksums enabled and I end up with a lot 
of warnings like these ones:

WARNING:  page verification failed, calculated checksum 47340 but expected 
47342
WARNING:  page verification failed, calculated checksum 16649 but expected 
16647
WARNING:  page verification failed, calculated checksum 13567 but expected 
13565
WARNING:  page verification failed, calculated checksum 14110 but expected 
14108
WARNING:  page verification failed, calculated checksum 40990 but expected 
40988
WARNING:  page verification failed, calculated checksum 46242 but expected 
46244

I can reproduce the problem with the following script:

WORKDIR=/home/fcanovai/tmp
psql -c CREATE DATABASE pgbench
pgbench -i -s 100 --foreign-keys pgbench
mkdir $WORKDIR/tbsp
psql -c CREATE TABLESPACE tbsp LOCATION '$WORKDIR/tbsp'
psql -c ALTER DATABASE pgbench SET TABLESPACE tbsp

Regards,
Francesco

-- 
Francesco Canovai - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
francesco.cano...@2ndquadrant.it | www.2ndQuadrant.it


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table-level log_autovacuum_min_duration

On Fri, Feb 6, 2015 at 4:50 AM, Naoya Anzai
anzai-na...@mxu.nes.nec.co.jp wrote:
 The following review has been posted through the commitfest application:
 make installcheck-world:  tested, failed
 Implements feature:   tested, failed
 Spec compliant:   tested, failed
 Documentation:tested, failed
 I'm sorry, I just sent it by mistake.
 All of them have passed.
That's fine. I think you used the UI on the commit fest app, and it is
not intuitive that you need to check those boxes at first sight when
using it for the first time.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_basebackup, tablespace mapping and path canonicalization

2015-02-05 Thread Ian Barwick

Hi

I stumbled on what appears to be inconsistent handling of double slashes
in tablespace paths when using pg_basebackup with the -T/--tablespace-mapping
option:

ibarwick:postgresql (master)$ mkdir /tmp//foo-old
ibarwick:postgresql (master)$ $PG_HEAD/psql 'dbname=postgres port=9595'
psql (9.5devel)
Type help for help.

postgres=# CREATE TABLESPACE foo LOCATION '/tmp//foo-old';
CREATE TABLESPACE
postgres=# \db
 List of tablespaces
Name|  Owner   |   Location
+--+--
 foo| ibarwick | /tmp/foo-old
 pg_default | ibarwick |
 pg_global  | ibarwick |
(3 rows)

So far so good. However attempting to take a base backup (on the same
machine) and remap the tablespace directory:

ibarwick:postgresql (master)$ $PG_HEAD/pg_basebackup -p9595 
--pgdata=/tmp//backup --tablespace-mapping=/tmp//foo-old=/tmp//foo-new

produces the following message:

pg_basebackup: directory /tmp/foo-old exists but is not empty

which, while undeniably true, is unexpected and could potentially encourage 
someone
to hastily delete /tmp/foo-old after confusing it with the new directory.

The double-slash in the old tablespace path is the culprit:

ibarwick:postgresql (master)$ $PG_HEAD/pg_basebackup -p9595 
--pgdata=/tmp//backup --tablespace-mapping=/tmp/foo-old=/tmp//foo-new
NOTICE:  pg_stop_backup complete, all required WAL segments have been 
archived

The documentation does state:

To be effective, olddir must exactly match the path specification of the
tablespace as it is currently defined.

which I understood to mean that e.g. tildes would not be expanded, but it's
somewhat surprising that the path is not canonicalized in the same way
it is pretty much everywhere else (including in CREATE TABLESPACE).

The attached patch adds the missing canonicalization; I can't see any
reason not to do this. Thoughts?

Regards


Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services
diff --git a/src/bin/pg_basebackup/pg_basebackup.c 
b/src/bin/pg_basebackup/pg_basebackup.c
new file mode 100644
index fbf7106..349bd90
*** a/src/bin/pg_basebackup/pg_basebackup.c
--- b/src/bin/pg_basebackup/pg_basebackup.c
*** tablespace_list_append(const char *arg)
*** 199,204 
--- 199,207 
exit(1);
}
  
+   canonicalize_path(cell-old_dir);
+   canonicalize_path(cell-new_dir);
+ 
if (tablespace_dirs.tail)
tablespace_dirs.tail-next = cell;
else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

Fujii Masao wrote:
 I wrote
 This is an inspiration from lz4 APIs. Wouldn't it be buggy for a
 compression algorithm to return a size of 0 bytes as compressed or
 decompressed length btw? We could as well make it return a negative
 value when a failure occurs if you feel more comfortable with it.

 I feel that's better. Attached is the updated version of the patch.
 I changed the pg_lzcompress and pg_lzdecompress so that they return -1
 when failure happens. Also I applied some cosmetic changes to the patch
 (e.g., shorten the long name of the newly-added macros).
 Barring any objection, I will commit this.

I just had a look at your updated version, ran some sanity tests, and
things look good from me. The new names of the macros at the top of
tuptoaster.c are clearer as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-05 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 We've got a mix of styles for extensible options right now:

That we do.

 So COPY puts the options at the very end, but EXPLAIN and VACUUM put
 them right after the command name.  I prefer the latter style and
 would vote to adopt it here.

Meh.  Options-at-the-end seems by far the most sensible style to me.
The options-right-after-the-keyword style is a mess, both logically
and from a parsing standpoint, and the only reason we have it at all
is historical artifact (ask Bruce about the origins of VACUUM ANALYZE
over a beer sometime).

Still, I can't help noticing that I'm being outvoted.  I'll shut up now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code