Re: [HACKERS] Tracking wait event for latches

2016-08-22 Thread Michael Paquier
On Tue, Aug 23, 2016 at 12:44 AM, Robert Haas  wrote:
> On Mon, Aug 22, 2016 at 9:49 AM, Michael Paquier
>  wrote:
>> The reason why I chose this way is that there are a lot of them. It is
>> painful to maintain the order of the array elements in perfect mapping
>> with the list of IDs...
>
> You can use stupid macro tricks to help with that problem...

Yeah, still after thinking about it I think I would just go with an
array like lock types and be done with it. With a comment to mention
that the order should be respected things would be enough...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Amit Kapila
On Tue, Aug 23, 2016 at 10:57 AM, Michael Paquier
 wrote:
>
> Also, what's the use case of allowing only a certain set of rmgrs to
> be checked. Wouldn't a simple on/off switch be simpler?
>

I think there should be a way to test WAL for one particular resource
manager.  For example, if someone develops a new index or some other
heap storage, only that particular module can be verified.  Generating
WAL for all the resource managers together can also serve the purpose,
but it will be slightly difficult to verify it.

> As presented,
> wal_consistency_mask is also going to be very quite confusing for
> users. You should not need to apply some maths to set up this
> parameter, a list of rmgr names may be more adapted if this level of
> tuning is needed,
>

Yeah, that can be better.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-08-22 Thread amul sul
Hi Artur Zakirov,

Please see following review comment for 
"0001-to-timestamp-format-checking-v2.patch" & share your thought: 

#1.

15 +   to_timestamp('2000JUN', ' MON')

Documented as working case, but unfortunatly it does not : 

postgres=# SELECT to_timestamp('2000JUN', ' MON');
ERROR:  invalid value "---" for "MON"
DETAIL:  The given value did not match any of the allowed values for this field.


#2.

102 +   /* Previous character was a quote */
103 +   else if (in_text)
104 +   {
105 +   if (*str == '"')
106 +   {
107 +   str++;
108 +   in_text = false;
109 +   }
110 +   else if (*str == '\\')
111 +   {
112 +   str++;
113 +   in_backslash = true;
114 +   }
115 +   else
116 +   {
117 +   n->type = NODE_TYPE_CHAR;
118 +   n->character = *str;
119 +   n->key = NULL;
120 +   n->suffix = 0;
121 +   n++;
122 +   str++;
123 +   }
124 +   continue;
125 +   }
126 +

NODE_TYPE_CHAR assumption in else block seems incorrect. What if we have space 
after double quote?  It will again have incorrect output without any error, see 
below: 

postgres=# SELECT to_timestamp('Year: 1976, Month: May, Day: 16', 
postgres(# '"Year:" , "Month:" FMMonth, "Day:"   DD');
to_timestamp 
--
0006-05-16 00:00:00-07:52:58
(1 row)

I guess, we might need NODE_TYPE_SEPARATOR and NODE_TYPE_SPACE check as well? 



#3.

296 -   /* Ignore spaces before fields when not in FX (fixed width) mode */
297 +   /* Ignore spaces before fields when not in FX (fixed * width) mode 
*/
298 if (!fx_mode && n->key->id != DCH_FX)
299 {
300 -   while (*s != '\0' && isspace((unsigned char) *s))
301 +   while (*s != '\0' && (isspace((unsigned char) *s)))
302 s++;

Unnecessary hunk?
We should not touch any code unless it necessary to implement proposed feature, 
otherwise it will add unnecessary diff to the patch and eventually extra burden 
to review the code. Similarly hunk in the patch @ line # 313 - 410, nothing to 
do with to_timestamp behaviour improvement, IIUC.

If you think this changes need to be in, please submit separate cleanup-patch.


>I attached second patch "0002-to-timestamp-validation-v2.patch". With it 
>PostgreSQL perform additional checks for date and time. But as I wrote 
>there is another patch in the thread "to_date_valid()" wich differs from 
>this patch.

@community : I am not sure what to do with this patch, should we keep it as 
separate enhancement?

Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Michael Paquier
On Tue, Aug 23, 2016 at 1:32 PM, Amit Kapila  wrote:
> On Mon, Aug 22, 2016 at 9:16 PM, Robert Haas  wrote:
>> On Mon, Aug 22, 2016 at 9:25 AM, Michael Paquier
>>  wrote:
>>> Another pin-point is: given a certain page, how do we identify of
>>> which type it is? One possibility would be again to extend the AM
>>> handler with some kind of is_self function with a prototype like that:
>>> bool handler->is_self(Page);
>>> If the page is of the type of the handler, this returns true, and
>>> false otherwise. Still here performance would suck.
>>>
>>> At the end, what we want is a clean interface, and more thoughts into it.
>>
>> I think that it makes sense to filter based on the resource manager
>> ID
>>
>
> +1.

Yes actually that's better. That's simple enough and removes any need
to looking at pd_special.

> I think the patch currently addresses only a subset of resource
> manager id's (mainly Heap and Index resource manager id's).  Do we
> want to handle the complete resource manager list as defined in
> rmgrlist.h?

Not all of them generate FPWs. I don't think it matters much.

> Another thing that needs some thoughts is the UI of this patch,
> currently it is using integer mask which might not be best way, but
> again as it is intended to be mainly used for tests, it might be okay.

What we'd want to have is a way to visualize easily differences of
pages. Any other ideas than MASK_MARKER would be welcome of course.

> Do we want to enable some tests in the regression suite by using this option?

We could get out a recovery test that sets up a standby/master and
runs the tests of src/test/regress with pg_regress with this parameter
enabled.

+ * bufmask.c
+ *  Routines for buffer masking, used to ensure that buffers used for
+ *  comparison across nodes are in a consistent state.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
Copyright notices need to be updated. (It's already been 2 years!!)

Also, what's the use case of allowing only a certain set of rmgrs to
be checked. Wouldn't a simple on/off switch be simpler? As presented,
wal_consistency_mask is also going to be very quite confusing for
users. You should not need to apply some maths to set up this
parameter, a list of rmgr names may be more adapted if this level of
tuning is needed, still it seems to me that we don't need this much.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Kuntal Ghosh
Yes, I've verified the outputs and log contents after running gmake
installcheck and gmake installcheck-world. The status of the test was
marked as pass for all the testcases.


On Mon, Aug 22, 2016 at 9:26 PM, Simon Riggs  wrote:
> On 22 August 2016 at 13:44, Kuntal Ghosh  wrote:
>
>> Please let me know your thoughts on this.
>
> Do the regression tests pass with this option enabled?
>
> --
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Thanks & Regards,
Kuntal Ghosh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Amit Kapila
On Mon, Aug 22, 2016 at 9:16 PM, Robert Haas  wrote:
> On Mon, Aug 22, 2016 at 9:25 AM, Michael Paquier
>  wrote:
>> Another pin-point is: given a certain page, how do we identify of
>> which type it is? One possibility would be again to extend the AM
>> handler with some kind of is_self function with a prototype like that:
>> bool handler->is_self(Page);
>> If the page is of the type of the handler, this returns true, and
>> false otherwise. Still here performance would suck.
>>
>> At the end, what we want is a clean interface, and more thoughts into it.
>
> I think that it makes sense to filter based on the resource manager
> ID
>

+1.

I think the patch currently addresses only a subset of resource
manager id's (mainly Heap and Index resource manager id's).  Do we
want to handle the complete resource manager list as defined in
rmgrlist.h?

Another thing that needs some thoughts is the UI of this patch,
currently it is using integer mask which might not be best way, but
again as it is intended to be mainly used for tests, it might be okay.

Do we want to enable some tests in the regression suite by using this option?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-22 Thread Amit Kapila
On Tue, Aug 23, 2016 at 8:54 AM, Amit Kapila  wrote:
> $SUBJECT will make hash indexes reliable and usable on standby.
> AFAIU, currently hash indexes are not recommended to be used in
> production mainly because they are not crash-safe and with this patch,
> I hope we can address that limitation and recommend them for use in
> production.
>
> This patch is built on my earlier patch [1] of making hash indexes
> concurrent.  The main reason for doing so is that the earlier patch
> allows to complete the split operation and used light-weight locking
> due to which operations can be logged at granular level.
>
> WAL for different operations:
>
> This has been explained in README as well, but I am again writing it
> here for the ease of people.
>
>
..
> One of the challenge in writing this patch was that the current code
> was not written with a mindset that we need to write WAL for different
> operations.  Typical example is _hash_addovflpage() where pages are
> modified across different function calls and all modifications needs
> to be done atomically, so I have to refactor some code so that the
> operations can be logged sensibly.
>

This patch has not done handling for OldSnapshot.  Previously, we
haven't done TestForOldSnapshot() checks in hash index as they were
not logged, but now with this patch, it makes sense to perform such
checks.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2016-08-22 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Aug 22, 2016 at 6:28 PM, Michael Paquier
>  wrote:
>> There is no need to put restrictions on those I think, and they are
>> actually supported.

> Bi-directional text support (i.e., the use of right-to-left control
> characters) is known to have security implications, FWIW. There is an
> interesting discussion of the matter here:

> http://www.unicode.org/reports/tr36/#Bidirectional_Text_Spoofing

The problem with implementing anything like that is that it requires
assumptions about what encoding we're dealing with, which would be
entirely not based in fact.  (The DB encoding is not a good guide
to what global names are encoded as, much less what encoding some
shell might think it's using.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2016-08-22 Thread Peter Geoghegan
On Mon, Aug 22, 2016 at 6:28 PM, Michael Paquier
 wrote:
> There is no need to put restrictions on those I think, and they are
> actually supported.

Bi-directional text support (i.e., the use of right-to-left control
characters) is known to have security implications, FWIW. There is an
interesting discussion of the matter here:

http://www.unicode.org/reports/tr36/#Bidirectional_Text_Spoofing

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exporting more function in libpq

2016-08-22 Thread Tom Lane
Alvaro Herrera  writes:
> Craig Ringer wrote:
>> Shouldn't that generally be done by extending libpq to add the required
>> functionality?

> The thought that came to me was that maybe we need a separate library
> that handles the lower level operations (a "fe/be" library, if you will)
> which can be exported for others to use and is used by libpq to
> implement the slightly-higher-level functionality.

If you wanted a library that exposed something close to the wire-level
protocol, I do not think that tearing out some of the oldest and cruftiest
parts of libpq and exposing them verbatim is really the best way to go
about it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2016-08-22 Thread Michael Paquier
On Tue, Aug 23, 2016 at 10:19 AM, Peter Geoghegan  wrote:
> I haven't looked at the patch, but offhand I wonder if it's worth
> considering control characters added by unicode, if you haven't already.

There is no need to put restrictions on those I think, and they are
actually supported. Look for example at pg_upgrade/test.sh, we are
already testing them with database names :) Not BEL of course, but
that works.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2016-08-22 Thread Peter Geoghegan
I haven't looked at the patch, but offhand I wonder if it's worth
considering control characters added by unicode, if you haven't already.

--
Peter Geoghegan


Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2016-08-22 Thread Michael Paquier
On Fri, Aug 12, 2016 at 10:12 AM, Michael Paquier
 wrote:
> Note that pg_dump[all] and pg_upgrade already have safeguards against
> those things per the same routines putting quotes for execution as
> commands into psql and shell. So attached is a patch to implement this
> restriction in the backend, and I am adding that to the next CF for
> 10.0. Attached is as well a script able to trigger those errors.
> Thoughts?

I am re-sending the patch. For a reason escaping me, it is showing up
as 'invalid/octet-stream'... (Thanks Bruce for noting that)
-- 
Michael
diff --git a/src/backend/commands/dbcommands.c 
b/src/backend/commands/dbcommands.c
index c1c0223..5746958 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -77,6 +77,7 @@ typedef struct
 } movedb_failure_params;
 
 /* non-export function prototypes */
+static void check_db_name(const char *dbname);
 static void createdb_failure_callback(int code, Datum arg);
 static void movedb(const char *dbname, const char *tblspcname);
 static void movedb_failure_callback(int code, Datum arg);
@@ -456,6 +457,9 @@ createdb(const CreatedbStmt *stmt)
/* Note there is no additional permission check in this path */
}
 
+   /* do sanity checks on database name */
+   check_db_name(dbname);
+
/*
 * Check for db name conflict.  This is just to give a more friendly 
error
 * message than "unique index violation".  There's a race condition but
@@ -745,6 +749,22 @@ check_encoding_locale_matches(int encoding, const char 
*collate, const char *cty
   pg_encoding_to_char(collate_encoding;
 }
 
+/*
+ * Perform sanity checks on the database name.
+ */
+static void
+check_db_name(const char *dbname)
+{
+   if (strchr(dbname, '\n') != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("database name cannot use LF 
character")));
+   if (strchr(dbname, '\r') != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("database name cannot use CR 
character")));
+}
+
 /* Error cleanup callback for createdb */
 static void
 createdb_failure_callback(int code, Datum arg)
@@ -949,6 +969,9 @@ RenameDatabase(const char *oldname, const char *newname)
int npreparedxacts;
ObjectAddress address;
 
+   /* check format of new name */
+   check_db_name(newname);
+
/*
 * Look up the target database's OID, and get exclusive lock on it. We
 * need this for the same reasons as DROP DATABASE.
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index b6ea950..8954e16 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -57,6 +57,21 @@ static void DelRoleMems(const char *rolename, Oid roleid,
bool admin_opt);
 
 
+/* Do sanity checks on role name */
+static void
+check_role_name(const char *rolname)
+{
+   if (strchr(rolname, '\n') != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("role name cannot use LF character")));
+   if (strchr(rolname, '\r') != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("role name cannot use CR character")));
+}
+
+
 /* Check if current user has createrole privileges */
 static bool
 have_createrole_privilege(void)
@@ -111,6 +126,9 @@ CreateRole(CreateRoleStmt *stmt)
DefElem*dvalidUntil = NULL;
DefElem*dbypassRLS = NULL;
 
+   /* sanity check for role name */
+   check_role_name(stmt->role);
+
/* The defaults can vary depending on the original statement type */
switch (stmt->stmt_type)
{
@@ -1137,6 +1155,9 @@ RenameRole(const char *oldname, const char *newname)
ObjectAddress address;
Form_pg_authid authform;
 
+   /* sanity check for role name */
+   check_role_name(newname);
+
rel = heap_open(AuthIdRelationId, RowExclusiveLock);
dsc = RelationGetDescr(rel);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Better locale-specific-character-class handling for regexps

2016-08-22 Thread Tom Lane
I got tired of hearing complaints about the issue described in
this thread:
https://www.postgresql.org/message-id/flat/24241.1329347196%40sss.pgh.pa.us

Here's a proposed fix.  I've not done extensive performance testing,
but it seems to be as fast or faster than the old code in cases where
there are not too many "large" characters in the input.  And, more
to the point, it gets the right answer for such large characters.

I'll add this to the upcoming commitfest.

regards, tom lane

diff --git a/src/backend/regex/README b/src/backend/regex/README
index 6c9f483..b4a7ad7 100644
*** a/src/backend/regex/README
--- b/src/backend/regex/README
*** and similarly additional source files re
*** 27,39 
  regexec.  This was done to avoid exposing internal symbols globally;
  all functions not meant to be part of the library API are static.
  
! (Actually the above is a lie in one respect: there is one more global
! symbol, pg_set_regex_collation in regcomp.  It is not meant to be part of
! the API, but it has to be global because both regcomp and regexec call it.
! It'd be better to get rid of that, as well as the static variables it
! sets, in favor of keeping the needed locale state in the regex structs.
! We have not done this yet for lack of a design for how to add
! application-specific state to the structs.)
  
  What's where in src/backend/regex/:
  
--- 27,40 
  regexec.  This was done to avoid exposing internal symbols globally;
  all functions not meant to be part of the library API are static.
  
! (Actually the above is a lie in one respect: there are two more global
! symbols, pg_set_regex_collation and pg_reg_getcolor in regcomp.  These are
! not meant to be part of the API, but they have to be global because both
! regcomp and regexec call them.  It'd be better to get rid of
! pg_set_regex_collation, as well as the static variables it sets, in favor of
! keeping the needed locale state in the regex structs.  We have not done this
! yet for lack of a design for how to add application-specific state to the
! structs.)
  
  What's where in src/backend/regex/:
  
*** colors:
*** 274,301 
an existing color has to be subdivided.
  
  The last two of these are handled with the "struct colordesc" array and
! the "colorchain" links in NFA arc structs.  The color map proper (that
! is, the per-character lookup array) is handled as a multi-level tree,
! with each tree level indexed by one byte of a character's value.  The
! code arranges to not have more than one copy of bottom-level tree pages
! that are all-the-same-color.
  
! Unfortunately, this design does not seem terribly efficient for common
! cases such as a tree in which all Unicode letters are colored the same,
! because there aren't that many places where we get a whole page all the
! same color, except at the end of the map.  (It also strikes me that given
! PG's current restrictions on the range of Unicode values, we could use a
! 3-level rather than 4-level tree; but there's not provision for that in
! regguts.h at the moment.)
  
! A bigger problem is that it just doesn't seem very reasonable to have to
! consider each Unicode letter separately at regex parse time for a regex
! such as "\w"; more than likely, a huge percentage of those codes will
! never be seen at runtime.  We need to fix things so that locale-based
! character classes are somehow processed "symbolically" without making a
! full expansion of their contents at parse time.  This would mean that we'd
! have to be ready to call iswalpha() at runtime, but if that only happens
! for high-code-value characters, it shouldn't be a big performance hit.
  
  
  Detailed semantics of an NFA
--- 275,330 
an existing color has to be subdivided.
  
  The last two of these are handled with the "struct colordesc" array and
! the "colorchain" links in NFA arc structs.
  
! Ideally, we'd do the first two operations using a simple linear array
! storing the current color assignment for each character code.
! Unfortunately, that's not terribly workable for large charsets such as
! Unicode.  Our solution is to divide the color map into two parts.  A simple
! linear array is used for character codes up to MAX_SIMPLE_CHR, which can be
! chosen large enough to include all popular characters (so that the
! significantly-slower code paths about to be described are seldom invoked).
! Characters above that need be considered at compile time only if they
! appear explicitly in the regex pattern.  We store each such mentioned
! character or character range as an entry in the "colormaprange" array in
! the colormap.  (Overlapping ranges are split into unique subranges, so that
! each range in the finished list needs only a single color that describes
! all its characters.)  When mapping a character above MAX_SIMPLE_CHR to a
! color at runtime, we search this list of ranges explicitly.
  
! That's still not quite enough, though, because of 

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-22 Thread Craig Ringer
On 23 August 2016 at 01:03, Robert Haas  wrote:


>
> I think you should use underscores to separate all of the words
> instead of only some of them.
>
>
ifassigned => if_assigned

ifrecent=> if_recent

Updated patch series attached. As before, 0-4 intended for commit, 5 just
because it'll be handy to have around for people doing wraparound related
testing.

Again, thanks for taking a look.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 81cbe525261a15a21415af361b3421038eccc895 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 19 Aug 2016 14:44:15 +0800
Subject: [PATCH 1/4] Introduce txid_status(bigint) to get status of an xact

If an appliation is disconnected while a COMMIT request is in flight,
the backend crashes mid-commit, etc, then an application may not be
sure whether or not a commit completed successfully or was rolled
back. While two-phase commit solves this it does so at a considerable
overhead, so introduce a lighter alternative.

txid_status(bigint) lets an application determine the status of a
a commit based on an xid-with-epoch as returned by txid_current()
or similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.

Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned.
---
 doc/src/sgml/func.sgml   | 28 +++-
 src/backend/access/transam/clog.c| 23 --
 src/backend/catalog/system_views.sql | 20 +
 src/backend/utils/adt/txid.c | 82 
 src/include/access/clog.h| 23 ++
 src/include/catalog/pg_proc.h|  2 +
 src/test/regress/expected/txid.out   | 50 ++
 src/test/regress/sql/txid.sql| 35 +++
 8 files changed, 239 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 169a385..8edf490 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17139,6 +17139,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
 txid_visible_in_snapshot

 
+   
+txid_status
+   
+

 The functions shown in 
 provide server transaction information in an exportable form.  The main
@@ -17157,7 +17161,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
   
txid_current()
bigint
-   get current transaction ID, assigning a new one if the current transaction does not have one
+   get current 64-bit transaction ID with epoch, assigning a new one if the current transaction does not have one
   
   
txid_current_snapshot()
@@ -17184,6 +17188,11 @@ SELECT collation for ('foo' COLLATE "de_DE");
boolean
is transaction ID visible in snapshot? (do not use with subtransaction ids)
   
+  
+   txid_status(bigint)
+   txid_status
+   report the status of the given xact - committed, aborted, in-progress, or null if the xid is too old
+  
  
 

@@ -17254,6 +17263,23 @@ SELECT collation for ('foo' COLLATE "de_DE");

 

+txid_status(bigint) reports the commit status of a recent
+transaction. Any recent transaction can be identified as one of
+
+ in-progress
+ committed
+ aborted
+
+Prepared transactions are identified as in-progress.
+The commit status of transactions older than the transaction ID wrap-around
+threshold is no longer known by the system, so txid_status
+returns NULL for such transactions. Applications may use
+txid_status to determine whether a transaction committed
+or aborted when the application and/or database server crashed or lost
+connection while a COMMIT command was in-progress.
+   
+
+   
 The functions shown in 
 provide information about transactions that have been already committed.
 These functions mainly provide information about when the transactions
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2634476..1a6e26d 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -41,29 +41,6 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 
-/*
- * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
- * everywhere else in Postgres.
- *
- * Note: because TransactionIds are 32 bits and wrap around at 0x,
- * CLOG page numbering also wraps around at 0x/CLOG_XACTS_PER_PAGE,
- * and CLOG segment numbering at
- * 0x/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
- * explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
- */

[HACKERS] Re: [BUGS] Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

2016-08-22 Thread Peter Geoghegan
On Wed, Mar 23, 2016 at 10:46 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Are you still in information-gathering more, or are you going to issue
>> a recommendation on how we should proceed here, or what?
>
> If I had to make a recommendation right now, I would go for your
> option #4, ie shut 'em all down Scotty.  We do not know the full extent
> of the problem but it looks pretty bad, and I think our first priority
> has to be to guarantee data integrity.  I do not have a lot of faith in
> the proposition that glibc's is the only buggy implementation, either.

For the record, I have been able to determine by using amcheck on the
Heroku platform that en_US.UTF-8 cases are sometimes affected by an
inconsistency between strcoll() and strxfrm() behavior, which was
previously an open question. I saw only two instances of this across
many thousands of servers. For some reason, both cases involved
strings with code points from the Arabic alphabet, even though each
case was from a totally unrelated customer database.

I'll go update the Wiki page for this [1] now.

[1] https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-22 Thread Craig Ringer
On 23 August 2016 at 01:03, Robert Haas  wrote:


> I think you should use underscores to separate all of the words
> instead of only some of them.
>

Right. Will fix.

Thanks for taking a look.


> Also, note that the corresponding internal function is
> GetTopTransactionIdIfAny(), which might suggest txid_current_if_any()
> rather than txid_current_if_assigned(), but you could argue that your
> naming is better.


Yeah, I do argue that in this case. Not a hugely strong opinion, but we
refer to txid_current() in the docs as:

"get current transaction ID, assigning a new one if the current transaction
does not have one"

so I thought it'd be worth being consistent with that. It's not like
txid_current() mirrors the name of GetTopTransactionId() after all ;) and I
think it's more important to be consistent with what the user sees than
with the code.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-08-22 Thread Craig Ringer
On 10 August 2016 at 14:44, Michael Paquier 
wrote:

> On Fri, Jun 3, 2016 at 8:51 PM, Dmitry Igrishin  wrote:
> >> BTW, I've publushed the HTML-ified SGML docs to
> >> http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a
> preview.
> > Typo detected: "Returns 1 if the batch curently being received" --
> "curently".
>
> I am looking a bit more seriously at this patch and assigned myself as
> a reviewer.
>

Much appreciated.


> testlibpqbatch.c:1239:73: warning: format specifies type 'long' but
> the argument has type '__darwin_suseconds_t' (aka 'int') [-Wformat]
> printf("batch insert elapsed:  %ld.%06lds\n",
> elapsed_time.tv_sec, elapsed_time.tv_usec);
> macos complains here. You may want to replace %06lds by just %06d.
>

Yeah, or cast to a type known to be big enough. Will amend.


> This patch generates a core dump, use for example pg_ctl start -w and
> you'll bump into the trace above. There is something wrong with the
> queue handling.
>

Huh. I didn't see that here (Fedora 23). I'll look more closely.


> Do you have plans for a more generic structure for the command queue list?
>

No plans, no. This was a weekend experiment that turned into a useful patch
and I'm having to scrape up time for it amongst much more important things
like logical failover / sequence decoding and various other replication
work.

Thanks for the docs review too, will amend.


> +   fprintf(stderr, "internal error, COPY in batch mode");
> +   abort();
> I don't think that's a good idea. defaultNoticeProcessor can be
> overridden to allow applications to have error messages sent
> elsewhere. Error messages should also use libpq_gettext, and perhaps
> be stored in conn->errorMessage as we do so for OOMs happening on
> client-side and reporting them back even if they are not expected
> (those are blocked PQsendQueryStart in your patch).
>
> src/test/examples is a good idea to show people what this new API can
> do, but this is never getting compiled. It could as well be possible
> to include tests in src/test/modules/, in the same shape as what
> postgres_fdw is doing by connecting to itself and link it to libpq. As
> this patch complicates quote a lot fe-exec.c, I think that this would
> be worth it. Thoughts?


I didn't think it added much complexity to fe-exec.c personally. A lot of
the appeal is that it has very minor impact on anything that isn't using it.

I think it makes sense to (ab)use the recovery module tests for this,
invoking the test program from there.

Ideally I'd like to teach pgsql and pg_restore how to use async mode, but
that's a whole separate patch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Why --backup-and-modify-in-place in perltidy config?

2016-08-22 Thread Bruce Momjian
On Mon, Aug 15, 2016 at 10:19:12AM -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 08/14/2016 04:38 PM, Tom Lane wrote:
> >> I did a trial run following the current pgindent README procedure, and
> >> noticed that the perltidy step left me with a pile of '.bak' files
> >> littering the entire tree.  This seems like a pretty bad idea because
> >> a naive "git add ." would have committed them.  It's evidently because
> >> src/tools/pgindent/perltidyrc includes --backup-and-modify-in-place.
> 
> BTW, after experimenting with this, I did not find any way to get perltidy
> to overwrite the original files without making a backup file.

Yep, that's why --backup-and-modify-in-place had to be used.  I have a
local script to remove file with specified extentions, but didn't
document that cleanup step.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-22 Thread Thomas Munro
On Tue, Aug 23, 2016 at 8:41 AM, Robert Haas  wrote:
> We could test to see how much it slows things down.  But it
> may be worth paying the cost even if it ends up being kinda expensive.

Here are some numbers from a Xeon E7-8830 @ 2.13GHz running Linux 3.10
running the attached program.  It's fairly noisy and I didn't run
super long tests with many repeats, but the general theme is visible.
If you're actually going to USE the memory, it's only a small extra
cost to have reserved seats.  But if there's a strong chance you'll
never access most of the memory, you might call it expensive.

Segment size 1MB:

base = shm_open + ftruncate + mmap + munmap + close = 5us
base + fallocate = 38us
base + memset = 332us
base + fallocate + memset = 346us

Segment size 1GB:

base = shm_open + ftruncate + mmap + munmap + close = 10032us
base + fallocate = 30774us
base + memset = 602925us
base + fallocate + memset = 655433us

-- 
Thomas Munro
http://www.enterprisedb.com
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define SEGMENT_NAME "/my_test_segment"

int
main(int argc, char *argv[])
{
	int loops, i;
	off_t size;
	bool fallocatep;
	bool memsetp;
	bool hugep;
	void *mem;

	if (argc != 6)
	{
		fprintf(stderr, "Usage: %s     \n", argv[0]);
		return EXIT_FAILURE;
	}

	loops = atoi(argv[1]);
	size = atoi(argv[2]) * 1024 * 1024;
	fallocatep = atoi(argv[3]) != 0;
	memsetp = atoi(argv[4]) != 0;
	hugep = atoi(argv[5]) != 0;

	for (i = 0; i < loops; ++i)
	{
		int fd;

		fd = shm_open(SEGMENT_NAME, O_CREAT | O_RDWR, S_IWUSR | S_IRUSR);
		if (fd < 0)
		{
			perror("shm_open");
			goto cleanup;
		}
		if (ftruncate(fd, size))
		{
			perror("ftruncate");
			goto cleanup;
		}
		if (fallocatep && fallocate(fd, 0, 0, size))
		{
			perror("fallocate");
			goto cleanup;
		}
		mem = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED | (hugep ? MAP_HUGETLB : 0), fd, 0);
		if (mem == NULL)
		{
			fprintf(stderr, "mmap failed");
			goto cleanup;
		}
		if (memsetp)
			memset(mem, 0, size);
		munmap(mem, size);
		close(fd);
	}

	shm_unlink(SEGMENT_NAME);
	return EXIT_SUCCESS;

cleanup:	
	shm_unlink(SEGMENT_NAME);
	return EXIT_FAILURE;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-22 Thread Craig Ringer
On 23 Aug 2016 05:43, "Kevin Grittner"  wrote:
>
> On Mon, Aug 22, 2016 at 3:29 PM, Robert Haas 
wrote:
>
> > it seems to me that
> > this is just one facet of a much more general problem: given two
> > transactions T1 and T2, the order of replay must match the order of
> > commit unless you can prove that there are no dependencies between
> > them.  I don't see why it matters whether the operations are sequence
> > operations or data operations; it's just a question of whether they're
> > modifying the same "stuff".

It matters because sequence operations aren't transactional in pg. Except
when they are - operations on a newly CREATEd sequence or one where we did
a TRUNCATE ...RESTART IDENTITY.

But we don't store the xid of the xact associated with a transactional
sequence update along with the sequence update anywhere. We just rely on nk
other xact knowing to look at the sequence relfilenode we're changing.
Doesn't work so well in logical rep.

We also don't store knowledge of whether or not the sequence advance is
transactional. Again important because for two xacts t1 and t2:

* Sequence last value is 50

* T1 calls nextval. Needs a new chunk because all cached values have been
used. Writes sequence wal advancing seq last_value to 100, returns 51.

* T2 calls nextval, gets cached value 52.

* T2 commits

* Master crashes and we fail over to replica.

This is fine for physical rep. We replay the sequence advance and all is
well.

But for logical rep the sequence can't be treated as part of t1. If t1
rolls back or we fail over before replying it we might return value 52 from
nextval even though we replayed and committed t2 that used value 52. Oops.

However if some xact t3 creates a sequence we can't replay updates to it
until the sequence relation is committed. And it's even more fun with
TRUNCATE ... RESTART IDENTITY where we need rollback behaviour too.

Make sense? It's hard because sequences are sometimes but not always exrmpt
from transactional behaviour and pg doesn't record when, since it can rely
on physical wal redo order and can apply sequence advances before the
sequence relation is committed yet.

>
> The commit order is the simplest and safest *unless* there is a
> read-write anti-dependency a/k/a read-write dependency a/k/a
> rw-conflict: where a read from one transaction sees the "before"
> version of data modified by the other transaction.  In such a case
> it is necessary for correct serializable transaction behavior for
> the transaction that read the "before" image to be replayed before
> the write it didn't see, regardless of commit order.  If you're not
> trying to avoid serialization anomalies, it is less clear to me
> what is best.

Could you provide an example of a case where xacts replayed in commit order
will produce incorrect results?

Remember that we aren't doing statement based replication in pg logical
decoding/replication. We don't care how a row got changed, only that we
make consistent transitions from before state to after state to for each
transaction, such that the data committed and visible on the master is
visible on the standby and no uncommitted or not yet visible data on the
master is committed/visible on the replica. The replica should have visible
committed data matching the master as it was when it originally executed
the xact we most recently replayed.

No locking is decoded or replayed. It is not expected that a normal non
replication client executing some other concurrent xact will have the same
effect if run on standby as on master.

It's replication not tightly coupled clustering. If/when we have things
like parallel decoding and replay of  concurrent xacts then issues like the
dependencies you mention will start to become a concern. We are a long way
from there.

For sequences the requirement IMO is that the sequence advances on the
replica to or past the position it was at on the master when the first xact
that saw those sequence values committed. We should never see the sequence
'behind' such that calling nextval on the replica can produce a value
already seen and stored by some committed xact on the replica. Being a bit
ahead is ok, much like pg discards sequence values on crash.

That's not that hard. The problems arise when the sequence it's self isn't
committed yet, per above.


Re: [HACKERS] UTF-8 docs?

2016-08-22 Thread Tatsuo Ishii
> On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
>> I don't know what kind of problem you are seeing with encoding
>> handling, but at least UTF-8 is working for Japanese, French and
>> Russian.
> 
> Those translations are using DocBook XML.

But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade  -wall -wno-unused-param 
-wno-empty -wfully-tagged -D . -D . -c 
/usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t 
sgml -i output-html -i include-index postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2016-08-22 Thread Tomas Vondra



On 08/22/2016 07:42 PM, Alvaro Herrera wrote:

Robert Haas wrote:

On Sat, Aug 20, 2016 at 4:58 PM, Tom Lane  wrote:

Jeff Janes  writes:

On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane  wrote:

It does know it, what it doesn't know is how many duplicates there are.



Does it know whether the count comes from a parsed query-string list/array,
rather than being an estimate from something else?  If it came from a join,
I can see why it would be dangerous to assume they are mostly distinct.
But if someone throws 6000 things into a query string and only 200 distinct
values among them, they have no one to blame but themselves when it makes
bad choices off of that.


I am not exactly sold on this assumption that applications have
de-duplicated the contents of a VALUES or IN list.  They haven't been
asked to do that in the past, so why do you think they are doing it?


It's hard to know, but my intuition is that most people would
deduplicate.  I mean, nobody is going to want to their query generator
to send X IN (1, 1, ) to the server if it
could have just sent X IN (1).


Also, if we patch it this way and somebody has a slow query because of a
lot of duplicate values, it's easy to solve the problem by
de-duplicating.  But with the current code, people that have the
opposite problem has no way to work around it.



I certainly agree it's better when a smart user can fix his query plan 
by deduplicating the values than when we end up generating a poor plan 
due to assuming some users are somewhat dumb.


I wonder how expensive would it be to actually count the number of 
distinct values - there certainly are complex data types where the 
comparisons are fairly expensive, but I would not expect those to be 
used in explicit VALUES lists. Also, maybe there's some sufficiently 
accurate estimation approach - e.g. for small number of values we can 
compute the number of distinct values directly (and it's still going to 
be fairly cheap), while for larger number we could probably sample the 
values similarly to what ANALYZE does.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_receivexlog does not report flush position with --synchronous

2016-08-22 Thread Gabriele Bartolini
Hi guys,

  while adding synchronous WAL streaming to Barman, I noticed that
pg_receivexlog - unless a replication slot is specified and --synchronous
is passed - does not become a synchronous receiver (if the application_name
is in the synchronous_standby_names value). I was a bit surprised by this
behaviour.

  By reading the pg_receivexlog documentation, I assumed that:

1) if I set application_name properly for pg_receivexlog (let's say
'barman_receive_wal')
2) then I set synchronous_standby_names so that barman_receive_wal is first
in the list
3) then I run pg_receivexlog with --synchronous

  I would find the pg_receivexlog in 'sync' state in the
pg_stat_replication view on the master.

  However, I kept receiving the 'async' state. After looking at the
documentation once more, I noticed that '--synchronous' was mentioned also
in the '--slot-name' option but the explanation - at least to me - was not
very clear.

  I tried again by creating a replication slot and passing it to
pg_receivexlog and this time I could see 'sync' as streaming state.

  Looking up the code in more details I see that, unless replication slot
are enabled, pg_receivexlog does not report the flush position (this is a
precaution that's been taken when '--synchronous' was probably not around).
Please find this very short patch which - in case replication slots are not
present but synchronous is - reports the flush position.

   I am not sure if it is a bug or not. I any case I guess we should
probably improve the documentation - it's a bit late here so maybe I can
try better tomorrow with a fresher mind. :)

Thanks,
Gabriele
--
 Gabriele Bartolini - 2ndQuadrant Italia - Director
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it


0001-pg_receivexlog-does-not-report-flush-position-with-s.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-22 Thread Thomas Munro
On Tue, Aug 23, 2016 at 8:41 AM, Robert Haas  wrote:
> On Tue, Aug 16, 2016 at 7:41 PM, Thomas Munro
>  wrote:
>> I still think it's worth thinking about something along these lines on
>> Linux only, where holey Swiss tmpfs files can bite you.  Otherwise
>> disabling overcommit on your OS isn't enough to prevent something
>> which is really a kind of deferred overcommit with a surprising
>> failure mode (SIGBUS rather than OOM SIGKILL).
>
> Yeah, I am inclined to agree.  I mean, creating a DSM is fairly
> heavyweight already, so one extra system call isn't (I hope) a crazy
> overhead.  We could test to see how much it slows things down.  But it
> may be worth paying the cost even if it ends up being kinda expensive.
> We don't really have any way of knowing whether the caller's request
> is reasonable relative to the amount of virtual memory available, and
> converting a possible SIGBUS into an ereport(ERROR, ...) is a big win.

Here's a version of the patch that only does something special if the
following planets are aligned:

* Linux only: for now, there doesn't seem to be any reason to assume
that other operating systems share this file-with-holes implementation
quirk, or that posix_fallocate would work on such a fd, or which errno
values to tolerate if it doesn't.  From what I can tell, Solaris,
FreeBSD etc either don't overcommit or do normal non-stealth
overcommit with the usual out-of-swap failure mode for shm_open
memory, with a way to turn overcommit off.  So I put a preprocessor
test in to do this just for __linux__, and I used "fallocate" (a
non-standard Linux syscall) instead of "posix_fallocate".

* Glibc version >= 2.10: ancient versions and other libc
implementations don't have fallocate, so I put a test into the
configure script.

* Kernel version >= 2.6.23+: the man page says that ancient kernels
don't provide the syscall, and that glibc sets errno to ENOSYS in that
case, so I put a check in to keep calm and carry on.

I don't know if any distros ever shipped with an old enough kernel and
new enough glibc for ENOSYS to happen in the wild; for example RHEL5
had neither kernel nor glibc support, and RHEL6 had both.  I haven't
personally tested that path.

Maybe it would be worth thinking about whether this is a condition
that should cause dsm_create to return NULL rather than ereporting,
depending on a flag along the lines of the existing
DSM_CREATE_NULL_IF_MAXSEGMENTS.  But that could be a separate patch if
it turns out to be useful.

-- 
Thomas Munro
http://www.enterprisedb.com


fallocate.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-08-22 Thread Gavin Flower

On 23/08/16 09:40, Andres Freund wrote:

Hi,

as noted in [1] I started hacking on removing the current implementation
of SRFs in the targetlist (tSRFs henceforth). IM discussion brought the
need for a description of the problem, need and approach to light.

There are several reasons for wanting to get rid of tSRFs. The primary
ones in my opinion are that the current behaviour of several SRFs in one
targetlist is confusing, and that the implementation burden currently is
all over the executor.  Especially the latter is what is motivating me
working on this, because it blocks my work on making the executor faster
for queries involving significant amounts of tuples.  Batching is hard
if random places in the querytree can icnrease the number of tuples.

The basic idea, hinted at in several threads, is, at plan time, to convert a 
query like
SELECT generate_series(1, 10);
into
SELECT generate_series FROM ROWS FROM(generate_series(1, 10));

thereby avoiding the complications in the executor (c.f. execQual.c
handling of isDone/ExprMultipleResult and supporting code in many
executor nodes / node->*.ps.ps_TupFromTlist).

There are several design questions along the way:

1) How to deal with the least-common-multiple behaviour of tSRFs. E.g.
=# SELECT generate_series(1, 3), generate_series(1,2);
returning
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   1 │
│   2 │   2 │
│   3 │   1 │
│   1 │   2 │
│   2 │   1 │
│   3 │   2 │
└─┴─┘
(6 rows)
but
=# SELECT generate_series(1, 3), generate_series(5,7);
returning
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   5 │
│   2 │   6 │
│   3 │   7 │
└─┴─┘

discussion in this thread came, according to my reading, to the
conclusion that that behaviour is just confusing and that the ROWS FROM
behaviour of
=# SELECT * FROM ROWS FROM(generate_series(1, 3), generate_series(1,2));
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   1 │
│   2 │   2 │
│   3 │  (null) │
└─┴─┘
(3 rows)

makes more sense.
I had always implicitly assumed that having 2 generated sequences would 
act as equivalent to:


SELECT
sa,
sb
FROM
ROWS FROM(generate_series(1, 3)) AS sa,
ROWS FROM(generate_series(5, 7)) AS sb
ORDER BY
sa,
sb;

 sa | sb
+
  1 |  5
  1 |  6
  1 |  7
  2 |  5
  2 |  6
  2 |  7
  3 |  5
  3 |  6
  3 |  7


Obviously I was wrong - but to me, my implicit assumption makes more sense!
[...]


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changed SRF in targetlist handling

2016-08-22 Thread Andres Freund
Hi,

On 2016-08-22 16:20:58 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-08-17 17:41:28 -0700, Andres Freund wrote:
> >> Tom, do you think this is roughly going in the right direction?
> 
> I've not had time to look at this patch, I'm afraid.  If you still
> want me to, I can make time in a day or so.

That'd greatly be appreciated. I think polishing the POC up to
committable patch will be a considerable amount of work, and I'd like
design feedback before that.


> > I'm working on these. Atm ExecMakeTableFunctionResult() resides in
> > execQual.c - I'm inlining it into nodeFunctionscan.c now, because
> > there's no other callers, and having it separate seems to bring no
> > benefit.
> 
> > Please speak soon up if you disagree.
> 
> I think ExecMakeTableFunctionResult was placed in execQual.c because
> it seemed to belong there alongside the support for SRFs in tlists.
> If that's going away then there's no good reason not to move the logic
> to where it's used.

Cool, then we agree.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-08-22 Thread Peter Geoghegan
On Mon, Aug 1, 2016 at 3:18 PM, Peter Geoghegan  wrote:
> Attached WIP patch series:

This has bitrot, since commit da1c9163 changed the interface for
checking parallel safety. I'll have to fix that, and will probably
take the opportunity to change how workers have maintenance_work_mem
apportioned while I'm at it. To recap, it would probably be better if
maintenance_work_mem remained a high watermark for the entire CREATE
INDEX, rather than applying as a per-worker allowance.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-22 Thread Kevin Grittner
On Mon, Aug 22, 2016 at 3:29 PM, Robert Haas  wrote:

> it seems to me that
> this is just one facet of a much more general problem: given two
> transactions T1 and T2, the order of replay must match the order of
> commit unless you can prove that there are no dependencies between
> them.  I don't see why it matters whether the operations are sequence
> operations or data operations; it's just a question of whether they're
> modifying the same "stuff".

The commit order is the simplest and safest *unless* there is a
read-write anti-dependency a/k/a read-write dependency a/k/a
rw-conflict: where a read from one transaction sees the "before"
version of data modified by the other transaction.  In such a case
it is necessary for correct serializable transaction behavior for
the transaction that read the "before" image to be replayed before
the write it didn't see, regardless of commit order.  If you're not
trying to avoid serialization anomalies, it is less clear to me
what is best.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-08-22 Thread Andres Freund
Hi,

as noted in [1] I started hacking on removing the current implementation
of SRFs in the targetlist (tSRFs henceforth). IM discussion brought the
need for a description of the problem, need and approach to light.

There are several reasons for wanting to get rid of tSRFs. The primary
ones in my opinion are that the current behaviour of several SRFs in one
targetlist is confusing, and that the implementation burden currently is
all over the executor.  Especially the latter is what is motivating me
working on this, because it blocks my work on making the executor faster
for queries involving significant amounts of tuples.  Batching is hard
if random places in the querytree can icnrease the number of tuples.

The basic idea, hinted at in several threads, is, at plan time, to convert a 
query like
SELECT generate_series(1, 10);
into
SELECT generate_series FROM ROWS FROM(generate_series(1, 10));

thereby avoiding the complications in the executor (c.f. execQual.c
handling of isDone/ExprMultipleResult and supporting code in many
executor nodes / node->*.ps.ps_TupFromTlist).

There are several design questions along the way:

1) How to deal with the least-common-multiple behaviour of tSRFs. E.g.
=# SELECT generate_series(1, 3), generate_series(1,2);
returning
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   1 │
│   2 │   2 │
│   3 │   1 │
│   1 │   2 │
│   2 │   1 │
│   3 │   2 │
└─┴─┘
(6 rows)
but
=# SELECT generate_series(1, 3), generate_series(5,7);
returning
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   5 │
│   2 │   6 │
│   3 │   7 │
└─┴─┘

discussion in this thread came, according to my reading, to the
conclusion that that behaviour is just confusing and that the ROWS FROM
behaviour of
=# SELECT * FROM ROWS FROM(generate_series(1, 3), generate_series(1,2));
┌─┬─┐
│ generate_series │ generate_series │
├─┼─┤
│   1 │   1 │
│   2 │   2 │
│   3 │  (null) │
└─┴─┘
(3 rows)

makes more sense.  We also discussed erroring out if two SRFs  return
differing amount of rows, but that seems not to be preferred so far. And
we can easily add it if we want.


2) A naive conversion to ROWS FROM, like in the example in the
   introductory paragraph, can change the output, when implemented as a
   join from ROWS FROM to the rest of the query, rather than the other
   way round. E.g.
=# EXPLAIN SELECT * FROM few, ROWS FROM(generate_series(1,10));
┌──┐
│  QUERY PLAN  │
├──┤
│ Nested Loop  (cost=0.00..36.03 rows=2000 width=8)│
│   ->  Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=4) │
│   ->  Materialize  (cost=0.00..1.03 rows=2 width=4)  │
│ ->  Seq Scan on few  (cost=0.00..1.02 rows=2 width=4)│
└──┘
(4 rows)
=# SELECT * FROM few, ROWS FROM(generate_series(1,3));
┌┬─┐
│ id │ generate_series │
├┼─┤
│  1 │   1 │
│  2 │   1 │
│  1 │   2 │
│  2 │   2 │
│  1 │   3 │
│  2 │   3 │
└┴─┘
(6 rows)
surely isn't what was intended.  So the join order needs to be enforced.

3) tSRFs are evaluated after GROUP BY, and window functions:
=# SELECT generate_series(1, count(*)) FROM (VALUES(1),(2),(10)) f;
┌─┐
│ generate_series │
├─┤
│   1 │
│   2 │
│   3 │
└─┘
which means we have to push the "original" query into a subquery, with
the ROWS FROM laterally referencing the subquery:
SELECT generate_series FROM (SELECT count(*) FROM (VALUES(1),(2),(10)) f) s, 
ROWS FROM (generate_series(1,s.count));

4) The evaluation order of tSRFs in combination with ORDER BY is a bit
   confusing. Namely tSRFs are implemented after ORDER BY has been
   evaluated, unless the ORDER BY references the SRF.
E.g.
=# SELECT few.id, generate_series FROM ROWS FROM(generate_series(1,3)),few 
ORDER BY few.id DESC;
might return
┌┬─┐
│ id │ generate_series │
├┼─┤
│ 24 │   3 │
│ 24 │   2 │
│ 24 │   1 │
..
instead of

Re: [HACKERS] pg_bsd_indent - improvements around offsetof and sizeof

2016-08-22 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Tue, Aug 16, 2016 at 11:47:09AM -0700, Andres Freund wrote:
> > On 2016-08-15 18:09:02 +, Piotr Stefaniak wrote:
> > > There are more fixes I intend to do, of which the most relevant for 
> > > Postgres are:
> > > 1) fixing "function pointer typedef formatting"
> > 
> > This alone would warrant a bottle of something rather expensive.
> 
> Agreed.  I was kind of hoping we could use this for the pgindent run we
> just did, but that is being done just before 9.6 final, which seems too
> close.  I suggest we run it once everything is ready, and run it on all
> back-branches so we can backpatch things.  The ideal time would probably
> be right after we have done minor releases.  The problem is that this is
> going to break queued-up patches, so maybe it has to be done right
> before 10.0 beta, and again, to all back branches too.

I think it doesn't really matter -- surely we don't want to do it just
before some important release, but other than that I don't think there
are any constraints.  The amount of pain for large patch maintainers is
unrelated to the timing.

(I sketched a way to mechanically rebase patches across a pgindent run;
I haven't had the chance to try it.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-22 Thread Gavin Flower

On 23/08/16 08:27, Tom Lane wrote:

Andres Freund  writes:

On 2016-08-22 13:54:43 -0400, Robert Haas wrote:

On Sat, Aug 20, 2016 at 11:17 AM, Tom Lane  wrote:

I'm inclined to suggest you forget this approach and propose a single
counter for "SQL commands executed", which avoids all of the above
definitional problems.  People who need more detail than that are
probably best advised to look to contrib/pg_stat_statements, anyway.

I disagree.  I think SQL commands executed, lumping absolutely
everything together, really isn't much use.

I'm inclined to agree. I think that's a quite useful stat when looking
at an installation one previously didn't have a lot of interaction with.

Well, let's at least have an "other" category so you can add up the
counters and get a meaningful total.

regards, tom lane


Initially I thought of something I thought was factious, but then 
realized it might actually be both practicable & useful...


How about 2 extra categories (if appropriate!!!):

1. Things that actually might be sort of as fitting in 2 or more of the
   existing categories, or there is an ambiguity as to which category
   is appropriate.

2. Things that don't fit into any existing category

This was inspired by a real use case, in a totally unrelated area - but 
where I attempted to ensure counts were in the most useful categories I 
was able to provide.  The user had listed categories, but I found that 
the data didn't always fit neatly into them as specified.



Cheers,

Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sslmode=require fallback

2016-08-22 Thread Bruce Momjian
On Fri, Aug 19, 2016 at 09:22:32AM -0700, Jeff Janes wrote:
> On Sat, Jul 30, 2016 at 11:18 AM, Bruce Momjian  wrote:
> 
> On Fri, Jul 29, 2016 at 11:27:06AM -0400, Peter Eisentraut wrote:
> > On 7/29/16 11:13 AM, Bruce Momjian wrote:
> > > Yes, I am thinking of a case where Postgres is down but a malevolent
> > > user starts a Postgres server on 5432 to gather passwords.  Verifying
> > > against an SSL certificate would avoid this problem, so there is some
> > > value in using SSL on localhost.  (There is no such security available
> > > for Unix-domain socket connections.)
> >
> > Sure, there is the requirepeer connection option for that.
> 
> Oh, nice, I had not seen that.
> 
> 
> 
> Hi Bruce,
> 
> There is typo in the doc patch you just committed
> 
> "On way to prevent spoofing of"
> 
> s/On/One/

Oops, thanks, fixed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_bsd_indent - improvements around offsetof and sizeof

2016-08-22 Thread Bruce Momjian
On Tue, Aug 16, 2016 at 11:47:09AM -0700, Andres Freund wrote:
> On 2016-08-15 18:09:02 +, Piotr Stefaniak wrote:
> > There are more fixes I intend to do, of which the most relevant for 
> > Postgres are:
> > 1) fixing "function pointer typedef formatting"
> 
> This alone would warrant a bottle of something rather expensive.

Agreed.  I was kind of hoping we could use this for the pgindent run we
just did, but that is being done just before 9.6 final, which seems too
close.  I suggest we run it once everything is ready, and run it on all
back-branches so we can backpatch things.  The ideal time would probably
be right after we have done minor releases.  The problem is that this is
going to break queued-up patches, so maybe it has to be done right
before 10.0 beta, and again, to all back branches too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-22 Thread Andres Freund
On 2016-08-22 17:50:11 -0300, Alvaro Herrera wrote:
> > 2. You can't write to unlogged tables on standby servers, so this
> > doesn't help solve the problem of wanting to use temporary tables on
> > standbys.
> 
> Check.  We could think about relaxing this restriction, which would
> enable the feature to satisfy that use case.  (I think the main
> complication there is the init fork of btrees on those catalogs; other
> relations could just be truncated to empty on restart.)

Isn't the main complication that visibility currently requires xids to
be assigned?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add -c to rsync commands on SR tutorial wiki page

2016-08-22 Thread Robert Haas
On Thu, Aug 18, 2016 at 2:27 PM, Jim Nasby  wrote:
> I don't think it's any great leap for someone to think they can use those
> commands incrementally. It's certainly one of the first things you think of
> when using rsync. AFAIK there's no downside at all to using -c when it is a
> brand new copy, so I'm thinking we should just put it in there, especially
> considering what the potential downside is.

I think that's right.

The suggestion that people might use some backup tool is a good one,
but that's not a reason not to add -c here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-22 Thread Alvaro Herrera
Robert Haas wrote:

> However:
> 
> 1. The number of tables for which we would need to add a duplicate,
> unlogged table is formidable.  You need pg_attribute, pg_attrdef,
> pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
> And the backend changes needed so that we used the unlogged copy for
> temp tables and the permanent copy for regular tables is probably
> really large.

Check.  This is the most serious issue, IMV.

> 2. You can't write to unlogged tables on standby servers, so this
> doesn't help solve the problem of wanting to use temporary tables on
> standbys.

Check.  We could think about relaxing this restriction, which would
enable the feature to satisfy that use case.  (I think the main
complication there is the init fork of btrees on those catalogs; other
relations could just be truncated to empty on restart.)

> 3. While it makes creating temporary tables a lighter-weight
> operation, because you no longer need to write WAL for the catalog
> entries, there's probably still substantially more overhead than just
> stuffing them in backend-local RAM.  So the performance benefits are
> probably fairly modest.

You also save catalog bloat ... These benefits may not be tremendous,
but I think they may be good enough for many users.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-22 Thread Robert Haas
On Tue, Aug 16, 2016 at 7:41 PM, Thomas Munro
 wrote:
> I still think it's worth thinking about something along these lines on
> Linux only, where holey Swiss tmpfs files can bite you.  Otherwise
> disabling overcommit on your OS isn't enough to prevent something
> which is really a kind of deferred overcommit with a surprising
> failure mode (SIGBUS rather than OOM SIGKILL).

Yeah, I am inclined to agree.  I mean, creating a DSM is fairly
heavyweight already, so one extra system call isn't (I hope) a crazy
overhead.  We could test to see how much it slows things down.  But it
may be worth paying the cost even if it ends up being kinda expensive.
We don't really have any way of knowing whether the caller's request
is reasonable relative to the amount of virtual memory available, and
converting a possible SIGBUS into an ereport(ERROR, ...) is a big win.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-22 Thread Robert Haas
On Tue, Aug 16, 2016 at 8:03 PM, Jim Nasby  wrote:
> On 8/16/16 11:59 AM, Robert Haas wrote:
> ...
>>
>> That doesn't really solve the problem, because OTHER backends won't be
>> able to see them.  So, if I create a fast temporary table in one
>> session that depends on a permanent object, some other session can
>> drop the permanent object.  If there were REAL catalog entries, that
>> wouldn't work, because the other session would see the dependency.
>
> Some discussion about TEMP functions is happening on -general right now, and
> there's other things where temp objects are good to have, so it'd be nice to
> have a more generic fix for this stuff. Is the idea of "partitioning" the
> catalogs to store temp objects separate from permanent fatally flawed?

I wouldn't say it's fatally flawed.  But you might need a
world-renowned team of physicians working round the clock for days in
a class 1 trauma center to save it.  If you imagine that you have a
permanent pg_class which holds permanent tables and a temporary
pg_class per-backend which stores temporary tables, then you very
quickly end up with the same deadly flaw as in Aleksander's design:
other backends cannot see all of the dependency entries and can drop
things that they shouldn't be permitted to drop.  However, you could
have a permanent pg_class which holds the records for permanent tables
and an *unlogged* table, say pg_class_unlogged, which holds records
for temporary tables.  Now everybody can see everybody else's data,
yet we don't have to create permanent catalog entries.  So we are not
dead.  All of the temporary catalog tables vanish on a crash, too, and
in a very clean way, which is great.

However:

1. The number of tables for which we would need to add a duplicate,
unlogged table is formidable.  You need pg_attribute, pg_attrdef,
pg_constraint, pg_description, pg_type, pg_trigger, pg_rewrite, etc.
And the backend changes needed so that we used the unlogged copy for
temp tables and the permanent copy for regular tables is probably
really large.

2. You can't write to unlogged tables on standby servers, so this
doesn't help solve the problem of wanting to use temporary tables on
standbys.

3. While it makes creating temporary tables a lighter-weight
operation, because you no longer need to write WAL for the catalog
entries, there's probably still substantially more overhead than just
stuffing them in backend-local RAM.  So the performance benefits are
probably fairly modest.

Overall I feel like the development effort that it would take to make
this work would almost certainly be better-expended elsewhere.  But of
course I'm not in charge of how people who work for other companies
spend their time...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-22 Thread Andres Freund
On 2016-08-22 16:29:12 -0400, Robert Haas wrote:
> So, I wish I could give you some better advice on this topic, but
> sadly I am not an expert in this area.  However, it seems to me that
> this is just one facet of a much more general problem: given two
> transactions T1 and T2, the order of replay must match the order of
> commit unless you can prove that there are no dependencies between
> them.  I don't see why it matters whether the operations are sequence
> operations or data operations; it's just a question of whether they're
> modifying the same "stuff".
> 
> Of course, it's possible I'm missing something important here...

Maybe that normally logical decoding outputs stuff in commit order?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding of sequence advances, part II

2016-08-22 Thread Robert Haas
On Sun, Aug 21, 2016 at 11:13 PM, Craig Ringer  wrote:
> If the sequence is created in the current xact (i.e. uncommitted) we have to
> add the sequence updates to that xact to be replayed only if it commits. The
> sequence is visible only to the toplevel xact that created the sequence so
> advances of it can only come from that xact and its children. The actual
> CREATE SEQUENCE is presumed to be handled separately by an event trigger or
> similar.
>
> If the new sequence is committed we must replay sequence advances
> immediately and non-transactionally to ensure they're not lost due to xact
> rollback or replayed in the wrong order due to xact commit order.

So, I wish I could give you some better advice on this topic, but
sadly I am not an expert in this area.  However, it seems to me that
this is just one facet of a much more general problem: given two
transactions T1 and T2, the order of replay must match the order of
commit unless you can prove that there are no dependencies between
them.  I don't see why it matters whether the operations are sequence
operations or data operations; it's just a question of whether they're
modifying the same "stuff".

Of course, it's possible I'm missing something important here...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-22 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-22 13:54:43 -0400, Robert Haas wrote:
>> On Sat, Aug 20, 2016 at 11:17 AM, Tom Lane  wrote:
>>> I'm inclined to suggest you forget this approach and propose a single
>>> counter for "SQL commands executed", which avoids all of the above
>>> definitional problems.  People who need more detail than that are
>>> probably best advised to look to contrib/pg_stat_statements, anyway.

>> I disagree.  I think SQL commands executed, lumping absolutely
>> everything together, really isn't much use.

> I'm inclined to agree. I think that's a quite useful stat when looking
> at an installation one previously didn't have a lot of interaction with.

Well, let's at least have an "other" category so you can add up the
counters and get a meaningful total.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2016-08-22 Thread Tom Lane
Robert Haas  writes:
> On Sat, Aug 20, 2016 at 4:58 PM, Tom Lane  wrote:
>> I am not exactly sold on this assumption that applications have
>> de-duplicated the contents of a VALUES or IN list.  They haven't been
>> asked to do that in the past, so why do you think they are doing it?

> It's hard to know, but my intuition is that most people would
> deduplicate.  I mean, nobody is going to want to their query generator
> to send X IN (1, 1, ) to the server if it
> could have just sent X IN (1).

I dunno, these are the very same people who send "WHERE 1=1" so that
they can save one line of code to decide whether to append AND or not
before the first real condition.

Still, maybe we should optimize for smart queries rather than stupid
ones.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in abbreviated keys abort handling (found with amcheck)

2016-08-22 Thread Peter Geoghegan
On Mon, Aug 22, 2016 at 12:34 PM, Robert Haas  wrote:
> Ugh, that sucks. Thanks for the report and patch.  Committed and
> back-patched to 9.5.

Thanks.

Within Heroku, there is a lot of enthusiasm for the idea of sharing
hard data about the prevalence of problems like this. I hope to be
able to share figures in the next few weeks, when I finish working
through the backlog.

Separately, I would like amcheck to play a role in how we direct users
to REINDEX, as issues like this come to light. It would be much more
helpful if we didn't have to be so conservative. I hesitate to say
that amcheck will detect cases where this bug led to corruption with
100% reliability, but I think that any case that one can imagine in
which amcheck fails here is unlikely in the extreme. The same applies
to the glibc abbreviated keys issue.

I actually didn't find any glibc strxfrm() issues yet, even though any
instances of corruption of text indexes I've seen originated before
the point release in which strxfrm() became distrusted. I guess that
not that many Heroku users use the "C" locale, which would still be
affected with the latest point release.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changed SRF in targetlist handling

2016-08-22 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-17 17:41:28 -0700, Andres Freund wrote:
>> Tom, do you think this is roughly going in the right direction?

I've not had time to look at this patch, I'm afraid.  If you still
want me to, I can make time in a day or so.

> I'm working on these. Atm ExecMakeTableFunctionResult() resides in
> execQual.c - I'm inlining it into nodeFunctionscan.c now, because
> there's no other callers, and having it separate seems to bring no
> benefit.

> Please speak soon up if you disagree.

I think ExecMakeTableFunctionResult was placed in execQual.c because
it seemed to belong there alongside the support for SRFs in tlists.
If that's going away then there's no good reason not to move the logic
to where it's used.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in abbreviated keys abort handling (found with amcheck)

2016-08-22 Thread Robert Haas
On Fri, Aug 19, 2016 at 6:07 PM, Peter Geoghegan  wrote:
> I found another bug as a result of using amcheck on Heroku customer
> databases. This time, the bug is in core Postgres. It's one of mine.
>
> There was a thinko in tuplesort's abbreviation abort logic, causing
> certain SortTuples to be spuriously marked NULL (and so, subsequently
> sorted as a NULL tuple, despite not actually changing anything about
> the representation of caller tuples). The attached patch fixes this
> bug.

Ugh, that sucks. Thanks for the report and patch.  Committed and
back-patched to 9.5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changed SRF in targetlist handling

2016-08-22 Thread Andres Freund
Hi,

On 2016-05-23 09:26:03 +0800, Craig Ringer wrote:
> SRFs-in-tlist are a lot faster for lockstep iteration etc. They're also
> much simpler to write, though if the result result rowcount differs
> unexpectedly between the functions you get exciting and unexpected
> behaviour.
> 
> WITH ORDINALITY provides what I think is the last of the functionality
> needed to replace SRFs-in-from, but at a syntatactic complexity and
> performance cost. The following example demonstrates that, though it
> doesn't do anything that needs LATERAL etc. I'm aware the following aren't
> semantically identical if the rowcounts differ.

I think here you're just missing ROWS FROM (generate_series(..), 
generate_series(...))

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exporting more function in libpq

2016-08-22 Thread Alvaro Herrera
Craig Ringer wrote:
> On 19 August 2016 at 14:17, Tatsuo Ishii  wrote:
> 
> > I would like to proppse to export these functions in libpq.
> >
> > pqPutMsgStart
> > pqPutMsgEnd
> > pqPutc
> > pqPuts
> > pqPutInt
> > pqPutnchar
> > pqFlush
> > pqHandleSendFailure
> >
> > I think this would be useful to create a tool/library which needs to
> > handle frontend/backend protocol messages in detail.
> 
> Shouldn't that generally be done by extending libpq to add the required
> functionality?

The thought that came to me was that maybe we need a separate library
that handles the lower level operations (a "fe/be" library, if you will)
which can be exported for others to use and is used by libpq to
implement the slightly-higher-level functionality.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changed SRF in targetlist handling

2016-08-22 Thread Andres Freund
On 2016-08-17 17:41:28 -0700, Andres Freund wrote:
> Tom, do you think this is roughly going in the right direction? My plan
> here is to develop two patches, to come before this:
> 
> a) Allow to avoid using a tuplestore for SRF_PERCALL SRFs in ROWS FROM -
>otherwise our performance would regress noticeably in some cases.
> b) Allow ROWS FROM() to return SETOF RECORD type SRFs as one column,
>instead of expanded. That's important to be able move SETOF RECORD
>returning functions in the targetlist into ROWS FROM, which otherwise
>requires an explicit column list.

I'm working on these. Atm ExecMakeTableFunctionResult() resides in
execQual.c - I'm inlining it into nodeFunctionscan.c now, because
there's no other callers, and having it separate seems to bring no
benefit.

Please speak soon up if you disagree.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Most efficient way for libPQ .. PGresult serialization

2016-08-22 Thread Joshua Bay
Oh I see.
I just read more on use cases PgBouncer, but seems like it can't be used
for my project.
The reason is that I need to have my middleware to have full control over
each transaction.
That is it must be able to decide if it's going to commit or abort a single
query (reason why libpq is used in the middleware), and it must be able to
decide when to send back the result. Also it does things like load
balancing with it's algorithm.

So, what middleware does is (simplied, ignoring other details)
1. listens to query and does load balancing
2. execute query on behalf of client to server with libpq (does not have to
be libpq).
3. serialize the result and send it back

And the #3 is why I asked for ways to serialize PGresult (of libpq)

Client app will deserialize the result and thus be able to interpret
PGresult as if it used libpq itself.

Thanks!

On Thu, Aug 18, 2016 at 9:05 PM, Craig Ringer  wrote:

>
> On 19 August 2016 at 03:08, Joshua Bay  wrote:
>
>> Thanks,
>> But I don't think my question was clear enough.
>>
>> I already managed the connection pooling, and what I need is to serialize
>> the result.
>>
>> If PGresult was a contiguous block, I could have just create buffer and
>> call memcpy for serialization, but structure of result seems much more
>> complicated.
>>
>> So, I was asking if there is an easy way to achieve serialization
>>
>
> It's wire format is a serialization. That's kind of the point.
>
> I don't understand what you're trying to do here, so it's hard to give a
> better answer.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] Most efficient way for libPQ .. PGresult serialization

2016-08-22 Thread Joshua Bay
No, it can be anything else.

Please correctly me if I'm wrong, but to me, PgPool-II looks like a proxy
server that forwards all the data without interpretation. Can I intercept
in the middle and control the flow between client and server? For e.g, I
need control when the result of transaction is sent back to the result?




On Sat, Aug 20, 2016 at 2:39 AM, Craig Ringer  wrote:

> On 19 August 2016 at 22:16, Joshua Bay  wrote:
>
>> Oh I see.
>> I just read more on use cases PgBouncer, but seems like it can't be used
>> for my project.
>> The reason is that I need to have my middleware to have full control over
>> each transaction.
>> That is it must be able to decide if it's going to commit or abort a
>> single query (reason why libpq is used in the middleware), and it must be
>> able to decide when to send back the result. Also it does things like load
>> balancing with it's algorithm.
>>
>> So, what middleware does is (simplied, ignoring other details)
>> 1. listens to query and does load balancing
>> 2. execute query on behalf of client to server with libpq (does not have
>> to be libpq).
>> 3. serialize the result and send it back
>>
>> And the #3 is why I asked for ways to serialize PGresult (of libpq)
>>
>> Client app will deserialize the result and thus be able to interpret
>> PGresult as if it used libpq itself.
>>
>>
> Surely the app should just use libpq, and your middleware should be a
> proxy?
>
> Like, say, PgPool-II?
>
> Otherwise you'll have to extract all the results handling parts of libpq
> into some smaller cut-down library and graft on
> serialization/deserialization code. There's nothing built-in for that
> ,since the natural and logical serialization for query results is the
> PostgreSQL wire protocol.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2016-08-22 Thread Alvaro Herrera
Robert Haas wrote:
> On Sat, Aug 20, 2016 at 4:58 PM, Tom Lane  wrote:
> > Jeff Janes  writes:
> >> On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane  wrote:
> >>> It does know it, what it doesn't know is how many duplicates there are.
> >
> >> Does it know whether the count comes from a parsed query-string list/array,
> >> rather than being an estimate from something else?  If it came from a join,
> >> I can see why it would be dangerous to assume they are mostly distinct.
> >> But if someone throws 6000 things into a query string and only 200 distinct
> >> values among them, they have no one to blame but themselves when it makes
> >> bad choices off of that.
> >
> > I am not exactly sold on this assumption that applications have
> > de-duplicated the contents of a VALUES or IN list.  They haven't been
> > asked to do that in the past, so why do you think they are doing it?
> 
> It's hard to know, but my intuition is that most people would
> deduplicate.  I mean, nobody is going to want to their query generator
> to send X IN (1, 1, ) to the server if it
> could have just sent X IN (1).

Also, if we patch it this way and somebody has a slow query because of a
lot of duplicate values, it's easy to solve the problem by
de-duplicating.  But with the current code, people that have the
opposite problem has no way to work around it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-22 Thread Andres Freund
On 2016-08-22 13:54:43 -0400, Robert Haas wrote:
> On Sat, Aug 20, 2016 at 11:17 AM, Tom Lane  wrote:
> > I'm inclined to suggest you forget this approach and propose a single
> > counter for "SQL commands executed", which avoids all of the above
> > definitional problems.  People who need more detail than that are
> > probably best advised to look to contrib/pg_stat_statements, anyway.
> 
> I disagree.  I think SQL commands executed, lumping absolutely
> everything together, really isn't much use.

I'm inclined to agree. I think that's a quite useful stat when looking
at an installation one previously didn't have a lot of interaction with.


> Haribabu's categorization
> scheme seems to need some work, but the idea of categorizing
> statements by type and counting executions per type seems very
> reasonable.

I'd consider instead using something like COALESCE(commandType,
nodeTag(Query->utilityStmt)) as the categories. Not sure if I'd even
pivot that.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-22 Thread Robert Haas
On Sat, Aug 20, 2016 at 11:17 AM, Tom Lane  wrote:
> I'm inclined to suggest you forget this approach and propose a single
> counter for "SQL commands executed", which avoids all of the above
> definitional problems.  People who need more detail than that are
> probably best advised to look to contrib/pg_stat_statements, anyway.

I disagree.  I think SQL commands executed, lumping absolutely
everything together, really isn't much use.  Haribabu's categorization
scheme seems to need some work, but the idea of categorizing
statements by type and counting executions per type seems very
reasonable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-08-22 Thread Pavel Stehule
2016-08-22 18:19 GMT+02:00 Robert Haas :

> On Mon, Aug 22, 2016 at 4:49 AM, Pavel Stehule 
> wrote:
> > This feature shows source code for PL function when \df statement was
> used.
> > I am not too sure, if this functionality is necessary - but I don't see
> any
> > argument against. Sometimes it can be useful, mainly when we work with
> > overloaded functions.
>
> Wait, really?  I thought Peter was complaining about the fact that it
> *removed* that from the display.
>
> He also complained about the fact that the subject line of this thread
> and what the patch actually does have diverged considerably, which I
> think is a fair complaint.
>

If I understand to purpose of this patch - it is compromise - PL source is
removed from table, but it is printed in result.

I am sure so there are low benefit from displaying the body of PL function
inside table. But I see some benefit on Tom's design. We cannot to simply
show source code of more functions. \sf doesn't support it. The source is
displayed on the end, so there is low impact on result.

Regards

Pavel




>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)

2016-08-22 Thread Andres Freund
On 2016-08-22 13:49:47 -0400, Robert Haas wrote:
> On Mon, Aug 22, 2016 at 1:46 PM, Andres Freund  wrote:
> > I don't think the runtime overhead is likely to be all that high - if
> > you look at valgrind.supp the peformancecritical parts basically are:
> > - pgstat_send - the context switching is going to drown out some zeroing
> > - xlog insertions - making the crc computation more predictable would
> >   actually be nice
> > - reorderbuffer serialization - zeroing won't be a material part of the
> >   cost
> >
> > The rest is mostly bootstrap or python related.
> >
> > There might be cases where we *don't* unconditionally do the zeroing -
> > e.g. I'm doubtful about the sinval stuff where we currently only
> > conditionally clear - but the stuff in valgrind.supp seems fine.
> 
> Naturally you'll be wanting to conclusively demonstrate this with
> benchmarks on multiple workloads, platforms, and concurrency levels.
> Right?  :-)

Pah ;)

I do think some micro-benchmarks aiming at the individual costs make
sense, we're only taking about ~three places in the code - don't think
concurrency plays a large role though ;)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)

2016-08-22 Thread Robert Haas
On Mon, Aug 22, 2016 at 1:46 PM, Andres Freund  wrote:
> I don't think the runtime overhead is likely to be all that high - if
> you look at valgrind.supp the peformancecritical parts basically are:
> - pgstat_send - the context switching is going to drown out some zeroing
> - xlog insertions - making the crc computation more predictable would
>   actually be nice
> - reorderbuffer serialization - zeroing won't be a material part of the
>   cost
>
> The rest is mostly bootstrap or python related.
>
> There might be cases where we *don't* unconditionally do the zeroing -
> e.g. I'm doubtful about the sinval stuff where we currently only
> conditionally clear - but the stuff in valgrind.supp seems fine.

Naturally you'll be wanting to conclusively demonstrate this with
benchmarks on multiple workloads, platforms, and concurrency levels.
Right?  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Andres Freund
On 2016-08-22 13:41:57 -0400, Robert Haas wrote:
> On Mon, Aug 22, 2016 at 1:32 PM, Heikki Linnakangas  wrote:
> >> But what about the best case?  If we create a scenario where there are
> >> no open read-write transactions at all and (somehow) lots and lots of
> >> ProcArrayLock contention, how much does this help?
> >
> > I ran some quick pgbench tests on my laptop, but didn't see any meaningful
> > benefit. I think the best I could see is about 5% speedup, when running
> > "pgbench -S", with 900 idle connections sitting in the background. On the
> > positive side, I didn't see much slowdown either. (Sorry, I didn't record
> > the details of those tests, as I was testing many different options and I
> > didn't see a clear difference either way.)
> 
> That's not very exciting.

I think it's neither exciting nor worrying - the benefit of the pgproc
batch clearing itself wasn't apparent on small hardware either...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)

2016-08-22 Thread Andres Freund
On 2016-08-22 13:16:34 -0400, Robert Haas wrote:
> On Sat, Aug 20, 2016 at 3:46 PM, Tom Lane  wrote:
> > So to me, it seems like the core of this complaint boils down to "my
> > sanitizer doesn't understand the valgrind exclusion patterns that have
> > been created for Postgres".  We can address that to some extent by trying
> > to reduce the number of valgrind exclusions we need, but it's unlikely to
> > be practical to get that to zero, and it's not very clear that adding
> > runtime cycles is a good tradeoff for it either.  So maybe we need to push
> > back on the assumption that people should expect their sanitizers to
> > produce zero warnings without having made some effort to adapt the
> > valgrind rules.

I don't think the runtime overhead is likely to be all that high - if
you look at valgrind.supp the peformancecritical parts basically are:
- pgstat_send - the context switching is going to drown out some zeroing
- xlog insertions - making the crc computation more predictable would
  actually be nice
- reorderbuffer serialization - zeroing won't be a material part of the
  cost

The rest is mostly bootstrap or python related.

There might be cases where we *don't* unconditionally do the zeroing -
e.g. I'm doubtful about the sinval stuff where we currently only
conditionally clear - but the stuff in valgrind.supp seems fine.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Robert Haas
On Mon, Aug 22, 2016 at 1:32 PM, Heikki Linnakangas  wrote:
>> But what about the best case?  If we create a scenario where there are
>> no open read-write transactions at all and (somehow) lots and lots of
>> ProcArrayLock contention, how much does this help?
>
> I ran some quick pgbench tests on my laptop, but didn't see any meaningful
> benefit. I think the best I could see is about 5% speedup, when running
> "pgbench -S", with 900 idle connections sitting in the background. On the
> positive side, I didn't see much slowdown either. (Sorry, I didn't record
> the details of those tests, as I was testing many different options and I
> didn't see a clear difference either way.)

That's not very exciting.

> It seems that Amit's PGPROC batch clearing patch was very effective. I
> remember seeing ProcArrayLock contention very visible earlier, but I can't
> hit that now. I suspect you'd still see contention on bigger hardware,
> though, my laptop has oly 4 cores. I'll have to find a real server for the
> next round of testing.

It's good that those patches were effective, and I bet that approach
could be further refined, too.  However, I think Amit may have
mentioned in an internal meeting that he was able to generate some
pretty serious ProcArrayLock contention with some of his hash index
patches applied.  I don't remember the details, though.

>> Because there's only a purpose to trying to minimize the losses if
>> there are some gains to which we can look forward.
>
> Aside from the potential performance gains, this slashes a lot of
> complicated code:
>
>  70 files changed, 2429 insertions(+), 6066 deletions(-)
>
> That removed code is quite mature at this point, and I'm sure we'll add some
> code back to this patch as it evolves, but still.

That's interesting, but it might just mean we're replacing well-tested
code with new, buggy code.  By the time you fix all the performance
regressions, those numbers could be a lot closer together.

> Also, I'm looking forward for a follow-up patch, to track snapshots in
> backends at a finer level, so that vacuum could remove tuples more
> aggressively, if you have pg_dump running for days. CSN snapshots isn't a
> strict requirement for that, but it makes it simpler, when you can represent
> a snapshot with a small fixed-size integer.

That would certainly be nice, but I think we need to be careful not to
sacrifice too much trying to get there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Andres Freund
Hi,

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:
> I ran some quick pgbench tests on my laptop, but didn't see any meaningful
> benefit. I think the best I could see is about 5% speedup, when running
> "pgbench -S", with 900 idle connections sitting in the background. On the
> positive side, I didn't see much slowdown either. (Sorry, I didn't record
> the details of those tests, as I was testing many different options and I
> didn't see a clear difference either way.)

Hm. Does the picture change if those are idle in transaction, after
assigning an xid.


> It seems that Amit's PGPROC batch clearing patch was very effective.

It usually breaks down if you have a mixed read/write workload - might
be worthehile prototyping that.


> I
> remember seeing ProcArrayLock contention very visible earlier, but I can't
> hit that now. I suspect you'd still see contention on bigger hardware,
> though, my laptop has oly 4 cores. I'll have to find a real server for the
> next round of testing.

Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)

2016-08-22 Thread Robert Haas
On Sat, Aug 20, 2016 at 3:46 PM, Tom Lane  wrote:
> Noah Misch  writes:
>> On Fri, Aug 19, 2016 at 07:22:02PM -0400, Tom Lane wrote:
>>> So maybe we ought to work towards taking those out?
>
>> Maybe.  It's a policy question boiling down to our willingness to spend 
>> cycles
>> zeroing memory in order to limit trust in the confidentiality of storage
>> backing the data directory.  Think of "INSERT INTO t VALUES(my_encrypt('key',
>> 'cleartext'))", subsequent to which bits of the key or cleartext leak to disk
>> by way of WAL padding bytes.  A reasonable person might expect that not to
>> happen; GNU Privacy Guard and a few like-minded programs prevent it.  I'm on
>> the fence regarding whether PostgreSQL should target this level of vigilance.
>> An RDBMS is mainly a tool for managing persistent data, and PostgreSQL will
>> never be a superior tool for data that _must not_ persist.  Having said that,
>> the runtime cost of zeroing memory and the development cost of reviewing the
>> patches to do so is fairly low.
>
> [ after thinking some more about this... ]
>
> FWIW, I put pretty much zero credence in the proposition that junk left in
> padding bytes in WAL or data files represents a meaningful security issue.
> An attacker who has access to those files will probably find much more
> that is of interest in the non-pad data.  My only interest here is in
> making the code sanitizer-clean, which seems like it is useful for
> debugging purposes independently of any security arguments.
>
> So to me, it seems like the core of this complaint boils down to "my
> sanitizer doesn't understand the valgrind exclusion patterns that have
> been created for Postgres".  We can address that to some extent by trying
> to reduce the number of valgrind exclusions we need, but it's unlikely to
> be practical to get that to zero, and it's not very clear that adding
> runtime cycles is a good tradeoff for it either.  So maybe we need to push
> back on the assumption that people should expect their sanitizers to
> produce zero warnings without having made some effort to adapt the
> valgrind rules.

One idea is to add protect additional memory-clearing operations with
#ifdef SANITIZER_CLEAN or something of that sort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Heikki Linnakangas

On 08/22/2016 07:49 PM, Robert Haas wrote:

Nice to see you working on this again.

On Mon, Aug 22, 2016 at 12:35 PM, Heikki Linnakangas  wrote:

A sequential scan of a table like that with 10 million rows took about 700
ms on my laptop, when the hint bits are set, without this patch. With this
patch, if there's a snapshot holding back the xmin horizon, so that we need
to check the CSN log for every XID, it took about 3 ms. So we have some
optimization work to do :-). I'm not overly worried about that right now, as
I think there's a lot of room for improvement in the SLRU code. But that's
the next thing I'm going to work.


So the worst case for this patch is obviously bad right now and, as
you say, that means that some optimization work is needed.

But what about the best case?  If we create a scenario where there are
no open read-write transactions at all and (somehow) lots and lots of
ProcArrayLock contention, how much does this help?


I ran some quick pgbench tests on my laptop, but didn't see any 
meaningful benefit. I think the best I could see is about 5% speedup, 
when running "pgbench -S", with 900 idle connections sitting in the 
background. On the positive side, I didn't see much slowdown either. 
(Sorry, I didn't record the details of those tests, as I was testing 
many different options and I didn't see a clear difference either way.)


It seems that Amit's PGPROC batch clearing patch was very effective. I 
remember seeing ProcArrayLock contention very visible earlier, but I 
can't hit that now. I suspect you'd still see contention on bigger 
hardware, though, my laptop has oly 4 cores. I'll have to find a real 
server for the next round of testing.



Because there's only a purpose to trying to minimize the losses if
there are some gains to which we can look forward.


Aside from the potential performance gains, this slashes a lot of 
complicated code:


 70 files changed, 2429 insertions(+), 6066 deletions(-)

That removed code is quite mature at this point, and I'm sure we'll add 
some code back to this patch as it evolves, but still.


Also, I'm looking forward for a follow-up patch, to track snapshots in 
backends at a finer level, so that vacuum could remove tuples more 
aggressively, if you have pg_dump running for days. CSN snapshots isn't 
a strict requirement for that, but it makes it simpler, when you can 
represent a snapshot with a small fixed-size integer.


Yes, seeing some direct performance gains would be nice too.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2016-08-22 Thread Robert Haas
On Sat, Aug 20, 2016 at 4:58 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane  wrote:
>>> It does know it, what it doesn't know is how many duplicates there are.
>
>> Does it know whether the count comes from a parsed query-string list/array,
>> rather than being an estimate from something else?  If it came from a join,
>> I can see why it would be dangerous to assume they are mostly distinct.
>> But if someone throws 6000 things into a query string and only 200 distinct
>> values among them, they have no one to blame but themselves when it makes
>> bad choices off of that.
>
> I am not exactly sold on this assumption that applications have
> de-duplicated the contents of a VALUES or IN list.  They haven't been
> asked to do that in the past, so why do you think they are doing it?

It's hard to know, but my intuition is that most people would
deduplicate.  I mean, nobody is going to want to their query generator
to send X IN (1, 1, ) to the server if it
could have just sent X IN (1).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-22 Thread Robert Haas
On Sat, Aug 20, 2016 at 9:46 AM, Craig Ringer  wrote:
> Ahem. Forgot to squash in a fixup commit. Updated patch of
> txid_status(bigint) attachd.
>
> A related patch follows, adding a new txid_current_ifassigned(bigint)
> function as suggested by Jim Nasby. It's usefully connected to txid_status()
> and might as well be added at the same time.
>
> Since it builds on the same history I've also attached an updated version of
> txid_recent(bigint) now called txid_convert_ifrecent(bigint), per the
> above-linked thread.
>
> Finally, and not intended for commit, is a useful test function I wrote to
> cause extremely rapid xid wraparound, bundled up into a src/test/regress
> test case. txid_incinerate() can jump the server about UINT32/2 xids in ~2
> seconds if fsync is off, making it handy for testing.  Posting so others can
> use it for their own test needs later and because it's useful for testing
> these patches that touch on the xid epoch.

I think you should use underscores to separate all of the words
instead of only some of them.

Also, note that the corresponding internal function is
GetTopTransactionIdIfAny(), which might suggest txid_current_if_any()
rather than txid_current_if_assigned(), but you could argue that your
naming is better...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Robert Haas
Nice to see you working on this again.

On Mon, Aug 22, 2016 at 12:35 PM, Heikki Linnakangas  wrote:
> A sequential scan of a table like that with 10 million rows took about 700
> ms on my laptop, when the hint bits are set, without this patch. With this
> patch, if there's a snapshot holding back the xmin horizon, so that we need
> to check the CSN log for every XID, it took about 3 ms. So we have some
> optimization work to do :-). I'm not overly worried about that right now, as
> I think there's a lot of room for improvement in the SLRU code. But that's
> the next thing I'm going to work.

So the worst case for this patch is obviously bad right now and, as
you say, that means that some optimization work is needed.

But what about the best case?  If we create a scenario where there are
no open read-write transactions at all and (somehow) lots and lots of
ProcArrayLock contention, how much does this help?

Because there's only a purpose to trying to minimize the losses if
there are some gains to which we can look forward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Heikki Linnakangas

On 08/22/2016 07:35 PM, Heikki Linnakangas wrote:

And here's a new patch version...


And here's the attachment I forgot, *sigh*..

- Heikki



csn-4.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2016-08-22 Thread Heikki Linnakangas
And here's a new patch version. Still lots of work to do, especially in 
performance testing, and minimizing the worst-case performance hit.


On 08/09/2016 03:16 PM, Heikki Linnakangas wrote:

Next steps:

* Hot standby feedback is broken, now that CSN != LSN again. Will have
to switch this back to using an "oldest XID", rather than a CSN.

* I plan to replace pg_subtrans with a special range of CSNs in the
csnlog. Something like, start the CSN counter at 2^32 + 1, and use CSNs
< 2^32 to mean "this is a subtransaction, parent is XXX". One less SLRU
to maintain.

* Put per-proc xmin back into procarray. I removed it, because it's not
necessary for snapshots or GetOldestSnapshot() (which replaces
GetOldestXmin()) anymore. But on second thoughts, we still need it for
deciding when it's safe to truncate the csnlog.

* In this patch, HeapTupleSatisfiesVacuum() is rewritten to use an
"oldest CSN", instead of "oldest xmin", but that's not strictly
necessary. To limit the size of the patch, I might revert those changes
for now.


I did all of the above. This patch is now much smaller, as I didn't 
change all the places that used to deal with global-xmin's, like I did 
earlier. The oldest-xmin is now computed pretty like it always has been.



* Rewrite the way RecentGlobalXmin is updated. As Alvaro pointed out in
his review comments two years ago, that was quite complicated. And I'm
worried that the lazy scheme I had might not allow pruning fast enough.
I plan to make it more aggressive, so that whenever the currently oldest
transaction finishes, it's responsible for advancing the "global xmin"
in shared memory. And the way it does that, is by scanning the csnlog,
starting from the current "global xmin", until the next still
in-progress XID. That could be a lot, if you have a very long-running
transaction that ends, but we'll see how it performs.


I ripped out all that, and created a GetRecentGlobalXmin() function that 
computes a global-xmin value when needed, like GetOldestXmin() does. 
Seems most straightforward. Since we no longer get a RecentGlobalXmin 
value essentially for free in GetSnapshotData(), as we no longer scan 
the proc array, it's better to compute the value only when needed.



* Performance testing. Clearly this should have a performance benefit,
at least under some workloads, to be worthwhile. And not regress.


I wrote a little C module to create a "worst-case" table. Every row in 
the table has a different xmin, and the xmin values are shuffled across 
the table, to defeat any caching.


A sequential scan of a table like that with 10 million rows took about 
700 ms on my laptop, when the hint bits are set, without this patch. 
With this patch, if there's a snapshot holding back the xmin horizon, so 
that we need to check the CSN log for every XID, it took about 3 ms. 
So we have some optimization work to do :-). I'm not overly worried 
about that right now, as I think there's a lot of room for improvement 
in the SLRU code. But that's the next thing I'm going to work.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-08-22 Thread Robert Haas
On Mon, Aug 22, 2016 at 4:49 AM, Pavel Stehule  wrote:
> This feature shows source code for PL function when \df statement was used.
> I am not too sure, if this functionality is necessary - but I don't see any
> argument against. Sometimes it can be useful, mainly when we work with
> overloaded functions.

Wait, really?  I thought Peter was complaining about the fact that it
*removed* that from the display.

He also complained about the fact that the subject line of this thread
and what the patch actually does have diverged considerably, which I
think is a fair complaint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in comment to function LockHasWaitersRelation() [master branch]

2016-08-22 Thread Robert Haas
On Mon, Aug 22, 2016 at 9:01 AM, Dmitry Ivanov  wrote:
>> Hi hackers,
>>
>> I've found a typo in a comment to function LockHasWaitersRelation() [lmgr.c
>> :
>> 271, master branch]:
>> >> This is a functiion to check
>
> Attached a patch.

Thanks.  Committed with a bit of additional wordsmithing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Simon Riggs
On 22 August 2016 at 13:44, Kuntal Ghosh  wrote:

> Please let me know your thoughts on this.

Do the regression tests pass with this option enabled?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LSN as a recovery target

2016-08-22 Thread Robert Haas
On Mon, Aug 22, 2016 at 8:28 AM, Michael Paquier
 wrote:
> On Mon, Aug 22, 2016 at 7:12 PM, Adrien Nayrat  
> wrote:
>> As Julien said, there is nothing to notice that error comes from
>> recovery.conf.
>> My fear would be that an user encounters an error like this. Il will be
>> difficult to link to the recovery.conf.
>
> Thinking a bit wider than that, we may want to know such context for
> normal GUC parameters as well, and that's not the case now. Perhaps
> there is actually a reason why that's not done for GUCs, but it seems
> that it would be useful there as well. That would give another reason
> to move all that under the GUC umbrella.

Maybe so, but that's been tried multiple times without success.  If
you think an error context is useful here, and I bet it is, I'd say
just add it and be done with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Robert Haas
On Mon, Aug 22, 2016 at 9:25 AM, Michael Paquier
 wrote:
> Another pin-point is: given a certain page, how do we identify of
> which type it is? One possibility would be again to extend the AM
> handler with some kind of is_self function with a prototype like that:
> bool handler->is_self(Page);
> If the page is of the type of the handler, this returns true, and
> false otherwise. Still here performance would suck.
>
> At the end, what we want is a clean interface, and more thoughts into it.

I think that it makes sense to filter based on the resource manager
ID, but I can't see how we could reasonably filter based on the AM
name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracking wait event for latches

2016-08-22 Thread Robert Haas
On Mon, Aug 22, 2016 at 9:49 AM, Michael Paquier
 wrote:
> The reason why I chose this way is that there are a lot of them. It is
> painful to maintain the order of the array elements in perfect mapping
> with the list of IDs...

You can use stupid macro tricks to help with that problem...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF-8 docs?

2016-08-22 Thread Peter Eisentraut
On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
> I don't know what kind of problem you are seeing with encoding
> handling, but at least UTF-8 is working for Japanese, French and
> Russian.

Those translations are using DocBook XML.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improved DefElem list processing

2016-08-22 Thread Alvaro Herrera
Peter Eisentraut wrote:

> I'm not happy that utils/acl.h has prototypes for aclchk.c, because
> acl.h is included all over the place.  Perhaps I should make a
> src/include/catalog/aclchk.c to clean that up.

I've been bothered by that too in the past.  +1 for the cleanup.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improved DefElem list processing

2016-08-22 Thread Michael Paquier
On Mon, Aug 22, 2016 at 10:41 PM, Pavel Stehule  wrote:
> 1. This patch introduce location in DefElement node, and inject ParserState
> to SQL commands, where ParserState was not used. It allows to show the
> position of an error. This patch is not small, but almost changes are
> trivial.
>
> 2. There are no problems with patching, compiling, tests - all tests passed.
>
> 3. There is not any new functionality, so new tests and new documentation is
> not necessary.
>
> I'll mark this patch as ready for commiter.

Now that I look at those patches, +1 for both. Particularly the
redundant-option checks will remove a lot of boring code.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm_unpin_segment

2016-08-22 Thread Amit Kapila
On Mon, Aug 22, 2016 at 6:06 PM, Thomas Munro
 wrote:
> On Tue, Aug 23, 2016 at 12:07 AM, Amit Kapila  wrote:
>> + int control_slot = -1;
>> ...
>> + if (control_slot == -1)
>> + elog(ERROR, "cannot unpin unknown segment handle");
>>
>> Isn't it better to use INVALID_CONTROL_SLOT for control_slot and use
>> datatype as uint32 (same is used for dsm_segment->control_slot and
>> nitems)?
>
> Yes, it is better.  New version attached.
>

This version of patch looks good to me.  I have marked it as Ready For
Committer.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracking wait event for latches

2016-08-22 Thread Michael Paquier
On Mon, Aug 22, 2016 at 6:09 PM, Alexander Korotkov
 wrote:
> Hi, Michael!
>
> On Thu, Aug 4, 2016 at 8:26 AM, Michael Paquier 
> wrote:
> I took a look at your patch. Couple of notes from me.

Thanks!

>> const char *
>> GetEventIdentifier(uint16 eventId)
>> {
>> const char *res;
>> switch (eventId)
>> {
>> case EVENT_ARCHIVER_MAIN:
>> res = "ArchiverMain";
>> break;
>> ... long long list of events ...
>> case EVENT_WAL_SENDER_WRITE_DATA:
>> res = "WalSenderWriteData";
>> break;
>> default:
>> res = "???";
>> }
>> return res;
>> }
>
>
> Would it be better to use an array here?

The reason why I chose this way is that there are a lot of them. It is
painful to maintain the order of the array elements in perfect mapping
with the list of IDs...

>> typedef enum EventIdentifier
>> {
>
>
> EventIdentifier seems too general name for me, isn't it?  Could we name it
> WaitEventIdentifier? Or WaitEventId for shortcut?

OK. So WaitEventIdentifier? The reason to include Identifier is for
consistency with lwlock structure notation.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improved DefElem list processing

2016-08-22 Thread Pavel Stehule
Hi


2016-08-11 17:32 GMT+02:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 8/5/16 11:25 AM, Peter Eisentraut wrote:
> > On 8/4/16 2:21 PM, Tom Lane wrote:
> >> Forgot to mention: seems like you should have added a location
> >> argument to makeDefElem.
> >
> > I was hesitating to do that lest it break extensions or something, but I
> > guess we break bigger things than that all the time.  I'll change it.
>
> In order not to work on two patches that directly conflict with each
> other, I have proceeded with the location patch and postponed the
> duplicate checking patch.
>
> Attached is a biggish patch to review.  It adds location information to
> all places DefElems are created in the parser and then adds errposition
> information in a lot of places, but surely not all of them.  That can be
> improved over time.
>
> I'm not happy that utils/acl.h has prototypes for aclchk.c, because
> acl.h is included all over the place.  Perhaps I should make a
> src/include/catalog/aclchk.c to clean that up.
>
> Here are some example commands to try for getting suitable error messages:
>
> create collation foo (foo = bar, bar = foo);
> copy test from stdin (null 'x', null 'x');
> create function foo (a int, b int) returns int as $$ select a+b $$
> language sql language sql;
> create function foo (a int, b int) returns int as $$ select a+b $$
> language sql volatile stable;
> create function foo (a int, b int) returns int as $$ select a+b $$
> language sql with (foo = bar);
> create sequence foo minvalue 1 minvalue 2;
> create type foo (foo = bar);
> create user foo createdb nocreatedb;
> explain (foo, bar) select 1;
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
I am sending a review of this patch:

1. This patch introduce location in DefElement node, and inject ParserState
to SQL commands, where ParserState was not used. It allows to show the
position of an error. This patch is not small, but almost changes are
trivial.

2. There are no problems with patching, compiling, tests - all tests passed.

3. There is not any new functionality, so new tests and new documentation
is not necessary.

I'll mark this patch as ready for commiter.

Regards

Pavel





>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] WAL consistency check facility

2016-08-22 Thread Michael Paquier
On Mon, Aug 22, 2016 at 9:44 PM, Kuntal Ghosh
 wrote:
> Please let me know your thoughts on this.

Since custom AMs have been introduced, I have kept that in a corner of
my mind and thought about it a bit. And while the goal of this patch
is clearly worth it, I don't think that the page masking interface is
clear at all. For example, your patch completely ignores
contrib/bloom, and we surely want to do something about it. The idea
would be to add a page masking routine in IndexAmRoutine and heap to
be able to perform page masking operations directly with that. This
would allow as well one to be able to perform page masking for bloom
or any custom access method, and this will allow this sanity check to
be more generic as well.

Another pin-point is: given a certain page, how do we identify of
which type it is? One possibility would be again to extend the AM
handler with some kind of is_self function with a prototype like that:
bool handler->is_self(Page);
If the page is of the type of the handler, this returns true, and
false otherwise. Still here performance would suck.

At the end, what we want is a clean interface, and more thoughts into it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF-8 docs?

2016-08-22 Thread Tatsuo Ishii
> On 8/22/16 1:16 AM, Tatsuo Ishii wrote:
>> Just out of curiopusity, I wonder why we can't make the encoding of
>> SGML docs to be UTF-8, rather than current ISO-8859-1.
> 
> Encoding handling in DocBook SGML is weird, and making it work robustly
> will either fail or might be more work than just completing the
> conversion to XML.

I don't know what kind of problem you are seeing with encoding
handling, but at least UTF-8 is working for Japanese, French and
Russian.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-22 Thread Michael Paquier
On Mon, Aug 22, 2016 at 10:09 PM, Amit Kapila  wrote:
> Won't the similar problem exists for nonExclusiveBackups?  Basically
> in similar situation, the count for nonExclusiveBackups will be
> decremented and if it hits pg_start_backup_callback(), the following
> Assertion can fail.
> pg_start_backup_callback()
> {
> ..
> Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
> ..
> }

This cannot be hit for non-exclusive backups as pg_start_backup() and
pg_stop_backup() need to be launched from the same session: their call
will never overlap across session, and this overlap is the reason why
exclusive backups are exposed to the problem.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-22 Thread Amit Kapila
On Mon, Aug 22, 2016 at 7:13 AM, Michael Paquier
 wrote:
> On Sat, Aug 6, 2016 at 6:35 PM, Andreas Seltenreich  
> wrote:
>> Michael Paquier writes:
>>
>>> Andreas, with the patch attached is the assertion still triggered?
>>> [2. text/x-diff; base-backup-crash-v2.patch]
>>
>> I didn't observe the crashes since applying this patch.  There should
>> have been about five by the amount of fuzzing done.
>
> I have reworked the patch, replacing those two booleans with a single
> enum. That's definitely clearer. Also, I have added this patch to the
> CF to not lose track of it:
> https://commitfest.postgresql.org/10/731/
> Horiguchi-san, I have added you as a reviewer as you provided some
> input. I hope you don't mind.
>

Won't the similar problem exists for nonExclusiveBackups?  Basically
in similar situation, the count for nonExclusiveBackups will be
decremented and if it hits pg_start_backup_callback(), the following
Assertion can fail.
pg_start_backup_callback()
{
..
Assert(XLogCtl->Insert.nonExclusiveBackups > 0);
..
}


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LSN as a recovery target

2016-08-22 Thread Peter Eisentraut
On 8/22/16 8:28 AM, Michael Paquier wrote:
> Thinking a bit wider than that, we may want to know such context for
> normal GUC parameters as well, and that's not the case now. Perhaps
> there is actually a reason why that's not done for GUCs, but it seems
> that it would be useful there as well.

GUC parsing generally needs, or used to need, to work under more
constrained circumstances, e.g., no full memory management.  That's not
a reason not to try this, but there might be non-obvious problems.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in comment to function LockHasWaitersRelation() [master branch]

2016-08-22 Thread Dmitry Ivanov
> Hi hackers,
> 
> I've found a typo in a comment to function LockHasWaitersRelation() [lmgr.c
> :
> 271, master branch]:
> >> This is a functiion to check

Attached a patch.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 7b08555..d0fdb94 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -268,7 +268,7 @@ UnlockRelation(Relation relation, LOCKMODE lockmode)
 /*
  *		LockHasWaitersRelation
  *
- * This is a functiion to check if someone else is waiting on a
+ * This is a function to check if someone else is waiting on a
  * lock, we are currently holding.
  */
 bool

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo in comment to function LockHasWaitersRelation() [master branch]

2016-08-22 Thread Dmitry Ivanov
Hi hackers,

I've found a typo in a comment to function LockHasWaitersRelation() [lmgr.c : 
271, master branch]:

>> This is a functiion to check


-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF-8 docs?

2016-08-22 Thread Peter Eisentraut
On 8/22/16 1:16 AM, Tatsuo Ishii wrote:
> Just out of curiopusity, I wonder why we can't make the encoding of
> SGML docs to be UTF-8, rather than current ISO-8859-1.

Encoding handling in DocBook SGML is weird, and making it work robustly
will either fail or might be more work than just completing the
conversion to XML.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WAL consistency check facility

2016-08-22 Thread Kuntal Ghosh
Hi,

I've attached a patch to check if the current page is equal with the
FPW after applying WAL on it. This is how the patch works:

1. When a WAL record is inserted, a FPW is done for that operation.
But, a flag is kept to  indicate whether that page needs to be
restored.
2. During recovery, when a redo operation is done, we do a comparison
with the FPW contained in the WAL record with the current page in the
buffer. For this purpose, I've used Michael's patch with minor changes
to check whether two pages are actually equal or not.
3. I've also added a guc variable (wal_consistency_mask) to indicate
the operations (HEAP,BTREE,HASH,GIN etc) for which this feature
(always FPW and consistency check) is to be enabled.

How to use the patch:
1. Apply the patch.
2. In postgresql.conf file, set wal_consistency_mask variable
accordingly. For debug messages, set log_min_messages = debug1.

Michael's patch:
https://www.postgresql.org/message-id/CAB7nPqR4vxdKijP%2BDu82vOcOnGMvutq-gfqiU2dsH4bsM77hYg%40mail.gmail.com

Reference thread:
https://www.postgresql.org/message-id/flat/CAB7nPqR4vxdKijP%2BDu82vOcOnGMvutq-gfqiU2dsH4bsM77hYg%40mail.gmail.com#cab7npqr4vxdkijp+du82vocongmvutq-gfqiu2dsh4bsm77...@mail.gmail.com

Please let me know your thoughts on this.
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f13f9c1..9380079 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -25,6 +25,7 @@
 #include "access/commit_ts.h"
 #include "access/multixact.h"
 #include "access/rewriteheap.h"
+#include "access/rmgr.h"
 #include "access/subtrans.h"
 #include "access/timeline.h"
 #include "access/transam.h"
@@ -52,7 +53,9 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/barrier.h"
+#include "storage/bufmask.h"
 #include "storage/bufmgr.h"
+#include "storage/bufpage.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/large_object.h"
@@ -94,6 +97,7 @@ bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
 bool		wal_compression = false;
+int		wal_consistency_mask = 0;
 bool		log_checkpoints = false;
 int			sync_method = DEFAULT_SYNC_METHOD;
 int			wal_level = WAL_LEVEL_MINIMAL;
@@ -867,6 +871,9 @@ static void WALInsertLockAcquireExclusive(void);
 static void WALInsertLockRelease(void);
 static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
 
+void checkWALConsistency(XLogReaderState *xlogreader);
+void checkWALConsistencyForBlock(XLogReaderState *record, uint8 block_id);
+
 /*
  * Insert an XLOG record represented by an already-constructed chain of data
  * chunks.  This is a low-level routine; to construct the WAL record header
@@ -6868,6 +6875,12 @@ StartupXLOG(void)
 /* Now apply the WAL record itself */
 RmgrTable[record->xl_rmid].rm_redo(xlogreader);
 
+/*
+ * Check whether the page associated with WAL record is consistent
+ * with the existing page
+ */
+checkWALConsistency(xlogreader);
+
 /* Pop the error context stack */
 error_context_stack = errcallback.previous;
 
@@ -11626,3 +11639,160 @@ XLogRequestWalReceiverReply(void)
 {
 	doRequestWalReceiverReply = true;
 }
+
+/*
+ * Check whether the page associated with WAL record is consistent with the
+ * existing page or not.
+ */
+void checkWALConsistency(XLogReaderState *xlogreader)
+{
+	RmgrIds rmid = (RmgrIds) XLogRecGetRmid(xlogreader);
+	int block_id;
+	int enableWALConsistencyMask = 1;
+	RmgrIds rmids[] = {RM_HEAP2_ID,RM_HEAP_ID,RM_BTREE_ID,RM_HASH_ID,RM_GIN_ID,RM_GIST_ID,RM_SEQ_ID,RM_SPGIST_ID,RM_BRIN_ID};
+	int size = sizeof(rmids)/sizeof(rmid);
+	int i;
+	for(i=0;imax_block_id; block_id++)
+checkWALConsistencyForBlock(xlogreader,block_id);
+			break;
+		}
+		/*
+		 * Enable checking for the next bit
+		 */
+		enableWALConsistencyMask <<= 1;
+	}
+}
+void checkWALConsistencyForBlock(XLogReaderState *record, uint8 block_id)
+{
+	Buffer buf;
+	char *ptr;
+	DecodedBkpBlock *bkpb;
+	char		tmp[BLCKSZ];
+	RelFileNode rnode;
+	ForkNumber	forknum;
+	BlockNumber blkno;
+	Page		page;
+
+	if (!XLogRecGetBlockTag(record, block_id, , , ))
+	{
+		/* Caller specified a bogus block_id. Don't do anything. */
+		return;
+	}
+	buf = XLogReadBufferExtended(rnode, forknum, blkno,
+	   RBM_WAL_CHECK);
+	page = BufferGetPage(buf);
+
+	bkpb = >blocks[block_id];
+	if(bkpb->bkp_image!=NULL)
+		ptr = bkpb->bkp_image;
+	else
+	{
+		elog(WARNING,
+ "No page found in WAL for record %X/%X, rel %u/%u/%u, "
+ "forknum %u, blkno %u",
+ (uint32) (record->ReadRecPtr>> 32), (uint32) record->ReadRecPtr ,
+ rnode.spcNode, rnode.dbNode, rnode.relNode,
+ forknum, blkno);
+		return;
+	}
+
+	if (bkpb->bimg_info & BKPIMAGE_IS_COMPRESSED)

Re: [HACKERS] dsm_unpin_segment

2016-08-22 Thread Thomas Munro
On Tue, Aug 23, 2016 at 12:07 AM, Amit Kapila  wrote:
> + int control_slot = -1;
> ...
> + if (control_slot == -1)
> + elog(ERROR, "cannot unpin unknown segment handle");
>
> Isn't it better to use INVALID_CONTROL_SLOT for control_slot and use
> datatype as uint32 (same is used for dsm_segment->control_slot and
> nitems)?

Yes, it is better.  New version attached.

> Apart from this, I have verified your patch on Windows using attached
> dsm_demo module.  Basically, by using dsm_demo_create(), I have pinned
> the segment and noticed that Handle count of postmaster is incremented
> by 1 and then by using dsm_demo_unpin_segment() unpinned the segment
> which decrements the Handle count in Postmaster.

Thanks!

-- 
Thomas Munro
http://www.enterprisedb.com


dsm-unpin-segment-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LSN as a recovery target

2016-08-22 Thread Michael Paquier
On Mon, Aug 22, 2016 at 7:12 PM, Adrien Nayrat  wrote:
> As Julien said, there is nothing to notice that error comes from
> recovery.conf.
> My fear would be that an user encounters an error like this. Il will be
> difficult to link to the recovery.conf.

Thinking a bit wider than that, we may want to know such context for
normal GUC parameters as well, and that's not the case now. Perhaps
there is actually a reason why that's not done for GUCs, but it seems
that it would be useful there as well. That would give another reason
to move all that under the GUC umbrella.

> For the others settings (xid, timeline,name) there is an explicit
> message that notice error is in recovery.conf.
>
> I see it is not the case for recovery_target_time.

Yes, in this case the parameter value is parsed using an datatype _in
function call. And the error message depends on that..

> Should we modify only the documentation or should we try to find a
> solution to point the origin of error?

The patch as proposed is complicated enough I think, and it would be
good to keep things simple if we can. So having something in the docs
looks fine to me, and that's actually the reference to pg_lsn used to
parse the parameter value.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Binary I/O for isn extension

2016-08-22 Thread Fabien COELHO


Hello Shay,


Attached is a new version of the patch, adding an upgrade script and the
rest of it. Note that because, as Fabien noted, there's doesn't seem to be
a way to add send/receive functions with ALTER TYPE, I did that by updating
pg_type directly - hope that's OK.


This patch does not apply anymore, because there as been an update in 
between to mark relevant contrib functions as "parallel".


Could you update the patch?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm_unpin_segment

2016-08-22 Thread Amit Kapila
On Mon, Aug 22, 2016 at 5:24 AM, Thomas Munro
 wrote:
> On Sat, Aug 20, 2016 at 11:37 PM, Amit Kapila  wrote:
>> On Tue, Aug 9, 2016 at 10:07 AM, Thomas Munro
>>  wrote:
>>> On Tue, Aug 9, 2016 at 12:53 PM, Tom Lane  wrote:
 The larger picture here is that Robert is exhibiting a touching but
 unfounded faith that extensions using this feature will contain zero bugs.
 IMO there needs to be some positive defense against mistakes in using the
 pin/unpin API.  As things stand, multiple pin requests don't have any
 fatal consequences (especially not on non-Windows), so I have little
 confidence that it's not happening in the field.  I have even less
 confidence that there wouldn't be too many unpin requests.
>>>
>>> Ok, here is a version that defends against invalid sequences of
>>> pin/unpin calls.  I had to move dsm_impl_pin_segment into the block
>>> protected by DynamicSharedMemoryControlLock, so that it could come
>>> after the already-pinned check, but before updating any state, since
>>> it makes a Windows syscall that can fail.  That said, I've only tested
>>> on Unix and will need to ask someone to test on Windows.
>>>
>>
>> Few review comments:
>
> Thanks for the review!
>
>> 1.
>> + /* Note that 1 means no references (0 means unused slot). */
>> + if (--dsm_control->item[i].refcnt == 1)
>> + destroy = true;
>> +
>> + /*
>> + * Allow implementation-specific code to run.  We have to do this before
>> + * releasing the lock, because impl_private_pm_handle may get modified by
>> + * dsm_impl_unpin_segment.
>> + */
>> + if (control_slot >= 0)
>> + dsm_impl_unpin_segment(handle,
>> + _control->item[control_slot].impl_private_pm_handle);
>>
>> If there is an error in dsm_impl_unpin_segment(), then we don't need
>> to decrement the reference count.  Isn't it better to do it after the
>> dsm_impl_unpin_segment() is successful.  Similarly, I think pinned
>> should be set to false after dsm_impl_unpin_segment().
>
> Hmm.  Yeah, OK.  Things are in pretty bad shape if you fail to unpin
> despite having run the earlier checks, but you're right, it's better
> that way.  New version attached.
>

+ int control_slot = -1;
...
+ if (control_slot == -1)
+ elog(ERROR, "cannot unpin unknown segment handle");

Isn't it better to use INVALID_CONTROL_SLOT for control_slot and use
datatype as uint32 (same is used for dsm_segment->control_slot and
nitems)?


Apart from this, I have verified your patch on Windows using attached
dsm_demo module.  Basically, by using dsm_demo_create(), I have pinned
the segment and noticed that Handle count of postmaster is incremented
by 1 and then by using dsm_demo_unpin_segment() unpinned the segment
which decrements the Handle count in Postmaster.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


dsm_demo_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-22 Thread Michael Paquier
On Mon, Aug 22, 2016 at 8:45 PM, Tom Lane  wrote:
> Likely it would be better to refactor all of this so we get just one
> reference to libpq and one reference to libpgport, but that'd require
> a more thoroughgoing cleanup than you have here.  (Now that I think
> about it, adding -L/-l to LDFLAGS is pretty duff coding style to
> begin with --- we should be adding those things to LIBS, or at least
> putting them just before LIBS in the command lines.)

Agreed, this needs more thoughts. As that's messy, I won't high-jack
more this thread and begin a new one with a more fully-bloomed patch
once I get time to look at it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-22 Thread Tom Lane
Michael Paquier  writes:
> On Sun, Aug 21, 2016 at 3:02 AM, Tom Lane  wrote:
>> I looked into this and soon found that fe_utils/string_utils.o has
>> dependencies on libpq that are much wider than just pqexpbuffer :-(.

> pqexpbuffer.c is an independent piece of facility, so we could move it
> to src/common and leverage the dependency a bit, and have libpq link
> to the source file itself at build phase. The real problem is the call
> to PQmblen in psqlscan.l... And this, I am not sure how this could be
> refactored cleanly.

I see all of these as libpq dependencies in string_utils.o:

 U PQclientEncoding
 U PQescapeStringConn
 U PQmblen
 U PQserverVersion

Maybe we could split that file into two parts (libpq-dependent and not)
but it would be pretty arbitrary.

> And actually, I had a look at the build failure that you reported in
> 3855.1471713...@sss.pgh.pa.us. While that was because of a copy of
> libpq.so that I had in my own LD_LIBRARY_PATH, shouldn't all the other
> frontend utilities depending on fe_utils also use $(libpq_pgport)
> instead of -lpq?

All the rest of them already have that, because their link commands
look like, eg for psql,

LDFLAGS += -L$(top_builddir)/src/fe_utils -lpgfeutils -lpq

psql: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
$(CC) $(CFLAGS) $(OBJS) $(libpq_pgport) $(LDFLAGS) $(LDFLAGS_EX) 
$(LIBS) -o $@$(X)
^^

The extra reference to -lpq just makes sure that references to libpq from
libpgfeutils get resolved properly.  (And yeah, there are some platforms
that need that :-(.)  We don't need an extra copy of the -L flag.

This is all pretty messy, not least because of the way libpq_pgport
is set up; as Makefile.global notes,

# ...  This does cause duplicate -lpgport's to appear
# on client link lines.

Likely it would be better to refactor all of this so we get just one
reference to libpq and one reference to libpgport, but that'd require
a more thoroughgoing cleanup than you have here.  (Now that I think
about it, adding -L/-l to LDFLAGS is pretty duff coding style to
begin with --- we should be adding those things to LIBS, or at least
putting them just before LIBS in the command lines.)

You're right that I missed the desirability of invoking
submake-libpq and submake-libpgfeutils in initdb's build.
Will fix that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Race condition in GetOldestActiveTransactionId()

2016-08-22 Thread Heikki Linnakangas
While hacking on the CSN patch, I spotted a race condition between 
GetOldestActiveTransactionId() and GetNewTransactionId(). 
GetOldestActiveTransactionId() calculates the oldest XID that's still 
running, by doing:


1. Read nextXid, without a lock. This is the upper bound, if there are 
no active XIDs in the proc array.

2. Loop through the proc array, making note of the oldest XID.

While GetNewTransactionId() does:

1. Read and Increment nextXid
2. Set MyPgXact->xid.

It seems possible that if you call GetNewTransactionId() concurrently 
with GetOldestActiveTransactionId(), GetOldestActiveTransactionId() sees 
the new nextXid value that the concurrent GetNewTransactionId() set, but 
doesn't see the old XID in the proc array. It will return a value that 
doesn't cover the old XID, i.e. it won't consider the just-assigned XID 
as in-progress.


Am I missing something? Commit c6d76d7c added a comment to 
GetOldestActiveTransactionId() explaining why it's not necessary to 
acquire XidGenLock there, but I think it missed the above race condition.


GetOldestActiveTransactionId() is not performance-critical, it's only 
called when performing a checkpoint, so I think we should just bite the 
bullet and grab the lock. Per attached patch.


- Heikki
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e5d487d..de45c16 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2097,13 +2097,13 @@ GetOldestActiveTransactionId(void)
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
 	/*
-	 * It's okay to read nextXid without acquiring XidGenLock because (1) we
-	 * assume TransactionIds can be read atomically and (2) we don't care if
-	 * we get a slightly stale value.  It can't be very stale anyway, because
-	 * the LWLockAcquire above will have done any necessary memory
-	 * interlocking.
+	 * Acquire XidGenLock while we read nextXid, to make sure that all
+	 * XIDs < nextXid are already present in the proc array (or have
+	 * already completed), when we spin over it.
 	 */
+	LWLockAcquire(XidGenLock, LW_SHARED);
 	oldestRunningXid = ShmemVariableCache->nextXid;
+	LWLockRelease(XidGenLock);
 
 	/*
 	 * Spin over procArray collecting all xids and subxids.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LSN as a recovery target

2016-08-22 Thread Adrien Nayrat
On 08/20/2016 04:16 PM, Michael Paquier wrote:
> On Sat, Aug 20, 2016 at 10:44 AM, Petr Jelinek  wrote:
>> On 20/08/16 02:13, Michael Paquier wrote:
>>> On Fri, Aug 19, 2016 at 10:47 PM, Adrien Nayrat
>>>  wrote:
>>> Using a PG_TRY/CATCH block the way you do to show to user a different
>>> error message while the original one is actually correct does not
>>> sound like a good idea to me. It complicates the code and the original
>>> pattern matches what is already done for timestamps, where in case of
>>> error you'd get that:
>>> FATAL:  invalid input syntax for type timestamp with time zone: "aa"
>>>
>>> I think that a better solution would be to make clearer in the docs
>>> that pg_lsn is used here. First, in recovery.conf.sample, let's use
>>> pg_lsn and not LSN. Then, in the sgml docs, let's add a reference to
>>> the page of pg_lsn as well:
>>> https://www.postgresql.org/docs/devel/static/datatype-pg-lsn.html
>>>
>>> Now we have that:
>>> +   
>>> +This parameter specifies the LSN of the write-ahead log stream up
>>> to
>>> +which recovery will proceed. The precise stopping point is also
>>> +influenced by .
>>> +   
>>> Perhaps we could just add an additional sentence: "This parameter
>>> value is parsed using the system datatype 

Re: [HACKERS] [Patch] RBTree iteration interface improvement

2016-08-22 Thread Aleksander Alekseev
> > It seems clear to me that the existing arrangement is hazardous for
> > any RBTree that hasn't got exactly one consumer.  I think
> > Aleksander's plan to decouple the iteration state is probably a good
> > one (NB: I've not read the patch, so this is not an endorsement of
> > details).  I'd go so far as to say that we should remove the old API
> > as being dangerous, rather than preserve it on
> > backwards-compatibility grounds.  We make bigger changes than that in
> > internal APIs all the time.
>
> Glad to hear it! I would like to propose a patch that removes the old
> API. I have a few questions though.
>
> Would you like it to be a separate patch, or all-in-one patch is fine
> in this case? I also would like to include unit/property-based tests in
> this (these) patch (patches). However as I understand there are
> currently no unit tests in PostgreSQL at all, only integration/system
> tests. Any advice how to do it better?

OK, here is a patch. I think we could call it a _replacement_ of an old API, so
there is probably no need in two separate patches. I still don't know how to
include unit test and whether they should be included at all. Thus for now
unit/property-based tests could be found only on GitHub [1].

As usual, if you have any questions, suggestions or notes regarding this patch,
please let me know.

[1] https://github.com/afiskon/c-algorithms/tree/master/test

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c
index d6422ea..eac76c4 100644
--- a/src/backend/access/gin/ginbulk.c
+++ b/src/backend/access/gin/ginbulk.c
@@ -255,7 +255,7 @@ qsortCompareItemPointers(const void *a, const void *b)
 void
 ginBeginBAScan(BuildAccumulator *accum)
 {
-	rb_begin_iterate(accum->tree, LeftRightWalk);
+	rb_begin_left_right_walk(accum->tree, >tree_walk);
 }
 
 /*
@@ -271,7 +271,7 @@ ginGetBAEntry(BuildAccumulator *accum,
 	GinEntryAccumulator *entry;
 	ItemPointerData *list;
 
-	entry = (GinEntryAccumulator *) rb_iterate(accum->tree);
+	entry = (GinEntryAccumulator *) rb_left_right_walk(>tree_walk);
 
 	if (entry == NULL)
 		return NULL;			/* no more entries */
diff --git a/src/backend/lib/rbtree.c b/src/backend/lib/rbtree.c
index 4fa8a1d..4368492 100644
--- a/src/backend/lib/rbtree.c
+++ b/src/backend/lib/rbtree.c
@@ -28,18 +28,6 @@
 
 #include "lib/rbtree.h"
 
-
-/*
- * Values of RBNode.iteratorState
- *
- * Note that iteratorState has an undefined value except in nodes that are
- * currently being visited by an active iteration.
- */
-#define InitialState	(0)
-#define FirstStepDone	(1)
-#define SecondStepDone	(2)
-#define ThirdStepDone	(3)
-
 /*
  * Colors of nodes (values of RBNode.color)
  */
@@ -53,10 +41,6 @@ struct RBTree
 {
 	RBNode	   *root;			/* root node, or RBNIL if tree is empty */
 
-	/* Iteration state */
-	RBNode	   *cur;			/* current iteration node */
-	RBNode	   *(*iterate) (RBTree *rb);
-
 	/* Remaining fields are constant after rb_create */
 
 	Size		node_size;		/* actual size of tree nodes */
@@ -75,7 +59,7 @@ struct RBTree
  */
 #define RBNIL ()
 
-static RBNode sentinel = {InitialState, RBBLACK, RBNIL, RBNIL, NULL};
+static RBNode sentinel = {RBBLACK, RBNIL, RBNIL, NULL};
 
 
 /*
@@ -123,8 +107,6 @@ rb_create(Size node_size,
 	Assert(node_size > sizeof(RBNode));
 
 	tree->root = RBNIL;
-	tree->cur = RBNIL;
-	tree->iterate = NULL;
 	tree->node_size = node_size;
 	tree->comparator = comparator;
 	tree->combiner = combiner;
@@ -201,6 +183,28 @@ rb_leftmost(RBTree *rb)
 	return NULL;
 }
 
+/*
+ * rb_rightmost: fetch the rightmost (highest-valued) tree node.
+ * Returns NULL if tree is empty.
+ */
+RBNode *
+rb_rightmost(RBTree *rb)
+{
+	RBNode	   *node = rb->root;
+	RBNode	   *rightmost = rb->root;
+
+	while (node != RBNIL)
+	{
+		rightmost = node;
+		node = node->right;
+	}
+
+	if (rightmost != RBNIL)
+		return rightmost;
+
+	return NULL;
+}
+
 /**
  *			  Insertion  *
  **/
@@ -437,7 +441,6 @@ rb_insert(RBTree *rb, const RBNode *data, bool *isNew)
 
 	x = rb->allocfunc (rb->arg);
 
-	x->iteratorState = InitialState;
 	x->color = RBRED;
 
 	x->left = RBNIL;
@@ -654,220 +657,276 @@ rb_delete(RBTree *rb, RBNode *node)
  **/
 
 /*
- * The iterator routines were originally coded in tail-recursion style,
- * which is nice to look at, but is trouble if your compiler isn't smart
- * enough to optimize it.  Now we just use looping.
+ * Begin left right walk.
+ */
+void
+rb_begin_left_right_walk(RBTree *rb, RBTreeLeftRightWalk * lrw)
+{
+	lrw->rb = rb;
+	lrw->last_visited = NULL;
+	lrw->is_over = (lrw->rb->root == RBNIL);
+}
+
+/*
+ * Left right walk: get next node. Returns NULL if there is none.
  */
-#define descend(next_node) \
-	do { \
-		(next_node)->iteratorState = InitialState; \
-		node = rb->cur = 

Re: [HACKERS] Tracking wait event for latches

2016-08-22 Thread Alexander Korotkov
On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I took a look at your patch. Couple of notes from me.
>
> const char *
>> GetEventIdentifier(uint16 eventId)
>> {
>> const char *res;
>> switch (eventId)
>> {
>> case EVENT_ARCHIVER_MAIN:
>> res = "ArchiverMain";
>> break;
>> ... long long list of events ...
>> case EVENT_WAL_SENDER_WRITE_DATA:
>> res = "WalSenderWriteData";
>> break;
>> default:
>> res = "???";
>> }
>> return res;
>> }
>
>
> Would it be better to use an array here?
>
> typedef enum EventIdentifier
>> {
>
>
> EventIdentifier seems too general name for me, isn't it?  Could we name it
> WaitEventIdentifier? Or WaitEventId for shortcut?
>

I'm also not sure about handling of secure_read() and secure_write()
functions.
In the current patch we're tracking latch event wait inside them.  But we
setup latch only for blocking sockets and can do it multiple times in loop.
Would it be better to make separate wait events NETWORK_READ and
NETWORK_WRITE and setup them for the whole time spent in secure_read()
and secure_write()?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Tracking wait event for latches

2016-08-22 Thread Alexander Korotkov
Hi, Michael!

On Thu, Aug 4, 2016 at 8:26 AM, Michael Paquier 
wrote:

> On Tue, Aug 2, 2016 at 10:31 PM, Michael Paquier
>  wrote:
> > Attached is an updated patch.
>
> Updated version for 2 minor issues:
> 1) s/stram/stream/
> 2) Docs used incorrect number
>

I took a look at your patch. Couple of notes from me.

const char *
> GetEventIdentifier(uint16 eventId)
> {
> const char *res;
> switch (eventId)
> {
> case EVENT_ARCHIVER_MAIN:
> res = "ArchiverMain";
> break;
> ... long long list of events ...
> case EVENT_WAL_SENDER_WRITE_DATA:
> res = "WalSenderWriteData";
> break;
> default:
> res = "???";
> }
> return res;
> }


Would it be better to use an array here?

typedef enum EventIdentifier
> {


EventIdentifier seems too general name for me, isn't it?  Could we name it
WaitEventIdentifier? Or WaitEventId for shortcut?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


  1   2   >