Re: [HACKERS] WAL consistency check facility

2016-08-26 Thread Amit Kapila
On Fri, Aug 26, 2016 at 9:26 PM, Simon Riggs  wrote:
>
> I think you should add this as part of the default testing for both
> check and installcheck. I can't imagine why we'd have it and not use
> it during testing.
>

The actual consistency checks are done during redo (replay), so not
sure whats in you mind for enabling it with check or installcheck.  I
think we can run few recovery/replay tests with this framework.  Also
running the tests under this framework could be time-consuming as it
logs the entire page for each WAL record we write and then during
replay reads the same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-08-26 Thread Andreas Karlsson

On 08/26/2016 07:04 PM, Heikki Linnakangas wrote:

On 08/26/2016 07:44 PM, Tom Lane wrote:

Peter Eisentraut  writes:

On 8/26/16 5:31 AM, Heikki Linnakangas wrote:

I think now would be a good time to drop support for OpenSSL versions
older than 0.9.8. OpenSSL don't even support 0.9.8 anymore, although
there are probably distributions out there that still provide patches
for it. But OpenSSL 0.9.7 and older are really not interesting for
PostgreSQL 10 anymore, I think.



CentOS 5 currently ships 0.9.8e.  That's usually the oldest OS we want
to support eagerly.


Also, I get this on fully-up-to-date OS X (El Capitan):

$ openssl version
OpenSSL 0.9.8zh 14 Jan 2016


Ok, sold, let's remove support for OpenSSL < 0.9.8.


I have attached a patch which removes the < 0.9.8 compatibility code. 
Should we also add a version check to configure? We do not have any such 
check currently.


Andreas
diff --git a/contrib/pgcrypto/openssl.c b/contrib/pgcrypto/openssl.c
index 976af70..ffab5d2 100644
--- a/contrib/pgcrypto/openssl.c
+++ b/contrib/pgcrypto/openssl.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -47,155 +48,6 @@
 #define MAX_IV		(128/8)
 
 /*
- * Compatibility with OpenSSL 0.9.6
- *
- * It needs AES and newer DES and digest API.
- */
-#if OPENSSL_VERSION_NUMBER >= 0x00907000L
-
-/*
- * Nothing needed for OpenSSL 0.9.7+
- */
-
-#include 
-#else			/* old OPENSSL */
-
-/*
- * Emulate OpenSSL AES.
- */
-
-#include "rijndael.c"
-
-#define AES_ENCRYPT 1
-#define AES_DECRYPT 0
-#define AES_KEY		rijndael_ctx
-
-static int
-AES_set_encrypt_key(const uint8 *key, int kbits, AES_KEY *ctx)
-{
-	aes_set_key(ctx, key, kbits, 1);
-	return 0;
-}
-
-static int
-AES_set_decrypt_key(const uint8 *key, int kbits, AES_KEY *ctx)
-{
-	aes_set_key(ctx, key, kbits, 0);
-	return 0;
-}
-
-static void
-AES_ecb_encrypt(const uint8 *src, uint8 *dst, AES_KEY *ctx, int enc)
-{
-	memcpy(dst, src, 16);
-	if (enc)
-		aes_ecb_encrypt(ctx, dst, 16);
-	else
-		aes_ecb_decrypt(ctx, dst, 16);
-}
-
-static void
-AES_cbc_encrypt(const uint8 *src, uint8 *dst, int len, AES_KEY *ctx, uint8 *iv, int enc)
-{
-	memcpy(dst, src, len);
-	if (enc)
-	{
-		aes_cbc_encrypt(ctx, iv, dst, len);
-		memcpy(iv, dst + len - 16, 16);
-	}
-	else
-	{
-		aes_cbc_decrypt(ctx, iv, dst, len);
-		memcpy(iv, src + len - 16, 16);
-	}
-}
-
-/*
- * Emulate DES_* API
- */
-
-#define DES_key_schedule des_key_schedule
-#define DES_cblock des_cblock
-#define DES_set_key(k, ks) \
-		des_set_key((k), *(ks))
-#define DES_ecb_encrypt(i, o, k, e) \
-		des_ecb_encrypt((i), (o), *(k), (e))
-#define DES_ncbc_encrypt(i, o, l, k, iv, e) \
-		des_ncbc_encrypt((i), (o), (l), *(k), (iv), (e))
-#define DES_ecb3_encrypt(i, o, k1, k2, k3, e) \
-		des_ecb3_encrypt((des_cblock *)(i), (des_cblock *)(o), \
-*(k1), *(k2), *(k3), (e))
-#define DES_ede3_cbc_encrypt(i, o, l, k1, k2, k3, iv, e) \
-		des_ede3_cbc_encrypt((i), (o), \
-(l), *(k1), *(k2), *(k3), (iv), (e))
-
-/*
- * Emulate newer digest API.
- */
-
-static void
-EVP_MD_CTX_init(EVP_MD_CTX *ctx)
-{
-	memset(ctx, 0, sizeof(*ctx));
-}
-
-static int
-EVP_MD_CTX_cleanup(EVP_MD_CTX *ctx)
-{
-	px_memset(ctx, 0, sizeof(*ctx));
-	return 1;
-}
-
-static int
-EVP_DigestInit_ex(EVP_MD_CTX *ctx, const EVP_MD *md, void *engine)
-{
-	EVP_DigestInit(ctx, md);
-	return 1;
-}
-
-static int
-EVP_DigestFinal_ex(EVP_MD_CTX *ctx, unsigned char *res, unsigned int *len)
-{
-	EVP_DigestFinal(ctx, res, len);
-	return 1;
-}
-#endif   /* old OpenSSL */
-
-/*
- * Provide SHA2 for older OpenSSL < 0.9.8
- */
-#if OPENSSL_VERSION_NUMBER < 0x00908000L
-
-#include "sha2.c"
-#include "internal-sha2.c"
-
-typedef void (*init_f) (PX_MD *md);
-
-static int
-compat_find_digest(const char *name, PX_MD **res)
-{
-	init_f		init = NULL;
-
-	if (pg_strcasecmp(name, "sha224") == 0)
-		init = init_sha224;
-	else if (pg_strcasecmp(name, "sha256") == 0)
-		init = init_sha256;
-	else if (pg_strcasecmp(name, "sha384") == 0)
-		init = init_sha384;
-	else if (pg_strcasecmp(name, "sha512") == 0)
-		init = init_sha512;
-	else
-		return PXE_NO_HASH;
-
-	*res = px_alloc(sizeof(PX_MD));
-	init(*res);
-	return 0;
-}
-#else
-#define compat_find_digest(name, res)  (PXE_NO_HASH)
-#endif
-
-/*
  * Hashes
  */
 
@@ -275,7 +127,7 @@ px_find_digest(const char *name, PX_MD **res)
 
 	md = EVP_get_digestbyname(name);
 	if (md == NULL)
-		return compat_find_digest(name, res);
+		return PXE_NO_HASH;
 
 	digest = px_alloc(sizeof(*digest));
 	digest->algo = md;
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index a996875..2420387 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -2827,30 +2827,6 @@ MANPATH=/usr/lib/scohelp/%L/man:/usr/dt/man:/usr/man:/usr/share/man:scohelp:/usr

 

-Problems with OpenSSL
-
-
- When you build PostgreSQL with OpenSSL support you might get
- compilation errors in the following files:
- 
-  

Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-26 Thread Martín Marqués
2016-08-26 19:37 GMT-03:00 Tom Lane :
> =?UTF-8?B?TWFydMOtbiBNYXJxdcOpcw==?=  writes:
>> Looking at this issue today, I found that we are not setting a
>> dependency for an index created inside an extension.
>
> Surely the index has a dependency on a table, which depends on the
> extension?
>
> If you mean that you want an extension to create an index on a table that
> doesn't belong to it, but it's assuming pre-exists, I think that's just
> stupid and we need not support it.

Well, there's still the second pattern I mentioned before (which
actually came up while trying to fix this patch).

Extension creates a table and an index over one of the columns:

CREATE TABLE regress_pg_dump_schema.test_table (
col1 int,
col2 int
);

CREATE INDEX test_extension_index ON regress_pg_dump_schema.test_table (col2);


Later, some application (or a user, doesn't matter really) creates a
second index over col1:

CREATE INDEX test_index ON regress_pg_dump_schema.test_table (col1);

What we are doing (or at least it's what I understand from the code)
is checking if the table depends on an extension, and so we don't dump
it.

We should be able to use the same procedure (and reuse the code we
already have) to decide if an index should be dumped or not. But we
are missing the dependency, and so it's not possible to know that
regress_pg_dump_schema.test_extension_index depends on the extension
and regress_pg_dump_schema.test_index doesn't.

Or is this something we shouldn't support (in that case we should document it).

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-26 Thread Joshua D. Drake

On 08/26/2016 04:15 PM, Tomas Vondra wrote:

On 08/27/2016 12:37 AM, Tom Lane wrote:

=?UTF-8?B?TWFydMOtbiBNYXJxdcOpcw==?=  writes:

Looking at this issue today, I found that we are not setting a
dependency for an index created inside an extension.


Surely the index has a dependency on a table, which depends on the
extension?

If you mean that you want an extension to create an index on a table
that doesn't belong to it, but it's assuming pre-exists, I think
that's just stupid and we need not support it.



I don't see why that would be stupid (and I'm not sure it's up to us to
just decide it's stupid).


+1, there are a lot of workflow patterns that make sense and don't make 
sense depending on your perspective, consider safe mode with MySQL. I 
personally think it is stupid too but it also obviously has a huge 
useful user base.




Imagine you use extensions to break the application into modules. You
have a "basic" extension, with the table, and a "search" extension
implementing fulltext search on top of that table. Isn't it natural to
keep the gin indexes in the search extension?

So the basic--1.0.sql will do something like

  CREATE TABLE x ( ...)

and search--1.0.sql would do

  CREATE INDEX y ON x USING gin (z);
  CREATE FUNCTION search_objects(..) ...

because the index and function doing the search are somewhat tightly
coupled. I admit this is just an example and I don't know how many
people use extensions this way, but I don't see why this would be
inherently stupid pattern?


It isn't and in fact shows that as we extend Pg, the more we allow 
people flexibility in developing WITHIN the database, the more we will 
be able to influence them to do so. That is a good thing.


Or maybe we should just tell everyone, use an ORM it will solve all your 
problems. (/sarcasm)



Sincerely,

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-26 Thread Tom Lane
Tomas Vondra  writes:
> On 08/27/2016 12:37 AM, Tom Lane wrote:
>> If you mean that you want an extension to create an index on a table
>> that doesn't belong to it, but it's assuming pre-exists, I think
>> that's just stupid and we need not support it.

> I don't see why that would be stupid (and I'm not sure it's up to us to 
> just decide it's stupid).

Well, think about it.

1. Let's say user A owns the pre-existing table and user B owns the
extension.  Who owns the index?

2. Generally speaking, if an object is part of an extension then you
can't drop the object without dropping the whole extension.  This means
that either user A can't drop his own table, or he can but that causes
dropping the whole extension (which he does not own), not to mention
whatever else depends on it (which he owns even less).

3. Can user A do something that would mutate the index?  (For instance,
ALTER COLUMN TYPE on one of the columns it's on.)  Now is it still a
member of the extension?  If user A can't, can user B?  What if either of
them mutated the column to a datatype that belongs to some extension that
has a dependency on the original one?  Now you've got a circular
dependency loop, what fun --- especially at dump/reload time, where
pg_dump has no hope of breaking the circularity.

4. If we're going to support indexes as independent members of extensions,
why not foreign-key constraints, or check constraints?  Maybe check
constraints on domain types that don't belong to the extension?  Heck,
why not allow an extension to do ALTER TABLE ADD COLUMN?  (I think
BTW that several of these cases would allow creation of circular
extension dependencies even without any ALTER COLUMN TYPE shenanigans.)

None of this makes any sense, because these things are not stand-alone
objects in any sense of the word.  They are attributes of tables (or
domain types, for that case) and there are good reasons why we don't
consider such attributes to have ownership independent of the object
they are part of.  I do not think it is sensible for an extension to "own"
objects that don't also have, or could potentially have [1], independent
ownership in the sense of an owning user.

> Imagine you use extensions to break the application into modules.

I do not think that extensions as we currently understand them are a
suitable basis for slicing an application in the way you suggest.
I'm fine with an extension owning a whole table, but the rest of this
is just crazy.  And I sure as hell do not want to put it in as part
of a bug fix for existing releases.

regards, tom lane

[1] weasel wording for cases like casts, which we don't consider
to have owners, though I'm not sure why not.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-26 Thread Tomas Vondra

On 08/27/2016 12:37 AM, Tom Lane wrote:

=?UTF-8?B?TWFydMOtbiBNYXJxdcOpcw==?=  writes:

Looking at this issue today, I found that we are not setting a
dependency for an index created inside an extension.


Surely the index has a dependency on a table, which depends on the
extension?

If you mean that you want an extension to create an index on a table
that doesn't belong to it, but it's assuming pre-exists, I think
that's just stupid and we need not support it.



I don't see why that would be stupid (and I'm not sure it's up to us to 
just decide it's stupid).


Imagine you use extensions to break the application into modules. You 
have a "basic" extension, with the table, and a "search" extension 
implementing fulltext search on top of that table. Isn't it natural to 
keep the gin indexes in the search extension?


So the basic--1.0.sql will do something like

  CREATE TABLE x ( ...)

and search--1.0.sql would do

  CREATE INDEX y ON x USING gin (z);
  CREATE FUNCTION search_objects(..) ...

because the index and function doing the search are somewhat tightly 
coupled. I admit this is just an example and I don't know how many 
people use extensions this way, but I don't see why this would be 
inherently stupid pattern?


I see the current behavior is documented, and I do understand why global 
objects can't be part of the extension, but for indexes it seems to 
violate POLA a bit.


Is there a reason why we don't want the extension/index dependencies?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-26 Thread Tom Lane
=?UTF-8?B?TWFydMOtbiBNYXJxdcOpcw==?=  writes:
> Looking at this issue today, I found that we are not setting a
> dependency for an index created inside an extension.

Surely the index has a dependency on a table, which depends on the
extension?

If you mean that you want an extension to create an index on a table that
doesn't belong to it, but it's assuming pre-exists, I think that's just
stupid and we need not support it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-26 Thread Martín Marqués
Hi,

2016-08-26 10:53 GMT-03:00 Martín Marqués :
>
> There's still one issue, which I'll add a test for as well, which is
> that if the index was created by the extension, it will be dumped
> anyway. I'll have a look at that as well.

Looking at this issue today, I found that we are not setting a
dependency for an index created inside an extension. I don't know if
it's deliberate or an overlook.

The thing is that we can't check pg_depend for dependency of an index
and the extension that creates it.

I was talking with other developers, and we kind of agree this is a
bug, for 2 reasons we thought of:

*) If the extension only creates an index over an existing table, a
drop extension will not drop that index

*) We need to have the dependency for this patch as well, else we'll
end up with an inconsistent dump, or at least one that could restore
with a != 0 return error code.

I'll open a separate bug report for this.

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread Peter Eisentraut
On 8/26/16 12:26 PM, Euler Taveira wrote:
> initdb: we already have 'pg_ctl init' (since 9.0) and could remove initdb.

I have a concern specifically about pg_ctl.  Depending on how your
PostgreSQL is packaged, not all of the pg_ctl actions are safe or
sensible to run.  For example, if you are using systemd, then using
pg_ctl restart will probably not do the right thing.  And depending on
SELinux (maybe), running initdb directly might also not do the right
thing.  In most cases, you can just not use pg_ctl and do all these
things differently, but for example pg_ctl promote is the only
documented way to do that thing.

Until we have a better way to figure that out, I wouldn't want to put
more things into pg_ctl, especially as the only way.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Andres Freund
On 2016-08-26 17:31:14 -0400, Peter Eisentraut wrote:
> I agree with all that.  But the subject line is specifically about
> moving pg_xlog.  So if your opinion is that we shouldn't move pg_xlog,
> then that is noted.  But if we were to move it, we can think about a
> good place to move it to.

I think it's probably worth moving pg_xlog, because the benefit also
includes preventing a few users from shooting themselves somewhere
vital. That's imo much less the case for some of the other moves.  But I
still don't think think a largescale reorganization is a good idea,
it'll just stall and nothing will happen.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Peter Eisentraut
On 8/26/16 5:20 PM, Andres Freund wrote:
> I do think there's an order of magnitude between the impact between
> moving some and moving everything. And that's going to impact
> cost/benefit calculations.
> 
> Moving e.g. all ephemeral files into a (possibly configurable) directory
> is going to hardly impact anyone.  Renaming pg_logical into something
> different (FWIW, it was originally named differently...) will hopefully
> impact nobody, excepting some out of date file exclusion lists possibly.
> 
> But moving config files, and even pg_xlog (which we document to be
> symlinkable somewhere else) imo is different.

I agree with all that.  But the subject line is specifically about
moving pg_xlog.  So if your opinion is that we shouldn't move pg_xlog,
then that is noted.  But if we were to move it, we can think about a
good place to move it to.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Andres Freund
On 2016-08-26 17:11:00 -0400, Peter Eisentraut wrote:
> On 8/26/16 12:28 PM, Tom Lane wrote:
> > Also, I'd just as soon not move/rename things
> > that don't really need it.
>
> I'm just as happy with not changing anything.  But if we're going to
> rename stuff, let's at least think about something slightly more
> comprehensive.  Any rename is going to break a bunch of stuff.  But if
> we break it in a way that reduces the need for future discussion or
> changes, it would at least be a small win in the long run.

I do think there's an order of magnitude between the impact between
moving some and moving everything. And that's going to impact
cost/benefit calculations.

Moving e.g. all ephemeral files into a (possibly configurable) directory
is going to hardly impact anyone.  Renaming pg_logical into something
different (FWIW, it was originally named differently...) will hopefully
impact nobody, excepting some out of date file exclusion lists possibly.

But moving config files, and even pg_xlog (which we document to be
symlinkable somewhere else) imo is different.

The other thing is that the likelihood of getting anywhere by doing
radical one-off redesigns is approximately 0.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread Andres Freund
On 2016-08-26 13:26:39 -0300, Euler Taveira wrote:
> I'm bringing this $subject into discussion again. Historically, we are
> carrying binary names that have been confused newbies. createuser is the
> worst name so for. Also, names like createdb, initdb, reindexdb, and
> droplang does not suggest what product it is referring to. Adding a
> prefix (pg_, pg, ...) would 'make things clear'. If we have a consensus
> about this change, I suggest renaming the following binaries:
>
> clusterdb
> createdb
> createlang
> createuser
> dropdb
> droplang
> dropuser
> initdb
> oid2name
> reindexdb
> vacuumdb
> vacuumlo

Uhm, that'd need a careful backward compatibility plan, including a period of
supporting both names.


> Another major change related to this topic is assemble functionalities
> from binaries. We currently have 34 binaries

> (is that a lot for a single software?).

Does it matter? The few bytes of disk space are essentially irrelevant.

That said, on the code level, there'd be considerable benefit of coalescing:


> pg_command: clusterdb, createdb, dropdb, createuser, dropuser,
> createlang, droplang, reindexdb, vacuumdb, vacuumlo.

these commands. We could do the old trick of leaving the old names as in
place as symlinks.


> initdb: we already have 'pg_ctl init' (since 9.0) and could remove
> initdb.

I fairly strongly against removing initdb as a separate binary. Issuies
during cluster creation are already annoying to debug.



This is a *significant* amount of work, it'll make backpatching a
nightmare (although not that much happens in these binaries). I
personally see better uses of my time.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is a newly created index contains the invalid LSN?

2016-08-26 Thread Andres Freund
On 2016-08-26 18:46:42 +0300, Yury Zhuravlev wrote:
> Thanks all.
> Now understand LSN strongly connected with WAL.
> However how difficult put last system LSN instead 0?
> It's not so important but will allow make use LSN more consistent.

Maybe explain why you're interested in page lsns, that'd perhaps allow
us to give more meaningful feedback.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Peter Eisentraut
On 8/26/16 4:02 PM, Joshua D. Drake wrote:
> Although... wouldn't run be under var?

Traditionally yes, but things are changing in this area, if you consider
the top-level file system of a modern Linux distribution.  One reason is
that "run" is/can be blown away at reboot.  This wouldn't be an entirely
useless feature for postgres: Can you tell otherwise which of the
various pid/lock/opts files you can delete if you have killed the
processes and want to eliminate any left-over state?  Is postmaster.opts
a configuration file or a state file?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is a newly created index contains the invalid LSN?

2016-08-26 Thread Kevin Grittner
On Fri, Aug 26, 2016 at 10:46 AM, Yury Zhuravlev
 wrote:

> Now understand LSN strongly connected with WAL.
> However how difficult put last system LSN instead 0?
> It's not so important but will allow make use LSN more consistent.

There might be performance implications when flushing the index
buffers, due to the need to check each one against the WAL flush
point, where we now skip that check.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Peter Eisentraut
On 8/26/16 12:28 PM, Tom Lane wrote:
> Also, I'd just as soon not move/rename things
> that don't really need it.

I'm just as happy with not changing anything.  But if we're going to
rename stuff, let's at least think about something slightly more
comprehensive.  Any rename is going to break a bunch of stuff.  But if
we break it in a way that reduces the need for future discussion or
changes, it would at least be a small win in the long run.

> If, for example, we decide to move
> postgresql.conf to etc/postgresql.conf, that is going to break a metric
> ton of stuff that doesn't need to get broken AFAICS.

Sure, I'd be content with leaving those in the top-level instead of say
"etc".

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Andres Freund
On 2016-08-26 11:53:21 -0400, Peter Eisentraut wrote:
> On 8/25/16 10:39 PM, Michael Paquier wrote:
> > I am relaunching $subject as 10 development will begin soon. As far as
> > I know, there is agreement that we can do something here. Among the
> > different proposals I have found:
> > - pg_clog renamed to pg_commit_status, pg_xact or pg_commit
> > - pg_xlog renamed to pg_xjournal, pg_wal or pg_journal
> 
> If we're going to do some renaming, then I suggest we do a
> mini-file-system structure under $PGDATA, like
> 
> $PGDATA/etc
> $PGDATA/log
> $PGDATA/run (lock files etc.)
> $PGDATA/tmp
> $PGDATA/var
> 
> The names of all the things under "var" could still be refined, but it's
> much less likely that users will confuse data with configuration or
> plain logs under that scheme.

Splitting of ephemeral data seems to have a benefit, the rest seems more
like rather noisy busywork to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Peter Eisentraut
On 8/26/16 11:58 AM, Magnus Hagander wrote:
> $PGDATA/etc
>> $PGDATA/log
>> $PGDATA/run (lock files etc.)
>> $PGDATA/tmp
>> $PGDATA/var
>>
>> The names of all the things under "var" could still be refined, but it's
>> much less likely that users will confuse data with configuration or
>> plain logs under that scheme
> 
> Interesting idea. I worry a bit that this might encourage distributions
> to split it up into different places though, and I'm not sure we want to
> encourage that..

They already do this.  This would just formalize where we think
appropriate boundaries between the pieces are.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread David Fetter
On Fri, Aug 26, 2016 at 04:33:47PM -0300, Euler Taveira wrote:
> On 26-08-2016 14:03, David Fetter wrote:
> > Would these make sense as pg_ctl options, or are you separating them
> > out because they're not instance-wide?  If separating them is
> > important on those grounds, how about something like pg_db or
> > pg_db_command?
> > 
> It doesn't make sense because pg_ctl is server-side and pg_command would
> be client-side.

Perfect!

> >> pg_oid2name: I don't have a strong opinion that it fits in pg_command;
> > 
> > I vaguely knew that this existed, but I can't recall having heard of
> > anybody actually using it.  I suppose it's under pg_ctl if the split
> > above between instance-wide and db-specific holds.
> > 
> I don't use it for a long time. It also a client-side binary then better
> place for it is pg_command. BTW, is anybody using it? If so, we could
> add this functionality to psql and remove it.

Sure.  For server versions 10 or better, it could be a call to a new
server-side function.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread Tomas Vondra

On 08/26/2016 07:03 PM, David Fetter wrote:

On Fri, Aug 26, 2016 at 01:26:39PM -0300, Euler Taveira wrote:

Hi,


...
>>

initdb: we already have 'pg_ctl init' (since 9.0) and could remove initdb.

Opinions?


+1 for removing initdb.



We can't just remove it because pg_ctl actually calls initdb, so we 
would have to move the code from src/bin/initdb to src/bin/pg_ctl. Which 
is doable, but I assume we'd also change how we pass parameters to it - 
right now we have to do this:


pg_ctl init -o "... initdb params ..."

But I assume we'd change this to regular parameters, if 'init' gets 
merged into pg_ctl - otherwise we'd force users to construct the param 
string only to immediately parse it (currently the "system()" call does 
that for us, but we'd have to implement that in pg_ctl).


That however means that not only we break the scripts for all those 
calling initdb directly (and there's quite a few of them, I'm sure, 
exactly because it's simpler to pass the parameters directly), but also 
for all those already using pg_ctl.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread Alvaro Herrera
Simon Riggs wrote:
> On 26 August 2016 at 18:26, Euler Taveira  wrote:
> 
> > I'm bringing this $subject into discussion again. Historically, we are
> > carrying binary names that have been confused newbies. createuser is the
> > worst name so for. Also, names like createdb, initdb, reindexdb, and
> > droplang does not suggest what product it is referring to. Adding a
> > prefix (pg_, pg, ...) would 'make things clear'. If we have a consensus
> > about this change, I suggest renaming the following binaries:
 
> Why not just remove them all and change the docs to suggest using
> 
> psql -c "CREATE DATABASE foo"

I agree that some of these commands are redundant, and that we seem to
have them just because we can, not because they provide useful
functionality:
createdb createlang createuser dropdb droplang dropuser

I would rather have a single command that can do all those actions.
We could have things like
  pg_ctl createuser

I agree with Simon that many "serious users" would not be using
the shell to implement this functionality, but something higher-level (a
GUI, some web app) that's going to go through SQL instead.  Still, for
ad-hoc admin tasks it seems convenient to have the shell interface.

Nowadays git and many other programs have a model where a single command
(git) can call a number of different binaries which are not in PATH
(/usr/lib/git-core in my installation).  We could do something similar,
where little else apart from pg_ctl, pg_dump, pgbench are in path, and
most binaries are in some hidden binary directory known to pg_ctl.

pg_ctl vacuum -j4 


(Personally, I very often have postmaster running on its own terminal
with output to stderr.  It'd be a little sad to lose that functionality,
but what value does it offer to regular users?  I can script my way
around its lack, if needed.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Simon Riggs
On 26 August 2016 at 18:28, Tom Lane  wrote:

> Also, I'd just as soon not move/rename things that don't really need it.

+1

Let's leave everything exactly as it is now... but put a small README
in each directory to explain why files in it shouldn't be deleted to
make space.

That helps the few people who made such mistakes, but doesn't cause
massive change as a result.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread Larry Rosenman

On 2016-08-26 15:03, Andres Freund wrote:

On 2016-08-26 22:01:58 +0200, Simon Riggs wrote:
On 26 August 2016 at 18:26, Euler Taveira  
wrote:


> I'm bringing this $subject into discussion again. Historically, we are
> carrying binary names that have been confused newbies. createuser is the
> worst name so for. Also, names like createdb, initdb, reindexdb, and
> droplang does not suggest what product it is referring to. Adding a
> prefix (pg_, pg, ...) would 'make things clear'. If we have a consensus
> about this change, I suggest renaming the following binaries:
>
> clusterdb
> createdb
> createlang
> createuser
> dropdb
> droplang
> dropuser
> initdb
> oid2name
> reindexdb
> vacuumdb
> vacuumlo

Why not just remove them all and change the docs to suggest using

psql -c "CREATE DATABASE foo"
etc

Less code, no confusion because we have just one client tool - psql.


Several of them have the ability to connect to several databases, some
even do that in parallel.


vacuumdb being one that I've needed recently to do a number of DB's in a 
row.



--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread Andres Freund
On 2016-08-26 22:01:58 +0200, Simon Riggs wrote:
> On 26 August 2016 at 18:26, Euler Taveira  wrote:
> 
> > I'm bringing this $subject into discussion again. Historically, we are
> > carrying binary names that have been confused newbies. createuser is the
> > worst name so for. Also, names like createdb, initdb, reindexdb, and
> > droplang does not suggest what product it is referring to. Adding a
> > prefix (pg_, pg, ...) would 'make things clear'. If we have a consensus
> > about this change, I suggest renaming the following binaries:
> >
> > clusterdb
> > createdb
> > createlang
> > createuser
> > dropdb
> > droplang
> > dropuser
> > initdb
> > oid2name
> > reindexdb
> > vacuumdb
> > vacuumlo
> 
> Why not just remove them all and change the docs to suggest using
> 
> psql -c "CREATE DATABASE foo"
> etc
> 
> Less code, no confusion because we have just one client tool - psql.

Several of them have the ability to connect to several databases, some
even do that in parallel.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Joshua D. Drake

On 08/26/2016 09:28 AM, Tom Lane wrote:

Magnus Hagander  writes:

On Aug 26, 2016 5:54 PM, "Peter Eisentraut" <
peter.eisentr...@2ndquadrant.com> wrote:

If we're going to do some renaming, then I suggest we do a
mini-file-system structure under $PGDATA, like

$PGDATA/etc
$PGDATA/log
$PGDATA/run (lock files etc.)
$PGDATA/tmp
$PGDATA/var

The names of all the things under "var" could still be refined, but it's
much less likely that users will confuse data with configuration or
plain logs under that scheme



Interesting idea. I worry a bit that this might encourage distributions to
split it up into different places though, and I'm not sure we want to
encourage that..


Yeah, I'm afraid that these names are not as well standardized as Peter
probably wishes they were.  Also, I'd just as soon not move/rename things
that don't really need it.  If, for example, we decide to move
postgresql.conf to etc/postgresql.conf, that is going to break a metric
ton of stuff that doesn't need to get broken AFAICS.


I am not so sure that is accurate. Yes, Windows is an outlying but any 
Unix person is going to easily understand etc log run. Further as Linux 
is by far our most run platform (outside of possibly Windows) it will 
feel right at home for the largest growing user base. Personally I 
really like the idea of:


$PGDATA/etc/postgresql.conf
$PGDATA/log/Thu.log
$PGDATA/run/postmaster.pid
$PGDATA/var/base/16758

Although... wouldn't run be under var?

JD




regards, tom lane





--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread Simon Riggs
On 26 August 2016 at 18:26, Euler Taveira  wrote:

> I'm bringing this $subject into discussion again. Historically, we are
> carrying binary names that have been confused newbies. createuser is the
> worst name so for. Also, names like createdb, initdb, reindexdb, and
> droplang does not suggest what product it is referring to. Adding a
> prefix (pg_, pg, ...) would 'make things clear'. If we have a consensus
> about this change, I suggest renaming the following binaries:
>
> clusterdb
> createdb
> createlang
> createuser
> dropdb
> droplang
> dropuser
> initdb
> oid2name
> reindexdb
> vacuumdb
> vacuumlo

Why not just remove them all and change the docs to suggest using

psql -c "CREATE DATABASE foo"
etc

Less code, no confusion because we have just one client tool - psql.

Almost all serious users will be issuing more than one DDL command at
a time, so they'll be writing a script anyway.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread Euler Taveira
On 26-08-2016 14:03, David Fetter wrote:
> Would these make sense as pg_ctl options, or are you separating them
> out because they're not instance-wide?  If separating them is
> important on those grounds, how about something like pg_db or
> pg_db_command?
> 
It doesn't make sense because pg_ctl is server-side and pg_command would
be client-side.

>> pg_oid2name: I don't have a strong opinion that it fits in pg_command;
> 
> I vaguely knew that this existed, but I can't recall having heard of
> anybody actually using it.  I suppose it's under pg_ctl if the split
> above between instance-wide and db-specific holds.
> 
I don't use it for a long time. It also a client-side binary then better
place for it is pg_command. BTW, is anybody using it? If so, we could
add this functionality to psql and remove it.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] old_snapshot_threshold documentation

2016-08-26 Thread Peter Eisentraut
I doubt the documentation for old_snapshot_threshold is going to be
understood by many ordinary users.  What is a "snapshot", first of all?
Why would a snapshot be old?  Why is that a problem?  What can I do to
avoid it?  What are the circumstances in practice where this issue would
arise, and what are the trade-offs?  Where can I see existing snapshots
and how old they are?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-26 Thread Tom Lane
Robert Haas  writes:
> On Fri, Aug 26, 2016 at 11:26 AM, Tom Lane  wrote:
>> I'm not quite sure what to do about this.  It feels a tad wrong to use
>> ErrorContext as the active context during HandleParallelMessages, but
>> what else should we use?  Maybe this needs its very own private context
>> that we can reset after each use?

> If we use ErrorContext, will anything go wrong?

I think it might, if we were unlucky enough to be doing this while the
leader was engaged in reporting some other error for its own reasons.
To avoid accumulated memory leakage over multiple worker error reports,
we ought to do a MemoryContextReset at the start or end (probably start)
of HandleParallelMessages, and that would be problematic if ErrorContext
had some data in it already.  It's possible that the scenario can't occur
because CHECK_FOR_INTERRUPTS is never done inside error processing, but
I would not feel very comfortable with that assumption.

I'm thinking a dedicated context is the way to go.  It won't take very
much code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-26 Thread Robert Haas
On Fri, Aug 26, 2016 at 11:26 AM, Tom Lane  wrote:
> Or in short, this has confused edata and newedata.  Valid coding would
> be
> oldcontext = MemoryContextSwitchTo(newedata->assoc_context);
> rather than what is there.

Oh, right.

>>> (Note that in the sole
>>> existing use-case, edata->assoc_context is going to have been set to
>>> CurrentMemoryContext by pq_parse_errornotice, and I see no good reason to
>>> assume that's very long-lived ... in fact, it looks like it's whatever
>>> happens to be active when ProcessInterrupts is called, which means there's
>>> probably a totally separate set of problems here having to do with
>>> potential leaks into long-lived contexts.)
>
>> Oops.  Yes.
>
> I'm not quite sure what to do about this.  It feels a tad wrong to use
> ErrorContext as the active context during HandleParallelMessages, but
> what else should we use?  Maybe this needs its very own private context
> that we can reset after each use?

If we use ErrorContext, will anything go wrong?

>>> I have little use for the name of that function either, as it's not
>>> necessarily going to "throw" anything.  Maybe ReportErrorUsingData,
>>> or something like that?
>
>> I deliberately avoided that sort of terminology because it need not be
>> an ERROR.  It can be, say, a NOTICE.  It is definitely something that
>> is coming from an ErrorData but it need not be an ERROR.
>
> Right, but to me, "throw" generally means a transfer of control, which
> is exactly what this isn't going to do if the message is only a notice.

Fair point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2016-08-26 Thread Brandur
And here's a commitfest link:

https://commitfest.postgresql.org/10/743/

Also, a correction to my original message: the unreferenced [1] footnote
points back to the thread that included the patch for UUID SortSupport.

[1] 
https://www.postgresql.org/message-id/CAM3SWZR4avsTwwNVUzRNbHk8v36W-QBqpoKg=ogkwwy0dkt...@mail.gmail.com

On Fri, Aug 26, 2016 at 10:44:22AM -0700, Brandur wrote:
> Hello,
> 
> I've attached a patch to add SortSupport for Postgres' macaddr which has the
> effect of improving the performance of sorting operations for the type. The
> strategy that I employ is very similar to that for UUID, which is to create
> abbreviated keys by packing as many bytes from the MAC address as possible 
> into
> Datums, and then performing fast unsigned integer comparisons while sorting.
> 
> I ran some informal local benchmarks, and for cardinality greater than 100k
> rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those
> interested, I put a few more numbers into a small report here [2].)
> 
> Admittedly, this is not quite as useful as speeding up sorting on a more 
> common
> data type like TEXT or UUID, but the change still seems like a useful
> performance improvement. I largely wrote it as an exercise to familiarize
> myself with the Postgres codebase.
> 
> I'll add an entry into the current commitfest as suggested by the Postgres 
> Wiki
> and follow up here with a link.
> 
> Thanks, and if anyone has feedback or other thoughts, let me know!
> 
> Brandur
> 
> [1] 
> https://www.postgresql.org/message-id/CAM3SWZR4avsTwwNVUzRNbHk8v36W-QBqpoKg=ogkwwy0dkt...@mail.gmail.com
> [2] 
> https://docs.google.com/spreadsheets/d/1cUqbg9RTgSo16WrrfJJwDP1eDaWL3TNOxnIOFNpfwgA/edit?usp=sharing



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] SortSupport for macaddr type

2016-08-26 Thread Brandur
Hello,

I've attached a patch to add SortSupport for Postgres' macaddr which has the
effect of improving the performance of sorting operations for the type. The
strategy that I employ is very similar to that for UUID, which is to create
abbreviated keys by packing as many bytes from the MAC address as possible into
Datums, and then performing fast unsigned integer comparisons while sorting.

I ran some informal local benchmarks, and for cardinality greater than 100k
rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those
interested, I put a few more numbers into a small report here [2].)

Admittedly, this is not quite as useful as speeding up sorting on a more common
data type like TEXT or UUID, but the change still seems like a useful
performance improvement. I largely wrote it as an exercise to familiarize
myself with the Postgres codebase.

I'll add an entry into the current commitfest as suggested by the Postgres Wiki
and follow up here with a link.

Thanks, and if anyone has feedback or other thoughts, let me know!

Brandur

[1] 
https://www.postgresql.org/message-id/CAM3SWZR4avsTwwNVUzRNbHk8v36W-QBqpoKg=ogkwwy0dkt...@mail.gmail.com
[2] 
https://docs.google.com/spreadsheets/d/1cUqbg9RTgSo16WrrfJJwDP1eDaWL3TNOxnIOFNpfwgA/edit?usp=sharing
>From 73f7dbf1921eea7106c93654dd7bf1fdd0888a07 Mon Sep 17 00:00:00 2001
From: Brandur 
Date: Tue, 16 Aug 2016 17:07:07 -0700
Subject: [PATCH] Implement SortSupport for macaddr data type

Introduces a scheme to produce abbreviated keys for the macaddr type
which involves packing as many of a MAC address' six bytes as possible
into a Datum (and adding two zeroed bytes of padding on a 64-bit
system). Sorting routines can then take advantage of the abbreviated
keys by performing fast unsigned integer comparisons, resulting in a
significant performance increase when operating on > 100k rows or more.

This implementation is modeled closely on the addition of SortSupport
for the UUID data type, which employs a very similar strategy.
---
 src/backend/utils/adt/mac.c | 210 
 src/include/catalog/pg_amproc.h |   1 +
 src/include/catalog/pg_proc.h   |   2 +
 src/include/utils/builtins.h|   1 +
 4 files changed, 214 insertions(+)

diff --git a/src/backend/utils/adt/mac.c b/src/backend/utils/adt/mac.c
index 509315a..471c3f0 100644
--- a/src/backend/utils/adt/mac.c
+++ b/src/backend/utils/adt/mac.c
@@ -7,9 +7,13 @@
 #include "postgres.h"
 
 #include "access/hash.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/inet.h"
+#include "utils/sortsupport.h"
 
 
 /*
@@ -22,6 +26,21 @@
 #define lobits(addr) \
   ((unsigned long)(((addr)->d<<16)|((addr)->e<<8)|((addr)->f)))
 
+/* sortsupport for macaddr */
+typedef struct
+{
+   int64   input_count;/* number of non-null values seen */
+   boolestimating; /* true if estimating 
cardinality */
+
+   hyperLogLogState abbr_card; /* cardinality estimator */
+} macaddr_sortsupport_state;
+
+static int macaddr_cmp_internal(macaddr *a1, macaddr *a2);
+static int macaddr_fast_cmp(Datum x, Datum y, SortSupport ssup);
+static int macaddr_cmp_abbrev(Datum x, Datum y, SortSupport ssup);
+static bool macaddr_abbrev_abort(int memtupcount, SortSupport ssup);
+static Datum macaddr_abbrev_convert(Datum original, SortSupport ssup);
+
 /*
  * MAC address reader.  Accepts several common notations.
  */
@@ -318,3 +337,194 @@ macaddr_trunc(PG_FUNCTION_ARGS)
 
PG_RETURN_MACADDR_P(result);
 }
+
+/*
+ * SortSupport strategy function. Populates a SortSupport struct with the
+ * information necessary to use comparison by abbreviated keys.
+ */
+Datum
+macaddr_sortsupport(PG_FUNCTION_ARGS)
+{
+   SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+   ssup->comparator = macaddr_fast_cmp;
+   ssup->ssup_extra = NULL;
+
+   if (ssup->abbreviate)
+   {
+   macaddr_sortsupport_state *uss;
+   MemoryContext oldcontext;
+
+   oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
+
+   uss = palloc(sizeof(macaddr_sortsupport_state));
+   uss->input_count = 0;
+   uss->estimating = true;
+   initHyperLogLog(>abbr_card, 10);
+
+   ssup->ssup_extra = uss;
+
+   ssup->comparator = macaddr_cmp_abbrev;
+   ssup->abbrev_converter = macaddr_abbrev_convert;
+   ssup->abbrev_abort = macaddr_abbrev_abort;
+   ssup->abbrev_full_comparator = macaddr_fast_cmp;
+
+   MemoryContextSwitchTo(oldcontext);
+   }
+
+   PG_RETURN_VOID();
+}
+
+/*
+ * SortSupport "traditional" comparison function. Pulls two MAC addresses from
+ * the heap and runs a standard comparison on them.
+ */
+static int
+macaddr_fast_cmp(Datum x, Datum y, SortSupport ssup)
+{
+   

Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-26 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Any objections?  Anyone want to bikeshed the field name?  I considered
>> PG_DIAG_SEVERITY_NONLOCALIZED and PG_DIAG_SEVERITY_ENGLISH before settling
>> on PG_DIAG_SEVERITY_ASCII, but I can't say I'm in love with that.

> I didn't review the patch, but +1 on the idea.  As for the name, I think
> ASCII is the wrong thing (as many labels in other languages can be in
> ascii too).  I vote for NONLOCALIZED.

> I see character "s" is already taken in the protocol; that would be my
> first preference rather than A.  How about Z?

One of the reasons I didn't pick NONLOCALIZED was that I couldn't come
up with an appropriate letter for the protocol.  I guess Z is okay,
but it's a bit of a stretch (especially if you're of the persuasion
that wants to spell it "nonlocalised").  Maybe V for "seVerity"?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-08-26 Thread Tom Lane
Heikki Linnakangas  writes:
> Yeah, they want people to move to their own SSL library [1].

> [1] I couldn't find any official statement, but lots of blog posts 
> saying the same thing.

As I recall, the deprecation warning messages said that in so many words.
That probably counts as an official statement ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-26 Thread Alvaro Herrera
Tom Lane wrote:

> So far as I can find, the attached is all we need to do to introduce a
> new message field.  (This patch doesn't address the memory-context
> questions, but it does fix the localization-driven failure demonstrated
> upthread.)
> 
> Any objections?  Anyone want to bikeshed the field name?  I considered
> PG_DIAG_SEVERITY_NONLOCALIZED and PG_DIAG_SEVERITY_ENGLISH before settling
> on PG_DIAG_SEVERITY_ASCII, but I can't say I'm in love with that.

I didn't review the patch, but +1 on the idea.  As for the name, I think
ASCII is the wrong thing (as many labels in other languages can be in
ascii too).  I vote for NONLOCALIZED.

I see character "s" is already taken in the protocol; that would be my
first preference rather than A.  How about Z?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-08-26 Thread Heikki Linnakangas

On 08/26/2016 07:44 PM, Tom Lane wrote:

Peter Eisentraut  writes:

On 8/26/16 5:31 AM, Heikki Linnakangas wrote:

I think now would be a good time to drop support for OpenSSL versions
older than 0.9.8. OpenSSL don't even support 0.9.8 anymore, although
there are probably distributions out there that still provide patches
for it. But OpenSSL 0.9.7 and older are really not interesting for
PostgreSQL 10 anymore, I think.



CentOS 5 currently ships 0.9.8e.  That's usually the oldest OS we want
to support eagerly.


Also, I get this on fully-up-to-date OS X (El Capitan):

$ openssl version
OpenSSL 0.9.8zh 14 Jan 2016


Ok, sold, let's remove support for OpenSSL < 0.9.8.


Worth noting though is that without -Wno-deprecated-declarations, you
find that Apple has sprinkled the entire OpenSSL API with deprecation
warnings.  That suggests that their plan for the future is to drop it
rather than update it.  Should we be thinking ahead to that?


Yeah, they want people to move to their own SSL library [1]. I doubt 
they will actually remove it any time soon, but who knows. It would be a 
good project for someone with an OS X system and some spare time, to 
write a patch to build with OS X's native SSL library instead of 
OpenSSL. The code is structured nicely to enable that now.


[1] I couldn't find any official statement, but lots of blog posts 
saying the same thing.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-08-26 Thread David Fetter
On Fri, Aug 26, 2016 at 01:26:39PM -0300, Euler Taveira wrote:
> Hi,
> 
> I'm bringing this $subject into discussion again. Historically, we are
> carrying binary names that have been confused newbies. createuser is the
> worst name so for. Also, names like createdb, initdb, reindexdb, and
> droplang does not suggest what product it is referring to. Adding a
> prefix (pg_, pg, ...) would 'make things clear'.

+1 for pg_ .  We should have done this long ago, but this is better
fixed than left broken.

> If we have a consensus
> about this change, I suggest renaming the following binaries:
> 
> clusterdb
> createdb
> createlang
> createuser
> dropdb
> droplang
> dropuser
> initdb
> oid2name
> reindexdb
> vacuumdb
> vacuumlo
> 
> Another major change related to this topic is assemble functionalities
> from binaries. We currently have 34 binaries (is that a lot for a single
> software?). Also, some of them have the same principle: execute a
> administrative or maintenance command. IMHO, from the list above, we
> could reduce it to:
> 
> pg_command: clusterdb, createdb, dropdb, createuser, dropuser,
> createlang, droplang, reindexdb, vacuumdb, vacuumlo. It also has the
> advantage to allow adding new administrative/maintenance commands to it
> in the future;

Would these make sense as pg_ctl options, or are you separating them
out because they're not instance-wide?  If separating them is
important on those grounds, how about something like pg_db or
pg_db_command?

> pg_oid2name: I don't have a strong opinion that it fits in pg_command;

I vaguely knew that this existed, but I can't recall having heard of
anybody actually using it.  I suppose it's under pg_ctl if the split
above between instance-wide and db-specific holds.

> initdb: we already have 'pg_ctl init' (since 9.0) and could remove initdb.
> 
> Opinions?

+1 for removing initdb.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-08-26 Thread Tom Lane
Peter Eisentraut  writes:
> On 8/26/16 5:31 AM, Heikki Linnakangas wrote:
>> I think now would be a good time to drop support for OpenSSL versions 
>> older than 0.9.8. OpenSSL don't even support 0.9.8 anymore, although 
>> there are probably distributions out there that still provide patches 
>> for it. But OpenSSL 0.9.7 and older are really not interesting for 
>> PostgreSQL 10 anymore, I think.

> CentOS 5 currently ships 0.9.8e.  That's usually the oldest OS we want
> to support eagerly.

Also, I get this on fully-up-to-date OS X (El Capitan):

$ openssl version
OpenSSL 0.9.8zh 14 Jan 2016

Worth noting though is that without -Wno-deprecated-declarations, you
find that Apple has sprinkled the entire OpenSSL API with deprecation
warnings.  That suggests that their plan for the future is to drop it
rather than update it.  Should we be thinking ahead to that?

regards, tom lane

PS: I still have 0.9.7 on some of my buildfarm critters.  But I could
either update them, or stop using --with-openssl there.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-08-26 Thread Peter Eisentraut
On 8/26/16 5:31 AM, Heikki Linnakangas wrote:
> I think now would be a good time to drop support for OpenSSL versions 
> older than 0.9.8. OpenSSL don't even support 0.9.8 anymore, although 
> there are probably distributions out there that still provide patches 
> for it. But OpenSSL 0.9.7 and older are really not interesting for 
> PostgreSQL 10 anymore, I think.

CentOS 5 currently ships 0.9.8e.  That's usually the oldest OS we want
to support eagerly.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-08-26 Thread Masahiko Sawada
On Fri, Aug 26, 2016 at 3:13 PM, Ashutosh Bapat
 wrote:
>
>
> On Fri, Aug 26, 2016 at 11:37 AM, Masahiko Sawada 
> wrote:
>>
>> On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
>>  wrote:
>> >
>> >
>> > On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada
>> > 
>> > wrote:
>> >>
>> >> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale 
>> >> wrote:
>> >> > Hi All,
>> >> >
>> >> > Ashutosh proposed the feature 2PC for FDW for achieving atomic
>> >> > commits
>> >> > across multiple foreign servers.
>> >> > If a transaction make changes to more than two foreign servers the
>> >> > current
>> >> > implementation in postgres_fdw doesn't make sure that either all of
>> >> > them
>> >> > commit or all of them rollback their changes.
>> >> >
>> >> > We (Masahiko Sawada and me) reopen this thread and trying to
>> >> > contribute
>> >> > in
>> >> > it.
>> >> >
>> >> > 2PC for FDW
>> >> > 
>> >> > The patch provides support for atomic commit for transactions
>> >> > involving
>> >> > foreign servers. when the transaction makes changes to foreign
>> >> > servers,
>> >> > either all the changes to all the foreign servers commit or rollback.
>> >> >
>> >> > The new patch 2PC for FDW include the following things:
>> >> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
>> >> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can
>> >> > involve
>> >> > in
>> >> > the transaction.
>> >> >
>> >> > Currently we can push some conditions down to shard nodes, especially
>> >> > in
>> >> > 9.6
>> >> > the directly modify feature has
>> >> > been introduced. But such a transaction modifying data on shard node
>> >> > is
>> >> > not
>> >> > executed surely.
>> >> > Using 0002 patch, that modify is executed with 2PC. It means that we
>> >> > almost
>> >> > can provide sharding solution using
>> >> > multiple PostgreSQL server (one parent node and several shared node).
>> >> >
>> >> > For multi master, we definitely need transaction manager but
>> >> > transaction
>> >> > manager probably can use this 2PC for FDW feature to manage
>> >> > distributed
>> >> > transaction.
>> >> >
>> >> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
>> >> >
>> >> > 0002 patch makes postgres_fdw to use below APIs. These APIs are
>> >> > generic
>> >> > features which can be used by all kinds of FDWs.
>> >> >
>> >> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead
>> >> > of
>> >> > COMMIT/ABORT on foreign server which supports 2PC.
>> >> > b. Manage information of foreign prepared transactions resolver
>> >> >
>> >> > Masahiko Sawada will post the patch.
>> >> >
>> >> >
>> >>
>> >
>> > Thanks Vinayak and Sawada-san for taking this forward and basing your
>> > work
>> > on my patch.
>> >
>> >>
>> >> Still lot of work to do but attached latest patches.
>> >> These are based on the patch Ashutosh posted before, I revised it and
>> >> divided into two patches.
>> >> Compare with original patch, patch of pg_fdw_xact_resolver and
>> >> documentation are lacked.
>> >
>> >
>> > I am not able to understand the last statement.
>>
>> Sorry to confuse you.
>>
>> > Do you mean to say that your patches do not have pg_fdw_xact_resolver()
>> > and
>> > documentation that my patches had?
>>
>> Yes.
>> I'm confirming them that your patches had.
>
>
> Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve
> any transactions which can not be resolved immediately after they were
> prepared. There was a comment from Kevin (IIRC) that leaving transactions
> unresolved on the foreign server keeps the resources locked on those
> servers. That's not a very good situation. And nobody but the initiating
> server can resolve those. That functionality is important to make it a
> complete 2PC solution. So, please consider it to be included in your first
> set of patches.
>

Yeah, I know the reason why pg_fdw_xact_resolver is required.
I will add it as a separated patch.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Tom Lane
Magnus Hagander  writes:
> On Aug 26, 2016 5:54 PM, "Peter Eisentraut" <
> peter.eisentr...@2ndquadrant.com> wrote:
>> If we're going to do some renaming, then I suggest we do a
>> mini-file-system structure under $PGDATA, like
>> 
>> $PGDATA/etc
>> $PGDATA/log
>> $PGDATA/run (lock files etc.)
>> $PGDATA/tmp
>> $PGDATA/var
>> 
>> The names of all the things under "var" could still be refined, but it's
>> much less likely that users will confuse data with configuration or
>> plain logs under that scheme

> Interesting idea. I worry a bit that this might encourage distributions to
> split it up into different places though, and I'm not sure we want to
> encourage that..

Yeah, I'm afraid that these names are not as well standardized as Peter
probably wishes they were.  Also, I'd just as soon not move/rename things
that don't really need it.  If, for example, we decide to move
postgresql.conf to etc/postgresql.conf, that is going to break a metric
ton of stuff that doesn't need to get broken AFAICS.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Renaming some binaries

2016-08-26 Thread Euler Taveira
Hi,

I'm bringing this $subject into discussion again. Historically, we are
carrying binary names that have been confused newbies. createuser is the
worst name so for. Also, names like createdb, initdb, reindexdb, and
droplang does not suggest what product it is referring to. Adding a
prefix (pg_, pg, ...) would 'make things clear'. If we have a consensus
about this change, I suggest renaming the following binaries:

clusterdb
createdb
createlang
createuser
dropdb
droplang
dropuser
initdb
oid2name
reindexdb
vacuumdb
vacuumlo

Another major change related to this topic is assemble functionalities
from binaries. We currently have 34 binaries (is that a lot for a single
software?). Also, some of them have the same principle: execute a
administrative or maintenance command. IMHO, from the list above, we
could reduce it to:

pg_command: clusterdb, createdb, dropdb, createuser, dropuser,
createlang, droplang, reindexdb, vacuumdb, vacuumlo. It also has the
advantage to allow adding new administrative/maintenance commands to it
in the future;
pg_oid2name: I don't have a strong opinion that it fits in pg_command;
initdb: we already have 'pg_ctl init' (since 9.0) and could remove initdb.

Opinions?


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unsupported feature F867: WITH TIES

2016-08-26 Thread Peter Eisentraut
On 8/26/16 9:06 AM, Jürgen Purtz wrote:
> Actually we don't support the SQL feature F867 "FETCH FIRST clause: WITH 
> TIES option". On the other side we support the window function "rank()" 
> which acts like the "WITH TIES option". My questions are: Is it hard to 
> implement the "WITH TIES option"? Are there plans for a realization / 
> how do we decide in general what to do next? Differs the semantic of the 
> "WITH TIES option" significantly from the "rank()" window function or 
> can we treat it as some kind of 'syntactical sugar'?

LIMIT/FETCH FIRST works at a level that is quite far removed from data
type semantics, which is what you'd need to have handy to compute ties.
LIMIT basically just tells the executor to stop after getting a certain
number of rows.

So implementing WITH TIES would be very difficult in the current setup.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Magnus Hagander
On Aug 26, 2016 5:54 PM, "Peter Eisentraut" <
peter.eisentr...@2ndquadrant.com> wrote:
>
> On 8/25/16 10:39 PM, Michael Paquier wrote:
> > I am relaunching $subject as 10 development will begin soon. As far as
> > I know, there is agreement that we can do something here. Among the
> > different proposals I have found:
> > - pg_clog renamed to pg_commit_status, pg_xact or pg_commit
> > - pg_xlog renamed to pg_xjournal, pg_wal or pg_journal
>
> If we're going to do some renaming, then I suggest we do a
> mini-file-system structure under $PGDATA, like
>
> $PGDATA/etc
> $PGDATA/log
> $PGDATA/run (lock files etc.)
> $PGDATA/tmp
> $PGDATA/var
>
> The names of all the things under "var" could still be refined, but it's
> much less likely that users will confuse data with configuration or
> plain logs under that scheme

Interesting idea. I worry a bit that this might encourage distributions to
split it up into different places though, and I'm not sure we want to
encourage that..

/Magnus


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-26 Thread Tom Lane
I wrote:
> After sleeping on it, I think the right answer is to introduce the new
> error-message field (and not worry about 9.5).  Will work on a patch
> for that, unless I hear objections pretty soon.

So far as I can find, the attached is all we need to do to introduce a
new message field.  (This patch doesn't address the memory-context
questions, but it does fix the localization-driven failure demonstrated
upthread.)

Any objections?  Anyone want to bikeshed the field name?  I considered
PG_DIAG_SEVERITY_NONLOCALIZED and PG_DIAG_SEVERITY_ENGLISH before settling
on PG_DIAG_SEVERITY_ASCII, but I can't say I'm in love with that.

regards, tom lane

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index f22e3da..40ae0ff 100644
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*** char *PQresultErrorField(const PGresult 
*** 2767,2772 
--- 2767,2788 

   
  
+  
+   PG_DIAG_SEVERITY_ASCII
+   
+
+ The severity; the field contents are ERROR,
+ FATAL, or PANIC (in an error message),
+ or WARNING, NOTICE, DEBUG,
+ INFO, or LOG (in a notice message).
+ This is identical to the PG_DIAG_SEVERITY field except
+ that the contents are never localized.  This is present only in
+ reports generated by PostgreSQL versions 9.6
+ and later.
+
+   
+  
+ 
   

 PG_DIAG_SQLSTATE
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9c96d8f..9bddb19 100644
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
*** message.
*** 4884,4889 
--- 4884,4908 
  
  
  
+ A
+ 
+ 
+ 
+ Severity: the field contents are
+ ERROR, FATAL, or
+ PANIC (in an error message), or
+ WARNING, NOTICE, DEBUG,
+ INFO, or LOG (in a notice message).
+ This is identical to the S field except
+ that the contents are never localized.  This is present only in
+ messages generated by PostgreSQL versions 9.6
+ and later.
+ 
+ 
+ 
+ 
+ 
+ 
  C
  
  
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 921242f..b04a7ef 100644
*** a/src/backend/libpq/pqmq.c
--- b/src/backend/libpq/pqmq.c
*** pq_parse_errornotice(StringInfo msg, Err
*** 237,246 
  		switch (code)
  		{
  			case PG_DIAG_SEVERITY:
  if (strcmp(value, "DEBUG") == 0)
! 	edata->elevel = DEBUG1;		/* or some other DEBUG level */
  else if (strcmp(value, "LOG") == 0)
! 	edata->elevel = LOG;		/* can't be COMMERROR */
  else if (strcmp(value, "INFO") == 0)
  	edata->elevel = INFO;
  else if (strcmp(value, "NOTICE") == 0)
--- 237,262 
  		switch (code)
  		{
  			case PG_DIAG_SEVERITY:
+ /* ignore, trusting we'll get a nonlocalized version */
+ break;
+ 			case PG_DIAG_SEVERITY_ASCII:
  if (strcmp(value, "DEBUG") == 0)
! {
! 	/*
! 	 * We can't reconstruct the exact DEBUG level, but
! 	 * presumably it was >= client_min_messages, so select
! 	 * DEBUG1 to ensure we'll pass it on to the client.
! 	 */
! 	edata->elevel = DEBUG1;
! }
  else if (strcmp(value, "LOG") == 0)
! {
! 	/*
! 	 * It can't be LOG_SERVER_ONLY, or the worker wouldn't
! 	 * have sent it to us; so LOG is the correct value.
! 	 */
! 	edata->elevel = LOG;
! }
  else if (strcmp(value, "INFO") == 0)
  	edata->elevel = INFO;
  else if (strcmp(value, "NOTICE") == 0)
*** pq_parse_errornotice(StringInfo msg, Err
*** 254,264 
  else if (strcmp(value, "PANIC") == 0)
  	edata->elevel = PANIC;
  else
! 	elog(ERROR, "unknown error severity");
  break;
  			case PG_DIAG_SQLSTATE:
  if (strlen(value) != 5)
! 	elog(ERROR, "malformed sql state");
  edata->sqlerrcode = MAKE_SQLSTATE(value[0], value[1], value[2],
    value[3], value[4]);
  break;
--- 270,280 
  else if (strcmp(value, "PANIC") == 0)
  	edata->elevel = PANIC;
  else
! 	elog(ERROR, "unrecognized error severity: \"%s\"", value);
  break;
  			case PG_DIAG_SQLSTATE:
  if (strlen(value) != 5)
! 	elog(ERROR, "invalid SQLSTATE: \"%s\"", value);
  edata->sqlerrcode = MAKE_SQLSTATE(value[0], value[1], value[2],
    value[3], value[4]);
  break;
*** pq_parse_errornotice(StringInfo msg, Err
*** 308,314 
  edata->funcname = pstrdup(value);
  break;
  			default:
! elog(ERROR, "unknown error field: %d", (int) code);
  break;
  		}
  	}
--- 324,330 
  edata->funcname = pstrdup(value);
  break;
  			default:
! elog(ERROR, "unrecognized error field code: %d", (int) code);
  break;
  		}
  	}
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 78d441d..ab71b50 

Re: [HACKERS] WAL consistency check facility

2016-08-26 Thread Simon Riggs
Hi Kuntal,

Thanks for the patch.

Current patch has no docs, no tests and no explanatory comments, so
makes review quite hard.

The good news is you might discover a few bugs with it, so its worth
pursuing actively in this CF, though its not near to being
committable.

I think you should add this as part of the default testing for both
check and installcheck. I can't imagine why we'd have it and not use
it during testing.


On 25 August 2016 at 18:41, Kuntal Ghosh  wrote:

> * wal_consistency_mask = 511  /* Enable consistency check mask bit*/

What does this mean? (No docs)


> What needs to be done:
> 1. Add support for other Resource Managers.

We probably need to have a discussion as to why you think this should
be Rmgr dependent?
Code comments would help there.

If it does, then you should probably do this by extending RmgrTable
with an rm_check, so you can call it like this...

RmgrTable[record->xl_rmid].rm_check

I'm interested in how we handle the new generic WAL format for blocks.
Surely if we can handle that then we won't need an Rmgr dependency?
I'm sure you have reasons, they just need to be explained long hand -
don't assume anything.


> 5. Generalize the page type identification technique.

Why not do this first?

There are some coding guideline stuff to check as well.

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Peter Eisentraut
On 8/25/16 10:39 PM, Michael Paquier wrote:
> I am relaunching $subject as 10 development will begin soon. As far as
> I know, there is agreement that we can do something here. Among the
> different proposals I have found:
> - pg_clog renamed to pg_commit_status, pg_xact or pg_commit
> - pg_xlog renamed to pg_xjournal, pg_wal or pg_journal

If we're going to do some renaming, then I suggest we do a
mini-file-system structure under $PGDATA, like

$PGDATA/etc
$PGDATA/log
$PGDATA/run (lock files etc.)
$PGDATA/tmp
$PGDATA/var

The names of all the things under "var" could still be refined, but it's
much less likely that users will confuse data with configuration or
plain logs under that scheme.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is a newly created index contains the invalid LSN?

2016-08-26 Thread Yury Zhuravlev

Amit Kapila wrote:

On Thu, Aug 25, 2016 at 7:55 PM, Yury Zhuravlev
 wrote:

Hello hackers.

I have a small question. While working on an incremental 
backup I noticed a

strange thing.
Newly created index is contains the invalid LSN (0/0).
Exmaple: ...


For some of the indexes like btree which are built outside shared
buffers, we don't write WAL unless wal_level >= REPLICA.  I think
Robert has explained it very well how we handle the crash recovery
situation for such indexes.  However, for some other indexes which
don't bypass shared buffers like BRIN, GIN we do write WAL for such
cases as well, so you must see LSN for those type of indexes.  I am
less sure, if there will be any problem, if don't write WAL for those
indexes as well when wal_level < REPLICA.



Thanks all.
Now understand LSN strongly connected with WAL.
However how difficult put last system LSN instead 0?
It's not so important but will allow make use LSN more consistent.

--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing checks when malloc returns NULL...

2016-08-26 Thread Aleksander Alekseev
Hello, Michael.

Your patch [1] was marked as "Needs review" so I decided to take a look.

It looks good to me. However I found dozens of places in PostgreSQL code
that seem to have similar problem you are trying to fix [2]. As far as I
understand these are only places left that don't check malloc/realloc/
strdup return values properly. I thought maybe you will be willing to
fix they too so we could forget about this problem forever.

If not I will be happy to do it. However in this case we need someone
familiar with affected parts of the code who will be willing to re-check
a new patch since I'm not filling particularly confident about how
exactly errors should be handled in all these cases.

By the way maybe someone knows other procedures besides malloc, realloc
and strdup that require special attention?

[1] https://commitfest.postgresql.org/10/653/
[2] http://afiskon.ru/s/15/83287ef7d2_malloc.txt

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-26 Thread Tom Lane
Robert Haas  writes:
> On Fri, Aug 26, 2016 at 10:35 AM, Tom Lane  wrote:
>> BTW, while I'm looking at this: what on god's green earth is
>> ThrowErrorData doing copying the supplied data into edata->assoc_context?
>> Surely it should be putting the data into the ErrorContext, where it's not
>> going to get flushed before it can be reported?

> Uh, well, perhaps I misinterpreted the comment in elog.h.  It says this:

> /* context containing associated non-constant strings */
> struct MemoryContextData *assoc_context;

> That sure looks like it's saying that all of the pointers stored in
> the ErrorData structure should be pointing into assoc_context, unless
> they are constant.

Indeed.  The point is that every ErrorData in the errordata stack needs
to, and does, have assoc_context = ErrorContext.  This coding is blithely
ignoring what errstart has set up and copying the data someplace else.
In fact, it's pointlessly duplicating data that is *already* in the
context of the source ErrorData.

You should be imagining this action as being the reverse of CopyErrorData,
ie copying some data back into the purview of elog.c, which is to say it
should be in ErrorContext.

Or in short, this has confused edata and newedata.  Valid coding would
be
oldcontext = MemoryContextSwitchTo(newedata->assoc_context);
rather than what is there.

>> (Note that in the sole
>> existing use-case, edata->assoc_context is going to have been set to
>> CurrentMemoryContext by pq_parse_errornotice, and I see no good reason to
>> assume that's very long-lived ... in fact, it looks like it's whatever
>> happens to be active when ProcessInterrupts is called, which means there's
>> probably a totally separate set of problems here having to do with
>> potential leaks into long-lived contexts.)

> Oops.  Yes.

I'm not quite sure what to do about this.  It feels a tad wrong to use
ErrorContext as the active context during HandleParallelMessages, but
what else should we use?  Maybe this needs its very own private context
that we can reset after each use?

>> I have little use for the name of that function either, as it's not
>> necessarily going to "throw" anything.  Maybe ReportErrorUsingData,
>> or something like that?

> I deliberately avoided that sort of terminology because it need not be
> an ERROR.  It can be, say, a NOTICE.  It is definitely something that
> is coming from an ErrorData but it need not be an ERROR.

Right, but to me, "throw" generally means a transfer of control, which
is exactly what this isn't going to do if the message is only a notice.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Magnus Hagander
On Aug 26, 2016 5:13 PM, "Joshua D. Drake"  wrote:
>
> On 08/25/2016 07:39 PM, Michael Paquier wrote:
>>
>> Hi all,
>>
>> I am relaunching $subject as 10 development will begin soon. As far as
>> I know, there is agreement that we can do something here. Among the
>> different proposals I have found:
>> - pg_clog renamed to pg_commit_status, pg_xact or pg_commit
>> - pg_xlog renamed to pg_xjournal, pg_wal or pg_journal
>
>
> I think the use of the pg_ prefix is redundant. Just a directory called:
wal will do (for example).
>
> In reference to pg_xlog specifically, it should be wal. It is the Write
Ahead Log, not the Journal (although I recognize they can be synonymous).
All the documentation talks about Write Ahead Log.
>

Yes, let's avoid inventing a *third* name for it, please. It's already bad
enough that we have both wal and xlog.

/Magnus


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Joshua D. Drake

On 08/25/2016 07:39 PM, Michael Paquier wrote:

Hi all,

I am relaunching $subject as 10 development will begin soon. As far as
I know, there is agreement that we can do something here. Among the
different proposals I have found:
- pg_clog renamed to pg_commit_status, pg_xact or pg_commit
- pg_xlog renamed to pg_xjournal, pg_wal or pg_journal


I think the use of the pg_ prefix is redundant. Just a directory called: 
wal will do (for example).


In reference to pg_xlog specifically, it should be wal. It is the Write 
Ahead Log, not the Journal (although I recognize they can be 
synonymous). All the documentation talks about Write Ahead Log.


Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Joshua D. Drake

On 08/26/2016 03:48 AM, Magnus Hagander wrote:



Same reason I'm also +1 for Stephens suggestion to put all things that
should not be in a base backup into the same directory. That may break
things now, but it will simplify things down the road. And doing it at
the same time as renaming these things makes a lot of sense, because it
causes breakage that tool-builders *have* to look at, and then they will
hopefully also notice the other change.


If this is done this fall, developers will have at least a year to fix 
their utilities.


JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-26 Thread Robert Haas
On Fri, Aug 26, 2016 at 10:35 AM, Tom Lane  wrote:
> I wrote:
>> After sleeping on it, I think the right answer is to introduce the new
>> error-message field (and not worry about 9.5).  Will work on a patch
>> for that, unless I hear objections pretty soon.
>
> BTW, while I'm looking at this: what on god's green earth is
> ThrowErrorData doing copying the supplied data into edata->assoc_context?
> Surely it should be putting the data into the ErrorContext, where it's not
> going to get flushed before it can be reported?

Uh, well, perhaps I misinterpreted the comment in elog.h.  It says this:

/* context containing associated non-constant strings */
struct MemoryContextData *assoc_context;

That sure looks like it's saying that all of the pointers stored in
the ErrorData structure should be pointing into assoc_context, unless
they are constant.  If that's not right, I suggest rewording that
comment, because I cannot think of a second interpretation of what's
written there.

>  (Note that in the sole
> existing use-case, edata->assoc_context is going to have been set to
> CurrentMemoryContext by pq_parse_errornotice, and I see no good reason to
> assume that's very long-lived ... in fact, it looks like it's whatever
> happens to be active when ProcessInterrupts is called, which means there's
> probably a totally separate set of problems here having to do with
> potential leaks into long-lived contexts.)

Oops.  Yes.

> I have little use for the name of that function either, as it's not
> necessarily going to "throw" anything.  Maybe ReportErrorUsingData,
> or something like that?

I deliberately avoided that sort of terminology because it need not be
an ERROR.  It can be, say, a NOTICE.  It is definitely something that
is coming from an ErrorData but it need not be an ERROR.

Also, I think "throwing an error" is pretty standard terminology that
is understandable to pretty much all programmers these days.  It's not
a perfect name; maybe ReportErrorData would have been better, but
changing that seems like pointless tinkering at this stage, from my
point of view.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Euler Taveira  writes:
> > On 26-08-2016 09:25, Devrim Gündüz wrote:
> >> ...and we also have "pg_logical", that includes a "log" keyword already...
> 
> > "clog" and "xlog" is almost "log"; "logical" is not. I don't imagine
> > people confusing "logical" meaning "log".
> 
> Well, I dunno; people with a weak grasp of English might have an issue
> there.  But I never liked that directory name anyway; my problem with
> it is I read "pg_logical" and think "logical what?".  Naming things using
> a disconnected adjective is not good, especially one with as many
> potential applications as that one has.  If we're up for renaming things
> under PGDATA, that one is high on my hit list.

iirc, pg_logical also has both temporary and non-temporary data in it
too.  That complicates things for backup utilities that are trying to
exclude all temporary files.  If we actually move all the temp files
into their own directory (or tree), then we're changing what's in
pg_logical anyway, so renaming it seems like a good idea.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Tom Lane
Euler Taveira  writes:
> On 26-08-2016 09:25, Devrim Gündüz wrote:
>> ...and we also have "pg_logical", that includes a "log" keyword already...

> "clog" and "xlog" is almost "log"; "logical" is not. I don't imagine
> people confusing "logical" meaning "log".

Well, I dunno; people with a weak grasp of English might have an issue
there.  But I never liked that directory name anyway; my problem with
it is I read "pg_logical" and think "logical what?".  Naming things using
a disconnected adjective is not good, especially one with as many
potential applications as that one has.  If we're up for renaming things
under PGDATA, that one is high on my hit list.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Euler Taveira
On 26-08-2016 09:25, Devrim Gündüz wrote:
> On Fri, 2016-08-26 at 21:12 +0900, Michael Paquier wrote:
>> In short, with the current names, sometimes users think that pg_xlog
>> and pg_clog are just logs. And so it is fine to delete them to free up
>> space, corrupting their cluster, because they are just *logs*.
> 
> ...and we also have "pg_logical", that includes a "log" keyword already...
> 
"clog" and "xlog" is almost "log"; "logical" is not. I don't imagine
people confusing "logical" meaning "log".


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-26 Thread Tom Lane
I wrote:
> After sleeping on it, I think the right answer is to introduce the new
> error-message field (and not worry about 9.5).  Will work on a patch
> for that, unless I hear objections pretty soon.

BTW, while I'm looking at this: what on god's green earth is
ThrowErrorData doing copying the supplied data into edata->assoc_context?
Surely it should be putting the data into the ErrorContext, where it's not
going to get flushed before it can be reported?  (Note that in the sole
existing use-case, edata->assoc_context is going to have been set to
CurrentMemoryContext by pq_parse_errornotice, and I see no good reason to
assume that's very long-lived ... in fact, it looks like it's whatever
happens to be active when ProcessInterrupts is called, which means there's
probably a totally separate set of problems here having to do with
potential leaks into long-lived contexts.)

I have little use for the name of that function either, as it's not
necessarily going to "throw" anything.  Maybe ReportErrorUsingData,
or something like that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN logging GIN_SEGMENT_UNMODIFIED actions?

2016-08-26 Thread Fujii Masao
On Tue, May 10, 2016 at 9:57 PM, Alexander Korotkov
 wrote:
> Hi!
>
> On Mon, May 9, 2016 at 10:46 PM, Andres Freund  wrote:
>>
>> trying to debug something I saw the following in pg_xlogdump output:
>>
>> rmgr: Gin len (rec/tot):  0/   274, tx:  0, lsn:
>> 1C/DF28AEB0, prev 1C/DF289858, desc: VACUUM_DATA_LEAF_PAGE  3 segments: 5
>> unknown action 0 ???, blkref #0: rel 1663/16384/16435 blk 310982
>>
>> note the "segments: 5 unknown action 0 ???" bit.  That doesn't seem
>> right, given:
>> #define GIN_SEGMENT_UNMODIFIED  0   /* no action (not used in
>> WAL records) */
>
>
> I've checked GIN code.  Have no idea of how such wal record could be
> generated...

I encountered the same issue when executing the following queries and
running pg_xlogdump.

CREATE EXTENSION pg_trgm;
CREATE TABLE test (col1 TEXT);
CREATE INDEX testidx ON test USING gin (col1 gin_trgm_ops) WITH
(fastupdate = off);
INSERT INTO test SELECT 'ABCDE' FROM generate_series(1,1);
DELETE FROM test;
VACUUM test;

$ pg_xlogdump data/pg_xlog/00010004 | grep Gin | grep action
rmgr: Gin len (rec/tot):  0/88, tx:  0, lsn:
0/04A4B468, prev 0/04A4B438, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
11
rmgr: Gin len (rec/tot):  0/78, tx:  0, lsn:
0/04A4B4C0, prev 0/04A4B468, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
10
rmgr: Gin len (rec/tot):  0/88, tx:  0, lsn:
0/04A4B510, prev 0/04A4B4C0, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
13
rmgr: Gin len (rec/tot):  0/78, tx:  0, lsn:
0/04A4B568, prev 0/04A4B510, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
12
rmgr: Gin len (rec/tot):  0/88, tx:  0, lsn:
0/04A4B5B8, prev 0/04A4B568, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
15
rmgr: Gin len (rec/tot):  0/78, tx:  0, lsn:
0/04A4B610, prev 0/04A4B5B8, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
14
rmgr: Gin len (rec/tot):  0/88, tx:  0, lsn:
0/04A4B660, prev 0/04A4B610, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
17
rmgr: Gin len (rec/tot):  0/78, tx:  0, lsn:
0/04A4B6B8, prev 0/04A4B660, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
16
rmgr: Gin len (rec/tot):  0/88, tx:  0, lsn:
0/04A4B708, prev 0/04A4B6B8, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
19
rmgr: Gin len (rec/tot):  0/78, tx:  0, lsn:
0/04A4B760, prev 0/04A4B708, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
18
rmgr: Gin len (rec/tot):  0/88, tx:  0, lsn:
0/04A4B7B0, prev 0/04A4B760, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
21
rmgr: Gin len (rec/tot):  0/78, tx:  0, lsn:
0/04A4B808, prev 0/04A4B7B0, desc: VACUUM_DATA_LEAF_PAGE  1663
segments: 0 unknown action 0 ???, blkref #0: rel 1663/13286/16455 blk
20

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set log_line_prefix and application name in test drivers

2016-08-26 Thread Fabien COELHO


Hello Peter,


log_line_prefix = '%t [%p]: [%l] %qapp=%a '

which is modeled after the pgfouine recommendation, which is I believe a
wide-spread convention, and it also vaguely follows syslog customs.

The build farm client has

log_line_prefix = '%m [%c:%l] '

which is very similar, but the lack of the PID makes it unsuitable for
the purposes that I have set out, and there is no obvious place to put
additional information such as %a.

%m vs %t is obviously a minor issue that I will gladly adjust, but
besides that I prefer to stick with my version.


v2 patch looks ok, applies without trouble and works as intended:

  2016-08-26 09:19:31.191 CEST [7571]: [58] app=pg_regress/event_trigger 
STATEMENT:  alter type rewritetype alter attribute a type varchar cascade;

About the format: '[\d+]' pattern is used twice, which makes the output 
less easily grep-able.


Also, the ':' is used as a separator in the remainder of the message, so 
maybe once is enough at this level.


I'm not sure about the "app=" is really necessary, given its very explicit 
definition as can be seen above above.


So I would suggest something like the following, which is also a little 
bit more compact:


  log_line_prefix = '%m [%p:%l] %q%a '

If you want to keep something with %a, maybe parentheses?

Finally I'm wondering also whether a timestamp since the server has 
started (which does not exists) would be more useful for a "make check", 
or at default maybe %n?


--
Fabien


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-08-26 Thread Alvaro Herrera
Kuntal Ghosh wrote:
> Thanks a lot.
> 
> I just want to mention the situation where I was getting the
> speculative token related inconsistency.
> 
> ItemPointer in backup page from master:
> LOG:  ItemPointer BlockNumber: 1 OffsetNumber:65534 Speculative: true
> CONTEXT:  xlog redo at 0/127F4A48 for Heap/INSERT+INIT: off 1
> 
> ItemPointer in current page from slave after redo:
> LOG:  ItemPointer BlockNumber: 0 OffsetNumber:1 Speculative: false
> CONTEXT:  xlog redo at 0/127F4A48 for Heap/INSERT+INIT: off 1
> 
> As the block numbers are different, I was getting the following warning:
> WARNING:  Inconsistent page (at byte 8166) found for record
> 0/127F4A48, rel 1663/16384/16946, forknum 0, blkno 0, Backup Page
> Header : (pd_lower: 28 pd_upper: 8152 pd_special: 8192) Current Page
> Header: (pd_lower: 28 pd_upper: 8152 pd_special: 8192)
> CONTEXT:  xlog redo at 0/127F4A48 for Heap/INSERT+INIT: off 1
> 
> In heap_xlog_insert, t_ctid is always set to blkno and xlrec->offnum.
> I think this is why I was getting the above warning.

Umm, really?  Then perhaps this *is* a bug.  Peter?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-26 Thread Martín Marqués
Hi,

2016-08-25 8:10 GMT-03:00 Michael Paquier :
> On Thu, Aug 25, 2016 at 10:25 AM, Martín Marqués  
> wrote:
>> 2016-08-24 21:34 GMT-03:00 Michael Paquier :
>>>
>>> Yes, you are right. If I look at the diffs this morning I am seeing
>>> the ACLs being dumped for this aggregate. So we could just fix the
>>> test and be done with it. I did not imagine the index issue though,
>>> and this is broken for some time, so that's not exclusive to 9.6 :)
>>
>> Do you see any easier way than what I mentioned earlier (adding a
>> selectDumpableIndex() function) to fix the index dumping issue?
>
> Yes, we are going to need something across those lines. And my guess
> is that this is going to be rather close to getOwnedSeqs() in terms of
> logic.

I was able to get this fixed without any further new functions (just
using the dump/dump_contains and applying the same fix on
selectDumpableTable).

Main problem relied here in getIndexes()

@@ -6158,7 +6167,7 @@ getIndexes(Archive *fout, TableInfo tblinfo[],
int numTables)
continue;

/* Ignore indexes of tables whose definitions are not
to be dumped */
-   if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
+   if (!(tbinfo->dobj.dump_contains & DUMP_COMPONENT_DEFINITION))
continue;

if (g_verbose)

But we have to set dump_contains with correct values.

There's still one issue, which I'll add a test for as well, which is
that if the index was created by the extension, it will be dumped
anyway. I'll ave a look at that as well.

One other thing I found was that one of the CREATE INDEX tests had
incorrectly set like and unlike for pre_data and post_data. (indexes
are dumped in section post_data)

That's been fixes as well.

I've cleaned up the patch a bit, so this is v3 with all checks passing.

I'll add that new test regarding dumping an index created by the
extension (which will fail) and look for ways to fix it.

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


pgdump-extension-v3.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd oid-system-column handling in postgres_fdw

2016-08-26 Thread Heikki Linnakangas

On 04/05/2016 11:15 AM, Etsuro Fujita wrote:

On 2016/03/16 16:25, Etsuro Fujita wrote:

PG9.5 allows us to add an oid system column to foreign tables, using
ALTER FOREIGN TABLE SET WITH OIDS, but currently, that column reads as
zeroes in postgres_fdw.  That seems to me like a bug.  So, I'd like to
propose to fix that, by retrieving that column from the remote server
when requested.  I'm attaching a proposed patch for that.


I rebased the patch against HEAD.  Updated patch attached.


Committed, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-08-26 Thread Tom Lane
Pavel Stehule  writes:
> Using footer for this purpose is little bit strange. What about following
> design?
> 1. move out source code of PL functions from \df+
> 2. allow not unique filter in \sf and allow to display multiple functions

Wasn't that proposed and rejected upthread?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] RBTree iteration interface improvement

2016-08-26 Thread Aleksander Alekseev
Hello, Heikki.

Thank you for your attention to this patch!

> This also seems to change the API so that instead of a single 
> rb_begin_iterate()+rb_iterate() pair, there is a separate begin and 
> iterate function for each traversal order. That seems like an unrelated 
> change. Was there a particular reason for that? I think the old 
> rb_begin_iterate()+rb_iterate() functions were fine, we just need to 
> have a RBTreeIterator struct that's different from the tree itself.

I'm trying to avoid calling procedures by a pointer, an old habit. When
I started to work on this patch I just needed a RB-tree implementation
for a pet project. I took one from PostgreSQL code. Then I found this
flaw of iteration interface and decided to fix it. The idea to merge
this fix back to PostgreSQL code appeared later so I just wrote code the
way I like.

These days code performance depends on many factors like whether code
fits into cache, i.e not only on whether you call a procedure directly
or using a pointer. Until someone finds a real bottleneck here I think
current rb_begin_iterate()+rb_iterate() interface should do just fine.
 
> Another unrelated change in this patch is the addition of 
> rb_rightmost(). It's not used for anything, so I'm not sure what the 
> point is. Then again, there don't seem to be any callers of 
> rb_leftmost() either.

It's just something I needed in tests to reach a good percent of code
coverage. Implementation of rb_rightmost is trivial so we probably can do
without it.

> I think we should something like in the attached patch. It seems to pass 
> your test suite, but I haven't done any other testing on this. Does it
> look OK to you?

Looks good to me.

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unsupported feature F867: WITH TIES

2016-08-26 Thread Jürgen Purtz
Actually we don't support the SQL feature F867 "FETCH FIRST clause: WITH 
TIES option". On the other side we support the window function "rank()" 
which acts like the "WITH TIES option". My questions are: Is it hard to 
implement the "WITH TIES option"? Are there plans for a realization / 
how do we decide in general what to do next? Differs the semantic of the 
"WITH TIES option" significantly from the "rank()" window function or 
can we treat it as some kind of 'syntactical sugar'?


Regards, Jürgen Purtz



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

2016-08-26 Thread Fujii Masao
On Wed, Mar 23, 2016 at 7:04 PM, Pavan Deolasee
 wrote:
>
>
> On Wed, Mar 23, 2016 at 1:13 PM, Michael Paquier 
> wrote:
>>
>>
>> +   /*
>> +* Compute targetRecOff. It should typically be greater than short
>> +* page-header since a valid record can't , but can also be zero
>> when
>> +* caller has supplied a page-aligned address or when we are
>> skipping
>> +* multi-page continuation record. It doesn't matter though
>> because
>> +* ReadPageInternal() will read at least short page-header worth
>> of
>> +* data
>> +*/
>> This needs some polishing, there is an unfinished sentence here.
>>
>> +   targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
>> targetRecOff, pageHeaderSize and targetPagePtr could be declared
>> inside directly the new while loop.
>
>
> Thanks Michael for reviewing the patch. I've fixed these issues and new
> version is attached.

The patch looks good to me. Barring any objections,
I'll push this and back-patch to 9.3 where pg_xlogdump was added.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch proposal

2016-08-26 Thread Stephen Frost
* Venkata B Nagothi (nag1...@gmail.com) wrote:
> On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost  wrote:
> > * Venkata B Nagothi (nag1...@gmail.com) wrote:
> > > This behaviour will be similar to that of recovery_target="immediate" and
> > > can be aliased.
> >
> > I don't believe we're really going at this the right way.  Clearly,
> > there will be cases where we'd like promotion at the end of the WAL
> > stream (as we currently have) even if the recovery point is not found,
> > but if the new option's "promote" is the same as "immediate" then we
> > don't have that.
> 
> My apologies for the confusion. Correction - I meant, "promote" option
> would promote the database once the consistent point is reached at the
> end-of-the-WAL.

"consistent point" and "end-of-the-WAL" are not the same.

> > recovery_target = immediate, other
> >
> > recovery_action_target_found = promote, pause, shutdown
> 
> This is currently supported by the existing parameter called
> "recovery_target_action"

Right, sure, we could possibly use that, or create an alias, etc.

> > recovery_action_target_not_found = promote, pause, shutdown
> 
> This is exactly what newly proposed parameter will do.

Then it isn't the same as 'recovery_target = immediate'.

> > One question to ask is if we need to support an option for xid and time
> > related to when we realize that we won't find the recovery target.  If
> > consistency is reached at a time which is later than the recovery target
> > for time, what then?  Do we go through the rest of the WAL and perform
> > the "not found" action at the end of the WAL stream?  If we use that
> > approach, then at least all of the recovery target types are handled the
> > same, but I can definitely see cases where an administrator might prefer
> > an "error" option.
> 
> Currently, this situation is handled by recovery_target_inclusive
> parameter. 

No, that's not the same.

> Administrator can choose to stop the recovery at any consistent
> point before or after the specified recovery target. Is this what you were
> referring to ?

Not quite.

> I might need a bit of clarification here, the parameter i am proposing
> would be effective only if the specified recovery target is not reached and
> may not be effective if the recovery goes beyond recovery target point.

Ok, let's consider this scenario:

Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08
Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00
recovery_target is not set
recovery_target_time = 2016-08-26 08:29:30
recovery_target_inclusive = false

In such a case, we will necessairly commit transactions which happened
between 8:29:30 and 8:30:00 and only stop (I believe..) once we've
reached consistency.  We could possibly detect that case and throw an
error instead.  The same could happen with xid.

Working through more scenarios would be useful, I believe.  For example,
what if the xid or time specified happened before the backup started?

Basically, what I was trying to get at is that we might be able to
detect that we'll never find a given recovery target point without
actually replaying all of WAL and wondering if we should provide options
to control what to do in such a case.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] shm_mq_set_sender() crash

2016-08-26 Thread Tom Lane
Latest from lorikeet:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet=2016-08-26%2008%3A37%3A27

TRAP: FailedAssertion("!(vmq->mq_sender == ((void *)0))", File: 
"/home/andrew/bf64/root/REL9_6_STABLE/pgsql.build/../pgsql/src/backend/storage/ipc/shm_mq.c",
 Line: 220)

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add the "snapshot too old" feature

2016-08-26 Thread Kevin Grittner
On Thu, Aug 25, 2016 at 4:51 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> On Thu, Aug 25, 2016 at 2:56 PM, Alvaro Herrera
>>  wrote:
>>
>> > I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
>> > Why do we apply it to the metapage buffer (line 1717 in master)?
>>
>> If there is any chance that GinPageGetMeta(page)->head could have
>> changed from a valid block number to InvalidBlockNumber or a
>> different pending-list page due to a vacuum freeing pages from the
>> pending-list, the metapage must be checked -- there is no other way
>> to detect that data might have disappeared.
>
> Hmm ... but the disappearing data is moved to regular GIN pages, right?
> It doesn't just go away.  I suppose that the error would be raised as
> soon as we scan a GIN data page that, because of receiving data from the
> pending list, has a newer LSN.

That would cover things as long as data is always moved from the
pending list to a data page before it is vacuumed away.  If that is
true, there is no need to check the metapage *or* the pending list
-- but I'm pretty skeptical that this is the case.  The real
question is whether pages are ever removed from the pending list.

> I don't know GIN in detail but perhaps
> it's possible that the data is inserted into data pages in lsn A, then
> removed from the pending list in lsn B (where A < B).  If the snapshot's
> LSN lies in between, a spurious error would be raised.

The implementation does allow false positives and slightly less
aggressive early cleanup than could be achieved -- in order to
avoid the extra locking that would be needed to achieve higher
precision.  Since I expect that the threshold will usually be set
to at least a couple hours, these effects should have minimal
impact.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Devrim Gündüz

Hi,

On Fri, 2016-08-26 at 21:12 +0900, Michael Paquier wrote:
> In short, with the current names, sometimes users think that pg_xlog
> and pg_clog are just logs. And so it is fine to delete them to free up
> space, corrupting their cluster, because they are just *logs*.

...and we also have "pg_logical", that includes a "log" keyword already...

Regards,

-- 
Devrim GÜNDÜZ
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Michael Paquier
On Fri, Aug 26, 2016 at 8:31 PM, Simon Riggs  wrote:
> On 26 August 2016 at 04:39, Michael Paquier  wrote:
>> Hi all,
>>
>> I am relaunching $subject as 10 development will begin soon. As far as
>> I know, there is agreement that we can do something here. Among the
>> different proposals I have found:
>> - pg_clog renamed to pg_commit_status, pg_xact or pg_commit
>> - pg_xlog renamed to pg_xjournal, pg_wal or pg_journal
>
> Don't mean to be a party pooper, but what discussion and agreement are
> we referring to here?
>
> If we are going to suggest doing something we really should summarize
> the reason for doing it rather than assume it is self evident, cos it
> certainly isn't.

This thread was the previous one on the matter:
http://www.postgresql.org/message-id/caaswcxcvgma9kgeu-esc6u928mw67nozvnawbpusw7r7an9...@mail.gmail.com

In short, with the current names, sometimes users think that pg_xlog
and pg_clog are just logs. And so it is fine to delete them to free up
space, corrupting their cluster, because they are just *logs*.
Personally I have seen that, and based on the thread I am not the only
one.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG_DIAG_SEVERITY and a possible bug in pq_parse_errornotice()

2016-08-26 Thread Tom Lane
Robert Haas  writes:
> I don't have strong feelings about this.  Technically, this issue
> affects 9.5 also, because pqmq.c was introduced in that release.  I
> don't think we want to add another error field in a released branch.
> However, since there's no parallel query in 9.5, only people who are
> accessing that functionality via extension code would be affected,
> which might be nobody and certainly isn't a lot of people, so we could
> just leave this unfixed in 9.5.

After sleeping on it, I think the right answer is to introduce the new
error-message field (and not worry about 9.5).  Will work on a patch
for that, unless I hear objections pretty soon.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] RBTree iteration interface improvement

2016-08-26 Thread Heikki Linnakangas

On 08/22/2016 01:00 PM, Aleksander Alekseev wrote:

It seems clear to me that the existing arrangement is hazardous for
any RBTree that hasn't got exactly one consumer.  I think
Aleksander's plan to decouple the iteration state is probably a good
one (NB: I've not read the patch, so this is not an endorsement of
details).  I'd go so far as to say that we should remove the old API
as being dangerous, rather than preserve it on
backwards-compatibility grounds.  We make bigger changes than that in
internal APIs all the time.


Glad to hear it! I would like to propose a patch that removes the old
API. I have a few questions though.

Would you like it to be a separate patch, or all-in-one patch is fine
in this case? I also would like to include unit/property-based tests in
this (these) patch (patches). However as I understand there are
currently no unit tests in PostgreSQL at all, only integration/system
tests. Any advice how to do it better?


OK, here is a patch. I think we could call it a _replacement_ of an old API, so
there is probably no need in two separate patches. I still don't know how to
include unit test and whether they should be included at all. Thus for now
unit/property-based tests could be found only on GitHub [1].

As usual, if you have any questions, suggestions or notes regarding this patch,
please let me know.


This also seems to change the API so that instead of a single 
rb_begin_iterate()+rb_iterate() pair, there is a separate begin and 
iterate function for each traversal order. That seems like an unrelated 
change. Was there a particular reason for that? I think the old 
rb_begin_iterate()+rb_iterate() functions were fine, we just need to 
have a RBTreeIterator struct that's different from the tree itself.


Note that the API actually used to be like that. rb_begin_iterate() used 
to return a palloc'd RBTreeIterator struct. It was changed in commit 
0454f131, because the implementation didn't support more than one 
iterator at a time, which made the separate struct pointless. But we're 
fixing that problem now.


Another unrelated change in this patch is the addition of 
rb_rightmost(). It's not used for anything, so I'm not sure what the 
point is. Then again, there don't seem to be any callers of 
rb_leftmost() either.


I think we should something like in the attached patch. It seems to pass 
your test suite, but I haven't done any other testing on this. Does it 
look OK to you?


- Heikki

diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c
index d6422ea..71c64e4 100644
--- a/src/backend/access/gin/ginbulk.c
+++ b/src/backend/access/gin/ginbulk.c
@@ -255,7 +255,7 @@ qsortCompareItemPointers(const void *a, const void *b)
 void
 ginBeginBAScan(BuildAccumulator *accum)
 {
-	rb_begin_iterate(accum->tree, LeftRightWalk);
+	rb_begin_iterate(accum->tree, LeftRightWalk, >tree_walk);
 }
 
 /*
@@ -271,7 +271,7 @@ ginGetBAEntry(BuildAccumulator *accum,
 	GinEntryAccumulator *entry;
 	ItemPointerData *list;
 
-	entry = (GinEntryAccumulator *) rb_iterate(accum->tree);
+	entry = (GinEntryAccumulator *) rb_iterate(>tree_walk);
 
 	if (entry == NULL)
 		return NULL;			/* no more entries */
diff --git a/src/backend/lib/rbtree.c b/src/backend/lib/rbtree.c
index 4fa8a1d..048366d 100644
--- a/src/backend/lib/rbtree.c
+++ b/src/backend/lib/rbtree.c
@@ -28,18 +28,6 @@
 
 #include "lib/rbtree.h"
 
-
-/*
- * Values of RBNode.iteratorState
- *
- * Note that iteratorState has an undefined value except in nodes that are
- * currently being visited by an active iteration.
- */
-#define InitialState	(0)
-#define FirstStepDone	(1)
-#define SecondStepDone	(2)
-#define ThirdStepDone	(3)
-
 /*
  * Colors of nodes (values of RBNode.color)
  */
@@ -53,10 +41,6 @@ struct RBTree
 {
 	RBNode	   *root;			/* root node, or RBNIL if tree is empty */
 
-	/* Iteration state */
-	RBNode	   *cur;			/* current iteration node */
-	RBNode	   *(*iterate) (RBTree *rb);
-
 	/* Remaining fields are constant after rb_create */
 
 	Size		node_size;		/* actual size of tree nodes */
@@ -75,8 +59,19 @@ struct RBTree
  */
 #define RBNIL ()
 
-static RBNode sentinel = {InitialState, RBBLACK, RBNIL, RBNIL, NULL};
+static RBNode sentinel = {RBBLACK, RBNIL, RBNIL, NULL};
 
+/*
+ * Values used in the RBTreeIterator.next_state field, with an
+ * InvertedWalk iterator.
+ */
+typedef enum InvertedWalkNextStep
+{
+	NextStepBegin,
+	NextStepUp,
+	NextStepLeft,
+	NextStepRight
+} InvertedWalkNextStep;
 
 /*
  * rb_create: create an empty RBTree
@@ -123,8 +118,6 @@ rb_create(Size node_size,
 	Assert(node_size > sizeof(RBNode));
 
 	tree->root = RBNIL;
-	tree->cur = RBNIL;
-	tree->iterate = NULL;
 	tree->node_size = node_size;
 	tree->comparator = comparator;
 	tree->combiner = combiner;
@@ -437,7 +430,6 @@ rb_insert(RBTree *rb, const RBNode *data, bool *isNew)
 
 	x = rb->allocfunc (rb->arg);
 
-	x->iteratorState = InitialState;
 	x->color = RBRED;
 
 	x->left = RBNIL;
@@ -653,171 +645,194 @@ rb_delete(RBTree *rb, 

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2016-08-26 Thread Amit Kapila
On Thu, Aug 25, 2016 at 6:54 PM, Alvaro Herrera
 wrote:
> Amit Kapila wrote:
>> On Wed, Aug 24, 2016 at 11:46 PM, Alvaro Herrera
>>  wrote:
>
>> > Can you split the new xlog-related stuff to a new file, say hash_xlog.h,
>> > instead of cramming it in hash.h?  Removing the existing #include
>> > "xlogreader.h" from hash.h would be nice.  I volunteer for pushing any
>> > preliminary header cleanup commits.
>>
>> So, what you are expecting here is that we create hash_xlog.h and move
>> necessary functions like hash_redo(), hash_desc() and hash_identify()
>> to it. Then do whatever else is required to build it successfully.
>> Once that is done, I can build my patches on top of it.  Is that
>> right?  If yes, I can work on it.
>
> Yes, thanks.
>

How about attached?  If you want, I think we can one step further and
move hash_redo to a new file hash_xlog.c which is required for main
patch, but we can leave it for later as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


add_hash_xlog-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Simon Riggs
On 26 August 2016 at 04:39, Michael Paquier  wrote:
> Hi all,
>
> I am relaunching $subject as 10 development will begin soon. As far as
> I know, there is agreement that we can do something here. Among the
> different proposals I have found:
> - pg_clog renamed to pg_commit_status, pg_xact or pg_commit
> - pg_xlog renamed to pg_xjournal, pg_wal or pg_journal

Don't mean to be a party pooper, but what discussion and agreement are
we referring to here?

If we are going to suggest doing something we really should summarize
the reason for doing it rather than assume it is self evident, cos it
certainly isn't.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Magnus Hagander
On Fri, Aug 26, 2016 at 12:44 PM, Christoph Berg  wrote:

> Re: Fujii Masao 2016-08-26  cmzgv5u6oemxr-cbjro+w...@mail.gmail.com>
> > > I agree on a hard break, unless we get pushback from users, and even
> > > then, they can create the symlinks themselves.
> >
> > I strongly prefer symlink approach not to break many existing tools
> > and scripts.
>
> Symlinks might actually be worse than removing the directories
> altogether. If your backup tool fails because the pg_xlog directory is
> gone, you'll hopefully notice, but if you end up with a backup that
> consists merely of a copy of a symlink named pg_xlog, you might not
> notice.
>

+1. It's *much* better to cause a clean break. That way people will notice
it, and can fix it (and it will be an easy fix).

An unclean break might leave people with things that look like they work,
but don't. That's a lot more dangerous.


Same reason I'm also +1 for Stephens suggestion to put all things that
should not be in a base backup into the same directory. That may break
things now, but it will simplify things down the road. And doing it at the
same time as renaming these things makes a lot of sense, because it causes
breakage that tool-builders *have* to look at, and then they will hopefully
also notice the other change.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-08-26 Thread Christoph Berg
Re: Fujii Masao 2016-08-26 

> > I agree on a hard break, unless we get pushback from users, and even
> > then, they can create the symlinks themselves.
> 
> I strongly prefer symlink approach not to break many existing tools
> and scripts.

Symlinks might actually be worse than removing the directories
altogether. If your backup tool fails because the pg_xlog directory is
gone, you'll hopefully notice, but if you end up with a backup that
consists merely of a copy of a symlink named pg_xlog, you might not
notice.

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying the interface of UpdateMinRecoveryPoint

2016-08-26 Thread Heikki Linnakangas

On 07/13/2016 04:25 PM, Michael Paquier wrote:

On Wed, Jul 13, 2016 at 8:27 PM, Amit Kapila  wrote:

On Wed, Jul 13, 2016 at 9:19 AM, Michael Paquier
 wrote:

Hence why not simplifying its interface and remove the force flag?


One point to note is that the signature and usage of
UpdateMinRecoveryPoint() is same as it was when it got introduced in
commit-cdd46c76.  Now the only reasons that come to my mind for
introducing the force parameter was (a) it looks cleaner that way to
committer (b) they have some usecase for the same in mind (c) it got
have overlooked.  Now, if it got introduced due to (c), then your
patch does the right thing by removing it.  Personally, I feel
overloading the parameter for multiple purposes makes code less
maintainable, so retaining as it is in HEAD has some merits.


There is no way to tell what that is, but perhaps Heikki recalls
something on the matter. I am just adding him in CC.


No, I don't remember. Maybe the function originally used the 
caller-supplied 'lsn' value as the value to force-update 
minRecoveryPoint to. Or I anticipated that some callers might want to do 
that in the future.


If we were to do this, it might be better to still have a 'force' 
variable inside the function, to keep the if()s slighltly more readable, 
like:


  bool force = XLogRecPtrIsInvalid(lsn);

But even then, I don't think this makes it really any more readable 
overall. Not worse either, but it's a wash. I'll just mark this as 
rejected in the commitfest, let's move on.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-08-26 Thread Heikki Linnakangas

On 07/05/2016 04:46 PM, Andreas Karlsson wrote:

@@ -280,8 +287,9 @@ px_find_digest(const char *name, PX_MD **res)
digest = px_alloc(sizeof(*digest));
digest->algo = md;

-   EVP_MD_CTX_init(>ctx);
-   if (EVP_DigestInit_ex(>ctx, digest->algo, NULL) == 0)
+   digest->ctx = EVP_MD_CTX_create();
+   EVP_MD_CTX_init(digest->ctx);
+   if (EVP_DigestInit_ex(digest->ctx, digest->algo, NULL) == 0)
return -1;

h = px_alloc(sizeof(*h));


Now that we're calling EVP_MD_CTX_create((), which allocates memory, are 
we risking memory leaks? It has always been part of the contract that 
you have to call px_md_free(), for any context returned by 
px_find_digest(), but I wonder just how careful we have been about that. 
Before this, you would probably get away with it without leaking, if the 
digest implementation didn't allocate any extra memory or other resources.


At least pg_digest and try_unix_std functions call px_find_digest(), and 
then do more palloc()s which could elog() if you run out of memory, 
leaking th digest struct. Highly unlikely, but I think it would be 
fairly straightforward to reorder those calls to eliminate the risk, so 
we probably should.



@@ -854,6 +858,25 @@ load_dh_buffer(const char *buffer, size_t len)
return dh;
 }

+static DH  *
+generate_dh_params(int prime_len, int generator)
+{
+#if SSLEAY_VERSION_NUMBER >= 0x00908000L
+   DH *dh;
+
+   if ((dh = DH_new()) == NULL)
+   return NULL;
+
+   if (DH_generate_parameters_ex(dh, prime_len, generator, NULL))
+   return dh;
+
+   DH_free(dh);
+   return NULL;
+#else
+   return DH_generate_parameters(prime_len, generator, NULL, NULL);
+#endif
+}
+


I think now would be a good time to drop support for OpenSSL versions 
older than 0.9.8. OpenSSL don't even support 0.9.8 anymore, although 
there are probably distributions out there that still provide patches 
for it. But OpenSSL 0.9.7 and older are really not interesting for 
PostgreSQL 10 anymore, I think.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2016-08-26 Thread Dilip Kumar
On Wed, Aug 17, 2016 at 6:10 PM, Robert Haas  wrote:
>> If you are making changes in plpgsql_validator(), then shouldn't we
>> make changes in plperl_validator() or plpython_validator()?  I see
>> that you have made changes to function CheckFunctionValidatorAccess()
>> which seems to be getting called from *_validator() (* indicates
>> plpgsql/plperl/plpython) functions.  Is there a reason for changing
>> the *_validator() function?

Yes you are right, making changes in CheckFunctionValidatorAccess is
sufficient, no need to make changes in *_validator (* indicates
plpgsql/plperl/plpython) functions.

I have changed this in attached patch..

>
> Yeah, when I glanced briefly at this patch, I found myself wondering
> whether all of these call sites were actually reachable (and why the
> patch contained no test cases to prove it).

I have also added test cases to cover all my changes.

Basically this patch changes error at three places.

1. getBaseTypeAndTypmod: This is being called from domain_in exposed
function (domain_in->
domain_state_setup-> getBaseTypeAndTypmod). Though this function is
being called from many other places which are not exposed function,
but I don't this will have any imapct, will this ?

2. lookup_type_cache: This is being called from record_in
(record_in->lookup_rowtype_tupdesc->
lookup_rowtype_tupdesc_internal->lookup_type_cache).

3. CheckFunctionValidatorAccess: This is being called from all
language validator functions.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


cache_lookup_failure_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2016-08-26 Thread Eduardo Morras
On Wed, 24 Aug 2016 21:31:35 -0400
Robert Haas  wrote:

> Hi,
> 
> I'd like to propose that we increase the default WAL segment size,
> which is currently 16MB.  It was first set to that value in commit
> 47937403676d913c0e740eec6b85113865c6c8ab in October of 1999; prior to
> that, it was 64MB.  Between 1999 and now, there have been three
> significant changes that make me think it might be time to rethink
> this value:

> 
> Thoughts?

>From my ignorance, could block size affect this WAL size increase?

In past (didn't tried with >9 versions) you can change disk block page size 
from 8KB to 16 KB or 32KB (or 4) modifing src/include/pg_config.h BLCKSZ 8192 
and recompiling. (There are some mails in 1999-2002 about this topic)

> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


---   ---
Eduardo Morras 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-08-26 Thread Amit Langote
On 2016/08/18 5:23, Robert Haas wrote:
> On Wed, Aug 17, 2016 at 2:21 AM, Amit Langote
>  wrote:
>> I am slightly tempted to eliminate the pg_partition catalog and associated
>> syscache altogether and add a column to pg_class as Robert suggested.
>> That way, all relid_is_partition() calls will be replaced by
>> rel->rd_partbound != NULL check.  But one potential problem with that
>> approach is that now whenever a parent relation is opened, all the
>> partition relations must be opened to get the partbound value (to form the
>> PartitionDesc to be stored in parent relation's rd_partdesc).  Whereas
>> currently, we just look up the pg_partition catalog (or the associated
>> cache) for every partition and that gets us the partbound.
> 
> Well, you could just look up the pg_class row without opening the
> relation, too.  There is a system cache on pg_class.oid, after all.  I

Yes, I somehow didn't think of that.

> think the issue is whether it's safe to read either one of those
> things without a lock on the child relation.  If altering the
> partitioning information for a relation requires holding only
> AccessExclusiveLock on that relation, and no lock on the parent, then
> you really can't read the information for any child relation without
> taking at least AccessShareLock.  Otherwise, it might change under
> you, and that would be bad.

I'd imagine this won't be a problem because we take an AccessExclusiveLock
on the parent when adding/removing a partition.

> I'm inclined to think that changing the partitioning information for a
> child is going to require AccessExclusiveLock on both the child and
> the parent.  That seems unfortunate from a concurrency point of view,
> but we may be stuck with it: suppose you require only
> ShareUpdateExclusiveLock on the parent.  Well, then a concurrent read
> transaction might see the partition boundaries change when it does a
> relcache rebuild, which would cause it to suddenly start expecting the
> data to be in a different plan in mid-transaction, perhaps even in
> mid-scan.  Maybe that's survivable with really careful coding, but it
> seems like it's probably a bad thing.  For example, it would mean that
> the executor would be unable to rely on the partitioning information
> in the relcache remaining stable underneath it.  Moreover, the
> relcache is always going to be scanned with the most recent possible
> MVCC snapshot, but the transaction snapshot may be older, so such a
> system creates all sorts of nasty possibilities for there to be skew
> between the snapshot being used to via the data and the snapshot being
> used to read the metadata that says where the data is.

We do take a lock on the parent because we would be changing its partition
descriptor (relcache).  I changed MergeAttributes() such that an
AccessExclusiveLock instead of ShareUpdateExclusiveLock is taken if the
parent is a partitioned table.

> This may need some more thought, but if we go with that approach of
> requiring an AccessExclusiveLock on both parent and child, then it
> seems to me that maybe we should consider the partitioning information
> to be a property of the parent rather than the child.  Just take all
> the partitioning information for all children and put it in one big
> node tree and store it in the pg_class or pg_partition_root entry for
> the parent as one big ol' varlena.  Now you can open the parent and
> get all of the partitioning information for all of the children
> without needing any lock on any child, and that's *really* good,
> because it means that some day we might be able to do partition
> elimination before locking any of the children!  That would be
> excellent.

If we need an AccessExclusiveLock on parent to add/remove a partition
(IOW, changing that child table's partitioning information), then do we
need to lock the individual partitions when reading partition's
information?  I mean to ask why the simple syscache look-ups to get each
partition's bound wouldn't do.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch proposal

2016-08-26 Thread Venkata B Nagothi
On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost  wrote:

> * Venkata B Nagothi (nag1...@gmail.com) wrote:
> > On Thu, Aug 25, 2016 at 10:59 PM, Stephen Frost 
> wrote:
> > > I'm not a fan of the "recovery_target" option, particularly as it's
> only
> > > got one value even though it can mean two things (either "immediate" or
> > > "not set"), but we need a complete solution before we can consider
> > > deprecating it.  Further, we could consider making it an alias for
> > > whatever better name we come up with.
> >
> > The new parameter will accept options : "pause", "shutdown" and "promote"
> >
> > *"promote"*
> >
> > This option will ensure database starts up once the "immediate"
> consistent
> > recovery point is reached even if it is well before the mentioned
> recovery
> > target point (XID, Name or time).
> > This behaviour will be similar to that of recovery_target="immediate" and
> > can be aliased.
>
> I don't believe we're really going at this the right way.  Clearly,
> there will be cases where we'd like promotion at the end of the WAL
> stream (as we currently have) even if the recovery point is not found,
> but if the new option's "promote" is the same as "immediate" then we
> don't have that.
>

My apologies for the confusion. Correction - I meant, "promote" option
would promote the database once the consistent point is reached at the
end-of-the-WAL.


> We need to break this down into all the different possible combinations
> and then come up with names for the options to define them.  I don't
> believe a single option is going to be able to cover all of the cases.
>
> The cases which I'm considering are:
>
> recovery target is immediate (as soon as we have consistency)
> recovery target is a set point (name, xid, time, whatever)
>
> action to take if recovery target is found
> action to take if recovery target is not found
>
> Generally, "action" is one of "promote", "pause", or "shutdown".
> Clearly, not all actions are valid for all recovery target cases- in
> particular, "immediate" with "recovery target not found" can not support
> the "promote" or "pause" options.  Otherwise, we can support:
>

I Agree. This is a valid point to consider. I might have few questions over
this at a later stage.

Recovery Target  |  Found  |  Action
> -|-|--
> immediate|  Yes| promote
> immediate|  Yes| pause
> immediate|  Yes| shutdown
>
> immediate|  No | shutdown
>
> name/xid/time|  Yes| promote
> name/xid/time|  Yes| pause
> name/xid/time|  Yes| shutdown
>
> name/xid/time|  No | promote
> name/xid/time|  No | pause
> name/xid/time|  No | shutdown
>
> We could clearly support this with these options:
>


> recovery_target = immediate, other
>
recovery_action_target_found = promote, pause, shutdown
>

This is currently supported by the existing parameter called
"recovery_target_action"


> recovery_action_target_not_found = promote, pause, shutdown
>

This is exactly what newly proposed parameter will do.


> One question to ask is if we need to support an option for xid and time
> related to when we realize that we won't find the recovery target.  If
> consistency is reached at a time which is later than the recovery target
> for time, what then?  Do we go through the rest of the WAL and perform
> the "not found" action at the end of the WAL stream?  If we use that
> approach, then at least all of the recovery target types are handled the
> same, but I can definitely see cases where an administrator might prefer
> an "error" option.
>

Currently, this situation is handled by recovery_target_inclusive
parameter. Administrator can choose to stop the recovery at any consistent
point before or after the specified recovery target. Is this what you were
referring to ?
I might need a bit of clarification here, the parameter i am proposing
would be effective only if the specified recovery target is not reached and
may not be effective if the recovery goes beyond recovery target point.

Regards,
Venkata B N

Fujitsu Australia


Re: [HACKERS] WAL consistency check facility

2016-08-26 Thread Kuntal Ghosh
Thanks a lot.

I just want to mention the situation where I was getting the
speculative token related inconsistency.

ItemPointer in backup page from master:
LOG:  ItemPointer BlockNumber: 1 OffsetNumber:65534 Speculative: true
CONTEXT:  xlog redo at 0/127F4A48 for Heap/INSERT+INIT: off 1

ItemPointer in current page from slave after redo:
LOG:  ItemPointer BlockNumber: 0 OffsetNumber:1 Speculative: false
CONTEXT:  xlog redo at 0/127F4A48 for Heap/INSERT+INIT: off 1

As the block numbers are different, I was getting the following warning:
WARNING:  Inconsistent page (at byte 8166) found for record
0/127F4A48, rel 1663/16384/16946, forknum 0, blkno 0, Backup Page
Header : (pd_lower: 28 pd_upper: 8152 pd_special: 8192) Current Page
Header: (pd_lower: 28 pd_upper: 8152 pd_special: 8192)
CONTEXT:  xlog redo at 0/127F4A48 for Heap/INSERT+INIT: off 1

In heap_xlog_insert, t_ctid is always set to blkno and xlrec->offnum.
I think this is why I was getting the above warning.


On Thu, Aug 25, 2016 at 10:33 PM, Alvaro Herrera
 wrote:
> Kuntal Ghosh wrote:
>
>> 4. For Speculative Heap tuple insert operation, there was
>> inconsistency in t_ctid value. So, I've modified the t_ctid value (in
>> backup page) to current block number and offset number. Need
>> suggestions!!
>
> In speculative insertions, t_ctid is used to store the speculative
> token.  I think you should just mask that field out in that case (which
> you can recognize because ip_posid is set to magic value 0xfffe).
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Thanks & Regards,
Kuntal Ghosh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-08-26 Thread Ashutosh Bapat
On Fri, Aug 26, 2016 at 11:37 AM, Masahiko Sawada 
wrote:

> On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
>  wrote:
> >
> >
> > On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada  >
> > wrote:
> >>
> >> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale 
> >> wrote:
> >> > Hi All,
> >> >
> >> > Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
> >> > across multiple foreign servers.
> >> > If a transaction make changes to more than two foreign servers the
> >> > current
> >> > implementation in postgres_fdw doesn't make sure that either all of
> them
> >> > commit or all of them rollback their changes.
> >> >
> >> > We (Masahiko Sawada and me) reopen this thread and trying to
> contribute
> >> > in
> >> > it.
> >> >
> >> > 2PC for FDW
> >> > 
> >> > The patch provides support for atomic commit for transactions
> involving
> >> > foreign servers. when the transaction makes changes to foreign
> servers,
> >> > either all the changes to all the foreign servers commit or rollback.
> >> >
> >> > The new patch 2PC for FDW include the following things:
> >> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
> >> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can
> involve
> >> > in
> >> > the transaction.
> >> >
> >> > Currently we can push some conditions down to shard nodes, especially
> in
> >> > 9.6
> >> > the directly modify feature has
> >> > been introduced. But such a transaction modifying data on shard node
> is
> >> > not
> >> > executed surely.
> >> > Using 0002 patch, that modify is executed with 2PC. It means that we
> >> > almost
> >> > can provide sharding solution using
> >> > multiple PostgreSQL server (one parent node and several shared node).
> >> >
> >> > For multi master, we definitely need transaction manager but
> transaction
> >> > manager probably can use this 2PC for FDW feature to manage
> distributed
> >> > transaction.
> >> >
> >> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
> >> >
> >> > 0002 patch makes postgres_fdw to use below APIs. These APIs are
> generic
> >> > features which can be used by all kinds of FDWs.
> >> >
> >> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead
> of
> >> > COMMIT/ABORT on foreign server which supports 2PC.
> >> > b. Manage information of foreign prepared transactions resolver
> >> >
> >> > Masahiko Sawada will post the patch.
> >> >
> >> >
> >>
> >
> > Thanks Vinayak and Sawada-san for taking this forward and basing your
> work
> > on my patch.
> >
> >>
> >> Still lot of work to do but attached latest patches.
> >> These are based on the patch Ashutosh posted before, I revised it and
> >> divided into two patches.
> >> Compare with original patch, patch of pg_fdw_xact_resolver and
> >> documentation are lacked.
> >
> >
> > I am not able to understand the last statement.
>
> Sorry to confuse you.
>
> > Do you mean to say that your patches do not have pg_fdw_xact_resolver()
> and
> > documentation that my patches had?
>
> Yes.
> I'm confirming them that your patches had.
>

Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve
any transactions which can not be resolved immediately after they were
prepared. There was a comment from Kevin (IIRC) that leaving transactions
unresolved on the foreign server keeps the resources locked on those
servers. That's not a very good situation. And nobody but the initiating
server can resolve those. That functionality is important to make it a
complete 2PC solution. So, please consider it to be included in your first
set of patches.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-08-26 Thread Masahiko Sawada
On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
 wrote:
>
>
> On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada 
> wrote:
>>
>> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale 
>> wrote:
>> > Hi All,
>> >
>> > Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
>> > across multiple foreign servers.
>> > If a transaction make changes to more than two foreign servers the
>> > current
>> > implementation in postgres_fdw doesn't make sure that either all of them
>> > commit or all of them rollback their changes.
>> >
>> > We (Masahiko Sawada and me) reopen this thread and trying to contribute
>> > in
>> > it.
>> >
>> > 2PC for FDW
>> > 
>> > The patch provides support for atomic commit for transactions involving
>> > foreign servers. when the transaction makes changes to foreign servers,
>> > either all the changes to all the foreign servers commit or rollback.
>> >
>> > The new patch 2PC for FDW include the following things:
>> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
>> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve
>> > in
>> > the transaction.
>> >
>> > Currently we can push some conditions down to shard nodes, especially in
>> > 9.6
>> > the directly modify feature has
>> > been introduced. But such a transaction modifying data on shard node is
>> > not
>> > executed surely.
>> > Using 0002 patch, that modify is executed with 2PC. It means that we
>> > almost
>> > can provide sharding solution using
>> > multiple PostgreSQL server (one parent node and several shared node).
>> >
>> > For multi master, we definitely need transaction manager but transaction
>> > manager probably can use this 2PC for FDW feature to manage distributed
>> > transaction.
>> >
>> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
>> >
>> > 0002 patch makes postgres_fdw to use below APIs. These APIs are generic
>> > features which can be used by all kinds of FDWs.
>> >
>> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead of
>> > COMMIT/ABORT on foreign server which supports 2PC.
>> > b. Manage information of foreign prepared transactions resolver
>> >
>> > Masahiko Sawada will post the patch.
>> >
>> >
>>
>
> Thanks Vinayak and Sawada-san for taking this forward and basing your work
> on my patch.
>
>>
>> Still lot of work to do but attached latest patches.
>> These are based on the patch Ashutosh posted before, I revised it and
>> divided into two patches.
>> Compare with original patch, patch of pg_fdw_xact_resolver and
>> documentation are lacked.
>
>
> I am not able to understand the last statement.

Sorry to confuse you.

> Do you mean to say that your patches do not have pg_fdw_xact_resolver() and
> documentation that my patches had?

Yes.
I'm confirming them that your patches had.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-08-26 Thread Ashutosh Bapat
On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada 
wrote:

> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale 
> wrote:
> > Hi All,
> >
> > Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
> > across multiple foreign servers.
> > If a transaction make changes to more than two foreign servers the
> current
> > implementation in postgres_fdw doesn't make sure that either all of them
> > commit or all of them rollback their changes.
> >
> > We (Masahiko Sawada and me) reopen this thread and trying to contribute
> in
> > it.
> >
> > 2PC for FDW
> > 
> > The patch provides support for atomic commit for transactions involving
> > foreign servers. when the transaction makes changes to foreign servers,
> > either all the changes to all the foreign servers commit or rollback.
> >
> > The new patch 2PC for FDW include the following things:
> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve
> in
> > the transaction.
> >
> > Currently we can push some conditions down to shard nodes, especially in
> 9.6
> > the directly modify feature has
> > been introduced. But such a transaction modifying data on shard node is
> not
> > executed surely.
> > Using 0002 patch, that modify is executed with 2PC. It means that we
> almost
> > can provide sharding solution using
> > multiple PostgreSQL server (one parent node and several shared node).
> >
> > For multi master, we definitely need transaction manager but transaction
> > manager probably can use this 2PC for FDW feature to manage distributed
> > transaction.
> >
> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
> >
> > 0002 patch makes postgres_fdw to use below APIs. These APIs are generic
> > features which can be used by all kinds of FDWs.
> >
> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead of
> > COMMIT/ABORT on foreign server which supports 2PC.
> > b. Manage information of foreign prepared transactions resolver
> >
> > Masahiko Sawada will post the patch.
> >
> >
>
>
Thanks Vinayak and Sawada-san for taking this forward and basing your work
on my patch.


> Still lot of work to do but attached latest patches.
> These are based on the patch Ashutosh posted before, I revised it and
> divided into two patches.
> Compare with original patch, patch of pg_fdw_xact_resolver and
> documentation are lacked.
>

I am not able to understand the last statement.

Do you mean to say that your patches do not have pg_fdw_xact_resolver() and
documentation that my patches had?

OR

you mean to say that my patches did not have (lacked)
pg_fdw_xact_resolver() and documenation

OR some combination of those?
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company