date:20161005

Hi,

it seems e94568ecc10 has a pretty bad memory leak. A simple

   pgbench -i -s 300

allocates ~32GB of memory before it fails

   vacuum...
   set primary keys...
   ERROR:  out of memory
   DETAIL:  Failed on request of size 134184960.

The relevant bit from the memory context stats:

TupleSort main: 33278738504 total in 263 blocks; 78848 free (23 chunks);
33278659656 used


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-10-05 Thread Ashutosh Bapat

>>
>> No, the COMMIT returns after the first phase. It can not wait for all
>> the foreign servers to complete their second phase
>
> Hm, it sounds like it's same as normal commit (not 2PC).
> What's the difference?
>
> My understanding is that basically the local server can not return
> COMMIT to the client until 2nd phase is completed.


If we do that, the local server may not return to the client at all,
if the foreign server crashes and never comes up. Practically, it may
take much longer to finish a COMMIT, depending upon how long it takes
for the foreign server to reply to a COMMIT message. I don't think
that's desirable.

> Otherwise the next transaction can see data that is not committed yet
> on remote server.

2PC doesn't guarantee transactional consistency all by itself. It only
guarantees that all legs of a distributed transaction are either all
rolled back or all committed. IOW, it guarantees that a distributed
transaction is not rolled back on some nodes and committed on the
other node.

Providing a transactionally consistent view is a very hard problem.
Trying to solve all those problems in a single patch would be very
difficult and the amount of changes required may be really huge. Then
there are many possible consistency definitions when it comes to
consistency of distributed system. I have not seen a consensus on what
kind of consistency model/s we want to support in PostgreSQL. That's
another large debate. We have had previous attempts where people have
tried to complete everything in one go and nothing has been completed
yet.

2PC implementation OR guaranteeing that all the legs of a transaction
commit or roll back, is an essential block of any kind of distributed
transaction manager. So, we should at least support that one, before
attacking further problems.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work

Andres Freund  writes:
> Hm. After a long battle of head vs. wall I think I see what the problem
> is.  For the fallback atomics implementation I somehow had assumed that
> pg_atomic_write_u32() doesn't need to lock, as it's just an unlocked
> write.  But that's not true, because it has to cause
> pg_atomic_compare_exchange_u32 to fail.

Hah ... obvious once you see it.

> For me the problem often takes a lot longer to reproduce (once only
> after 40min), could you run with the attached patch, and see whether
> that fixes things for you?

For me, with the described test case, HEAD fails within a minute,
two times out of three or so.  I've not reproduced it after half an
hour of beating on this patch.  Looks good.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs


> On Oct 5, 2016, at 5:52 PM, Vitaly Burovoy  wrote:
> 
> On 10/5/16, Serge Rielau  wrote:
>> I want to point out as a minor "extension" that there is no need for the
>> default to be immutable. It is merely required that the default is evaluate
>> at time of ADD COLUMN and then we remember the actual value for the exist
>> default, rather than the parsed expression as we do for the "current"
>> default.
> 
> I don't think it will be accepted.
And I wouldn’t expect it to. I had a misunderstanding on what PG did.
Clearly the enhancement must be semantically neutral and be limited to the 
cases where that can be asserted.
So my patch will detect that situation and fall back to the original behavior 
as needed.
> Your patch must be based on a just "master" branch.
> If you haven't seen wiki pages [1], [2] and [3], it is the time to do
> it (and related ones).
> 
>> Or should I compose some sort of a design document?
> 
> Since Tom Lane, Andres Freund and other people agreed "it does work",
> you may post a patch to a new thread and write a short (but clean
> enough) description with a link to the current thread. Examples can be
> seen by links from the CF[4].
Thanks of rte guidance.
It will take a bit of time to port to community code and complete QA.
I shall return….
 
Serge Rielau
Salesforce.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work

On 2016-10-05 15:02:09 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Without yet having analyzed this deeply, could it actually be that the
> > reason is that sem_post/wait aren't proper memory barriers?  On a glance
> > the symptoms look like values have been modified without proper locks...
>
> Hmm, possible ...

Hm. After a long battle of head vs. wall I think I see what the problem
is.  For the fallback atomics implementation I somehow had assumed that
pg_atomic_write_u32() doesn't need to lock, as it's just an unlocked
write.  But that's not true, because it has to cause
pg_atomic_compare_exchange_u32 to fail.  The lack of this can cause a
"leftover" lockbit, when UnlockBufHdr() occurs concurrently an operation
using compare_exchange.

For me the problem often takes a lot longer to reproduce (once only
after 40min), could you run with the attached patch, and see whether
that fixes things for you?

Andres
diff --git a/src/backend/port/atomics.c b/src/backend/port/atomics.c
index 42169a3..aa649b7 100644
--- a/src/backend/port/atomics.c
+++ b/src/backend/port/atomics.c
@@ -104,6 +104,19 @@ pg_atomic_init_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 val_)
 	ptr->value = val_;
 }
 
+void
+pg_atomic_write_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 val)
+{
+	/*
+	 * One might think that an unlocked write doesn't need a lock, but one
+	 * would be wrong. Even an unlocked write has to cause
+	 * pg_atomic_compare_exchange_u32() (et al) to fail.
+	 */
+	SpinLockAcquire((slock_t *) >sema);
+	ptr->value = val;
+	SpinLockRelease((slock_t *) >sema);
+}
+
 bool
 pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr,
 	uint32 *expected, uint32 newval)
diff --git a/src/include/port/atomics/fallback.h b/src/include/port/atomics/fallback.h
index bdaa795..2290fff 100644
--- a/src/include/port/atomics/fallback.h
+++ b/src/include/port/atomics/fallback.h
@@ -133,6 +133,9 @@ pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
 #define PG_HAVE_ATOMIC_INIT_U32
 extern void pg_atomic_init_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 val_);
 
+#define PG_HAVE_ATOMIC_WRITE_U32
+extern void pg_atomic_write_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 val);
+
 #define PG_HAVE_ATOMIC_COMPARE_EXCHANGE_U32
 extern bool pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr,
 uint32 *expected, uint32 newval);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash tables in dynamic shared memory

On 2016-10-05 08:02:42 +0200, Magnus Hagander wrote:
> On Oct 5, 2016 1:23 AM, "Thomas Munro" 
> wrote:
> > Another thought: it could be used to make things like
> > pg_stat_statements not have to be in shared_preload_libraries.

> That would indeed be a great improvement. And possibly also allow the
> changing of the max number of statements it can track without a restart?

There's the issue of having to serialize / load the file around a
restart...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash tables in dynamic shared memory

2016-10-05 Thread Thomas Munro

On Wed, Oct 5, 2016 at 7:02 PM, Magnus Hagander  wrote:
> On Oct 5, 2016 1:23 AM, "Thomas Munro" 
> wrote:
>>
>> On Wed, Oct 5, 2016 at 12:11 PM, Thomas Munro
>>  wrote:
>> > On Wed, Oct 5, 2016 at 11:22 AM, Andres Freund 
>> > wrote:
>> >>> Potential use cases for DHT include caches, in-memory database objects
>> >>> and working state for parallel execution.
>> >>
>> >> Is there a more concrete example, i.e. a user we'd convert to this at
>> >> the same time as introducing this hashtable?
>> >
>> > A colleague of mine will shortly post a concrete patch to teach an
>> > existing executor node how to be parallel aware, using DHT.  I'll let
>> > him explain.
>> >
>> > I haven't looked into whether it would make sense to convert any
>> > existing shmem dynahash hash table to use DHT.  The reason for doing
>> > so would be to move it out to DSM segments and enable dynamically
>> > growing.  I suspect that the bounded size of things like the hash
>> > tables involved in (for example) predicate locking is considered a
>> > feature, not a bug, so any such cluster-lifetime core-infrastructure
>> > hash table would not be a candidate.  More likely candidates would be
>> > ephemeral data used by the executor, as in the above-mentioned patch,
>> > and long lived caches of dynamic size owned by core code or
>> > extensions.  Like a shared query plan cache, if anyone can figure out
>> > the invalidation magic required.
>>
>> Another thought: it could be used to make things like
>> pg_stat_statements not have to be in shared_preload_libraries.
>>
>
> That would indeed be a great improvement. And possibly also allow the
> changing of the max number of statements it can track without a restart?

Yeah.  You don't explicitly size a DHT hash table, it just grows as
required to keep the load factor low enough, possibly leading the DSA
area it lives in to create DSM segments.  Currently it never gets
smaller though, so if you throw out a bunch of entries you will be
freeing up the memory occupied by the entries themselves (meaning:
giving it back to the DSA area, which might eventually give back to
the OS if the planets are correctly aligned rendering a DSM segment
entirely unused), but the hash table's bucket array won't ever shrink.

> I was also wondering if it might be useful for a replacement for some of the
> pgstats stuff to get rid of the cost of spooling to file and then rebuilding
> the hash tables in the receiving end. I've been waiting for this patch to
> figure out if that's useful. I mean keep the stats collector doing what it
> does now over udp, but present the results in shared hash tables instead of
> files.

Interesting thought.  I haven't studied how that stuff works... I've
put it on my reading list.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Serge Rielau  wrote:
>On Wed, Oct 5, 2016 at 4:19 PM, Vitaly Burovoy  
>wrote:
>> But what I discover for myself is that we have pg_attrdef separately
>> from the pg_attribute. Why?
>> Is it time to join them? For not presented defaults it would be only
>> one bit per row(if we avoid "adsrc" as it is recommended), but for a
>> separate table it is 11 columns with two indexes now...
>
> In terms of footprint we may be able to remove pg_attrdef.
> I would consider that orthogonal to the proposed feature though.

It was a question mostly to the community rather than to you.

> The internal representation of defaults in the tuple descriptor still needs 
> to be a map of sorts.
>
> To comment on Pantelis SQL Server Reference:
> Other vendors such as Oracle and DB2 also support this feature.
>
> The listed restriction made me loop back to Vitaly’s original serial example:
> ALTER TABLE t ADD COLUMN c2 serial;
> and rethink Tom’s struct restriction to constants.

I'm sorry, the correct example with "now" should be:
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT
'now'::text::timestamptz;

I wanted to show that for non-immutable (even stable) expression as a
default one each time selected old tuples gives different result.
But your implementation evaluates it before saving as a
"pre-add-column" default and it breakes the current behavior in a
different way.

> In PG the proposed feature would also have to be limited to immutable(?)
> default expressions to comply with existing behavior, which matches SQL 
> Servers.
>
> My current patch does not restrict that and thusly falsely "fills in" the 
> same value for all rows.

If you change the current behavior (just making it "faster") you
should save the current behavior.
The best case is to determine whether it is possible to do a "fast"
change and fill the "pre-add-column" default parameter or leave the
current "long" behavior.

I assure it is often enough to add columns with a default value which
fills the current rows by non-static values.
One of them is adding a serial column (which is a macros for "CREATE
SEQUENCE+SET NULL+SET DEFAULT
nextval(just_created_sequence_relation)"), others are
"uuid_generate_vX(...)" and "random()"

On 10/5/16, Serge Rielau  wrote:
> I want to point out as a minor "extension" that there is no need for the
> default to be immutable. It is merely required that the default is evaluate
> at time of ADD COLUMN and then we remember the actual value for the exist
> default, rather than the parsed expression as we do for the "current"
> default.

I don't think it will be accepted.

On 10/5/16, Serge Rielau  wrote:
> Thanks.
> This would be my first contribution.
> I take it I would post a patch based on a recent PG 9.6 master for review?

Your patch must be based on a just "master" branch.
If you haven't seen wiki pages [1], [2] and [3], it is the time to do
it (and related ones).

> Or should I compose some sort of a design document?

Since Tom Lane, Andres Freund and other people agreed "it does work",
you may post a patch to a new thread and write a short (but clean
enough) description with a link to the current thread. Examples can be
seen by links from the CF[4].
Nevertheless it is important to honor all thoughts mentioned here
because if your patch breaks the current behavior (with no significant
reason) it is senseless (even if it matches the behavior of other
RDBMS) and will not be committed.

Of cource, feel free to ask.

[1] https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
[2] https://wiki.postgresql.org/wiki/Developer_FAQ
[3] https://wiki.postgresql.org/wiki/Submitting_a_Patch
[4] https://commitfest.postgresql.org/11/

-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is the last 9.1 release planned?

2016-10-05 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Magnus Hagander
> On Oct 5, 2016 5:42 AM, "Tsunakawa, Takayuki"
>  wrote:
> > Thanks for clarification.  Then, I understood that the expression "stop
> releases in September" in the release note and a pgsql-announce mail was
> not correct.
> 
> 
> It basically means stop guaranteeing that we do. As of a couple of days
> ago, bug fixes won't necessarily be back ported to 9.1 if they are difficult.
> But there will be one wrap-up release in November with any patches that
> have already been applied but have not yet been in a release. And after
> November, we will stop doing that as well.

I see.  I simply took the phrases in pgsql-announce "September is EOL" and 
"only expects one more release" as meaning "only expects one more release in 
September", because I didn't imagine a minor version is released after EOL.

If possible, I was happy if I saw "only expects one more release in November" 
or "on a regular schedule".

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-10-05 Thread Tsunakawa, Takayuki

From: Robert Haas [mailto:robertmh...@gmail.com]
> I have no opinion on this patch, because I haven't reviewed it, but note
> recent commit 3b90e38c5d592ea8ec8236287dd5c749fc041728, which appears to
> be semi-related.

Thank you for interesting information.  Maybe Tom-san experienced some trouble 
in creating this patch.  Fortunately, this doesn't appear to be related to my 
patch, because the patch changed the timing of closing listen ports in 
postmaster children, whereas my patch explicitly closes listen ports in 
postmaster.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switch to unnamed POSIX semaphores as our preferred sema code?

2016-10-05 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
> I've gotten a bit tired of seeing "could not create semaphores: No space
> left on device" failures in the buildfarm, so I looked into whether we should
> consider preferring unnamed POSIX semaphores over SysV semaphores.

+100
Wonderful decision and cautious analysis.  This will make PostgreSQL more 
friendly to users, especially newcomers, by eliminating the need to tune kernel 
resources.  I wish other kernel resources (files, procs) will need no tuning 
like Windows, but that's just a daydream.

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.0.74=10.11.6=email_footer_2]
On Wed, Oct 5, 2016 at 4:19 PM, Vitaly Burovoy  wrote:
On 10/5/16, Tom Lane  wrote:
> I wrote:
>> Need a better name for the concept, since evidently this name isn't
>> conveying the idea.
>
> Maybe "creation default" would work better? Point being it's the
> default value at the time of column creation.
Hmm... Personaly for me the original topic name is good enough. I think at 
issue is with the term “exist default” rather than the feature/topic name (?) 
But what I discover for myself is that we have pg_attrdef separately
from the pg_attribute. Why?
Is it time to join them? For not presented defaults it would be only
one bit per row(if we avoid "adsrc" as it is recommended), but for a
separate table it is 11 columns with two indexes now... In terms of footprint 
we may be able to remove pg_attrdef. I would consider that orthogonal to the 
proposed feature though. The internal representation of defaults in the tuple 
descriptor still needs to be a map of sorts.
To comment on Pantelis SQL Server Reference: Other vendors such as Oracle and 
DB2 also support this feature.
The listed restriction made me loop back to Vitaly’s original serial example: 
ALTER TABLE t ADD COLUMN c2 serial; and rethink Tom’s struct restriction to 
constants.
In PG the proposed feature would also have to be limited to immutable(?) 
default expressions to comply with existing behavior, which matches SQL Servers.
My current patch does not restrict that and thusly falsely "fills in" the same 
value for all rows.
Cheers Serge Rielau Salesforce.com

Re: [HACKERS] pg_rewind hangs if --source-server is used and syncrep is enabled

2016-10-05 Thread Michael Paquier

On Wed, Oct 5, 2016 at 11:53 PM, Michael Banck
 wrote:
> On Wed, Oct 05, 2016 at 04:39:39PM +0200, Michael Banck wrote:
>> To the user, the last thing printed is "need to copy  MB [...]".  If
>> the user cancels the pg_rewind command with ^C, the backend keeps
>> hanging around even in --dry-run mode.  That won't hurt too much as it
>> does not seem to block future pg_rewind runs after synchronous_commit
>> has been set to a different value, but looks surprising to me.

Oops.

>> Not sure whether pg_rewind could error out gracefully without hanging in
>> this case,
>
> My colleague Christoph Berg pointed out that pg_rewind could just set
> synchronous_commit = local before creating the temporary table, which
> indeed works, proof-of-concept patch attached

Even synchronous_commit = off would not matter much, and we could just
use that for performance reasons. The temporary table used in this
context is just used to track the delta chunks of blocks, so this
solution sounds better to me. I'll patch 9.4's pg_rewind similarly to
what is decided here, and we could as well use an extra PQexec call to
bring more clarity for the code, now an extra round-trip could be a
big deal where network lag matters, but compared to the COPY chunks
sent out that would not matter much I guess. I am just posting another
version, and added a CF entry to not lose track of it:
https://commitfest.postgresql.org/11/811/
-- 
Michael
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index 9239009..4bc2bc3 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -403,6 +403,17 @@ libpq_executeFileMap(filemap_t *map)
 	int			i;
 
 	/*
+	 * Set up connection context, with synchronous replication enabled the
+	 * following command could stuck the connection.
+	 */
+	sql = "SET synchronous_commit = off;";
+	res = PQexec(conn, sql);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pg_fatal("could not set up connection context: %s",
+ PQresultErrorMessage(res));
+	PQclear(res);
+
+	/*
 	 * First create a temporary table, and load it with the blocks that we
 	 * need to fetch.
 	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Tom Lane  wrote:
> I wrote:
>> Need a better name for the concept, since evidently this name isn't
>> conveying the idea.
>
> Maybe "creation default" would work better?  Point being it's the
> default value at the time of column creation.

Hmm... Personaly for me the original topic name is good enough.


But what I discover for myself is that we have pg_attrdef separately
from the pg_attribute. Why?
Is it time to join them? For not presented defaults it would be only
one bit per row(if we avoid "adsrc" as it is recommended), but for a
separate table it is 11 columns with two indexes now...
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Pantelis Theodosiou

On Thu, Oct 6, 2016 at 12:05 AM, Serge Rielau  wrote:

>
> via Newton Mail
> 
>
> On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane  wrote:
>
> Andres Freund  writes:
> > On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
> >>> No, "a second “exist default"" was mentioned, i.e. it is an additional
> >>> column in a system table (pg_attribute) as default column values of
> >>> the "pre-alter" era. It solves changing of the default expression of
> >>> the same column later.
>
> > Don't think that actually solves the issue. The default might be unset
> > for a while, for example. Essentially you'd need to be able to associate
> > arbitrary number of default values with an arbitrary set of rows.
>
> I think it does work, as long as the "exists default" is immutable.
> (For safety, personally, I'd restrict it to be a verbatim constant.)
> The point is that you apply that when you are reading a row that has
> so few columns that it must predate the original ALTER TABLE ADD COLUMN.
> Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
> insertions that happen after them, but they don't affect the
> interpretation of old rows. And of course all rows inserted after the
> ADD COLUMN contain explicit values of the column, so their meaning is
> unaffected in any case.
>
>
> You do need two defaults associated with a column to make this work.
> The "exists default" never changes after the column is added. But
> in principle, the "exists default" just replaces the NULL value that
> we implicitly insert now in such cases.
>
> Explained so much better than I could do it :-)
>
> I want to point out as a minor “extension” that there is no need for the
> default to be immutable. It is merely required that the default is evaluate
> at time of ADD COLUMN
> and then we remember the actual value for the exist default, rather than
> the parsed expression as we do for the “current” default.
>
>
> Need a better name for the concept, since evidently this name isn't
> conveying the idea.
>
> By all means. Got anything in mind?
>
>
>
For comparison, SQL Server's implementation. They have a similar feature
(in their Enterprise only edition).
>From https://msdn.microsoft.com/en-us/library/ms190273.aspx :

Adding NOT NULL Columns as an Online Operation

Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column
with a default value is an online operation when the default value is
a *runtime
constant*. This means that the operation is completed almost
instantaneously regardless of the number of rows in the table. This is
because the existing rows in the table are not updated during the
operation; instead, the default value is stored only in the metadata of the
table and the value is looked up as needed in queries that access these
rows. This behavior is automatic; no additional syntax is required to
implement the online operation beyond the ADD COLUMN syntax. A runtime
constant is an expression that produces the same value at runtime for each
row in the table regardless of its determinism. For example, the constant
expression "My temporary data", or the system function GETUTCDATETIME() are
runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID()
are not runtime constants because a unique value is produced for each row
in the table. Adding a NOT NULL column with a default value that is not a
runtime constant is always performed offline and an exclusive (SCH-M) lock
is acquired for the duration of the operation.

While the existing rows reference the value stored in metadata, the default
value is stored on the row for any new rows that are inserted and do not
specify another value for the column. The default value stored in metadata
is moved to an existing row when the row is updated (even if the actual
column is not specified in the UPDATE statement), or if the table or
clustered index is rebuilt.

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.0.74=10.11.6=email_footer_2]
On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane  wrote:
Andres Freund  writes:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.

> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows. And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.
You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added. But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases. Explained so much better than I could 
do it :-)
I want to point out as a minor “extension” that there is no need for the 
default to be immutable. It is merely required that the default is evaluate at 
time of ADD COLUMN and then we remember the actual value for the exist default, 
rather than the parsed expression as we do for the “current” default. Need a 
better name for the concept, since evidently this name isn't
conveying the idea. By all means. Got anything in mind?
Cheers Serge Rielau

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

I wrote:
> Need a better name for the concept, since evidently this name isn't
> conveying the idea.

Maybe "creation default" would work better?  Point being it's the
default value at the time of column creation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 2016-10-05 18:58:47 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
> >>> No, "a second “exist default"" was mentioned, i.e. it is an additional
> >>> column in a system table (pg_attribute) as default column values of
> >>> the "pre-alter" era. It solves changing of the default expression of
> >>> the same column later.
> 
> > Don't think that actually solves the issue. The default might be unset
> > for a while, for example. Essentially you'd need to be able to associate
> > arbitrary number of default values with an arbitrary set of rows.
> 
> I think it does work, as long as the "exists default" is immutable.
> (For safety, personally, I'd restrict it to be a verbatim constant.)
> The point is that you apply that when you are reading a row that has
> so few columns that it must predate the original ALTER TABLE ADD COLUMN.
> Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
> insertions that happen after them, but they don't affect the
> interpretation of old rows.  And of course all rows inserted after the
> ADD COLUMN contain explicit values of the column, so their meaning is
> unaffected in any case.

Err, yes. I forgot that altering the default of an existing column
doesn't set the default for existing values. Sorry for the noise.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Andres Freund  wrote:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund  wrote:
>> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> > INSERT id = 1;
>> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
>> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> > INSERT id = 2;
>> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
>> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> > INSERT id = 3;
>> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>> >
>> > The result here would be that there's three rows with a default value
>> > for foo that's the same as their id. None of them has that column
>> > present in the row.
>> >
>>
>> My understanding is that all of those would be materialized.
>
> But that'd require a table rewrite, as none of the above INSERTs were
> done when a default was in place.

Since they did not have the default value, that tuples are written
with actual TupleDesc.natts where att_isnull for "withdefault" column
is set (actually the column does not have default for inserted tuples
in your case).

> But each has a different "applicable" default value.

No, their values are constructed "from scratch", not fetched from a
heap, so "pre-alter-add-column" default is not applicable for them.

>> The only
>> default that isn't materialized is the one in effect in the same
>> statement
>> in which that column was added.  Since a column can only be added once,
>> the
>> default in effect at the time the column was added can never change, no
>> matter what you do to the default later on.
>
> DROP DEFAULT pretty much does that, because it allows multiple (set of)
> rows with no value (or a NULL) for a specific column, but with differing
> applicable default values.

DROP DEFAULT is for "post-alter-add-column" tuples, it does not
affects "pre-alter-add-column" ones.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Pantelis Theodosiou

On Wed, Oct 5, 2016 at 11:44 PM, Jeff Janes  wrote:

> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund  wrote:
>
>> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
>> > On 10/5/16, Andres Freund  wrote:
>> > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> > >> Dear Hackers,
>> > >> I’m working on a patch that expands PG’s ability to add columns to a
>> table
>> > >> without a table rewrite (i.e. at O(1) cost) from the
>> > >> nullable-without-default to a more general case.
>> > >
>> > > If I understand this proposal correctly, altering a column default
>> will
>> > > still have trigger a rewrite unless there's previous default?
>> >
>> > No, "a second “exist default"" was mentioned, i.e. it is an additional
>> > column in a system table (pg_attribute) as default column values of
>> > the "pre-alter" era. It solves changing of the default expression of
>> > the same column later.
>>
>> Don't think that actually solves the issue. The default might be unset
>> for a while, for example. Essentially you'd need to be able to associate
>> arbitrary number of default values with an arbitrary set of rows.
>>
>>
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 1;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 2;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 3;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>>
>> The result here would be that there's three rows with a default value
>> for foo that's the same as their id. None of them has that column
>> present in the row.
>>
>
> My understanding is that all of those would be materialized.  The only
> default that isn't materialized is the one in effect in the same statement
> in which that column was added.  Since a column can only be added once, the
> default in effect at the time the column was added can never change, no
> matter what you do to the default later on.
>
> Cheers,
>
> Jeff
>

I understood the same thing. That when the column is added the "DEFAULT
constant" means 2 things:

-a- existing rows get a value of that constant (that is not actually
written in the rows, but kept (hidden from the user) in the system tables
and only written in the rows that are updated, vacuumed, etc) and
-b- new rows, inserted after the ADD COLUMN will get the DEFAULT constant,
same way as a normal column definition would.

The b part can easily be changed later with an ALTER COLUMN that sets a new
DEFAULT.
The a part is never changed - but possibly deleted from the system table
when all rows existing before the ADD COLUMN have been updated.


Pantelis

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

Andres Freund  writes:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>>> No, "a second âexist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.

> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows.  And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.

You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added.  But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases.

Need a better name for the concept, since evidently this name isn't
conveying the idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund  wrote:
> 
> > On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> > > On 10/5/16, Andres Freund  wrote:
> > > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> > > >> Dear Hackers,
> > > >> I’m working on a patch that expands PG’s ability to add columns to a
> > table
> > > >> without a table rewrite (i.e. at O(1) cost) from the
> > > >> nullable-without-default to a more general case.
> > > >
> > > > If I understand this proposal correctly, altering a column default will
> > > > still have trigger a rewrite unless there's previous default?
> > >
> > > No, "a second “exist default"" was mentioned, i.e. it is an additional
> > > column in a system table (pg_attribute) as default column values of
> > > the "pre-alter" era. It solves changing of the default expression of
> > > the same column later.
> >
> > Don't think that actually solves the issue. The default might be unset
> > for a while, for example. Essentially you'd need to be able to associate
> > arbitrary number of default values with an arbitrary set of rows.
> >
> >
> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> > INSERT id = 1;
> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> > INSERT id = 2;
> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> > INSERT id = 3;
> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
> >
> > The result here would be that there's three rows with a default value
> > for foo that's the same as their id. None of them has that column
> > present in the row.
> >
> 
> My understanding is that all of those would be materialized.

But that'd require a table rewrite, as none of the above INSERTs were
done when a default was in place. But each has a different "applicable"
default value.


> The only
> default that isn't materialized is the one in effect in the same statement
> in which that column was added.  Since a column can only be added once, the
> default in effect at the time the column was added can never change, no
> matter what you do to the default later on.

DROP DEFAULT pretty much does that, because it allows multiple (set of)
rows with no value (or a NULL) for a specific column, but with differing
applicable default values.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Jeff Janes

On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund  wrote:

> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> > On 10/5/16, Andres Freund  wrote:
> > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> > >> Dear Hackers,
> > >> I’m working on a patch that expands PG’s ability to add columns to a
> table
> > >> without a table rewrite (i.e. at O(1) cost) from the
> > >> nullable-without-default to a more general case.
> > >
> > > If I understand this proposal correctly, altering a column default will
> > > still have trigger a rewrite unless there's previous default?
> >
> > No, "a second “exist default"" was mentioned, i.e. it is an additional
> > column in a system table (pg_attribute) as default column values of
> > the "pre-alter" era. It solves changing of the default expression of
> > the same column later.
>
> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.
>
>
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 1;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 2;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 3;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>
> The result here would be that there's three rows with a default value
> for foo that's the same as their id. None of them has that column
> present in the row.
>

My understanding is that all of those would be materialized.  The only
default that isn't materialized is the one in effect in the same statement
in which that column was added.  Since a column can only be added once, the
default in effect at the time the column was added can never change, no
matter what you do to the default later on.

Cheers,

Jeff

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Vitaly Burovoy  wrote:
> On 10/5/16, Andres Freund  wrote:
>> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
>>> On 10/5/16, Andres Freund  wrote:
>>> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>>> >> Dear Hackers,
>>> >> I’m working on a patch that expands PG’s ability to add columns to a
>>> >> table
>>> >> without a table rewrite (i.e. at O(1) cost) from the
>>> >> nullable-without-default to a more general case.
>>> >
>>> > If I understand this proposal correctly, altering a column default
>>> > will
>>> > still have trigger a rewrite unless there's previous default?
>>>
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.
>>
>> Don't think that actually solves the issue. The default might be unset
>> for a while, for example. Essentially you'd need to be able to associate
>> arbitrary number of default values with an arbitrary set of rows.
>>
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 1;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 2;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 3;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>>
>> The result here would be that there's three rows with a default value
>> for foo that's the same as their id. None of them has that column
>> present in the row.
>
> I'm sorry, while I was writting "pre-alter" I meant
> "pre-alter-add-column" era (not "pre-alter-set-default"), all later
> default changes "current" default, whereas "pre-alter-add-column" adds
> value if current column number < TupleDesc.natts.
>
> All your DDL are in the "post-alter-add-column" era.

I'm so sorry, I was in a hurry. Of course,
- if current column number < TupleDesc.natts.
+ if current column number > TupleDesc.natts.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Andres Freund  wrote:
> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
>> On 10/5/16, Andres Freund  wrote:
>> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> >> Dear Hackers,
>> >> I’m working on a patch that expands PG’s ability to add columns to a
>> >> table
>> >> without a table rewrite (i.e. at O(1) cost) from the
>> >> nullable-without-default to a more general case.
>> >
>> > If I understand this proposal correctly, altering a column default will
>> > still have trigger a rewrite unless there's previous default?
>>
>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>> column in a system table (pg_attribute) as default column values of
>> the "pre-alter" era. It solves changing of the default expression of
>> the same column later.
>
> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.
>
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 1;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 2;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 3;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>
> The result here would be that there's three rows with a default value
> for foo that's the same as their id. None of them has that column
> present in the row.

I'm sorry, while I was writting "pre-alter" I meant
"pre-alter-add-column" era (not "pre-alter-set-default"), all later
default changes "current" default, whereas "pre-alter-add-column" adds
value if current column number < TupleDesc.natts.

All your DDL are in the "post-alter-add-column" era.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On Wed, Oct 5, 2016 at 3:23 PM, Vitaly Burovoy  wrote:
On 10/5/16, Andres Freund  wrote:
> On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> Dear Hackers,
>> I’m working on a patch that expands PG’s ability to add columns to a table
>> without a table rewrite (i.e. at O(1) cost) from the
>> nullable-without-default to a more general case.
> If I understand this proposal correctly, altering a column default will
> still have trigger a rewrite unless there's previous default?
No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later. Correct and good guess on pg_attribute. That’s where 
it’s living in my proposal.
Cheers Serge

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> On 10/5/16, Andres Freund  wrote:
> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> >> Dear Hackers,
> >> I’m working on a patch that expands PG’s ability to add columns to a table
> >> without a table rewrite (i.e. at O(1) cost) from the
> >> nullable-without-default to a more general case.
> >
> > If I understand this proposal correctly, altering a column default will
> > still have trigger a rewrite unless there's previous default?
>
> No, "a second “exist default"" was mentioned, i.e. it is an additional
> column in a system table (pg_attribute) as default column values of
> the "pre-alter" era. It solves changing of the default expression of
> the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.


ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Andres Freund  wrote:
> On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> Dear Hackers,
>> I’m working on a patch that expands PG’s ability to add columns to a table
>> without a table rewrite (i.e. at O(1) cost) from the
>> nullable-without-default to a more general case.
>
> If I understand this proposal correctly, altering a column default will
> still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> Dear Hackers,
> I’m working on a patch that expands PG’s ability to add columns to a table 
> without a table rewrite (i.e. at O(1) cost) from the nullable-without-default 
> to a more general case. E.g. CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); 
> INSERT INTO T VALEUS (1), (2), (3); ALTER TABLE T ADD COLUMN c1 INTEGER NOT 
> NULL DEFAULT 5; INSERT INTO T VALUES (4, DEFAULT); ALTER TABLE T ALTER COLUMN 
> SET DEFAULT 6; INSERT INTO T VALUS (5, DEFAULT); SELECT * FROM T ORDER BY pk; 
> => (1, 5), (2, 5), (3, 5), (4, 5), (5, 6);
> Rows 1-3 have never been updated, yet they know that their values of c1 is 5.
> The requirement is driven by large tables for which add column takes too much 
> time and/or produces too large a transaction for comfort.
> In simplified terms: * a second “exist default” is computed and stored in the 
> catalogs at time of AT ADD COLUMN * The exist default is cached in the tuple 
> descriptor (e.g in attrdef) * When one of the getAttr or copytuple related 
> routines is invoked the exist default is filled in instead of simply NULL 
> padding if the tuple is shorter the requested attribute number.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On Wed, Oct 5, 2016 at 2:45 PM, Vitaly Burovoy  wrote:
On 10/5/16, Serge Rielau  wrote:
> Dear Hackers,
>
> I’m working on a patch that expands PG’s ability to add columns to a table
> without a table rewrite (i.e. at O(1) cost) from the
> nullable-without-default to a more general case. E.g.
...
> Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed. Yes, 
Yes, and still analyzing speed
But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk; By solution I think you mean a semantic change 
from what it is doing today which is: * “Now” is fixed to ALTER TABLE time for 
all pre-existing rows * serial will fill in the same value for all pre-existing 
rows Having different semantics for those would require a rewrite and probably 
different syntax in some form.
This is what my patch does on our PG derivative today: CREATE TABLE t (pk INT 
NOT NULL PRIMARY KEY); CREATE TABLE postgres=# INSERT INTO t VALUES (1), (2), 
(3); INSERT 0 3 postgres=# ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL 
DEFAULT 'now'; ALTER TABLE postgres=# SELECT * FROM t ORDER BY pk; pk | c1 
+--- 1 | 2016-10-05 21:47:58.919194+00 2 | 
2016-10-05 21:47:58.919194+00 3 | 2016-10-05 21:47:58.919194+00 (3 rows)
postgres=# postgres=# ALTER TABLE t ADD COLUMN c2 serial; SELECT * FROM t ORDER 
BY pk; INSERT INTO t(pk) VALUES (4); SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial; ALTER TABLE postgres=# SELECT * FROM t 
ORDER BY pk; pk | c1 | c2 +---+ 1 | 
2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 
2016-10-05 21:47:58.919194+00 | 1 (3 rows)
postgres=# INSERT INTO t(pk) VALUES (4); INSERT 0 1 postgres=# SELECT * FROM t 
ORDER BY pk; pk | c1 | c2 +---+ 1 | 
2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 
2016-10-05 21:47:58.919194+00 | 1 4 | 2016-10-05 21:47:58.919194+00 | 2 (4 
rows) P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases... Thanks. This would be my first 
contribution. I take it I would post a patch based on a recent PG 9.6 master 
for review? Or should I compose some sort of a design document?
Cheers Serge Rielau Salesforce.com

[HACKERS] Switch to unnamed POSIX semaphores as our preferred sema code?

I've gotten a bit tired of seeing "could not create semaphores: No space
left on device" failures in the buildfarm, so I looked into whether we
should consider preferring unnamed POSIX semaphores over SysV semaphores.

We've had code for named and unnamed POSIX semaphores in our tree for
a long time, but it's not actually used on any current platform AFAIK.
There are good reasons to avoid the named-semaphore variant: typically
that eats a file descriptor per sema per backend. However that
complaint doesn't necessarily apply to unnamed semaphores. Indeed,
it seems that on Linux an unnamed POSIX semaphore is basically a futex,
which eats zero kernel resources; all the state is in userspace.

Although in normal cases the semaphore code paths aren't very heavily
exercised in our code, I was able to get a measurable performance
difference by building with --disable-spinlocks, so that spinlocks are
emulated with semaphores. On an 8-core RHEL6 machine, "pgbench -S -c 20
-j 20" seems to be about 4% faster with unnamed semaphores than SysV
semaphores. It'd be good to replicate that test on some higher-end
hardware, but provisionally I'd say unnamed semaphores are faster.

The data structure is bigger: Linux's type sem_t is 32 bytes on 64-bit
machines (16 bytes on 32-bit) whereas we use 8 bytes for SysV semaphores.
But there aren't normally a huge number of semaphores in a cluster, and
anyway this comparison is cheating because it ignores the space taken for
the kernel data structures backing the SysV semaphores.

There was some previous discussion about this in
https://www.postgresql.org/message-id/flat/20160621193412.5792.65085%40wrigleys.postgresql.org
but that thread tailed off without a resolution, partly because it wasn't
the kind of change we'd consider making in late beta. One thing
I expressed concern about there was whether there are any hidden kernel
resources underlying an unnamed semaphore. So far as I can tell by
strace'ing sem_init and sem_destroy, there are not, at least on Linux.

Another issue is raised in today's discussion
https://www.postgresql.org/message-id/flat/14947.1475690465%40sss.pgh.pa.us
where it appears that we might need to be more careful about putting
memory barriers into the unnamed-semaphore code (probably because it
might not enter the kernel). But if that's a bug, we'd want to fix it
anyway, IMO.

So for Linux, I think probably we should switch.

macOS seems not to have unnamed POSIX semaphores, only named ones (the
functions exist, but they always fail with ENOSYS). However, some
googling suggests that other BSD derivatives do have these primitives, so
somebody ought to do a similar comparison on them to see if switching is a
win. (The first thread above asserts that it is for FreeBSD, but someone
should recheck using a test case that stresses semaphores more.)

Dunno about other platforms. sem_init is nominally required by SUS v2,
but it doesn't seem to actually exist everywhere, so I doubt we can drop
SysV altogether. I'd be inclined to change the default on a platform-
by-platform basis not whole hog.

If anyone wants to test, the main thing you have to do to try this in
the existing code is to add "USE_UNNAMED_POSIX_SEMAPHORES=1" and
"--disable-spinlocks" to your configure arguments. On Linux you may need
to add -lrt to the backend LIBS list, though on my machine configure is
putting that in already.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Serge Rielau  wrote:
> Dear Hackers,
>
> I’m working on a patch that expands PG’s ability to add columns to a table
> without a table rewrite (i.e. at O(1) cost) from the
> nullable-without-default to a more general case. E.g.
>
> CREATE TABLE T(pk INT NOT NULL PRIMARY KEY);
> INSERT INTO T VALEUS (1), (2), (3);
> ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5;
> INSERT INTO T VALUES (4, DEFAULT);
> ALTER TABLE T ALTER COLUMN SET DEFAULT 6;
> INSERT INTO T VALUS (5, DEFAULT);
> SELECT * FROM T ORDER BY pk;
> =>
> (1, 5),
> (2, 5),
> (3, 5),
> (4, 5),
> (5, 6);
>
> Rows 1-3 have never been updated, yet they know that their values of c1 is
> 5.
>
> The requirement is driven by large tables for which add column takes too
> much time and/or produces too large a transaction for comfort.
>
> In simplified terms:
>
> * a second “exist default” is computed and stored in
> the catalogs at time of AT ADD COLUMN
>
> * The exist default is cached in the tuple descriptor (e.g in attrdef)
>
> * When one of the getAttr or copytuple related routines is invoked
> the exist default is filled in instead of simply NULL padding if the
> tuple is shorter the requested attribute number.
>
> Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed.

But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk;

P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases...

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stopping logical replication protocol

2016-10-05 Thread Vladimir Gordiychuk

> Vladimir? I'm running out of time to spend on this at least until the next
CF. Think you can make these changes?

Yes, I can. But I thinks It should be discuss first.

> Terminating COPY BOTH with ERRCODE_QUERY_CANCELED seems fine. If it
> was expecting the error the client can Sync and do whatever it next
> wants to do on the walsender protocol, or bail out nicely.

Do you want return CopyFail with ERRCODE_QUERY_CANCELED instead of CopyDone
on client initialized stop replication?

> Hm, I'm *VERY* doubtful about doing this via yet another callback. Why
> don't we just error out in the prepare write callback?

But what about scenario when output plugin collect changes in memory with
some transformation and send it only inside commit_cb?
It means that OutputPluginPrepareWrite will not execute until end
transaction and we face with too long stops replication when decodes huge
transaction.

2016-10-05 13:06 GMT+03:00 Craig Ringer :

> On 5 October 2016 at 05:14, Andres Freund  wrote:
> > Hi,
> >
> > On 2016-09-23 13:01:27 +0800, Craig Ringer wrote:
> >> From f98f2388c57d938ebbe07ccd2dbe02138312858f Mon Sep 17 00:00:00 2001
> >> From: Vladimir Gordiychuk 
> >> Date: Wed, 7 Sep 2016 00:39:18 +0300
> >> Subject: [PATCH 2/4] Client-initiated CopyDone during transaction
> decoding in
> >>  walsender
> >>
> >> The prior patch caused the walsender to react to a client-initiated
> >> CopyDone while it's in the WalSndLoop. That's not enough to let clients
> >> end logical decoding mid-transaction because we loop in
> ReorderBufferCommit
> >> during decoding of a transaction without ever returning to WalSndLoop.
> >>
> >> Allow breaking out of ReorderBufferCommit by detecting that client
> >> input has requested an end to COPY BOTH mode, so clients can abort
> >> decoding mid-transaction.
> >
> > Hm, I'm *VERY* doubtful about doing this via yet another callback. Why
> > don't we just error out in the prepare write callback?
>
> That's sensible.
>
> > I don't think it's a good idea to return a non-error state if a command
> > is interrupted in the middle. We might already have sent a partial
> > transaction or such.   Also compare this to interrupting a query - we
> > don't just returning rows, we error out.
>
> Good point. It's not usually visible to the user because if it's a
> client-initiated cancel the client will generally consume the error,
> but there's still an error generated.
>
> Terminating COPY BOTH with ERRCODE_QUERY_CANCELED seems fine. If it
> was expecting the error the client can Sync and do whatever it next
> wants to do on the walsender protocol, or bail out nicely.
>
> Vladimir? I'm running out of time to spend on this at least until the
> next CF. Think you can make these changes?
>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Kernel Tainted

2016-10-05 Thread reiner peterke


> On Oct 5, 2016, at 9:43 PM, Tomas Vondra  wrote:
> 
> On 10/05/2016 08:41 PM, reiner peterke wrote:
>> Hi,
>> 
>> We are helping a client test an application On Power8 using Postgres
>> 9.5.4 which has been compiled specifically for the Power.
>> 
>> This is running on sles12sp1  the current kernel is 3.12.49-11
>> 
>> We are getting these kernel warning associated with the postmaster
>> process.  The application is handling around 15000TPS  It appears that
>> one of these messages is generated for each each transaction which fills
>> up the warn.log quite quickly.
>> 
>> I’m trying to understand what is causing the Tainted kernel messages.
>> the warning is at 'WARNING: at ../net/core/dst.c:287’.
>> I’ve found one link that indicates that this is ip6 related.
>> https://brunomgalmeida.wordpress.com/2015/07/23/disable-ipv6-postgres-and-pgbouncer/
>> Is this accurate?  And if these action resolve the error, is it more of
>> a bandaid then an actual fix?
>> 
> 
> As Andres already pointed out, this is most likely a kernel issue, not a 
> PostgreSQL one. The "tainted" has nothing to do with the cause, it's just a 
> way to inform users whether it's a clean kernel build, or if it includes code 
> not available in vanilla kernels etc. The "X" means there are some 
> SuSe-specific modules loaded, IIRC.
> 
> And yes, it seems IPv6 related, at least judging by the stack trace:
> 
> 0xc16f7d80 (unreliable)
> sk_dst_check+0x174/0x180
> ip6_sk_dst_lookup_flow+0x4c/0x2a0
> udpv6_sendmsg+0x688/0xb20
> inet_sendmsg+0x9c/0x120
> sock_sendmsg+0xec/0x140
> SyS_sendto+0x108/0x150
> SyS_send+0x50/0x70
> SyS_socketcall+0x2a0/0x440
> syscall_exit+0x0/0x7c
> 
> You should probably talk to SuSe or whoever supports that system.
> 
> regards
> 
> -- 
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the clear information.
I think there are a few kernel upgrade we can apply first, then see if that 
fixes the problem.

Reiner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash tables in dynamic shared memory

2016-10-05 Thread Thomas Munro

On Thu, Oct 6, 2016 at 12:02 AM, Dilip Kumar  wrote:
> While reviewing , I found that in dht_iterate_begin function, we are
> not initializing
> iterator->last_item_pointer to InvalidDsaPointer;

Fixed, thanks.

-- 
Thomas Munro
http://www.enterprisedb.com


dht-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tracking wait event for latches

2016-10-05 Thread Michael Paquier

On Wed, Oct 5, 2016 at 9:25 PM, Robert Haas  wrote:
> On Tue, Oct 4, 2016 at 5:07 PM, Michael Paquier
>  wrote:
>> More seriously, the Windows animals have been complaining about
>> pg_sleep() crashing the system:
>>   SELECT pg_sleep(0.1);
>> ! server closed the connection unexpectedly
>> ! This probably means the server terminated abnormally
>> ! before or while processing the request.
>> ! connection to server was lost
>>
>> And I think that the answer to this crash is in WaitForSingleObject(),
>> where the macro WAIT_TIMEOUT is already defined, so there is an
>> overlap with the new declarations in pgstat.h:
>> https://msdn.microsoft.com/en-us/library/aa450988.aspx
>> This is also generating a bunch of build warnings now that I compile
>> HEAD on Windows. Regression tests are not crashing here, but I am
>> getting a failure in stats.sql and pg_sleep is broken. I swear I
>> tested that at some point and did not see a crash or those warnings...
>> But well what's done is done.
>>
>> It seems to me that a correct answer would be to rename this class ID.
>> But instead I'd suggest to append the prefix PG_* to all the class
>> events like in the attached, that passes make-check, contrib-check,
>> modules-check and builds without warnings on Windows. A more simple
>> fix would be just to rename WAIT_TIMEOUT to something else but
>> appending PG_ looks better in the long term.
>
> Committed.

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Small doc fix

Hello, hackers,

I've just noticed an extra word in a sentence in the docs in the
"parallel.sgml".
It seems the sentence was constructed one way and changed later with
the extra word left.

Please, find the fix attached.

-- 
Best regards,
Vitaly Burovoy


pg-docs-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Kernel Tainted


On 10/05/2016 08:41 PM, reiner peterke wrote:

Hi,

We are helping a client test an application On Power8 using Postgres
9.5.4 which has been compiled specifically for the Power.

This is running on sles12sp1  the current kernel is 3.12.49-11

We are getting these kernel warning associated with the postmaster
process.  The application is handling around 15000TPS  It appears that
one of these messages is generated for each each transaction which fills
up the warn.log quite quickly.

I’m trying to understand what is causing the Tainted kernel messages.
the warning is at 'WARNING: at ../net/core/dst.c:287’.
I’ve found one link that indicates that this is ip6 related.
 
https://brunomgalmeida.wordpress.com/2015/07/23/disable-ipv6-postgres-and-pgbouncer/
Is this accurate?  And if these action resolve the error, is it more of
a bandaid then an actual fix?



As Andres already pointed out, this is most likely a kernel issue, not a 
PostgreSQL one. The "tainted" has nothing to do with the cause, it's 
just a way to inform users whether it's a clean kernel build, or if it 
includes code not available in vanilla kernels etc. The "X" means there 
are some SuSe-specific modules loaded, IIRC.


And yes, it seems IPv6 related, at least judging by the stack trace:

0xc16f7d80 (unreliable)
sk_dst_check+0x174/0x180
ip6_sk_dst_lookup_flow+0x4c/0x2a0
udpv6_sendmsg+0x688/0xb20
inet_sendmsg+0x9c/0x120
sock_sendmsg+0xec/0x140
SyS_sendto+0x108/0x150
SyS_send+0x50/0x70
SyS_socketcall+0x2a0/0x440
syscall_exit+0x0/0x7c

You should probably talk to SuSe or whoever supports that system.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work

Andres Freund  writes:
> Without yet having analyzed this deeply, could it actually be that the
> reason is that sem_post/wait aren't proper memory barriers?  On a glance
> the symptoms look like values have been modified without proper locks...

Hmm, possible ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fast AT ADD COLUMN with DEFAULTs

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a table 
without a table rewrite (i.e. at O(1) cost) from the nullable-without-default 
to a more general case. E.g. CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); 
INSERT INTO T VALEUS (1), (2), (3); ALTER TABLE T ADD COLUMN c1 INTEGER NOT 
NULL DEFAULT 5; INSERT INTO T VALUES (4, DEFAULT); ALTER TABLE T ALTER COLUMN 
SET DEFAULT 6; INSERT INTO T VALUS (5, DEFAULT); SELECT * FROM T ORDER BY pk; 
=> (1, 5), (2, 5), (3, 5), (4, 5), (5, 6);
Rows 1-3 have never been updated, yet they know that their values of c1 is 5.
The requirement is driven by large tables for which add column takes too much 
time and/or produces too large a transaction for comfort.
In simplified terms: * a second “exist default” is computed and stored in the 
catalogs at time of AT ADD COLUMN * The exist default is cached in the tuple 
descriptor (e.g in attrdef) * When one of the getAttr or copytuple related 
routines is invoked the exist default is filled in instead of simply NULL 
padding if the tuple is shorter the requested attribute number.

Is there an interest in principle in the community for this functionality?
Cheers Serge Rielau Salesforce.com
via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.0.74=10.11.6=email_footer_2]

Re: [HACKERS] PostgreSQL - Weak DH group

2016-10-05 Thread Heikki Linnakangas


On 10/05/2016 05:15 PM, Nicolas Guini wrote:

We are working with Postgres 9.3.14 and executing nmap we
found that it is using “weak DH group” (nmap –script ssl-dh-params). Weak =
1024 bits.


Yeah, it seems that we're a bit behind the times on this...


This issue is similar to what this post explains about using weak DH
parameters: http://www.usefuljs.net/2016/09/29/imperfect-forward-secrecy/


The blog post points out that, as counterintuitive as it sounds, the 
SSL_CTX_set_tmp_dh_callback() callback should ignore the keylength 
argument, and always return a DH group with 2048 bits, or stronger. As 
you pointed out, that's not what our callback does.


We should fix this in master, at least. I'm not sure about backporting, 
there might be compatibility issues. It seems that at least OpenJDK 
(Java) didn't support DH groups larger than 1024 bits, until version 8. 
That's fairly recent, OpenJDK 8 was released in March 2014.


ECDHE family of ciphers are not affected, and are preferred over plain 
DHE, I believe, so disabling DHE and removing the DH parameter loading 
code altogether is one option. Clearly not backportable, though.


Meanwhile, users can work-around this by creating DH parameters with 
something like "openssl dhparam -out $PGDATA/dh1024.pem 2048". Yes, the 
file needs to be called "dh1024.pem", even though the actual key length 
is 2048 bits.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Secure Transport support as OpenSSL alternative on macOS

2016-10-05 Thread Magnus Hagander

On Wed, Oct 5, 2016 at 8:42 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Wed, Oct 5, 2016 at 5:36 AM, Daniel Gustafsson 
> wrote:
> >> The main questions raised here are: is it of interest to support
> multiple SSL
> >> libraries given the additional support burden and; is supporting Secure
> >> Transport of any interest or is it a wasted effort to continue towards
> a full
> >> frontend/backend/doc submission?
>
> > I think this is highly worthwhile.  I wish we could interest someone
> > in doing the work for Windows ... but I'm a macOS user myself, so I'll
> > be happy to have you fix my future compile problems for me.
>
> "Future"?  Apple isn't even shipping the OpenSSL headers anymore, and
> I imagine soon no libraries either.  We really have got little choice
> on that platform but to do something with Secure Transport.  I'm glad
> somebody is taking up the task.
>

Sure we do. Windows doesn't ship them either, and yet somehow Postgres
manages to work just fine there, including with openssl support. There's
nothing more magic about MacOS than there is for Windows.

That said, I agree that somebody is picking up the task. I actually think
it would be a lot more useful to get Windows SChannel support (because it's
*much* more of a PITA to get OpenSSL onto Windows than it is to get it onto
macOS) or even moreso NSS (because then every platform could use that, and
they have other integrations too). But one important point is that once we
have *two* implementations (openssl + macos) then we will know a lot more
about the correct places for abstractions etc, and and adding the third one
is probably going to be easier (but not easy). But let's make sure we keep
in mind there should be more than just these two implementation when
defining any external interfaces.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Kernel Tainted

Hi,

On 2016-10-05 20:41:54 +0200, reiner peterke wrote:
> We are getting these kernel warning associated with the postmaster process.  
> The application is handling around 15000TPS  It appears that one of these 
> messages is generated for each each transaction which fills up the warn.log 
> quite quickly.
> 
> I’m trying to understand what is causing the Tainted kernel
> messages. the warning is at 'WARNING: at ../net/core/dst.c:287’.

Which makes this a kernel, not a postgres issue.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

Robert Haas  writes:
> On Wed, Oct 5, 2016 at 9:04 AM, Amit Kapila  wrote:
>> Okay, but in that case I think we don't need to store including
>> columns in non-leaf pages to get the exact ordering.  As mentioned
>> upthread, we can use truncated scan key to reach to leaf level and
>> then use the complete key to find the exact location to store the key.
>> This is only possible if there exists an opclass for columns that are
>> covered as part of including clause.  So, we can allow "order by" to
>> use index scan only if the columns covered in included clause have
>> opclass for btree.

> But what if there are many pages full of keys that have the same
> values for the non-INCLUDING columns?

I concur with Robert that INCLUDING columns should be just dead weight
as far as the index is concerned.  Even if opclass information is
available for them, it's overcomplication for too little return.  We do
not need three classes of columns in an index.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Kernel Tainted

2016-10-05 Thread reiner peterke

Hi,

We are helping a client test an application On Power8 using Postgres 9.5.4 
which has been compiled specifically for the Power.

This is running on sles12sp1  the current kernel is 3.12.49-11

We are getting these kernel warning associated with the postmaster process.  
The application is handling around 15000TPS  It appears that one of these 
messages is generated for each each transaction which fills up the warn.log 
quite quickly.

I’m trying to understand what is causing the Tainted kernel messages. the 
warning is at 'WARNING: at ../net/core/dst.c:287’.  
I’ve found one link that indicates that this is ip6 related.  
https://brunomgalmeida.wordpress.com/2015/07/23/disable-ipv6-postgres-and-pgbouncer/
 

Is this accurate?  And if these action resolve the error, is it more of a 
bandaid then an actual fix?

Any comments are appreciated.

A sample of the error is below.

Reiner

2016-10-05T15:08:50.219292+02:00 PPDLMREB04 kernel: [ cut here 
]
2016-10-05T15:08:50.219335+02:00 PPDLMREB04 kernel: WARNING: at 
../net/core/dst.c:287
2016-10-05T15:08:50.219341+02:00 PPDLMREB04 kernel: Modules linked in: 
af_packet xfs libcrc32c ibmveth(X) rtc_generic btrfs xor raid6_pq 
dm_service_time sr_mod sd_mod cdrom crc_t10dif ibmvfc(X) scsi_transport_fc 
ibmvscsi(X) scsi_transport_srp scsi_tgt dm_multipath scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua scsi_dh dm_mod sg scsi_mod autofs4
2016-10-05T15:08:50.219346+02:00 PPDLMREB04 kernel: Supported: Yes, External
2016-10-05T15:08:50.219350+02:00 PPDLMREB04 kernel: CPU: 28 PID: 113041 Comm: 
postmaster Tainted: G   X 3.12.49-11-default #1
2016-10-05T15:08:50.219355+02:00 PPDLMREB04 kernel: task: c003c31100d0 ti: 
c003c1f08000 task.ti: c003c1f08000
2016-10-05T15:08:50.219362+02:00 PPDLMREB04 kernel: NIP: c05c0bb0 LR: 
c0594bc4 CTR: c06a0ae0
2016-10-05T15:08:50.219415+02:00 PPDLMREB04 kernel: REGS: c003c1f0b630 
TRAP: 0700   Tainted: G   X  (3.12.49-11-default)
2016-10-05T15:08:50.219424+02:00 PPDLMREB04 kernel: MSR: 80029033 
  CR: 24022288  XER: 0016
2016-10-05T15:08:50.219430+02:00 PPDLMREB04 kernel: CFAR: c0594bc0 
SOFTE: 1 
2016-10-05T15:08:50.219438+02:00 PPDLMREB04 kernel: GPR00: c0594bc4 
c003c1f0b8b0 c0e8ff00 c003c35a1980 
2016-10-05T15:08:50.219444+02:00 PPDLMREB04 kernel: GPR04: 0002 
 0001  
2016-10-05T15:08:50.219448+02:00 PPDLMREB04 kernel: GPR08:  
0001  c0710810 
2016-10-05T15:08:50.219452+02:00 PPDLMREB04 kernel: GPR12: c06a0ae0 
c7b2fc00 7fff 003c 
2016-10-05T15:08:50.219457+02:00 PPDLMREB04 kernel: GPR16:  
10735620   
2016-10-05T15:08:50.219462+02:00 PPDLMREB04 kernel: GPR20: c003c1746d80 
0001  03a8 
2016-10-05T15:08:50.219466+02:00 PPDLMREB04 kernel: GPR24: c003c1f0b9f0 
  03a8 
2016-10-05T15:08:50.219470+02:00 PPDLMREB04 kernel: GPR28: 0002 
c003c1746a00  c003c35a1980 
2016-10-05T15:08:50.219475+02:00 PPDLMREB04 kernel: NIP [c05c0bb0] 
dst_release+0x50/0xa0
2016-10-05T15:08:50.219479+02:00 PPDLMREB04 kernel: LR [c0594bc4] 
sk_dst_check+0x174/0x180
2016-10-05T15:08:50.219484+02:00 PPDLMREB04 kernel: Call Trace:
2016-10-05T15:08:50.219489+02:00 PPDLMREB04 kernel: [c003c1f0b8b0] 
[c16f7d80] 0xc16f7d80 (unreliable)
2016-10-05T15:08:50.219495+02:00 PPDLMREB04 kernel: [c003c1f0b8e0] 
[c0594bc4] sk_dst_check+0x174/0x180
2016-10-05T15:08:50.219501+02:00 PPDLMREB04 kernel: [c003c1f0b920] 
[c068d4cc] ip6_sk_dst_lookup_flow+0x4c/0x2a0
2016-10-05T15:08:50.219506+02:00 PPDLMREB04 kernel: [c003c1f0b970] 
[c06b16b8] udpv6_sendmsg+0x688/0xb20
2016-10-05T15:08:50.219511+02:00 PPDLMREB04 kernel: [c003c1f0baf0] 
[c064b30c] inet_sendmsg+0x9c/0x120
2016-10-05T15:08:50.219515+02:00 PPDLMREB04 kernel: [c003c1f0bb40] 
[c058eabc] sock_sendmsg+0xec/0x140
2016-10-05T15:08:50.219519+02:00 PPDLMREB04 kernel: [c003c1f0bc60] 
[c05921a8] SyS_sendto+0x108/0x150
2016-10-05T15:08:50.219524+02:00 PPDLMREB04 kernel: [c003c1f0bd80] 
[c0592240] SyS_send+0x50/0x70
2016-10-05T15:08:50.219530+02:00 PPDLMREB04 kernel: [c003c1f0bdc0] 
[c05933f0] SyS_socketcall+0x2a0/0x440
2016-10-05T15:08:50.219534+02:00 PPDLMREB04 kernel: [c003c1f0be30] 
[c000a17c] syscall_exit+0x0/0x7c
2016-10-05T15:08:50.219541+02:00 PPDLMREB04 kernel: Instruction dump:
2016-10-05T15:08:50.219547+02:00 PPDLMREB04 kernel: 6000 2fbf 419e0038 
395f0080 7c2004ac 7d205028 3129 7d20512d

Re: [HACKERS] WIP: Secure Transport support as OpenSSL alternative on macOS

Robert Haas  writes:
> On Wed, Oct 5, 2016 at 5:36 AM, Daniel Gustafsson  wrote:
>> The main questions raised here are: is it of interest to support multiple SSL
>> libraries given the additional support burden and; is supporting Secure
>> Transport of any interest or is it a wasted effort to continue towards a full
>> frontend/backend/doc submission?

> I think this is highly worthwhile.  I wish we could interest someone
> in doing the work for Windows ... but I'm a macOS user myself, so I'll
> be happy to have you fix my future compile problems for me.

"Future"?  Apple isn't even shipping the OpenSSL headers anymore, and
I imagine soon no libraries either.  We really have got little choice
on that platform but to do something with Secure Transport.  I'm glad
somebody is taking up the task.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work

Hi,

I was able to reproduce it in a read-write workload, instead of the
read-only workload you'd proposed.

On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
> Curiously, I did not see such a hang with regular SysV semaphores.
> That may be just a timing thing, or it may have something to do with
> POSIX semaphores being actually futexes on this platform (so that there
> is state inside the process not outside it).

Without yet having analyzed this deeply, could it actually be that the
reason is that sem_post/wait aren't proper memory barriers?  On a glance
the symptoms look like values have been modified without proper locks...

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work

Andres Freund  writes:
> On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
>> configure USE_UNNAMED_POSIX_SEMAPHORES=1 --disable-cassert 
>> --disable-spinlocks --disable-atomics

> Pretty independent from the complaint at hand, but if I just do that I get
> undefined reference to symbol 'sem_post@@GLIBC_2.2.5'

> I needed to add -pthread -lrt to LDFLAGS to make it work.

Yeah, on my machine man sem_init specifies "Link with -lrt or -pthread".
But I see -lrt getting added into the backend link anyway, presumably
as a result of one of these configure calls:

AC_SEARCH_LIBS(shm_open, rt)
AC_SEARCH_LIBS(shm_unlink, rt)
AC_SEARCH_LIBS(sched_yield, rt)

If we were to try to support USE_UNNAMED_POSIX_SEMAPHORES as default
(which is where I'm thinking about going) we'd have to tweak configure
to check sem_init similarly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work

Andres Freund  writes:
> On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
>> I think what is happening is that there are circular assumptions that end
>> up trying to implement a spinlock in terms of a spinlock, or otherwise
>> somehow recursively use the process's semaphore.  It's a bit hard to tell
>> though because the atomics code is such an underdocumented rat's nest of
>> #ifdefs.

> I don't think that should be the case, but I'll look into it.  How long
> did it take for you to reproduce the issue?

It hangs up within 10 or 20 seconds for me.  I didn't try hard to get a
census of where, but at least some of the callers are trying to acquire
buffer partition locks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work

On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
> I was trying to measure whether unnamed POSIX semaphores are any faster
> or slower than the SysV kind.  Plain pgbench is not very helpful for
> determining this, because we've optimized the system to the point that
> you don't hit semop waits all that often.  So I tried this:
> 
> configure USE_UNNAMED_POSIX_SEMAPHORES=1 --disable-cassert 
> --disable-spinlocks --disable-atomics

Pretty independent from the complaint at hand, but if I just do that I get
undefined reference to symbol 'sem_post@@GLIBC_2.2.5'

I needed to add -pthread -lrt to LDFLAGS to make it work.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move allocation size overflow handling to MemoryContextAllocExtended()?

On 2016-10-04 21:40:29 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-10-05 09:38:15 +0900, Michael Paquier wrote:
> >> The existing interface of MemoryContextAlloc do not care much about
> >> anything except Size, so I'd just give the responsability to the
> >> caller to do checks like queue != (Size) queue when queue is a uint64
> >> for example.
> 
> > Well, that duplicates the check and error message everywhere.
> 
> It seems like you're on the edge of reinventing calloc(), which is not an
> API that anybody especially likes.

I'm not sure how doing an s/Size/uint64/ in a bunch of APIs does
that. Because that'd allow us to to throw an error in a number of useful
cases where we currently can't.

I'm mostly concerned that there's a bunch of cases on 32bit platforms
where size_t is trivially overflowed. And being a bit more defensive
against that seems like a good idea. It took about a minute (10s of that
due to a typo) to find something that looks borked to me:
bool
spi_printtup(TupleTableSlot *slot, DestReceiver *self)
{
if (tuptable->free == 0)
{
/* Double the size of the pointer array */
tuptable->free = tuptable->alloced;
tuptable->alloced += tuptable->free;
tuptable->vals = (HeapTuple *) repalloc_huge(tuptable->vals,

tuptable->alloced * sizeof(HeapTuple));
}
seems like it could overflow quite easily on a 32bit system.

People don't think about 32bit size_t a whole lot anymore, so I think by
defaulting to 64bit calculations for this kind of thing, we'll probably
prevent a number of future bugs.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our "fallback" atomics implementation doesn't actually work

Hi,

On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
> I think what is happening is that there are circular assumptions that end
> up trying to implement a spinlock in terms of a spinlock, or otherwise
> somehow recursively use the process's semaphore.  It's a bit hard to tell
> though because the atomics code is such an underdocumented rat's nest of
> #ifdefs.

I don't think that should be the case, but I'll look into it.  How long
did it take for you to reproduce the issue?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Our "fallback" atomics implementation doesn't actually work

I was trying to measure whether unnamed POSIX semaphores are any faster
or slower than the SysV kind.  Plain pgbench is not very helpful for
determining this, because we've optimized the system to the point that
you don't hit semop waits all that often.  So I tried this:

configure USE_UNNAMED_POSIX_SEMAPHORES=1 --disable-cassert --disable-spinlocks 
--disable-atomics

figuring that that would stress the semop paths nicely.  But what I find
is that the database locks up after a few seconds of running "pgbench -S
-c 20 -j 20 bench" on an 8-core RHEL6 machine.

I think what is happening is that there are circular assumptions that end
up trying to implement a spinlock in terms of a spinlock, or otherwise
somehow recursively use the process's semaphore.  It's a bit hard to tell
though because the atomics code is such an underdocumented rat's nest of
#ifdefs.

Curiously, I did not see such a hang with regular SysV semaphores.
That may be just a timing thing, or it may have something to do with
POSIX semaphores being actually futexes on this platform (so that there
is state inside the process not outside it).

No hang observed without --disable-atomics, either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cardinality estimation for group by

On Tue, Oct 4, 2016 at 8:05 AM, Chenxi Li  wrote:
> How is cardinality estimation for "group by" is done and where is the code
> doing that?

I would suggest that you start by looking at estimate_num_groups() in
src/backend/utils/adt/selfuncs.c.  You might also want to look at
cost_agg() in src/backend/optimizer/path/costsize.c, and also the
places where it gets called.  They don't all estimate the number of
groups in the same way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] asynchronous execution

On Tue, Oct 4, 2016 at 7:53 AM, Amit Khandekar  wrote:
> Also, parent pointers are not required in the new design. Thinking of
> parent pointers, now it seems the event won't get bubbled up the tree
> with the new design. But still, , I think it's possible to switch over
> to the other asynchronous tree when some node in the current subtree
> is waiting. But I am not sure, will think more on that.

The bubbling-up still happens, because each node that made an async
request gets a callback with the final response - and if it is itself
the recipient of an async request, it can use that callback to respond
to that request in turn.  This version doesn't bubble up through
non-async-aware nodes, but that might be a good thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect comment/doc for poll_query_until

On Tue, Oct 4, 2016 at 7:25 AM, Daniel Gustafsson  wrote:
> Commit 2a0f89cd717ce6d raised the timeout in poll_query_until() to 180 seconds
> but left the documentation and comments at the previous value of 90 sec.  
> Patch
> attached to update documentation and comment to match reality.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgpassfile connection option

On Tue, Oct 4, 2016 at 7:42 AM, Julian Markwort
 wrote:
> On 09/26/2016 07:51 PM, Robert Haas wrote:
>> However, they don't have
>> to accept the possibility that arbitrary local files readable by the
>> user ID will be used for authentication and/or disclosed; this patch
>> would force them to accept that risk.
>
> I do agree with you, however we might have to take a look at the parameter
> sslkey's implementation here as well - There are no checks in place to stop
> you from using rogue sslkey parameters.
> I'd like to suggest having both of these parameters behave in a similar
> fashion. In order to achieve safe behaviour, we could implement the use of
> environment variables prohibiting the use of user-located pgpassfiles and
> sslkeys.
> How about PGSECRETSLOCATIONLOCK ?

You could do something like that, I guess, but I think it might be a
good idea to wait and see if anyone else has opinions on (1) the
desirability of the basic feature, (2) the severity of the security
hazard it creates, and (3) your proposed remediation method.

So far I don't see anybody actually endorsing your proposal here,
which might mean that any patch you produce will be rejected on the
grounds that nobody has a clear use case for this feature.

Hey, everybody: chime in here...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

On Wed, Oct 5, 2016 at 9:04 AM, Amit Kapila  wrote:
> Okay, but in that case I think we don't need to store including
> columns in non-leaf pages to get the exact ordering.  As mentioned
> upthread, we can use truncated scan key to reach to leaf level and
> then use the complete key to find the exact location to store the key.
> This is only possible if there exists an opclass for columns that are
> covered as part of including clause.  So, we can allow "order by" to
> use index scan only if the columns covered in included clause have
> opclass for btree.

But what if there are many pages full of keys that have the same
values for the non-INCLUDING columns?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Secure Transport support as OpenSSL alternative on macOS

On Wed, Oct 5, 2016 at 5:36 AM, Daniel Gustafsson  wrote:
> The main questions raised here are: is it of interest to support multiple SSL
> libraries given the additional support burden and; is supporting Secure
> Transport of any interest or is it a wasted effort to continue towards a full
> frontend/backend/doc submission?

I think this is highly worthwhile.  I wish we could interest someone
in doing the work for Windows ... but I'm a macOS user myself, so I'll
be happy to have you fix my future compile problems for me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

On Tue, Oct 4, 2016 at 12:36 AM, Amit Kapila  wrote:
> I think one way to avoid the risk of deadlock in above scenario is to
> take the cleanup lock conditionally, if we get the cleanup lock then
> we will delete the items as we are doing in patch now, else it will
> just mark the tuples as dead and ensure that it won't try to remove
> tuples that are moved-by-split.  Now, I think the question is how will
> these dead tuples be removed.  We anyway need a separate mechanism to
> clear dead tuples for hash indexes as during scans we are marking the
> tuples as dead if corresponding tuple in heap is dead which are not
> removed later.  This is already taken care in btree code via
> kill_prior_tuple optimization.  So I think clearing of dead tuples can
> be handled by a separate patch.

That seems like it could work.  The hash scan code will need to be
made smart enough to ignore any tuples marked dead, if it isn't
already.  More aggressive cleanup can be left for another patch.

> I have also thought about using page-scan-at-a-time idea which has
> been discussed upthread[1], but I think we can't completely eliminate
> the need to out-wait scans (cleanup lock requirement) for scans that
> are started when split-in-progress or for non-MVCC scans as described
> in that e-mail [1].  We might be able to find some way to solve the
> problem with this approach, but I think it will be slightly
> complicated and much more work is required as compare to previous
> approach.

There are several levels of aggressiveness here with different locking
requirements:

1. Mark line items dead without reorganizing the page.  Needs an
exclusive content lock, no more.  Even a shared content lock may be
OK, as for other opportunistic bit-flipping.
2. Mark line items dead and compact the tuple data.  If a pin is
sufficient to look at tuple data, as it is for the heap, then a
cleanup lock is required here.  But if we always hold a shared content
lock when looking at the tuple data, it might be possible to do this
with just an exclusive content lock.
3. Remove dead line items completely, compacting the tuple data and
the item-pointer array.  Doing this with only an exclusive content
lock certainly needs page-at-a-time mode because otherwise a searcher
that resumes a scan later might resume from the wrong place.  It also
needs the guarantee mentioned for point #2, namely that nobody will be
examining the tuple data without a shared content lock.
4. Squeezing the bucket.  This is probably always going to require a
cleanup lock, because otherwise it's pretty unclear how a concurrent
scan could be made safe.  I suppose the scan could remember every TID
it has seen, somehow detect that a squeeze had happened, and rescan
the whole bucket ignoring TIDs already returned, but that seems to
require the client to use potentially unbounded amounts of memory to
remember already-returned TIDs, plus an as-yet-uninvented mechanism
for detecting that a squeeze has happened.  So this seems like a
dead-end to me.

I think that it is very much worthwhile to reduce the required lock
strength from cleanup-lock to exclusive-lock in as many cases as
possible, but I don't think it will be possible to completely
eliminate the need to take the cleanup lock in some cases.  However,
if we can always take the cleanup lock conditionally and never be in a
situation where it's absolutely required, we should be OK - and even
level (1) gives you that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

2016-10-05 Thread Fabien COELHO



Hello Tom,


(2) The benchmark specification requires the client application to get
hold of query results, which are currently discarded by pgbench, so
pgbench does not really comply. I have submitted a patch to do that, see:


I find this completely bogus.  The data is delivered to the client,
ie it's inside the pgbench process.


Hmmm... It is delivered to libpq, and the client discards it... In a 
benchmarking context I think that this is not exactly the same:


For instance, one could implement a library protocol that would tell that 
the result is ready without actually transfering the result, getting a 
slight benefit by not doing so.


In order to avoid that kind of doubtful optimization, the benchmark 
requires that the final balance is returned to the "test driver", which is 
the client application, and not some subsystem linked to the database. I 
think that this is a pretty sensible requirement.


Moreover, the teller's branch must be used in some case, not sure how to 
do that without getting this value out anyway...



What's the grounds for arguing that something else has to happen to it?


In my view the "ground" is the benchmarking specification which wants to 
ensure that the tester/implementers/vendors cannot cheat to get better 
than deserved performance results...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Secure Transport support as OpenSSL alternative on macOS

2016-10-05 Thread Daniel Gustafsson

> On 05 Oct 2016, at 16:45, Stephen Frost  wrote:
> 
> Daniel,
> 
> * Daniel Gustafsson (dan...@yesql.se) wrote:
>> The main questions raised here are: is it of interest to support multiple SSL
>> libraries given the additional support burden and; is supporting Secure
>> Transport of any interest or is it a wasted effort to continue towards a full
>> frontend/backend/doc submission?
> 
> We've already started working towards being able to support multiple
> libraries (in particular, Heikki's work to try and contain the OpenSSL
> bits) and I've been looking at what it'll take to add NSS support.

Building on his work it’s quite simple to keep this code contained to its own
files which is great.

> Generally speaking, I think we'd be better off looking at adding support
> for multi-platform libraries like NSS rather than OS-specific APIs, but
> I do understand how there can be advantages to using the OS API
> (keychain integration is certainly a big part of that, as you mention;
> the same is true for our Kerberos/GSS support).

In general I agree with you, supporting OS specific APIs on common platforms
(for some value of) which doesn’t ship with a useable OpenSSL installation can
still be of value I think.

>> On top of the OpenSSL situation in macOS, supporting Secure Transport makes
>> sense if the Keychain can be fully leveraged since that would allow IT
>> organisations to centrally manage and provision Mac clients without passing
>> certificate/key files around.  Should there be any interest in this I intend 
>> to
>> continue on one more library which is more geared towards server environments
>> like nss or  once Secure Transport is done (scratching
>> another itch).
> 
> I'd certainly be more than happy to help with the NSS side of things,
> though I'd want to have both client and server support for it.

I think any library supported should be required to have both client and server
support.  I stopped short after the frontend in this patch to get feedback but
will absolutely implement the backend as well.

NSS is on my list and I intend to get started on that somewhat shortly.

>> The attached show-and-tell patches are:
> 
> Haven't actually looked at the patches yet, but very happy that you're
> also interested in this. :)
> 
>>  0003.  A first little stab at tweaking the docs to mention support for
>>  multiple SSL libraries.  There is a lot left here but the idea is to contain
>>  the library specifics in subsections with the main sections being generic
>>  SSL support information.  On top of tweaking the existing sections I intend
>>  (once done) to write up the steps needed to implement support for an SSL
>>  library, this should perhaps be a README in the tree though?
> 
> Agreed, this would be better as a README rather than in the formal docs,
> since adding support for another SSL library isn't really user-facing.

cheers ./daniel



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improving on pgrminclude / pgcompinclude ?

On Wed, Oct 5, 2016 at 12:28 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I wonder if we can do better.  The attached patch is the rest of a
>> couple of hours of hacking on a "see whether all of our includes
>> compile separately" project.  Directions:
>
> Don't we have more or less that already in src/tools/pginclude/cpluspluscheck?

Hmm, yeah, similar.  I didn't know about that.  But mine tries to get
the Makefile system to run the build, so that you compile it the same
way you are actually compiling the source tree, which is pretty darn
important.  cpluspluscheck fails miserably on my system, because it
doesn't know about the additional include directories I've configured.

>> It strikes me that if we could make this robust enough to include in
>> the buidlfarm, that would be good.  And maybe we could then have a
>> pgrminclude tool which is a bit smarter also, and something that
>> anyone can easily run.  I don't think pgrminclude should be part of
>> the buildfarm, but being able to run it every release cycle with
>> relatively minimal pain instead of every 5 years with extensive
>> fallout seems like it would be a win.
>
> I do not think that pgrminclude will *ever* be something that can be
> quasi-automatic or work without close adult supervision.  In the
> first place, whether a given reference is required or not can easily
> vary depending on platform and compile options.  In the second place,
> when there is something removable it's often because there are multiple
> inclusion pathways reaching the same header file, and figuring out
> which one is most appropriate to keep is a judgment call that requires
> some understanding of what we think the system structure ought to be.
>
> In fact, undoing the mess that the last pgrminclude run created was
> sufficiently painful that I'm not sure I want to see it run again ever.
> A few unnecessary includes isn't that big a problem.

No, but a lot of unnecessary includes eventually sucks.

> Having said that, part of the reason for the carnage was that someone
> had previously created a circular #include loop.  It would be good to
> have a tool that would whine about that; or at least fix pgrminclude
> so it would refuse to do anything if it found one.

I think there's basically no hope of making pgrminclude very smart as
long as it is a shell script.  If we can use perl and integrate with
the Makefiles, I think it could be made much smarter.  I agree that
there's not much chance of running it without close human supervision
and the application of human judgement, but I also think that the
current tool is pretty much hopeless.  If we improve the tool so that
it isn't so difficult to run and doesn't do so many silly things, we
can probably reduce the pain considerably.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2016-10-05 Thread Mithun Cy

On Tue, Oct 4, 2016 at 11:55 PM, Jeff Janes  wrote:
>Can you describe your benchmarking machine?  Your benchmarking data went
up to 128 clients.  But how many cores does the machine have?  Are
>you testing how well it can use the resources it has, or how well it can
deal with oversubscription of the resources?

It is a power2 machine with 192 hyperthreads.

Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):192
On-line CPU(s) list:   0-191
Thread(s) per core:8
Core(s) per socket:1
Socket(s): 24
NUMA node(s):  4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache:  512K
L3 cache:  8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191

 >Also, was the file supposed to be named .ods?  I didn't find it to be
openable as an .odc file.

Yes .ods right it is a spreadsheet in ODF.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] improving on pgrminclude / pgcompinclude ?

Robert Haas  writes:
> I wonder if we can do better.  The attached patch is the rest of a
> couple of hours of hacking on a "see whether all of our includes
> compile separately" project.  Directions:

Don't we have more or less that already in src/tools/pginclude/cpluspluscheck?

> It strikes me that if we could make this robust enough to include in
> the buidlfarm, that would be good.  And maybe we could then have a
> pgrminclude tool which is a bit smarter also, and something that
> anyone can easily run.  I don't think pgrminclude should be part of
> the buildfarm, but being able to run it every release cycle with
> relatively minimal pain instead of every 5 years with extensive
> fallout seems like it would be a win.

I do not think that pgrminclude will *ever* be something that can be
quasi-automatic or work without close adult supervision.  In the
first place, whether a given reference is required or not can easily
vary depending on platform and compile options.  In the second place,
when there is something removable it's often because there are multiple
inclusion pathways reaching the same header file, and figuring out
which one is most appropriate to keep is a judgment call that requires
some understanding of what we think the system structure ought to be.

In fact, undoing the mess that the last pgrminclude run created was
sufficiently painful that I'm not sure I want to see it run again ever.
A few unnecessary includes isn't that big a problem.

Having said that, part of the reason for the carnage was that someone
had previously created a circular #include loop.  It would be good to
have a tool that would whine about that; or at least fix pgrminclude
so it would refuse to do anything if it found one.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] improving on pgrminclude / pgcompinclude ?

Amit's complaint about unnecessary headers got me thinking about
src/tools/pginclude.  I think the the stuff in this directory is
trying to do something useful, because I think extra #include
directives that we don't need are a useful thing to eliminate.
However, in practice the contents are very hard for anybody to use
except maybe Bruce, who wrote it with his own environment in mind.  On
my system, any attempt to rerun pgcompinclude -- which is described as
a prerequisite to running pgrminclude -- produces enormous numbers of
bogus errors and warnings, both because of the attempt to call every
defined macro with made-up arguments and because I configure with
stuff like --with-includes=/opt/local/include which pgcompinclude
knows nothing about.

I wonder if we can do better.  The attached patch is the rest of a
couple of hours of hacking on a "see whether all of our includes
compile separately" project.  Directions:

1. configure
2. go to src/tools/pginclude
3. make test-compile-include

It strikes me that if we could make this robust enough to include in
the buidlfarm, that would be good.  And maybe we could then have a
pgrminclude tool which is a bit smarter also, and something that
anyone can easily run.  I don't think pgrminclude should be part of
the buildfarm, but being able to run it every release cycle with
relatively minimal pain instead of every 5 years with extensive
fallout seems like it would be a win.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/tools/pginclude/Makefile b/src/tools/pginclude/Makefile
new file mode 100644
index 000..b7e5e7e
--- /dev/null
+++ b/src/tools/pginclude/Makefile
@@ -0,0 +1,15 @@
+subdir = src/tools/pginclude
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+include $(top_srcdir)/src/backend/common.mk
+
+test-compile-include:
+	rm -rf compile-includes
+	cd $(top_builddir) && $(subdir)/gen_compile_include.pl
+	$(MAKE) -C $(top_builddir)/src/backend generated-headers
+	$(MAKE) -C compile-includes
+	rm -rf compile-includes
+
+clean:
+	rm -rf compile-includes
diff --git a/src/tools/pginclude/gen_compile_include.pl b/src/tools/pginclude/gen_compile_include.pl
new file mode 100755
index 000..24f858d
--- /dev/null
+++ b/src/tools/pginclude/gen_compile_include.pl
@@ -0,0 +1,163 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+my $include_prefix = 'src/include';
+my $output_dir = 'src/tools/pginclude/compile-includes';
+
+#
+# Some include files need to be excluded from the "does it compile separately?"
+# test, either because they are platform-dependent stubs or because they aren't
+# actually intended to compile separately.
+#
+my %exclude_include = map { $_ => 1 } qw(
+	access/rmgrlist.h
+	parser/gram.h
+	parser/kwlist.h
+	port/atomics
+	port/cygwin.h
+	port/win32.h
+	port/win32
+	port/win32_msvc
+	regex/regerrs.h
+storage/checksum_impl.h
+	rusagestub.h
+);
+
+my $top_builddir = $output_dir;
+$top_builddir =~ s@[^/]+@..@g;
+
+my @include_dir = ('');
+my @include_file;
+my @frontend_obj;
+my @backend_obj;
+
+#
+# Recurse through src/include and gather a list of all include files, excluding
+# those mentioned in the exclusion list, above.
+#
+while (@include_dir)
+{
+	my $include_dir = shift @include_dir;
+	my $include_path = $include_prefix . ($include_dir eq '' ? ''
+		: '/' . $include_dir);
+
+	opendir(my $directory, $include_path)
+		or die "could not opendir($include_path): $!";
+
+	while (my $entry = readdir($directory))
+	{
+		next if ($entry eq '.' || $entry eq '..');
+
+		$entry = $include_dir eq '' ? $entry : $include_dir . '/' . $entry;
+		next if exists $exclude_include{$entry};
+		my $path = $include_prefix . '/' . $entry;
+		if (-f $path)
+		{
+			push @include_file, $entry;
+		}
+		elsif (-d $path)
+		{
+			push @include_dir, $entry;
+		}
+	}
+}
+
+#
+# Create output directories.
+#
+mkdir($output_dir) or die "mkdir($output_dir): $!";
+mkdir($output_dir . '/backend') or die "mkdir($output_dir . '/backend'): $!";
+mkdir($output_dir . '/frontend') or die "mkdir($output_dir . '/frontend'): $!";
+
+#
+# For each include file, create a C file which includes either postgres.h
+# (for frontend includes) or postgres_fe.h (for backend includes), the
+# relevant header, and nothing else.
+#
+for my $include_file (@include_file)
+{
+	my $dotc_file = $include_file;
+	$dotc_file =~ s/\.h$/.c/;
+	$dotc_file =~ s@/@-@g;
+	my $is_frontend = ($include_file =~ m@^fe_utils/@);
+
+	my $dotc_path = $output_dir . ($is_frontend ? '/frontend/' : '/backend/')
+		. $dotc_file;
+	my $base_include = $is_frontend ? 'postgres_fe.h' : 'postgres.h';
+	$base_include = "postgres-fe.h" if $include_file =~ m@^fe_utils/@;
+
+	write_file($dotc_path, <

Re: [HACKERS] [PATCH] Generic type subscription

2016-10-05 Thread Artur Zakirov


Hello,

On 04.10.2016 11:28, Victor Wagner wrote:


Git complains about whitespace in this version of patch:

$ git apply ../generic_type_subscription_v2.patch
../generic_type_subscription_v2.patch:218: tab in indent.
int first;
../generic_type_subscription_v2.patch:219: tab in indent.
int second;
../generic_type_subscription_v2.patch:225: tab in indent.
SubscriptionExecData*sbsdata = (SubscriptionExecData *) 
PG_GETARG_POINTER(1);
../generic_type_subscription_v2.patch:226: tab in indent.
Custom  *result = (Custom *) 
sbsdata->containerSource;
../generic_type_subscription_v2.patch:234: tab in indent.
SubscriptionRef*sbsref = (SubscriptionRef *) PG_GETARG_POINTER(0);
warning: squelched 29 whitespace errors
warning: 34 lines add whitespace errors.

It doesn't prevent me from further testing of patch, but worth noticing.



I agree with Victor. In sgml files whitespaces instead of tabs are 
usually used.


I've looked at your patch to make some review.

"subscription"
--

The term "subscription" is confusing me. I'm not native english speaker. 
But I suppose that this term is not related with the "subscript". So 
maybe you should use the "subscripting" or "container" (because you 
already use the "container" term in the code). For example:


T_SubscriptingRef
SubscriptingRef
deparseSubscriptingRef()
xsubscripting.sgml
CREATE TYPE custom (
   internallength = 4,
   input = custom_in,
   output = custom_out,
   subscripting = custom_subscripting
);
etc.

Subscripting interface
--

In tests I see the query:


+select ('[1, "2", null]'::jsonb)['1'];
+ jsonb
+---
+ "2"
+(1 row)


Here '1' is used as a second item index. But from the last discussion 
https://www.postgresql.org/message-id/55D24517.8080609%40agliodbs.com it 
should be a key:



There is one ambiguous case you need to address:

testjson = '{ "a" : { } }'

SET testjson['a']['a1']['1'] = 42

... so in this case, is '1' a key, or the first item of an array?  how
do we determine that? How does the user assign something to an array?


And should return null. Is this right? Or this behaviour was not 
accepted in discussion? I didn't find it.



+Datum
+custom_subscription(PG_FUNCTION_ARGS)
+{
+   int op_type = 
PG_GETARG_INT32(0);
+   FunctionCallInfoDatatarget_fcinfo = get_slice_arguments(fcinfo, 1,
+  
 fcinfo->nargs);
+
+   if (op_type & SBS_VALIDATION)
+   return custom_subscription_prepare(_fcinfo);
+
+   if (op_type & SBS_EXEC)
+   return custom_subscription_evaluate(_fcinfo);
+
+   elog(ERROR, "incorrect op_type for subscription function: %d", op_type);
+}


I'm not sure but maybe we should use here two different functions? For 
example:


Datum
custom_subscripting_validate(PG_FUNCTION_ARGS)
{
}

Datum
custom_subscripting_evaluate(PG_FUNCTION_ARGS)
{
}

CREATE TYPE custom (
   internallength = 8,
   input = custom_in,
   output = custom_out,
   subscripting_validate = custom_subscripting_validate,
   subscripting_evalute = custom_subscripting_evaluate,
);

What do you think?

Documentation
-


+
+
+ 
+  User-defined subscription procedure


There is the extra whitespace before .


+  
+  When you define a new base type, you can also specify a custom procedure
+  to handle subscription expressions. It should contains logic for verification
+  and for extraction or update your data. For instance:


I suppose a  tag is missed after the .


+
+
+ 
+
+  


So  is redundant here.

Code



+static Oid
+findTypeSubscriptionFunction(List *procname, Oid typeOid)
+{
+   Oid argList[1];


Here typeOid argument is not used. Is it should be here?


+ExecEvalSubscriptionRef(SubscriptionRefExprState *sbstate,


I think in this function local arguments have a lot of tabs. It is 
normal if not all variables is aligned on the same line. A lot of tabs 
are occur in other functions too. Because of this some lines exceed 80 
characters.



+   if (sbstate->refupperindexpr != NULL)
+   upper = (Datum *) palloc(sbstate->refupperindexpr->length * 
sizeof(Datum *));
+
+   if (sbstate->reflowerindexpr != NULL)
+   lower = (Datum *) palloc(sbstate->reflowerindexpr->length * 
sizeof(Datum *));


Could we use palloc() before the "foreach(l, sbstate->refupperindexpr)" 
? I think it could be a little optimization.



-transformArrayType(Oid *arrayType, int32 *arrayTypmod)
+transformArrayType(Oid *containerType, int32 *containerTypmod)


I think name of the function is confusing. Maybe use 
transformContainerType()?



+JsonbValue *
+JsonbToJsonbValue(Jsonb *jsonb, JsonbValue *val)
+{


In this function we could palloc JsonbValue every time and don't use val

Re: [HACKERS] pgbench more operators & functions

Fabien COELHO  writes:
> [ re TPC-B ]

> (1) The required schema is slightly different : currently the type used 
> for holding balances is not wide enough per the TCP-B standard, this mean 
> maybe having an option to do "pgbench -i --standard-tpcb" which would 
> generate the right schema, probably it should just change a few INTEGER to 
> INT8, or maybe use NUMERIC(10). I have not done such a patch yet.

The whole question of the database setup is an interesting one.  If we
were to do anything at all here, I'd want to see not only the table
schemas and initial population, but also the hard-wired "vacuum" logic,
somehow made not so hard-wired.  I have no good ideas about that.  The
SQL commands could possibly be taken from scripts, but what of all the
work that's gone into friendly progress reporting for table loading?

> (2) The benchmark specification requires the client application to get 
> hold of query results, which are currently discarded by pgbench, so 
> pgbench does not really comply. I have submitted a patch to do that, see:

I find this completely bogus.  The data is delivered to the client,
ie it's inside the pgbench process.  What's the grounds for arguing
that something else has to happen to it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

2016-10-05 Thread Fabien COELHO



Hello Stephen,


* Fabien COELHO (coe...@cri.ensmp.fr) wrote:

I've got no objection to a more-nearly-TPC-B script as an option.


Good, because adding a "per-spec" tpc-b as an additional builtin
option is one of my intentions, once pgbench is capable of it.


I believe it would be really helpful to have the more-nearly-TPC-B
script written using these new capabilities of pgbench to see that
a) the new capabilities actually allow for this, b) there aren't other
things which are needed, c) to provide an actual use-case for these new
capabilities.


Here are the details:

(1) The required schema is slightly different : currently the type used 
for holding balances is not wide enough per the TCP-B standard, this mean 
maybe having an option to do "pgbench -i --standard-tpcb" which would 
generate the right schema, probably it should just change a few INTEGER to 
INT8, or maybe use NUMERIC(10). I have not done such a patch yet.



(2) The benchmark specification requires the client application to get 
hold of query results, which are currently discarded by pgbench, so 
pgbench does not really comply. I have submitted a patch to do that, see:


  https://commitfest.postgresql.org/11/669/


(3) The expression lines, especially with a CASE syntax, are quite long, 
allowing continuations would be nice, I have submitted a patch to do so:


  https://commitfest.postgresql.org/11/807/


(4) As stated above, conditions are needed. Given the above extensions, 
the following script would work and comply in 2 round trips and uses two

tests and two integer comparisons, added by the patch under discussion.
It also needs to get hold of two results (the branch teller and the final 
balance).


 -- choose teller id
 \set tid random(1, 10 * :scale)
 -- get an account branch, used if not the same as teller
 \set abid random(1; :scale - 1)
 -- get an account in-branch number
 \set laid random(1, 10)
 -- select amount
 \set delta random(-99, +99)
 -- let us now start the stuff
 BEGIN \;
 -- get the teller's branch
 SELECT bid \into tbid
   FROM pgbench_tellers WHERE tid = :tid ;
 -- if random < 0.85 account is in teller's branch, else in a *different* branch
 \set bid CASE \
WHEN random(0, 99) < 85 THEN :tbid \
ELSE :abid + (:abid < :tbid) \
  END
 \set aid :laid + 10 * :bid
 -- now move the money and return the final balance
 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid \;
 -- Maybe it is better to use "RETURNING aid" in the previous UPDATE?
 SELECT abalance \into abalance
   FROM pgbench_accounts WHERE aid = :aid \;
 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid \;
 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid \;
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
   VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP) \;
 END;


(5) The above "composite" queries (\;) do not work with -M prepared, which 
means (a) either doing independent queries and getting a lot of added 
latency, or better (b) make -M prepared work with composite queries, which 
I have not done yet.



Basically the 3 patches under submission allow to write the above working 
TPC-B script, but more patches are needed for the schema to comply and for 
-M prepared to work as well. I would prefer to wait for all pieces to be 
there before adding an example script. I do not think that one large patch 
mixing everything makes much sense from an engineering point of view, even 
if it makes sense from a feature point of view.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum launcher process launches worker process at high frequency

2016-10-05 Thread Jeff Janes

On Wed, Oct 5, 2016 at 7:28 AM, Masahiko Sawada 
wrote:

> Hi all,
>
> I found the kind of strange behaviour of the autovacuum launcher
> process when XID anti-wraparound vacuum.
>
> Suppose that a database (say test_db) whose age of frozenxid is about
> to reach max_autovacuum_max_age has three tables T1 and T2.
> T1 is very large and is frequently updated, so vacuum takes long time
> for vacuum.
> T2 is static and already frozen table, thus vacuum can skip to vacuum
> whole table.
> And anti-wraparound vacuum was already executed on other databases.
>
> Once the age of datfrozenxid of test_db exceeded
> max_autovacuum_max_age, autovacuum launcher launches worker process in
> order to do anti-wraparound vacuum on testdb.
> A worker process assigned to test_db begins to vacuum T1, it takes long
> time.
> Meanwhile another worker process is assigned to test_db and completes
> to vacuum on T2 and exits.
>
> After for while, the autovacuum launcher launches new worker again and
> worker is assigned to test_db again.
> But that worker exits quickly because there is no table we need to
> vacuum. (T1 is being vacuumed by another worker process).
> When new worker process starts, worker process sends SIGUSR2 signal to
> launcher process to wake up him.
> Although the launcher process executes WaitLatch() after launched new
> worker, it is woken up and launches another new worker process soon
> again.
>

See also this thread, which was never resolved:

https://www.postgresql.org/message-id/flat/CAMkU%3D1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta%3DYPyFPQ%40mail.gmail.com#CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com




> As a result, launcher process launches new worker process at extremely
> high frequency regardless of autovacuum_naptime, which increase cpu
> use rate.
>
> Why does auto vacuum worker need to wake up launcher process after started?
>
> autovacuum.c:L1604
>  /* wake up the launcher */
> if (AutoVacuumShmem->av_launcherpid != 0)
> kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);
>


I think that that is so that the launcher can launch multiple workers in
quick succession if it has fallen behind schedule. It can't launch them in
a tight loop, because its signals to the postmaster would get merged into
one signal, so it has to wait for one to get mostly set-up before launching
the next.

But it doesn't make any real difference to your scenario, as the
short-lived worker will wake the launcher up a few microseconds later
anyway, when it realizes it has no work to do and so exits.

Cheers,

Jeff

Re: [HACKERS] Question / requests.

2016-10-05 Thread Francisco Olarte

On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas  wrote:
> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera  
> wrote:
...
>> I wonder if the real answer isn't just to disallow -f with parallel
>> vacuuming.
> Seems like we should figure out which catalog tables are needed in
> order to perform a VACUUM, and force those to be done last and one at
> a time.

Is the system catalog a bottleneck for people who has real use for
paralell vacuum? I mean, to me someone who does this must have a very
big db on a big iron. If that does not consist of thousands and
thousands of smallish relations, it will normally be some very big
tables and a much smaller catalog. Then you can vacuum paralell
everything but system catalogs and then vaccum serial those. I do not
have that big dbs, but in my modest case system catalogs are very fast
to vacuum. If 99% of the time is spent vacuuming non-system it does
not make much sense to spend effort on speeding maybe one half of the
system catalogs vacuum ( I mean, 99% of the time is non-system, half
of system can be done in paralell, with a ten-fold speed up we go from
99+0.5+0.5 to 9.9+0.5+0.5 = 10.9 with full serial system catalogs and
to 9.9+0.5+0.05=10.45 with hybrid system vacuum  and with a 100 fold
speedup, in the realm of SF for me, to 0.99+0.5+0.5=1.99 and
0.99+0.5+0.05=1.54, not that much to be gained )

( Just asking. )

OTOH while I dig into the code I will take a look to see how complex
it will be to build to lists, paralell + serial, and loop on them.
This could be used on a first approach to split on !pg_catalog +
pg_catalog and used as a base for having and explicit list or some
flag in the catalog later.

Francisco Olarte.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind hangs if --source-server is used and syncrep is enabled

2016-10-05 Thread Michael Banck

On Wed, Oct 05, 2016 at 04:39:39PM +0200, Michael Banck wrote:
> if pg_rewind is told to fetch data via a libpq connection
> (--source-server), synchronous replication is enabled and there is only
> one sync standby (pilot error, but sill); pg_rewinding the old master
> hangs at the CREATE TEMPORARY TABLE step (CREATE TABLE waiting for
> X/).  At least this happened to one of our clients while
> evaluating pg_rewind.
> 
> To the user, the last thing printed is "need to copy  MB [...]".  If
> the user cancels the pg_rewind command with ^C, the backend keeps
> hanging around even in --dry-run mode.  That won't hurt too much as it
> does not seem to block future pg_rewind runs after synchronous_commit
> has been set to a different value, but looks surprising to me.
> 
> Not sure whether pg_rewind could error out gracefully without hanging in
> this case, 

My colleague Christoph Berg pointed out that pg_rewind could just set
synchronous_commit = local before creating the temporary table, which
indeed works, proof-of-concept patch attached


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index 9239009..a5cb426 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -406,7 +406,7 @@ libpq_executeFileMap(filemap_t *map)
 	 * First create a temporary table, and load it with the blocks that we
 	 * need to fetch.
 	 */
-	sql = "CREATE TEMPORARY TABLE fetchchunks(path text, begin int4, len int4);";
+	sql = "SET synchronous_commit = local; CREATE TEMPORARY TABLE fetchchunks(path text, begin int4, len int4);";
 	res = PQexec(conn, sql);
 
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PostgreSQL - Weak DH group

2016-10-05 Thread Nicolas Guini

Hello everyone,

I sent few days ago to the security DL a mail reporting a vulnerability in
how Postgres is requesting DH params to be used later for encryption
algorithms. So, due to there is no problem sharing with this group, here is
what I sent:

--
 Hi folks,



We are working with Postgres 9.3.14 and executing nmap we
found that it is using “weak DH group” (nmap –script ssl-dh-params). Weak =
1024 bits.

See nmap output (1)



We don’t know if other versions are affected or not. The
environment used is a RHEL 6 x86_6, OpenSSL version 1.0.2i with FIPS module.

This issue is similar to what this post explains about using weak DH
parameters: http://www.usefuljs.net/2016/09/29/imperfect-forward-secrecy/



Following with the code, it seems that PostgreSQL has
missed the keyLength OpenSSL parameter, and it delivers into a weak crypto
configuration.. Affected Code:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=
blob;f=src/backend/libpq/be-secure-openssl.c;h=
8d8f12952a4a4f14a15f8647b96935e13d68fb39;hb=48d50840d53eb62842c0d9b54eab9c
d7c9a3a46d



(Thanks to Damian in order to found the affected code)




(1) nmap output:


# nmap –script ssl-dh-params -p 5432 


Starting Nmap 7.25BETA2 ( https://nmap.org )

Nmap scan report for 

Host is up (0.00035s latency).

PORT STATE SERVICE

5432/tcp open  postgresql

| ssl-dh-params:

|   VULNERABLE:

|   Diffie-Hellman Key Exchange Insufficient Group Strength

| State: VULNERABLE

|   Transport Layer Security (TLS) services that use Diffie-Hellman
groups

|   of insufficient strength, especially those using one of a few
commonly

|   shared groups, may be susceptible to passive eavesdropping attacks.

| Check results:

|   WEAK DH GROUP 1

| Cipher Suite: TLS_DHE_RSA_WITH_AES_128_GCM_SHA256

| Modulus Type: Safe prime

| Modulus Source: Unknown/Custom-generated

| Modulus Length: 1024

| Generator Length: 8

| Public Key Length: 1024

| References:

|_  https://weakdh.org



--




Thanks in advance

Nicolas Guini

[HACKERS] pg_rewind hangs if --source-server is used and syncrep is enabled

2016-10-05 Thread Michael Banck

Hi,

if pg_rewind is told to fetch data via a libpq connection
(--source-server), synchronous replication is enabled and there is only
one sync standby (pilot error, but sill); pg_rewinding the old master
hangs at the CREATE TEMPORARY TABLE step (CREATE TABLE waiting for
X/).  At least this happened to one of our clients while
evaluating pg_rewind.

To the user, the last thing printed is "need to copy  MB [...]".  If
the user cancels the pg_rewind command with ^C, the backend keeps
hanging around even in --dry-run mode.  That won't hurt too much as it
does not seem to block future pg_rewind runs after synchronous_commit
has been set to a different value, but looks surprising to me.

Not sure whether pg_rewind could error out gracefully without hanging in
this case, but maybe it could/should clean up the query on SIGTERM? And
at the least, maybe this caveat should be documented?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Secure Transport support as OpenSSL alternative on macOS

2016-10-05 Thread Stephen Frost

Daniel,

* Daniel Gustafsson (dan...@yesql.se) wrote:
> The main questions raised here are: is it of interest to support multiple SSL
> libraries given the additional support burden and; is supporting Secure
> Transport of any interest or is it a wasted effort to continue towards a full
> frontend/backend/doc submission?

We've already started working towards being able to support multiple
libraries (in particular, Heikki's work to try and contain the OpenSSL
bits) and I've been looking at what it'll take to add NSS support.

Generally speaking, I think we'd be better off looking at adding support
for multi-platform libraries like NSS rather than OS-specific APIs, but
I do understand how there can be advantages to using the OS API
(keychain integration is certainly a big part of that, as you mention;
the same is true for our Kerberos/GSS support).

> On top of the OpenSSL situation in macOS, supporting Secure Transport makes
> sense if the Keychain can be fully leveraged since that would allow IT
> organisations to centrally manage and provision Mac clients without passing
> certificate/key files around.  Should there be any interest in this I intend 
> to
> continue on one more library which is more geared towards server environments
> like nss or  once Secure Transport is done (scratching
> another itch).

I'd certainly be more than happy to help with the NSS side of things,
though I'd want to have both client and server support for it.

> The attached show-and-tell patches are:

Haven't actually looked at the patches yet, but very happy that you're
also interested in this. :)

>   0003.  A first little stab at tweaking the docs to mention support for
>   multiple SSL libraries.  There is a lot left here but the idea is to contain
>   the library specifics in subsections with the main sections being generic
>   SSL support information.  On top of tweaking the existing sections I intend
>   (once done) to write up the steps needed to implement support for an SSL
>   library, this should perhaps be a README in the tree though?

Agreed, this would be better as a README rather than in the formal docs,
since adding support for another SSL library isn't really user-facing.

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] Autovacuum launcher process launches worker process at high frequency

2016-10-05 Thread Masahiko Sawada

Hi all,

I found the kind of strange behaviour of the autovacuum launcher
process when XID anti-wraparound vacuum.

Suppose that a database (say test_db) whose age of frozenxid is about
to reach max_autovacuum_max_age has three tables T1 and T2.
T1 is very large and is frequently updated, so vacuum takes long time
for vacuum.
T2 is static and already frozen table, thus vacuum can skip to vacuum
whole table.
And anti-wraparound vacuum was already executed on other databases.

Once the age of datfrozenxid of test_db exceeded
max_autovacuum_max_age, autovacuum launcher launches worker process in
order to do anti-wraparound vacuum on testdb.
A worker process assigned to test_db begins to vacuum T1, it takes long time.
Meanwhile another worker process is assigned to test_db and completes
to vacuum on T2 and exits.

After for while, the autovacuum launcher launches new worker again and
worker is assigned to test_db again.
But that worker exits quickly because there is no table we need to
vacuum. (T1 is being vacuumed by another worker process).
When new worker process starts, worker process sends SIGUSR2 signal to
launcher process to wake up him.
Although the launcher process executes WaitLatch() after launched new
worker, it is woken up and launches another new worker process soon
again.
As a result, launcher process launches new worker process at extremely
high frequency regardless of autovacuum_naptime, which increase cpu
use rate.

Why does auto vacuum worker need to wake up launcher process after started?

autovacuum.c:L1604
 /* wake up the launcher */
if (AutoVacuumShmem->av_launcherpid != 0)
kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relids in upper relations

Ashutosh Bapat  writes:
> While reviewing aggregate pushdown patch [1] we noticed that
> RelOptInfos for upper relations do not have relids set.

Indeed, because they don't correspond to any particular scan relation or
set of scan relations.  I have in mind that in future releases, any
particular upperrel might have its own definition of what the relids
mean --- for example, in UPPERREL_SETOP it would likely be useful for
the relids to represent the set of leaf SELECTs that have been merged
in a particular path.  You could imagine UPPERREL_WINDOW using the
relids to track which window functions have been implemented in a path,
whenever we get around to considering multiple window function orderings.
None of that's there yet.

> create_foreignscan_plan() copies the relids from RelOptInfo into
> ForeignScan::fs_relids. That field is used to identify the RTIs
> covered by the ForeignScan.

That's fine for scan/join paths.  If you want to pay attention to
relids for an upper rel, it's up to you to know what they mean.
I would not counsel assuming that they have anything at all to do
with baserel RTIs.

> We could prevent the crash by passing input_rel->relids to
> fetch_upper_rel() in create_grouping_path() as seen in the attached
> patch.

I think this is fundamentally wrongheaded.  If we go that route,
the only valid relids for any upper path would be the union of all
baserel RTIs, making it rather pointless to carry the value around
at all, and definitely making it impossible to use the field to
distinguish different partial implementations of the same upperrel.

You should look to root->all_baserels, instead, if that's the value
you want when considering an upperrel Path.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

2016-10-05 Thread Amit Kapila

On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas  wrote:
> On Tue, Oct 4, 2016 at 9:20 AM, Amit Kapila  wrote:
 I'd say that the reason for not using included columns in any
 operations which require comparisons, is that they don't have opclass.
 Let's go back to the example of points.
 This data type don't have any opclass for B-tree, because of fundamental
 reasons.
 And we can not apply _bt_compare() and others to this attribute, so
 we don't include it to scan key.

 create table t (i int, i2 int, p point);
 create index idx1 on (i) including (i2);
 create index idx2 on (i) including (p);
 create index idx3 on (i) including (i2, p);
 create index idx4 on (i) including (p, i2);

 You can keep tuples ordered in idx1, but not for idx2, partially ordered 
 for
 idx3, but not for idx4.
>>>
>>> Yeah, I think we shouldn't go there.  I mean, once you start ordering
>>> by INCLUDING columns, then you're going to need to include them in
>>> leaf pages because otherwise you can't actually guarantee that they
>>> are in the right order.
>>
>> I am not sure what you mean by above, because patch already stores
>> INCLUDING columns in leaf pages.
>
> Sorry, I meant non-leaf pages.
>

Okay, but in that case I think we don't need to store including
columns in non-leaf pages to get the exact ordering.  As mentioned
upthread, we can use truncated scan key to reach to leaf level and
then use the complete key to find the exact location to store the key.
This is only possible if there exists an opclass for columns that are
covered as part of including clause.  So, we can allow "order by" to
use index scan only if the columns covered in included clause have
opclass for btree.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

2016-10-05 Thread Stephen Frost

Fabien,

* Fabien COELHO (coe...@cri.ensmp.fr) wrote:
> >I've got no objection to a more-nearly-TPC-B script as an option.
> 
> Good, because adding a "per-spec" tpc-b as an additional builtin
> option is one of my intentions, once pgbench is capable of it.

I believe it would be really helpful to have the more-nearly-TPC-B
script written using these new capabilities of pgbench to see that
a) the new capabilities actually allow for this, b) there aren't other
things which are needed, c) to provide an actual use-case for these new
capabilities.

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] Relids in upper relations

2016-10-05 Thread Ashutosh Bapat

Hi,
While reviewing aggregate pushdown patch [1] we noticed that
RelOptInfos for upper relations do not have relids set.
create_foreignscan_plan() copies the relids from RelOptInfo into
ForeignScan::fs_relids. That field is used to identify the RTIs
covered by the ForeignScan. For example, postgresBeginForeignScan()
uses it to get the user mapping
1281 /*
1282  * Identify which user to do the remote access as.  This
should match what
1283  * ExecCheckRTEPerms() does.  In case of a join, use the
lowest-numbered
1284  * member RTE as a representative; we would get the same
result from any.
1285  */
1286 if (fsplan->scan.scanrelid > 0)
1287 rtindex = fsplan->scan.scanrelid;
1288 else
1289 rtindex = bms_next_member(fsplan->fs_relids, -1);
1290 rte = rt_fetch(rtindex, estate->es_range_table);
1291 userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();

Core functions also use this field to get RTIs covered by ForeignScan
e.g. ExplainPreScanNode.

Since this field is not set in an upper relation, when we push down an
upper operation like grouping/aggregation, fs_relids remains NULL,
causing undesirable results. In case of postgres_fdw aggregate
pushdown, it crashed in rt_fetch().

We could prevent the crash by passing input_rel->relids to
fetch_upper_rel() in create_grouping_path() as seen in the attached
patch. preprocess_minmax_aggregates() needed to be fixed so that both
these functions add paths to the same relation. I am sure, if we go
this route, we will have to fix each call of fetch_upper_rel() in this
manner. But I am not sure if it's intended to be so.

The comment in fetch_upper_rel() says
 847  * An "upper" relation is identified by an UpperRelationKind and
a Relids set.
 848  * The meaning of the Relids set is not specified here, and very
likely will
 849  * vary for different relation kinds.
which seems to indicate that relids in upper relation will be
different that those in the lower relations, but it doesn't say how.

We need to fix the usages of fs_relids or the calls to
fetch_upper_rel() for pushdown of upper operations. I am not sure
which. Please suggest, how to move forward with this.


[1] 
https://www.postgresql.org/message-id/flat/CANEvxPokcFi7OfEpi3%3DJ%2Bmvxu%2BPPAG2zXASLEkv5DzDPhSTk6A%40mail.gmail.com#CANEvxPokcFi7OfEpi3=j+mvxu+ppag2zxaslekv5dzdphst...@mail.gmail.com
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_upper_relids.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tracking wait event for latches

On Tue, Oct 4, 2016 at 5:07 PM, Michael Paquier
 wrote:
> More seriously, the Windows animals have been complaining about
> pg_sleep() crashing the system:
>   SELECT pg_sleep(0.1);
> ! server closed the connection unexpectedly
> ! This probably means the server terminated abnormally
> ! before or while processing the request.
> ! connection to server was lost
>
> And I think that the answer to this crash is in WaitForSingleObject(),
> where the macro WAIT_TIMEOUT is already defined, so there is an
> overlap with the new declarations in pgstat.h:
> https://msdn.microsoft.com/en-us/library/aa450988.aspx
> This is also generating a bunch of build warnings now that I compile
> HEAD on Windows. Regression tests are not crashing here, but I am
> getting a failure in stats.sql and pg_sleep is broken. I swear I
> tested that at some point and did not see a crash or those warnings...
> But well what's done is done.
>
> It seems to me that a correct answer would be to rename this class ID.
> But instead I'd suggest to append the prefix PG_* to all the class
> events like in the attached, that passes make-check, contrib-check,
> modules-check and builds without warnings on Windows. A more simple
> fix would be just to rename WAIT_TIMEOUT to something else but
> appending PG_ looks better in the long term.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tracking wait event for latches

On Tue, Oct 4, 2016 at 3:28 PM, Thomas Munro
 wrote:
> Nitpicking: the includes in bgworker.c weren't sorted properly, and
> then this patch added "pgstat.h" in the wrong position.  See attached
> suggestion.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-10-05 Thread Etsuro Fujita


On 2016/10/05 14:09, Ashutosh Bapat wrote:

I wrote:

I think
record_foreignscan_plan_dependencies in your patch would be a bit
inefficient because that tracks such dependencies redundantly in the case
where the given ForeignScan has an outer subplan, so I optimized that in the
attached.



+   /*
+* Record dependencies on FDW-related objects.  If an outer subplan
+* exists, that would be done in the processing of its baserels, so skip
+* that.
+*/
I think, we need a bit more explanation here. How about rewording it
as "Record dependencies on FDW-related objects. If an outer subplan
exists, the dependencies would be recorded while processing the
foreign plans for the base foreign relations in the subplan. Hence
skip that here."


Agreed.  I updated the comments as proposed.


+ * Currently, we track exactly the dependencies of plans on relations,
+ * user-defined functions and FDW-related objects.  On relcache invalidation
+ * events or pg_proc syscache invalidation events, we invalidate just those
+ * plans that depend on the particular object being modified.  (Note: this
+ * scheme assumes that any table modification that requires replanning will
+ * generate a relcache inval event.)  We also watch for inval events on
+ * certain other system catalogs, such as pg_namespace; but for them, our
+ * response is just to invalidate all plans.  We expect updates on those
+ * catalogs to be infrequent enough that more-detailed tracking is not worth
+ * the effort.

Just like you have added FDW-related objects in the first sentence,
reference to those needs to be added in the second sentence as well.
Or you want to start the second sentence as "When the relevant caches
are invalidated, we invalidate ..." so that those two sentences will
remain in sync even after additions/deletions to the first sentence.


Good catch!  I like the second one.  How about starting the second 
sentence as "On relcache invalidation events or the relevant syscache 
invalidation events, we invalidate ..."?



I think it would be a bit inefficient to use PlanCacheFuncCallback as the
inval callback function for those caches, because that checks the inval-item
list for the rewritten query tree, but any updates on such catalogs wouldn't
affect the query tree.



I am not sure about that. Right now it seems that only the plans are
affected, but can we say that for all FDWs?


If some writable FDW consulted foreign table/server/FDW options in 
AddForeignUpdateTarget, which adds the extra junk columns for 
UPDATE/DELETE to the targetList of the given query tree, the rewritten 
query tree would also need to be invalidated.  But I don't think such an 
FDW really exists because that routine in a practical FDW wouldn't 
change such columns depending on those options.



So, I added a new callback function for those caches
that is much like PlanCacheFuncCallback but skips checking the list for the
query tree.



I am not sure that the inefficiency because of an extra loop on
plansource->invalItems is a good reason to duplicate most of the code
in PlanCacheFuncCallback(). IMO, maintaining that extra function and
the risk of bugs because of not keeping those two functions in sync
outweigh the small not-so-frequent gain. The name of function may be
changed to be more generic, instead of current one referring Func.


The inefficiency wouldn't be negligible in the case where there are 
large numbers of cached plans.  So, I'd like to introduce a helper 
function that checks the dependency list for the generic plan, to 
eliminate most of the duplication.


Attached is the next version of the patch.

Thanks for the comments!

Best regards,
Etsuro Fujita


foreign_plan_cache_inval_v3.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash tables in dynamic shared memory

2016-10-05 Thread Dilip Kumar

On Wed, Oct 5, 2016 at 3:10 AM, Thomas Munro
 wrote:
> Here is an experimental hash table implementation that uses DSA memory
> so that hash tables can be shared by multiple backends and yet grow
> dynamically.  Development name:  "DHT".

+dht_iterate_begin(dht_hash_table *hash_table,
+  dht_iterator *iterator,
+  bool exclusive)
+{
+ Assert(hash_table->control->magic == DHT_MAGIC);
+
+ iterator->hash_table = hash_table;
+ iterator->exclusive = exclusive;
+ iterator->partition = 0;
+ iterator->bucket = 0;
+ iterator->item = NULL;
+ iterator->locked = false;

While reviewing , I found that in dht_iterate_begin function, we are
not initializing
iterator->last_item_pointer to InvalidDsaPointer;
and during scan this variable will be used in advance_iterator
function. (I think this will create problem, even loss of some tuple
?)

+advance_iterator(dht_iterator *iterator, dsa_pointer bucket_head,
+ dsa_pointer *next)
+{
+ dht_hash_table_item *item;
+
+ while (DsaPointerIsValid(bucket_head))
+ {
+ item = dsa_get_address(iterator->hash_table->area,
+   bucket_head);
+ if ((!DsaPointerIsValid(iterator->last_item_pointer) ||
+ bucket_head < iterator->last_item_pointer) &&

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-05 Thread John Gorman

On Tue, Oct 4, 2016 at 10:11 PM, Tomas Vondra

For GenSlab the situation is less clear, as there probably are ways to make
> it work, but I'd vote to keep it simple for now, and simply do elog(ERROR)
> in the realloc() methods - both for Slab and GenSlab. The current use case
> (reorderbuffer) does not need that, and it seems like a can of worms to me.


Good plan. Realloc can be added later if there is an actual use case.

Re: [HACKERS] Stopping logical replication protocol

2016-10-05 Thread Craig Ringer

On 5 October 2016 at 05:14, Andres Freund  wrote:
> Hi,
>
> On 2016-09-23 13:01:27 +0800, Craig Ringer wrote:
>> From f98f2388c57d938ebbe07ccd2dbe02138312858f Mon Sep 17 00:00:00 2001
>> From: Vladimir Gordiychuk 
>> Date: Wed, 7 Sep 2016 00:39:18 +0300
>> Subject: [PATCH 2/4] Client-initiated CopyDone during transaction decoding in
>>  walsender
>>
>> The prior patch caused the walsender to react to a client-initiated
>> CopyDone while it's in the WalSndLoop. That's not enough to let clients
>> end logical decoding mid-transaction because we loop in ReorderBufferCommit
>> during decoding of a transaction without ever returning to WalSndLoop.
>>
>> Allow breaking out of ReorderBufferCommit by detecting that client
>> input has requested an end to COPY BOTH mode, so clients can abort
>> decoding mid-transaction.
>
> Hm, I'm *VERY* doubtful about doing this via yet another callback. Why
> don't we just error out in the prepare write callback?

That's sensible.

> I don't think it's a good idea to return a non-error state if a command
> is interrupted in the middle. We might already have sent a partial
> transaction or such.   Also compare this to interrupting a query - we
> don't just returning rows, we error out.

Good point. It's not usually visible to the user because if it's a
client-initiated cancel the client will generally consume the error,
but there's still an error generated.

Terminating COPY BOTH with ERRCODE_QUERY_CANCELED seems fine. If it
was expecting the error the client can Sync and do whatever it next
wants to do on the walsender protocol, or bail out nicely.

Vladimir? I'm running out of time to spend on this at least until the
next CF. Think you can make these changes?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature suggestion: ON CONFLICT DO NOTHING RETURNING which returns existing row data

2016-10-05 Thread Pantelis Theodosiou

This can be solved by chaining modifying CTEs.

Something like this (not tested)  that can work with multiple rows inserted:


WITH
  vals (bk1, bk2, other_t1_columns, other_t2_columns) AS
( VALUES (bk1val, bk2val, other_t1_values, other_t2_values),
 (bk1val, bk2val, other_t1_values, other_t2_values)
),
  ins_t1 AS
( INSERT INTO t1 (bk1, bk2, other columns)
  SELECT bk1, bk2, other_t1_columns
  FROM vals
  ON CONFLICT (bk1val, bk2val) DO NOTHING
  RETURNING id, bk1, bk2
)
INSERT INTO t2 (t1_id, other_t2_columns)
SELECT
COALESCE(t1.id, ins_t1,id),
val.bk1, val.bk2, val.other_t2_columns
FROM vals
  LEFT JOIN ins_t1 ON (vals.bk1, vals.bk2) = (ins_t1.bk1, ins_t1.bk2)
  LEFT JOIN t1 ON (vals.bk1, vals.bk2) = (t1.bk1, t1.bk2)
 ;

On Wed, Oct 5, 2016 at 1:53 AM, Tom Dunstan  wrote:

> Hi all
>
> We recently moved to using 9.5 and were hoping to use the new upsert
> functionality, but unfortunately it doesn’t quite do what we need.
>
> Our setup is something like this:
>
> CREATE TABLE t1 (
>   id BIGSERIAL NOT NULL PRIMARY KEY,
>   bk1 INT,
>   bk2 UUID
>   — other columns
> );
> CREATE UNIQUE INDEX t1_bk ON t1 (bk1, bk2);
>
> CREATE TABLE t2 (
>   t1_id BIGINT NOT NULL REFERENCES t1
>  — other stuff
> );
>
> Data comes in as inserts of one tuple each of t1 and t2. We expect inserts
> to t1 to be heavily duplicated. That is, for stuff coming in we expect a
> large number of rows to have duplicate (bk1, bk2), and we wish to discard
> those, but not discard the t2 tuple - those should always be inserted and
> reference the correct t1 record.
>
> So we currently have an insertion function that does this:
>
> BEGIN
>   INSERT INTO t1 (bk1, bk2, other columns)
>   VALUES (bk1val, bk2val, other values)
>   RETURNING id
>   INTO t1_id;
> EXCEPTION WHEN unique_violation THEN
>   SELECT id
>   FROM t1
>   WHERE bk1 = bk1val AND bk2 = bk2val
>   INTO t1_id;
> END;
>
> INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);
>
> We were hoping that we’d be able to do something like this:
>
> INSERT INTO t1 (bk1, bk2, other columns)
>   VALUES (bk1val, bk2val, other values)
>   ON CONFLICT (bk1val, bk2val) DO NOTHING
>   RETURNING id
>   INTO t1_id;
> INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);
>
> But unfortunately it seems that the RETURNING clause returns null when
> there’s a conflict, rather than the existing row’s value.
>
> I understand that there is ambiguity if there were multiple rows that were
> in conflict. I think this sort of functionality really only makes sense
> where the conflict target is a unique constraint, so IMO it would make
> sense to only support returning columns in that case.
>
> I imagine that this would be possible to do more efficiently than the
> subsequent query that we are currently doing given that postgres has
> already found the rows in question, in the index at least. I have no idea
> how hard it would actually be to implement though. FWIW my use-case would
> be supported even if this only worked for indexes where the to-be-returned
> columns were stored in the index using Anastasia’s covering + unique index
> patch, when that lands.
>
> Thoughts?
>
> Tom
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

[HACKERS] WIP: Secure Transport support as OpenSSL alternative on macOS

2016-10-05 Thread Daniel Gustafsson

As noted in 77fb2321-9210-4ad4-b7fc-67623a583...@justatheory.com, OpenSSL is
being somewhat dismantled on macOS starting with 10.11 El Capitan.  To scratch
my own itch I decided to take a stab at adding support for the Secure Transport
library.  The idea is to make it as much of a drop-in for the OpenSSL support
as possible, supporting the same file formats etc (Secure Transport have PKCS12
API calls while support PEM requires using lower level APIs).

Only the frontend has been implemented so far.  I used a vanilla OpenSSL-
compiled backend for testing (see patch 0002 below).  Currently the CRL tests
fail since Secure Transport doesn’t handle manual CRL lists like OpenSSL, CRL
handling is done automatically by the Keychain upon validation.  One
alternative for supporting this could perhaps be to create a temporary Keychain
for the duration of the connection over which more control can be had (need to
research this further).  For now, no Secure Transport specifics are supported
but the idea is to add support for loading certificates and keys from the
Keychain as well as from the usual locations.  The engine and compression
settings will however be no-ops since there is no support in Secure Transport
for these.

Being an early WIP show-and-tell submissions and not something up for code
review (anyone wanting to take a look is of course more than welcome), it is
deliberately rough around the edges.  Except the manual CRL loading, it’s
mostly featurecomplete with the OpenSSL frontend, TODO items include: removing
use of private and deprecated API calls; ensure release of all resources;
support certificates/keys in Keychain; possibly use temporary keychains for
connections; support passphrase entering (there has been lots of deprecations
around that, need to figure out the supported way forward); tidy up, and write
better, comments; documentation.

The main questions raised here are: is it of interest to support multiple SSL
libraries given the additional support burden and; is supporting Secure
Transport of any interest or is it a wasted effort to continue towards a full
frontend/backend/doc submission?

On top of the OpenSSL situation in macOS, supporting Secure Transport makes
sense if the Keychain can be fully leveraged since that would allow IT
organisations to centrally manage and provision Mac clients without passing
certificate/key files around.  Should there be any interest in this I intend to
continue on one more library which is more geared towards server environments
like nss or  once Secure Transport is done (scratching
another itch).

The attached show-and-tell patches are:

  0001.  The WIP frontend support for Secure Transport together with minimal
  scaffolding around it such as Makefile and autoconf.

  0002.  Adding support to PostgresNode.pm to run the server binaries from a
  defined path making it easier to test a frontend without having to a)
  implement the backend at the same time or b) fiddle with paths manually.
  The SSL tests are the only ones modified.  There is nothing Secure Transport
  specific in this patch so iff the approach is deemed interesting it could be
  considered out of band.  Since the only consumer is testing SSL during
  development it might be too invasive however.

  0003.  A first little stab at tweaking the docs to mention support for
  multiple SSL libraries.  There is a lot left here but the idea is to contain
  the library specifics in subsections with the main sections being generic
  SSL support information.  On top of tweaking the existing sections I intend
  (once done) to write up the steps needed to implement support for an SSL
  library, this should perhaps be a README in the tree though?

Thoughts, comments?

cheers ./daniel



0001-WIP-Add-support-for-macOS-Secure-Transport-SSL-libra.patch
Description: Binary data


0002-Enable-ssl-tests-to-be-using-a-different-set-of-Post.patch
Description: Binary data


0003-A-first-stab-at-updating-the-docs-to-handle-multiple.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic shared memory areas

2016-10-05 Thread Dilip Kumar

On Wed, Oct 5, 2016 at 3:00 AM, Thomas Munro
 wrote:
> Here's a new version that does that.

While testing this patch I found some issue,

+ total_size = DSA_INITIAL_SEGMENT_SIZE;
+ total_pages = total_size / FPM_PAGE_SIZE;
+ metadata_bytes =
+ MAXALIGN(sizeof(dsa_area_control)) +
+ MAXALIGN(sizeof(FreePageManager)) +
+ total_pages * sizeof(dsa_pointer);
+ /* Add padding up to next page boundary. */
+ if (metadata_bytes % FPM_PAGE_SIZE != 0)
+ metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE);
+ usable_pages =
+ (total_size - metadata_bytes) / FPM_PAGE_SIZE;

+ segment = dsm_create(total_size, 0);
+ dsm_pin_segment(segment);

Actually problem is that size of dsa_area_control is bigger than
DSA_INITIAL_SEGMENT_SIZE.
but we are allocating segment of DSA_INITIAL_SEGMENT_SIZE size.

(gdb) p sizeof(dsa_area_control)
$8 = 67111000
(gdb) p DSA_INITIAL_SEGMENT_SIZE
$9 = 1048576

In dsa-v1 problem was not exist because  DSA_MAX_SEGMENTS was 1024,
but in dsa-v2 I think it's calculated wrongly.

(gdb) p DSA_MAX_SEGMENTS
$10 = 16777216

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

2016-10-05 Thread Fabien COELHO




I've got no objection to a more-nearly-TPC-B script as an option.


Good, because adding a "per-spec" tpc-b as an additional builtin option is 
one of my intentions, once pgbench is capable of it.



But why do you feel the need to pull the default script out into
a separate file?  Seems to me that just adds maintenance complexity,
and the need for pgbench to have a notion of a library directory,
for little gain.


I tend to agree on this point. Now it could be possible to make pgbench 
look for "builtin" scripts in a predefined location so that they are found 
easilly, but I'm not sure there would be a significant added value wrt the 
current status.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-10-05 Thread Amit Langote


Hi,

On 2016/10/05 16:57, Rajkumar Raghuwanshi wrote:
> I observed, when creating foreign table with range partition, data is not
> inserting into specified partition range. below are steps to reproduce.
> 
> [ ... ]
> 
> postgres=# INSERT INTO test_range (a) values (5),(25),(15);
> INSERT 0 3
> 
> postgres=# select tableoid::regclass, * from test_range;
>  tableoid | a 
> --+
>  ft_test_range_p1 |  5
>  ft_test_range_p2 | 15
>  ft_test_range_p3 | 25
> (3 rows)
> 
> --Here ft_test_range_p2 is created for range 20-30 having value 15.

Thanks a lot for testing.

That's a bug.  I found that it is caused by the FDW plans getting paired
with the wrong result relations during ExecInitModifyTable() initialization.

I will include a fix for the same in the patch set that I will be sending
soon in reply to Robert's review comments on patch 0002 [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoY1aQ5iPz0S2GBJw4YUR1Z2Qg5iKUf8YJSo2Ctya4ZmNg%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generic type subscription

2016-10-05 Thread Oleg Bartunov

On Wed, Oct 5, 2016 at 6:48 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:

> On 5 October 2016 at 03:00, Oleg Bartunov  wrote:
>
>>
>> have you ever run 'make check' ?
>>
>> =
>>  53 of 168 tests failed.
>> =
>>
>>
> Sure, how else could I write tests for this feature? But right now on my
> machine
> everything is ok (the same for `make installcheck`):
>
> $ make check
> 
> ===
>  All 168 tests passed.
> ===
>

Oops, something was wrong in my setup :)

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

On 09/05/2016 06:53 AM, Pavan Deolasee wrote:

On Thu, Sep 1, 2016 at 9:44 PM, Bruce Momjian > wrote:

On Thu, Sep  1, 2016 at 02:37:40PM +0530, Pavan Deolasee wrote:
> I like the simplified approach, as long as it doesn't block further
> improvements.
>
>
>
> Yes, the proposed approach is simple yet does not stop us from improving 
things
> further. Moreover it has shown good performance characteristics and I 
believe
> it's a good first step.

Agreed.  This is BIG.  Do you think it can be done for PG 10?

I definitely think so. The patches as submitted are fully functional and
sufficient. Of course, there are XXX and TODOs that I hope to sort out
during the review process. There are also further tests needed to ensure
that the feature does not cause significant regression in the worst
cases. Again something I'm willing to do once I get some feedback on the
broader design and test cases. What I am looking at this stage is to
know if I've missed something important in terms of design or if there
is some show stopper that I overlooked.

Latest patches rebased with current master are attached. I also added a
few more comments to the code. I forgot to give a brief about the
patches, so including that as well.

0001_track_root_lp_v4.patch: This patch uses a free t_infomask2 bit to
track latest tuple in an update chain. The t_ctid.ip_posid is used to
track the root line pointer of the update chain. We do this only in the
latest tuple in the chain because most often that tuple will be updated
and we need to quickly find the root only during update.

0002_warm_updates_v4.patch: This patch implements the core of WARM
logic. During WARM update, we only insert new entries in the indexes
whose key has changed. But instead of indexing the real TID of the new
tuple, we index the root line pointer and then use additional recheck
logic to ensure only correct tuples are returned from such potentially
broken HOT chains. Each index AM must implement a amrecheck method to
support WARM. The patch currently implements this for hash and btree
indexes.

Hi,

I've been looking at the patch over the past few days, running a bunch 
of benchmarks etc. I can confirm the significant speedup, often by more 
than 75% (depending on number of indexes, whether the data set fits into 
RAM, etc.). Similarly for the amount of WAL generated, although that's a 
bit more difficult to evaluate due to full_page_writes.

I'm not going to send detailed results, as that probably does not make 
much sense at this stage of the development - I can repeat the tests 
once the open questions get resolved.

There's a lot of useful and important feedback in the thread(s) so far, 
particularly the descriptions of various failure cases. I think it'd be 
very useful to collect those examples and turn them into regression 
tests - that's something the patch should include anyway.

I don't really have much comments regarding the code, but during the 
testing I noticed a bit strange behavior when updating statistics. 
Consider a table like this:

create table t (a int, b int, c int) with (fillfactor = 10);
insert into t select i, i from generate_series(1,1000) s(i);
create index on t(a);
create index on t(b);

and update:

update t set a = a+1, b=b+1;

which has to update all indexes on the table, but:

select n_tup_upd, n_tup_hot_upd from pg_stat_user_tables

 n_tup_upd | n_tup_hot_upd
---+---
  1000 |  1000

So it's still counted as "WARM" - does it make sense? I mean, we're 
creating a WARM chain on the page, yet we have to add pointers into all 
indexes (so not really saving anything). Doesn't this waste the one WARM 
update per HOT chain without actually getting anything in return?

The way this is piggy-backed on the current HOT statistics seems a bit 
strange for another reason, although WARM is a relaxed version of HOT. 
Until now, HOT was "all or nothing" - we've either added index entries 
to all indexes or none of them. So the n_tup_hot_upd was fine.

But WARM changes that - it allows adding index entries only to a subset 
of indexes, which means the "per row" n_tup_hot_upd counter is not 
sufficient. When you have a table with 10 indexes, and the counter 
increases by 1, does that mean the update added index tuple to 1 index 
or 9 of them?

So I think we'll need two counters to track WARM - number of index 
tuples we've added, and number of index tuples we've skipped. So 
something like blks_hit and blks_read. I'm not sure whether we should 
replace the n_tup_hot_upd entirely, or keep it for backwards 
compatibility (and to track perfectly HOT updates).

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-05 Thread Amit Kapila

On Wed, Oct 5, 2016 at 12:05 PM, Tomas Vondra
 wrote:
> Hi,
>
> After collecting a lot more results from multiple kernel versions, I can
> confirm that I see a significant improvement with 128 and 192 clients,
> roughly by 30%:
>
>64128192
> 
>  master 62482  43181  50985
>  granular-locking   61701  59611  47483
>  no-content-lock62650  59819  47895
>  group-update   63702  64758  62596
>
> But I only see this with Dilip's workload, and only with pre-4.3.0 kernels
> (the results above are from kernel 3.19).
>

That appears positive.

> With 4.5.5, results for the same benchmark look like this:
>
>64128192
> 
>  master 35693  39822  42151
>  granular-locking   35370  39409  41353
>  no-content-lock36201  39848  42407
>  group-update   35697  39893  42667
>
> That seems like a fairly bad regression in kernel, although I have not
> identified the feature/commit causing it (and it's also possible the issue
> lies somewhere else, of course).
>
> With regular pgbench, I see no improvement on any kernel version. For
> example on 3.19 the results look like this:
>
>64128192
> 
>  master 54661  61014  59484
>  granular-locking   55904  62481  60711
>  no-content-lock56182  62442  61234
>  group-update   55019  61587  60485
>

Are the above results with synchronous_commit=off?

> I haven't done much more testing (e.g. with -N to eliminate collisions on
> branches) yet, let's see if it changes anything.
>

Yeah, let us see how it behaves with -N.  Also, I think we could try
at higher scale factor?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-10-05 Thread Rajkumar Raghuwanshi

On Tue, Oct 4, 2016 at 1:32 PM, Amit Langote 
wrote:

Attached updated patches.
>
> Thanks,
> Amit
>

Hi,

I observed, when creating foreign table with range partition, data is not
inserting into specified partition range. below are steps to reproduce.

CREATE EXTENSION postgres_fdw;
CREATE SERVER pwj_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname
'postgres', port '5432',use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER pwj_server;

CREATE TABLE test_range (a int) PARTITION BY RANGE(a);

CREATE TABLE test_range_p1 (a int);
CREATE FOREIGN TABLE ft_test_range_p1 PARTITION OF test_range FOR VALUES
START (1) END (10) SERVER pwj_server OPTIONS (TABLE_NAME 'test_range_p1');

CREATE TABLE test_range_p2 (a int);
CREATE FOREIGN TABLE ft_test_range_p2 PARTITION OF test_range FOR VALUES
START (20) END (30) SERVER pwj_server OPTIONS (TABLE_NAME 'test_range_p2');

CREATE TABLE test_range_p3 (a int);
CREATE FOREIGN TABLE ft_test_range_p3 PARTITION OF test_range FOR VALUES
START (10) END (20) SERVER pwj_server OPTIONS (TABLE_NAME 'test_range_p3');

postgres=# INSERT INTO test_range (a) values (5),(25),(15);
INSERT 0 3

postgres=# select tableoid::regclass, * from test_range;
 tableoid | a
--+
 ft_test_range_p1 |  5
 ft_test_range_p2 | 15
 ft_test_range_p3 | 25
(3 rows)

--Here ft_test_range_p2 is created for range 20-30 having value 15.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers


Hi,

After collecting a lot more results from multiple kernel versions, I can 
confirm that I see a significant improvement with 128 and 192 clients, 
roughly by 30%:


   64128192

 master 62482  43181  50985
 granular-locking   61701  59611  47483
 no-content-lock62650  59819  47895
 group-update   63702  64758  62596

But I only see this with Dilip's workload, and only with pre-4.3.0 
kernels (the results above are from kernel 3.19).


With 4.5.5, results for the same benchmark look like this:

   64128192

 master 35693  39822  42151
 granular-locking   35370  39409  41353
 no-content-lock36201  39848  42407
 group-update   35697  39893  42667

That seems like a fairly bad regression in kernel, although I have not 
identified the feature/commit causing it (and it's also possible the 
issue lies somewhere else, of course).


With regular pgbench, I see no improvement on any kernel version. For 
example on 3.19 the results look like this:


   64128192

 master 54661  61014  59484
 granular-locking   55904  62481  60711
 no-content-lock56182  62442  61234
 group-update   55019  61587  60485

I haven't done much more testing (e.g. with -N to eliminate collisions 
on branches) yet, let's see if it changes anything.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-10-05 Thread Michael Paquier

(Squashing replies)

On Fri, Sep 30, 2016 at 6:13 PM, Michael Paquier
 wrote:
> On Fri, Sep 30, 2016 at 2:51 PM, Michael Paquier
>  wrote:
>> On Fri, Sep 30, 2016 at 2:05 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> At Fri, 30 Sep 2016 14:00:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>>>  wrote in 
>>> <20160930.140015.150178454.horiguchi.kyot...@lab.ntt.co.jp>
 I don't see no problem in setting progressAt in XLogInsertRecord.
 But I doubt GetProgressRecPtr is harmful, especially when
>>>
>>> But I suspect that GetProgressRecPtr could be harmful.
>>
>> Well, you can maximize its effects by doing NUM_XLOGINSERT_LOCKS ==
>> nproc and reducing checkpoint_timeout. That's what I did but..
>
> Note: I am moving this patch to next CF.

And I am back on it more seriously... And I am taking back what I said upthread.

I looked at the v12 that Horiguchi-san has written, and that seems
correct to me. So I have squashed everything into a single patch,
including the log entry that gets logged with log_checkpoints. Testing
with archive_timeout to 10s, checkpoint_timeout to 30s, sometimes
triggering manual activity with CREATE TABLE/whatever and manual
pg_switch_xlog(), I am able to see that checkpoints can be correctly
skipped or generated.

There was as well a compilation error with ereport(). Not sure how I
missed that... Likely too many patches handled these days.

I have also updated the description of archive_timeout that increasing
checkpoint_timeout would reduce unnecessary checkpoints on a idle
system. With this patch, on an idle system checkpoints are just
skipped as they should.

How does that look?
-- 
Michael

hs-checkpoints-v14.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] strange case of kernel performance regression (4.3.x and newer)