Re: [HACKERS] Commitfest problems

2014-12-19 Thread Mark Kirkwood

On 19/12/14 20:48, Andres Freund wrote:

On 2014-12-18 10:02:25 -0800, Joshua D. Drake wrote:

I think a lot of hackers forget exactly how tender their egos are. Now I say
this knowing that a lot of them will go, Oh give me a break but as someone
who employs hackers, deals with open source AND normal people :P every
single day, I can tell you without a single inch of sarcasm that petting
egos is one of the ways you get things done in the open source (and really
any male dominated) community.


To me that's a bit over the top stereotyping.



+1

Having been mentioned one or two times myself - it was an unexpected 
wow - cool rather than a grumpy/fragile I must be noticed thing. I 
think some folk have forgotten the underlying principle of the open 
source community - it is about freely giving - time or code etc. The 
there must be something in it for me me me meme is - well - the 
*other* world view.



However, doing crappy work and let's not be shy about it, there is NOTHING
fun about reviewing someone else's code needs to have incentive.


FWIW, I don't agree with this at all. Reviewing code can be quite
interesting - with the one constraint that the problem the patch solves
needs to be somewhat interesting. The latter is what I think gets many
of the more experienced reviewers - lots of the patches just solve stuff
they don't care about.



Yeah, and also it helps if the patch addresses an area that you at least 
know *something* about - otherwise it is really hard to review in any 
useful way.


Cheers

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres TR for missing chunk

2014-12-19 Thread M Tarkeshwar Rao
Hello friends,

Thanks for your useful inputs.

We are facing this issue and want to analyse this through logging. 
can you please share a sample Postgres config file to enable max logging with 
syslog support?

What should be the debug level so that I can capture the failure information?

Regards
Tarkeshwar

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: 16 December 2014 22:25
To: Jaime Casanova
Cc: M Tarkeshwar Rao; PostgreSQL-development
Subject: Re: [HACKERS] Postgres TR for missing chunk

Jaime Casanova ja...@2ndquadrant.com writes:
 You know, that toast table name ringed a bell.
 Look at this thread maybe this is your problem, and if it is then is 
 already fixed and you should update.
 http://www.postgresql.org/message-id/12138.1336019...@sss.pgh.pa.us

That was about transient failures though, not persistent ones, which is what 
the OP seems to be claiming he's getting.

 Btw, when giving a bug report you should start but saying your 
 PostgreSQL's version and explain what you did based on Google's wisdom

Yeah.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-19 Thread Petr Jelinek

On 10/12/14 16:03, Petr Jelinek wrote:

On 10/12/14 04:26, Michael Paquier wrote:

On Thu, Dec 4, 2014 at 9:26 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Yeah, it was raised. I don't think it was really addressed. There was
lengthy discussion on whether to include LSN, node id, and/or some other
information, but there was no discussion on:

* What is a node ID?
* How is it used?
* Who assigns it?

etc.

None of those things are explained in the documentation nor code
comments.





Could it be possible to get a refresh on that before it bitrots too
much or we'll simply forget about it? In the current state, there is
no documentation able to explain what is the point of the nodeid
stuff, how to use it and what use cases it is aimed at. The API in
committs.c should as well be part of it. If that's not done, I think
that it would be fair to remove this portion and simply WAL-log the
commit timestamp its GUC is activated.


I have this on my list so it's not being forgotten and I don't see it
bitrotting unless we are changing XLog api again. I have bit busy
schedule right now though, can it wait till next week? - I will post
some code documentation patches then.



Hi,

as promised I am sending code-comment patch that explains the use of 
commit timestamps + nodeid C api for the conflict resolution, comments 
welcome.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index ca074da..fff1fdd 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -6,6 +6,62 @@
  * This module is a pg_clog-like system that stores the commit timestamp
  * for each transaction.
  *
+ * When track_commit_timestamp is enabled, this module will keep track of
+ * commit timestamp for each transaction. It also provides API to for
+ * optionally storing nodeid (origin) of each transaction. The main purpose of
+ * this functionality is to help with conflict detection and resolution for
+ * replication systems.
+ *
+ * The following example shows how to use the API provided by this module, to
+ * handle UPDATE conflicts coming from replication stream:
+ * void
+ * update_tuple(Relation relation, HeapTuple remote_tuple,
+ *  TimestampTz remote_commit_ts, CommitTsNodeId remote_node_id)
+ * {
+ * bool exists;
+ * HeapTupleDatalocal_tuple;
+ *
+ * // Find existing tuple with same PK/unique index combination.
+ * exists = find_local_tuple(relation, remote_tuple, local_tuple);
+ *
+ * // The tuple was found.
+ * if (exists)
+ * {
+ * TransactionIdxmin;
+ * TimestampTz  local_commit_ts;
+ * CommitTsNodeId   local_node_id;
+ *
+ * xmin = HeapTupleHeaderGetXmin(local_tuple.t_data);
+ * TransactionIdGetCommitTsData(xmin, local_commit_ts, nodeid);
+ *
+ * // New tuple is coming from different node than the locally saved
+ * // tuple and the remote commit timestamp is older than local commit
+ * // timestamp, this is UPDATE/UPDATE conflict (node being UPDATEd on
+ * // different nodes at the same time.
+ * if (remote_id != local_node_id  remote_commit_ts = local_commit_ts)
+ * {
+ * if (remote_commit_ts  local_commit_ts)
+ * return; // Keep the local tuple.
+ *
+ * // Handle the conflict in a consistent manner.
+ * }
+ * else
+ * {
+ * // The new tuple either comes from same node as old tuple and/or
+ * // is has newer commit timestamp than the local tuple, apply the
+ * // UPDATE.
+ * }
+ *  }
+ *  else
+ *  {
+ *  // Tuple not found (possibly UPDATE/DELETE conflict), handle it
+ *  // in a consistent manner.
+ *  }
+ * }
+ *
+ * See default_node_id and CommitTsSetDefaultNodeId for explanation of how to
+ * set nodeid when applying transactions.
+ *
  * XLOG interactions: this module generates an XLOG record whenever a new
  * CommitTs page is initialized to zeroes.  Also, one XLOG record is
  * generated for setting of values when the caller requests it; this allows
@@ -49,6 +105,15 @@
  */
 
 /*
+ * CommitTimestampEntry
+ *
+ * Record containing information about the transaction commit timestamp and
+ * the nodeid.
+ *
+ * The nodeid provides IO efficient way for replication systems to store
+ * information about origin of the transaction. Currently the nodeid is opaque
+ * value meaning of which is defined by the replication system itself.
+ *
  * We need 8+4 bytes per xact.  Note that enlarging this struct might mean
  * the largest possible file name is more than 5 chars long; see
  * SlruScanDirectory.
@@ -93,6 +158,21 @@ CommitTimestampShared	*commitTsShared;
 /* GUC variable */
 bool	track_commit_timestamp;
 
+/*
+ * The default_node_id 

[HACKERS] Re: [pgsql-pkg-debian] Updated libpq5 packages cause connection errors on postgresql 9.2

2014-12-19 Thread Christoph Berg
Re: Chris Butler 2014-12-19 
1155204201.65430.1418975376728.javamail.zim...@zedcore.com
 One of our servers is currently running on postgres 9.2 using the 
 9.2.9-1.pgdg70+1 packages from pgdg.
 
 After an apt update this morning which brought in the libpq5 package version 
 9.4.0-1.pgdg70+1, connections to the database started failing with SSL errors 
 logged on the server:
 
[unknown] [unknown] LOG:  could not accept SSL connection: digest too big 
 for rsa key
 
 Rolling back the server and client to libpq5 version 9.3.5-2.pgdg70+1 fixed 
 it.
 
 This is running on an otherwise up-to-date Debian Wheezy. The SSL certificate 
 is locally issued using an internal CA which has been added to the local 
 trust store. SSL-related config options are left set to the defaults.

Hi Chris,

thanks for the report.

Googling for digest too big for rsa key seems to indicate that this
problem occurs when you are using (client?) certificates with short
RSA keys. 512 bits is most often cited in the problem reports,
something like 768 is around the minimum size that works, and of
course, anything smaller than 1024 or really 1536 (or 2048) bits is
too small for today's crypto standards.

So the question here is if this is also the problem you saw - are you
using client or server certificates with short keys?

What this explanation doesn't explain is why the problem occurs with
9.4's libpq5 while it works with 9.3's. The libssl version used for
building these packages should really be the same, 9.3.5-2.pgdg70+1
was built just two days ago as well.

I'm CCing -hackers, maybe someone there has an idea.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-12-19 Thread Amit Kapila
On Mon, Dec 15, 2014 at 4:18 PM, Dilip kumar dilip.ku...@huawei.com wrote:

 On December 2014 17:31 Amit Kapila Wrote,


 Hmm, theoretically I think new behaviour could lead to more I/O in

 certain cases as compare to existing behaviour.  The reason for more I/O

 is that in the new behaviour, while doing Analyze for a particular table
at

 different targets, in-between it has Analyze of different table as well,

 so the pages in shared buffers or OS cache for a particular table needs
to

 be reloded again for a new target whereas currently it will do all stages

 of Analyze for a particular table in one-go which means that each stage

 of Analyze could get benefit from the pages of a table loaded by previous

 stage.  If you agree, then we should try to avoid this change in new

 behaviour.



 I will work on this comment and post the updated patch..


One idea is to send all the stages and corresponding Analyze commands
to server in one go which means something like
BEGIN; SET default_statistics_target=1; SET vacuum_cost_delay=0;
  Analyze t1; COMMIT;
BEGIN; SET default_statistics_target=10; RESET vacuum_cost_delay;
  Analyze t1; COMMIT;
BEGIN; RESET default_statistics_target;
 Analyze t1; COMMIT;

Now, still parallel operations in other backends could lead to
page misses, but I think the impact will be minimized.



 I will move this patch to the latest commitfest.


By the way, I think this patch should be in Waiting On Author stage.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Combining Aggregates

2014-12-19 Thread David Rowley
On 18 December 2014 at 02:48, Simon Riggs si...@2ndquadrant.com wrote:


 David, if you can update your patch with some docs to explain the
 behaviour, it looks complete enough to think about committing it in
 early January, to allow other patches that depend upon it to stand a
 chance of getting into 9.5. (It is not yet ready, but I see it could
 be).


Sure, I've more or less added the docs from your patch. I still need to
trawl through and see if there's anywhere else that needs some additions.


 The above example is probably the best description of the need, since
 user defined aggregates must also understand this.

 Could we please call these combine functions or other? MERGE is an
 SQL Standard statement type that we will add later, so it will be
 confusing if we use the merge word in this context.


Ok, changed.


 David, your patch avoids creating any mergefuncs for existing
 aggregates. We would need to supply working examples for at least a
 few of the builtin aggregates, so we can test the infrastructure. We
 can add examples for all cases later.


In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating point
types, cash and interval. I've now added combine functions for count(*) and
count(col). It seems that int8pl() is suitable for this.


Do you think it's worth adding any new functions at this stage, or is it
best to wait until there's a patch which can use these?

Regards

David Rowley


combine_aggregate_state_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-19 Thread Alvaro Herrera
Stephen Frost wrote:
 Alvaro,
 
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
  FWIW I've been giving this patch a look and and adjusting some coding
  details here and there.  Do you intend to commit it yourself?  You're
  not listed as reviewer or committer for it in the commitfest app, FWIW.
 
 Oh, great, thanks!  And, yeah, I'm not as good as I should be about
 updating the commitfest app.  As for committing it, I was thinking I
 would but you're certainly welcome to if you're interested.  I would
 like this to be committed soonish as it will then allow Adam to rebase
 the patch which adds the various role attributes discussed previously on
 top of the representation changes.  I suspect he's done some work in
 that direction already, but I keep asking for changes to this patch
 which would then ripple down into the other.

Sure, I will go over it a bit more and merge changes from Adam to the
docs as they come through, and commit soon.

  One thing I don't very much like is that check_role_attribute() receives
  a RoleAttr but nowhere it checks that only one bit is set in
  'attribute'.  From the coding, this routine would return true if just
  one of those bits is set, which is probably not what is intended.  Now I
  realize that current callers all pass a bitmask with a single bit set,
  but I think it'd be better to have some protection against that, for
  possible future coding mistakes.
 
 That's certainly a good point.  I'm inclined to suggest that there be an
 Assert() check in check_role_attribute(), or were you thinking of
 something else..?

Yeah, an Assert() is what I had in mind.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [pgsql-pkg-debian] Updated libpq5 packages cause connection errors on postgresql 9.2

2014-12-19 Thread Magnus Hagander
On Fri, Dec 19, 2014 at 11:52 AM, Christoph Berg c...@df7cb.de wrote:

 Re: Chris Butler 2014-12-19 
 1155204201.65430.1418975376728.javamail.zim...@zedcore.com
  One of our servers is currently running on postgres 9.2 using the
 9.2.9-1.pgdg70+1 packages from pgdg.
 
  After an apt update this morning which brought in the libpq5 package
 version 9.4.0-1.pgdg70+1, connections to the database started failing with
 SSL errors logged on the server:
 
 [unknown] [unknown] LOG:  could not accept SSL connection: digest too
 big for rsa key
 
  Rolling back the server and client to libpq5 version 9.3.5-2.pgdg70+1
 fixed it.
 
  This is running on an otherwise up-to-date Debian Wheezy. The SSL
 certificate is locally issued using an internal CA which has been added to
 the local trust store. SSL-related config options are left set to the
 defaults.

 Hi Chris,

 thanks for the report.

 Googling for digest too big for rsa key seems to indicate that this
 problem occurs when you are using (client?) certificates with short
 RSA keys. 512 bits is most often cited in the problem reports,
 something like 768 is around the minimum size that works, and of
 course, anything smaller than 1024 or really 1536 (or 2048) bits is
 too small for today's crypto standards.

 So the question here is if this is also the problem you saw - are you
 using client or server certificates with short keys?

 What this explanation doesn't explain is why the problem occurs with
 9.4's libpq5 while it works with 9.3's. The libssl version used for
 building these packages should really be the same, 9.3.5-2.pgdg70+1
 was built just two days ago as well.

 I'm CCing -hackers, maybe someone there has an idea.


Some googling shows that this could be because it's negotiating TLS 1.2
which the key is just too small for. And we did change that in 9.4 - commit
326e1d73c476a0b5061ef00134bdf57aed70d5e7 disabled SSL in favor of always
using TLS for security reasons.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-19 Thread Michael Paquier
On Fri, Dec 19, 2014 at 6:30 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 On 10/12/14 16:03, Petr Jelinek wrote:

 On 10/12/14 04:26, Michael Paquier wrote:

 On Thu, Dec 4, 2014 at 9:26 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:

 Yeah, it was raised. I don't think it was really addressed. There was
 lengthy discussion on whether to include LSN, node id, and/or some other
 information, but there was no discussion on:

 * What is a node ID?
 * How is it used?
 * Who assigns it?

 etc.

 None of those things are explained in the documentation nor code
 comments.



 Could it be possible to get a refresh on that before it bitrots too
 much or we'll simply forget about it? In the current state, there is
 no documentation able to explain what is the point of the nodeid
 stuff, how to use it and what use cases it is aimed at. The API in
 committs.c should as well be part of it. If that's not done, I think
 that it would be fair to remove this portion and simply WAL-log the
 commit timestamp its GUC is activated.


 I have this on my list so it's not being forgotten and I don't see it
 bitrotting unless we are changing XLog api again. I have bit busy
 schedule right now though, can it wait till next week? - I will post
 some code documentation patches then.
 as promised I am sending code-comment patch that explains the use of commit
 timestamps + nodeid C api for the conflict resolution, comments welcome.
Having some documentation with this example in doc/ would be more fruitful IMO.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Keep track of transaction commit timestamps

2014-12-19 Thread Petr Jelinek

On 19/12/14 13:17, Michael Paquier wrote:

On Fri, Dec 19, 2014 at 6:30 PM, Petr Jelinek p...@2ndquadrant.com wrote:

On 10/12/14 16:03, Petr Jelinek wrote:


On 10/12/14 04:26, Michael Paquier wrote:


On Thu, Dec 4, 2014 at 9:26 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:


Yeah, it was raised. I don't think it was really addressed. There was
lengthy discussion on whether to include LSN, node id, and/or some other
information, but there was no discussion on:

* What is a node ID?
* How is it used?
* Who assigns it?

etc.

None of those things are explained in the documentation nor code
comments.





Could it be possible to get a refresh on that before it bitrots too
much or we'll simply forget about it? In the current state, there is
no documentation able to explain what is the point of the nodeid
stuff, how to use it and what use cases it is aimed at. The API in
committs.c should as well be part of it. If that's not done, I think
that it would be fair to remove this portion and simply WAL-log the
commit timestamp its GUC is activated.



I have this on my list so it's not being forgotten and I don't see it
bitrotting unless we are changing XLog api again. I have bit busy
schedule right now though, can it wait till next week? - I will post
some code documentation patches then.

as promised I am sending code-comment patch that explains the use of commit
timestamps + nodeid C api for the conflict resolution, comments welcome.

Having some documentation with this example in doc/ would be more fruitful IMO.



I am not sure I see point in that, the GUC and SQL interfaces are 
documented in doc/ and we usually don't put documentation for C 
interfaces there.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Stephen Frost
Amit,

* Amit Kapila (amit.kapil...@gmail.com) wrote:
 1. Parallel workers help a lot when there is an expensive qualification
 to evaluated, the more expensive the qualification the more better are
 results.

I'd certainly hope so. ;)

 2. It works well for low selectivity quals and as the selectivity increases,
 the benefit tends to go down due to additional tuple communication cost
 between workers and master backend.

I'm a bit sad to hear that the communication between workers and the
master backend is already being a bottleneck.  Now, that said, the box
you're playing with looks to be pretty beefy and therefore the i/o
subsystem might be particularly good, but generally speaking, it's a lot
faster to move data in memory than it is to pull it off disk, and so I
wouldn't expect the tuple communication between processes to really be
the bottleneck...

 3. After certain point, increasing having more number of workers won't
 help and rather have negative impact, refer Test-4.

Yes, I see that too and it's also interesting- have you been able to
identify why?  What is the overhead (specifically) which is causing
that?

 I think as discussed previously we need to introduce 2 additional cost
 variables (parallel_startup_cost, cpu_tuple_communication_cost) to
 estimate the parallel seq scan cost so that when the tables are small
 or selectivity is high, it should increase the cost of parallel plan.

I agree that we need to figure out a way to cost out parallel plans, but
I have doubts about these being the right way to do that.  There has
been quite a bit of literature regarding parallel execution and
planning- have you had a chance to review anything along those lines?
We certainly like to draw on previous experiences and analysis rather
than trying to pave our own way.

With these additional costs comes the consideration that we're looking
for a wall-clock runtime proxy and therefore, while we need to add costs
for parallel startup and tuple communication, we have to reduce the
overall cost because of the parallelism or we'd never end up choosing a
parallel plan.  Is the thought to simply add up all the costs and then
divide?  Or perhaps to divide the cost of the actual plan but then add
in the parallel startup cost and the tuple communication cost?

Perhaps there has been prior discussion on these points but I'm thinking
we need a README or similar which discusses all of this and includes any
references out to academic papers or similar as appropriate.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] Bogus WAL segments archived after promotion

2014-12-19 Thread Heikki Linnakangas
When streaming replication was introduced in 9.0, we started to recycle 
old WAL segments in archive recovery, like we do during normal 
operation. The WAL segments are recycled on the current timeline. There 
is no guarantee that they are useful, if the current timeline changes, 
because we step to recover another timeline after that, or the standby 
is promoted, but that was thought to be harmless.


However, consider what happens after a server is promoted, and WAL 
archiving is enabled. The server's pg_xlog directory will look something 
like this:



-rw--- 1 heikki heikki 16777216 Dec 19 14:22 00010005
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010006
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010007
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010008
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010009
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000A
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000B
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000C
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000D
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000E
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000F
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010010
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010011
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010012
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010013
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010014
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010015
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010016
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010017
-rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010018
-rw--- 1 heikki heikki 16777216 Dec 19 14:24 00010019
-rw--- 1 heikki heikki 16777216 Dec 19 14:22 0001001A
-rw--- 1 heikki heikki 16777216 Dec 19 14:22 0001001B
-rw--- 1 heikki heikki 16777216 Dec 19 14:22 0001001C
-rw--- 1 heikki heikki 16777216 Dec 19 14:24 00020019
-rw--- 1 heikki heikki 16777216 Dec 19 14:24 0002001A
-rw--- 1 heikki heikki   42 Dec 19 14:24 0002.history


The files on timeline 1, up to 00010019, are valid 
segments, streamed from the primary or restored from the WAL archive. 
The segments 0001001A and 0001001B are 
recycled segments that haven't been reused yet. Their contents are not 
valid (they contain records from some earlier point in WAL, but it might 
as well be garbage).


The server was promoted within the segment 19, and a new timeline was 
started. Segments 00020019 and 0002001A 
contain valid WAL on the new timeline.


Now, after enough time passes that the bogus 0001001A 
and 0001001B segments become old enough to be recycled, 
the system will see that there is no .ready or .done file for them, and 
will create .ready files so that they are archived. And they are 
archived. That's bogus, because the files are bogus. Worse, if the 
primary server where this server was forked off from continues running, 
and creates the genuine 0001001A and 
0001001B segments, it can fail to archive them if the 
standby had already archived the bogus segments with the same names.


We must somehow prevent the recycled, but not yet used, segments from 
being archived. One idea is to not create them in the first place, i.e. 
don't recycle old segments during recovery, just delete them and have 
new ones be created on demand. That's simple, but would hurt performance.


I'm thinking that we should add a step to promotion, where we scan 
pg_xlog for any segments higher than the timeline switch point, and 
remove them, or mark them with .done so that they are not archived. 
There might be some real WAL that was streamed from the primary, but not 
yet applied, but such WAL is of no interest to that server anyway, after 
it's been promoted. It's a bit disconcerting to zap WAL that's valid, 
even if doesn't belong to the current server's timeline history, because 
as a general rule it's good to avoid destroying evidence that might be 
useful in debugging. There isn't much difference between removing them 
immediately and marking them as .done, though, because they will 
eventually be removed/recycled anyway if they're marked as .done.


The archival behaviour at promotion is a bit inconsistent and weird 
anyway; even valid, streamed WAL is marked as .done and not archived 
anyway, except for the last partial segment. We're 

Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Robert Haas
On Fri, Dec 19, 2014 at 7:51 AM, Stephen Frost sfr...@snowman.net wrote:
 3. After certain point, increasing having more number of workers won't
 help and rather have negative impact, refer Test-4.

 Yes, I see that too and it's also interesting- have you been able to
 identify why?  What is the overhead (specifically) which is causing
 that?

Let's rewind.  Amit's results show that, with a naive algorithm
(pre-distributing equal-sized chunks of the relation to every worker)
and a fairly-naive first cut at how to pass tuples around (I believe
largely from what I did in pg_background) he can sequential-scan a
table with 8 workers at 6.4 times the speed of a single process, and
you're complaining because it's not efficient enough?  It's a first
draft!  Be happy we got 6.4x, for crying out loud!

The barrier to getting parallel sequential scan (or any parallel
feature at all) committed is not going to be whether an 8-way scan is
6.4 times faster or 7.1 times faster or 7.8 times faster.  It's going
to be whether it's robust and won't break things.  We should be
focusing most of our effort here on identifying and fixing robustness
problems.  I'd vote to commit a feature like this with a 3x
performance speedup if I thought it was robust enough.

I'm not saying we shouldn't try to improve the performance here - we
definitely should.  But I don't think we should say, oh, an 8-way scan
isn't good enough, we need a 16-way or 32-way scan in order for this
to be efficient.  That is getting your priorities quite mixed up.

 I think as discussed previously we need to introduce 2 additional cost
 variables (parallel_startup_cost, cpu_tuple_communication_cost) to
 estimate the parallel seq scan cost so that when the tables are small
 or selectivity is high, it should increase the cost of parallel plan.

 I agree that we need to figure out a way to cost out parallel plans, but
 I have doubts about these being the right way to do that.  There has
 been quite a bit of literature regarding parallel execution and
 planning- have you had a chance to review anything along those lines?
 We certainly like to draw on previous experiences and analysis rather
 than trying to pave our own way.

I agree that it would be good to review the literature, but am not
aware of anything relevant.  Could you (or can anyone) provide some
links?

 With these additional costs comes the consideration that we're looking
 for a wall-clock runtime proxy and therefore, while we need to add costs
 for parallel startup and tuple communication, we have to reduce the
 overall cost because of the parallelism or we'd never end up choosing a
 parallel plan.  Is the thought to simply add up all the costs and then
 divide?  Or perhaps to divide the cost of the actual plan but then add
 in the parallel startup cost and the tuple communication cost?

This has been discussed, on this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication Helpers WIP for discussion

2014-12-19 Thread Petr Jelinek

On 15/12/14 19:42, Robert Haas wrote:

On Mon, Dec 15, 2014 at 12:57 AM, Petr Jelinek p...@2ndquadrant.com wrote:

we've made few helper functions for making logical replication easier, I
bundled it into contrib module as this is mainly for discussion at this time
(I don't expect this to get committed any time soon, but it is good way to
iron out protocol, etc).

I created sample logical decoding plugin that uses those functions and which
can be used for passing DML changes in platform/version independent
(hopefully) format.

I will post sample apply BG worker also once I get some initial feedback
about this.

It's hard to write tests for this as the binary changes contain transaction
ids and timestamps so the data changes constantly.

This is of course based on the BDR work Andres, Craig and myself have been
doing.


I can't understand, either from what you've written here or the rather
sparse comments in the patch, what this might be good for.



What I tried to achieve here is to provide solution to many of the 
common problems faced by logical replication solutions. I believe the 
first step in designing the logical replication (now that we have the 
logical decoding) is making the output plugin and the efficient protocol 
so I started with that.


The code itself provides two main parts:
First is the lrep_utils common utility functions that solve things like 
transporting DML statements, and more importantly the changed data in 
efficient manner, trying to not do any conversion if not needed (when 
architecture/version matches) but falling back to binary/textual IO 
representation of individual types so that the cross platform/version 
replication works too. I think those should eventually end up in core 
(ie not in contrib) as they are helper functions likely to be shared by 
multiple extensions, but for now I keep them with the rest of the 
contrib module as I feel better experimenting inside that module.
There are also read functions that show how the other side could look 
like, but they are currently unused as the example apply worker is not 
part of the submission yet.


The second part is extensible output plugin which serves both as an 
example of the intended use of those common utility functions and also 
as actual working solution that can be used as base for several 
replication solutions.
It provides hooks for the replication solutions built on top of it that 
can be used for deciding if to replicate specific action on specific 
object and also injecting additional information to both BEGIN and 
COMMIT message - this can be useful for example when you are forwarding 
changes from another node and you wish to pass the information about the 
original node to the target one.


What I hope to get from this is agreement on the general approach and 
protocol so that we can have common base which will both make it easier 
to create external logical replication solutions and also eventually 
lead to full logical replication inside core PostgreSQL.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NUMERIC private methods?

2014-12-19 Thread Robert Haas
On Thu, Dec 18, 2014 at 11:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 What it boils down to is that numeric is great for storing given decimal
 inputs exactly, and it can do exact addition/subtraction/multiplication
 on those too, but as soon as you get into territory where the result is
 fundamentally inexact it is *not* promised to be better than float8.

I think what it boils down to is that several people here (and I'll
add my voice to the chorus) are saying, hey, numeric is really useful,
and we'd like to be able to manipulate numerics without all the palloc
and fmgr overhead, and your response appears to be to say, use float8.
That may be the right answer in some cases, but certainly not in all.

 We could probably improve on this if we were to redesign the algorithms
 around a concept of decimal floating-point, rather than decimal
 fixed-point as it is now.  But I'm not sure how well that would comport
 with the SQL standard.  And I'm very not sure that we could still do it
 once we'd tied one hand behind our backs by virtue of exporting a bunch
 of the internals as public API.

You make this argument every time somebody wants to drop static from a
function or stick PGDLLIMPORT on a variable, and frankly I think it's
pretty developer-hostile.  I would much rather see us go ahead and do
those things and if people later complain that we broke stuff, we'll
go tell them to pound sand.  That's what we said when people
complained about relistemp - relpersistence, and the number of people
who are affected by a change in the system catalogs has got to be a
good two orders of magnitude more than the number of people who are
going to notice a change in the C API.  Giving developers the ability
to write extensions that *might* get broken by future changes is a lot
better than not giving them that ability in the first place.  There's
only a problem for the core project if we think we're bound not to
break the APIs in a future release, and I don't feel so bound.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-19 Thread Alvaro Herrera
The fact that the ROLE_ATTR_* definitions are in pg_authid.h means that
there are now a lot of files that need to include that one.  I think the
includes relative to ACLs and roles is rather messy now, and this patch
makes it a bit worse.

I think we should create a new header file (maybe acltypes.h or
acldefs.h), which only contains the AclMode and RoleAttr typedefs and
their associated defines; that one would be included from parsenodes.h,
acl.h and pg_authid.h.  Everything else would be in acl.h.  So code that
currently checks permissions using only acl.h do not need any extra
includes.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel mode and parallel contexts

2014-12-19 Thread Robert Haas
On Wed, Dec 17, 2014 at 2:53 PM, Robert Haas robertmh...@gmail.com wrote:
 In the meantime, I had a good chat with Heikki on IM yesterday which
 gave me some new ideas on how to fix up the transaction handling in
 here, which I am working on implementing.  So hopefully I will have
 that by then.

And here is a new version.  This version has some real integration
with the transaction system, along the lines of what Heikki suggested
to me, so hopefully tuple visibility calculations in a parallel worker
will now produce the right answers, though I need to integrate this
with code to actually do something-or-other in parallel in order to
really test that.  There are still some problems with parallel worker
shutdown.  As hard as I tried to fight it, I'm gradually resigning
myself to the fact that we're probably going to have to set things up
so that the worker waits for all of its children to die during abort
processing (and during commit processing, but that's not bothersome).
Otherwise, to take just one example, chaos potentially ensues if you
run a parallel query in a subtransaction and then roll back to a
savepoint.  But this version just kills the workers and doesn't
actually wait for them to die; I'll see about fixing that, but wanted
to send this around for comments first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/parallel_dummy/Makefile b/contrib/parallel_dummy/Makefile
new file mode 100644
index 000..de00f50
--- /dev/null
+++ b/contrib/parallel_dummy/Makefile
@@ -0,0 +1,19 @@
+MODULE_big = parallel_dummy
+OBJS = parallel_dummy.o $(WIN32RES)
+PGFILEDESC = parallel_dummy - dummy use of parallel infrastructure
+
+EXTENSION = parallel_dummy
+DATA = parallel_dummy--1.0.sql
+
+REGRESS = parallel_dummy
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/parallel_dummy
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/parallel_dummy/parallel_dummy--1.0.sql b/contrib/parallel_dummy/parallel_dummy--1.0.sql
new file mode 100644
index 000..2a7251c
--- /dev/null
+++ b/contrib/parallel_dummy/parallel_dummy--1.0.sql
@@ -0,0 +1,7 @@
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use CREATE EXTENSION parallel_dummy to load this file. \quit
+
+CREATE FUNCTION parallel_dummy(sleep_time pg_catalog.int4,
+			  nworkers pg_catalog.int4)
+RETURNS pg_catalog.void STRICT
+	AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/contrib/parallel_dummy/parallel_dummy.c b/contrib/parallel_dummy/parallel_dummy.c
new file mode 100644
index 000..99b830f
--- /dev/null
+++ b/contrib/parallel_dummy/parallel_dummy.c
@@ -0,0 +1,82 @@
+/*--
+ *
+ * parallel_dummy.c
+ *		Test harness code for parallel mode code.
+ *
+ * Copyright (C) 2013-2014, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/parallel_dummy/parallel_dummy.c
+ *
+ * -
+ */
+
+#include postgres.h
+
+#include access/parallel.h
+#include access/xact.h
+#include fmgr.h
+#include miscadmin.h
+#include utils/builtins.h
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(parallel_dummy);
+
+#define		PARALLEL_DUMMY_KEY			1
+
+typedef struct {
+	int32		sleep_time;
+} ParallelDummyInfo;
+
+void		_PG_init(void);
+void		worker_main(shm_toc *toc);
+
+Datum
+parallel_dummy(PG_FUNCTION_ARGS)
+{
+	int32		sleep_time = PG_GETARG_INT32(0);
+	int32		nworkers = PG_GETARG_INT32(1);
+	bool		already_in_parallel_mode = IsInParallelMode();
+	ParallelContext *pcxt;
+	ParallelDummyInfo *info;
+
+	if (nworkers  1)
+		ereport(ERROR,
+(errmsg(number of parallel workers must be positive)));
+
+	if (!already_in_parallel_mode)
+		EnterParallelMode();
+
+	pcxt = CreateParallelContextForExtension(parallel_dummy, worker_main,
+			 nworkers);
+	shm_toc_estimate_chunk(pcxt-estimator, sizeof(ParallelDummyInfo));
+	shm_toc_estimate_keys(pcxt-estimator, 1);
+	InitializeParallelDSM(pcxt);
+	info = shm_toc_allocate(pcxt-toc, sizeof(ParallelDummyInfo));
+	info-sleep_time = sleep_time;
+	shm_toc_insert(pcxt-toc, PARALLEL_DUMMY_KEY, info);
+	LaunchParallelWorkers(pcxt);
+
+	/* here's where we do the real work ... */
+	DirectFunctionCall1(pg_sleep, Float8GetDatum((float8) sleep_time));
+
+	DestroyParallelContext(pcxt);
+
+	if (!already_in_parallel_mode)
+		ExitParallelMode();
+
+	PG_RETURN_VOID();
+}
+
+void
+worker_main(shm_toc *toc)
+{
+	ParallelDummyInfo *info;
+
+	info = shm_toc_lookup(toc, PARALLEL_DUMMY_KEY);
+	Assert(info != NULL);
+
+	/* here's where we do the real work ... */
+	DirectFunctionCall1(pg_sleep, Float8GetDatum((float8) info-sleep_time));
+}
diff --git a/contrib/parallel_dummy/parallel_dummy.control b/contrib/parallel_dummy/parallel_dummy.control
new file 

Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Dec 19, 2014 at 7:51 AM, Stephen Frost sfr...@snowman.net wrote:
  3. After certain point, increasing having more number of workers won't
  help and rather have negative impact, refer Test-4.
 
  Yes, I see that too and it's also interesting- have you been able to
  identify why?  What is the overhead (specifically) which is causing
  that?
 
 Let's rewind.  Amit's results show that, with a naive algorithm
 (pre-distributing equal-sized chunks of the relation to every worker)
 and a fairly-naive first cut at how to pass tuples around (I believe
 largely from what I did in pg_background) he can sequential-scan a
 table with 8 workers at 6.4 times the speed of a single process, and
 you're complaining because it's not efficient enough?  It's a first
 draft!  Be happy we got 6.4x, for crying out loud!

He also showed cases where parallelizing a query even with just two
workers caused a serious increase in the total runtime (Test 6).  Even
having four workers was slower in that case, but a modest performance
improvment was reached at eight but then no improvement from that was
seen when running with 16.

Being able to understand what's happening will inform how we cost this
to, hopefully, achieve the 6.4x gains where we can and avoid the
pitfalls of performing worse than a single thread in cases where
parallelism doesn't help.  What would likely be very helpful in the
analysis would be CPU time information- when running with eight workers,
were we using 800% CPU (8x 100%), or something less (perhaps due to
locking, i/o, or other processes).

Perhaps it's my fault for not being surprised that a naive first cut
gives us such gains as my experience with parallel operations and PG has
generally been very good (through the use of multiple connections to the
DB and therefore independent transactions, of course).  I'm very excited
that we're making such great progress towards having parallel execution
in the DB as I've often used PG in data warehouse use-cases.

 The barrier to getting parallel sequential scan (or any parallel
 feature at all) committed is not going to be whether an 8-way scan is
 6.4 times faster or 7.1 times faster or 7.8 times faster.  It's going
 to be whether it's robust and won't break things.  We should be
 focusing most of our effort here on identifying and fixing robustness
 problems.  I'd vote to commit a feature like this with a 3x
 performance speedup if I thought it was robust enough.

I don't have any problem if an 8-way scan is 6.4x faster or if it's 7.1
times faster, but what if that 3x performance speedup is only achieved
when running with 8 CPUs at 100%?  We'd have to coach our users to
constantly be tweaking the enable_parallel_query (or whatever) option
for the queries where it helps and turning it off for others.  I'm not
so excited about that.

 I'm not saying we shouldn't try to improve the performance here - we
 definitely should.  But I don't think we should say, oh, an 8-way scan
 isn't good enough, we need a 16-way or 32-way scan in order for this
 to be efficient.  That is getting your priorities quite mixed up.

I don't think I said that.  What I was getting at is that we need a cost
system which accounts for the costs accurately enough that we don't end
up with worse performance than single-threaded operation.  In general, I
don't expect that to be very difficult and we can be conservative in the
initial releases to hopefully avoid regressions, but it absolutely needs
consideration.

  I think as discussed previously we need to introduce 2 additional cost
  variables (parallel_startup_cost, cpu_tuple_communication_cost) to
  estimate the parallel seq scan cost so that when the tables are small
  or selectivity is high, it should increase the cost of parallel plan.
 
  I agree that we need to figure out a way to cost out parallel plans, but
  I have doubts about these being the right way to do that.  There has
  been quite a bit of literature regarding parallel execution and
  planning- have you had a chance to review anything along those lines?
  We certainly like to draw on previous experiences and analysis rather
  than trying to pave our own way.
 
 I agree that it would be good to review the literature, but am not
 aware of anything relevant.  Could you (or can anyone) provide some
 links?

There's certainly documentation available from the other RDBMS' which
already support parallel query, as one source.  Other academic papers
exist (and once you've linked into one, the references and prior work
helps bring in others).  Sadly, I don't currently have ACM access (might
have to change that..), but there are publicly available papers also,
such as:

http://i.stanford.edu/pub/cstr/reports/cs/tr/96/1570/CS-TR-96-1570.pdf
http://www.vldb.org/conf/1998/p251.pdf
http://www.cs.uiuc.edu/class/fa05/cs591han/sigmodpods04/sigmod/pdf/I-001c.pdf

  With these additional costs comes the consideration that we're looking
 

Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Marko Tiikkaja

On 12/19/14 3:27 PM, Stephen Frost wrote:

We'd have to coach our users to
constantly be tweaking the enable_parallel_query (or whatever) option
for the queries where it helps and turning it off for others.  I'm not
so excited about that.


I'd be perfectly (that means 100%) happy if it just defaulted to off, 
but I could turn it up to 11 whenever I needed it.  I don't believe to 
be the only one with this opinion, either.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Stephen Frost
* Marko Tiikkaja (ma...@joh.to) wrote:
 On 12/19/14 3:27 PM, Stephen Frost wrote:
 We'd have to coach our users to
 constantly be tweaking the enable_parallel_query (or whatever) option
 for the queries where it helps and turning it off for others.  I'm not
 so excited about that.
 
 I'd be perfectly (that means 100%) happy if it just defaulted to
 off, but I could turn it up to 11 whenever I needed it.  I don't
 believe to be the only one with this opinion, either.

Perhaps we should reconsider our general position on hints then and
add them so users can define the plan to be used..  For my part, I don't
see this as all that much different.

Consider if we were just adding HashJoin support today as an example.
Would we be happy if we had to default to enable_hashjoin = off?  Or if
users had to do that regularly because our costing was horrid?  It's bad
enough that we have to resort to those tweaks today in rare cases.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Robert Haas
On Fri, Dec 19, 2014 at 9:39 AM, Stephen Frost sfr...@snowman.net wrote:
 Perhaps we should reconsider our general position on hints then and
 add them so users can define the plan to be used..  For my part, I don't
 see this as all that much different.

 Consider if we were just adding HashJoin support today as an example.
 Would we be happy if we had to default to enable_hashjoin = off?  Or if
 users had to do that regularly because our costing was horrid?  It's bad
 enough that we have to resort to those tweaks today in rare cases.

If you're proposing that it is not reasonable to have a GUC that
limits the degree of parallelism, then I think that's outright crazy:
that is probably the very first GUC we need to add.  New query
processing capabilities can entail new controlling GUCs, and
parallelism, being as complex at it is, will probably add several of
them.

But the big picture here is that if you want to ever have parallelism
in PostgreSQL at all, you're going to have to live with the first
version being pretty crude.  I think it's quite likely that the first
version of parallel sequential scan will be just as buggy as Hot
Standby was when we first added it, or as buggy as the multi-xact code
was when it went in, and probably subject to an even greater variety
of taxing limitations than any feature we've committed in the 6 years
I've been involved in the project.  We get to pick between that and
not having it at all.

I'll take a look at the papers you sent about parallel query
optimization, but personally I think that's putting the cart not only
before the horse but also before the road.  For V1, we need a query
optimization model that does not completely suck - no more.  The key
criterion here is that this has to WORK.  There will be time enough to
improve everything else once we reach that goal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Heikki Linnakangas

On 12/19/2014 04:39 PM, Stephen Frost wrote:

* Marko Tiikkaja (ma...@joh.to) wrote:

On 12/19/14 3:27 PM, Stephen Frost wrote:

We'd have to coach our users to
constantly be tweaking the enable_parallel_query (or whatever) option
for the queries where it helps and turning it off for others.  I'm not
so excited about that.


I'd be perfectly (that means 100%) happy if it just defaulted to
off, but I could turn it up to 11 whenever I needed it.  I don't
believe to be the only one with this opinion, either.


Perhaps we should reconsider our general position on hints then and
add them so users can define the plan to be used..  For my part, I don't
see this as all that much different.

Consider if we were just adding HashJoin support today as an example.
Would we be happy if we had to default to enable_hashjoin = off?  Or if
users had to do that regularly because our costing was horrid?  It's bad
enough that we have to resort to those tweaks today in rare cases.


This is somewhat different. Imagine that we achieve perfect 
parallelization, so that when you set enable_parallel_query=8, every 
query runs exactly 8x faster on an 8-core system, by using all eight cores.


Now, you might still want to turn parallelization off, or at least set 
it to a lower setting, on an OLTP system. You might not want a single 
query to hog all CPUs to run one query faster; you'd want to leave some 
for other queries. In particular, if you run a mix of short 
transactions, and some background-like tasks that run for minutes or 
hours, you do not want to starve the short transactions by giving all 
eight CPUs to the background task.


Admittedly, this is a rather crude knob to tune for such things,
but it's quite intuitive to a DBA: how many CPU cores is one query 
allowed to utilize? And we don't really have anything better.


In real life, there's always some overhead to parallelization, so that 
even if you can make one query run faster by doing it, you might hurt 
overall throughput. To some extent, it's a latency vs. throughput 
tradeoff, and it's quite reasonable to have a GUC for that because 
people have different priorities.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [pgsql-pkg-debian] Updated libpq5 packages cause connection errors on postgresql 9.2

2014-12-19 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Fri, Dec 19, 2014 at 11:52 AM, Christoph Berg c...@df7cb.de wrote:
 Googling for digest too big for rsa key seems to indicate that this
 problem occurs when you are using (client?) certificates with short
 RSA keys. 512 bits is most often cited in the problem reports,
 something like 768 is around the minimum size that works, and of
 course, anything smaller than 1024 or really 1536 (or 2048) bits is
 too small for today's crypto standards.
 
 So the question here is if this is also the problem you saw - are you
 using client or server certificates with short keys?
 
 What this explanation doesn't explain is why the problem occurs with
 9.4's libpq5 while it works with 9.3's. The libssl version used for
 building these packages should really be the same, 9.3.5-2.pgdg70+1
 was built just two days ago as well.

 Some googling shows that this could be because it's negotiating TLS 1.2
 which the key is just too small for. And we did change that in 9.4 - commit
 326e1d73c476a0b5061ef00134bdf57aed70d5e7 disabled SSL in favor of always
 using TLS for security reasons.

Hm ... the 9.4 release notes fail to describe that change adequately, and
certainly don't mention that it would have any compatibility implications.
Guess that needs to be fixed.  Does anyone know offhand what the change in
the minimum key length is across SSL/TLS versions, exactly?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2014-12-19 Thread Steve Singer

On 12/15/2014 11:38 AM, Alex Shulgin wrote:

These are all valid concerns IMHO. Attached is the modified version of 
the original patch by Craig, addressing the handling of the new 
hint_log error data field and removing the client-side HINT. I'm also 
moving this to the current CF. -- Alex






The updated patch removes the client message. I feel this addresses 
Peter's concern.  The updated patch also addresses the missing free I 
mentioned in my original review.


The patch applies cleanly to head,

One thing I'm noticed while testing is that if you do the following

1. Edit pg_hba.conf  to allow a connection from somewhere
2. Attempt to connect, you get an error + the hind in the server log
3. Make another change to pg_hba.conf and introduce an error in the file
4. Attempt to reload the config files, pg_hba.conf the reload fails 
because of the above error
5. Attempt to connect you still can't connect since we have the original 
pg_hba.conf loaded


You don't get the warning in step 5 since we update PgReloadTime even if 
the reload didn't work.


Is this enough of a concern to justify the extra complexity in tracking 
the reload time of the pg_hba and pg_ident times directly?









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [pgsql-pkg-debian] Updated libpq5 packages cause connection errors on postgresql 9.2

2014-12-19 Thread Chris Butler
Hi Christoph,

- Original Message -
 From: Christoph Berg c...@df7cb.de
 To: Chris Butler cbut...@zedcore.com

 Googling for digest too big for rsa key seems to indicate that this
 problem occurs when you are using (client?) certificates with short
 RSA keys. 512 bits is most often cited in the problem reports,
 something like 768 is around the minimum size that works, and of
 course, anything smaller than 1024 or really 1536 (or 2048) bits is
 too small for today's crypto standards.
 
 So the question here is if this is also the problem you saw - are you
 using client or server certificates with short keys?

Yes, that would appear to be the case - the key we're using is only 512 bits. 
I'll make sure we replace the certificate before re-applying the update (which 
will probably be after the holidays now).
 
 What this explanation doesn't explain is why the problem occurs with
 9.4's libpq5 while it works with 9.3's. The libssl version used for
 building these packages should really be the same, 9.3.5-2.pgdg70+1
 was built just two days ago as well.

For info, I can confirm that both libraries are loading the same libssl:

zedcore@web2:/tmp/usr/lib/x86_64-linux-gnu$ ldd 
/usr/lib/x86_64-linux-gnu/libpq.so.5 | grep libssl
libssl.so.1.0.0 = /usr/lib/x86_64-linux-gnu/libssl.so.1.0.0 
(0x7f3e8d898000)
zedcore@web2:/tmp/usr/lib/x86_64-linux-gnu$ ldd ./libpq.so.5 | grep libssl
libssl.so.1.0.0 = /usr/lib/x86_64-linux-gnu/libssl.so.1.0.0 
(0x7f5d76176000)


I can see a few changes are listed in the 9.4 changelog relating to SSL, so my 
guess would be one of those changes has altered the behaviour of libssl when 
presented with a small key.

-- 
Chris Butler
Zedcore Systems Ltd

Telephone: 0114 303 0666
Direct dial: 0114 303 0572


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2014-12-19 Thread Alex Shulgin

Steve Singer st...@ssinger.info writes:

 On 12/15/2014 11:38 AM, Alex Shulgin wrote:

 These are all valid concerns IMHO. Attached is the modified version
 of the original patch by Craig, addressing the handling of the new
 hint_log error data field and removing the client-side HINT. I'm
 also moving this to the current CF. -- Alex




 The updated patch removes the client message. I feel this addresses
 Peter's concern.  The updated patch also addresses the missing free I
 mentioned in my original review.

 The patch applies cleanly to head,

 One thing I'm noticed while testing is that if you do the following

 1. Edit pg_hba.conf  to allow a connection from somewhere
 2. Attempt to connect, you get an error + the hind in the server log
 3. Make another change to pg_hba.conf and introduce an error in the file
 4. Attempt to reload the config files, pg_hba.conf the reload fails
 because of the above error
 5. Attempt to connect you still can't connect since we have the
 original pg_hba.conf loaded

 You don't get the warning in step 5 since we update PgReloadTime even
 if the reload didn't work.

 Is this enough of a concern to justify the extra complexity in
 tracking the reload time of the pg_hba and pg_ident times directly?

I don't think so.  The scenario this patch relies on assumes that the
DBA will remember to look in the log if something goes wrong, and in
your case there would be a message like the following:

WARNING:  pg_hba.conf not reloaded

So an extra hint about file timestamp is unneeded.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2014-12-19 Thread Craig Ringer
On 12/19/2014 11:41 PM, Alex Shulgin wrote:
 I don't think so.  The scenario this patch relies on assumes that the
 DBA will remember to look in the log if something goes wrong

Well, actually, the whole point was that the user who's connecting
(likely also the DBA) will see a HINT telling them that there's more
in the logs.

But that got removed.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NUMERIC private methods?

2014-12-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I think what it boils down to is that several people here (and I'll
 add my voice to the chorus) are saying, hey, numeric is really useful,
 and we'd like to be able to manipulate numerics without all the palloc
 and fmgr overhead, and your response appears to be to say, use float8.
 That may be the right answer in some cases, but certainly not in all.

Well, there are two components to what I'm saying.  One is that the
example David started with looks like it could use some better-informed
consideration about which datatype to use.  The other is that numeric
leaves quite a lot to be desired still, and someday we might want to fix
that, and that might require breaking the APIs you want to expose.

 You make this argument every time somebody wants to drop static from a
 function or stick PGDLLIMPORT on a variable, and frankly I think it's
 pretty developer-hostile.  I would much rather see us go ahead and do
 those things and if people later complain that we broke stuff, we'll
 go tell them to pound sand.

Really.  I will remember that next time you bitch about us changing some
extension-visible behavior, which AFAIR you are usually one of the first
to complain about.

Anyway, if we do this, I concur with Alvaro's suggestion that the
additional exposure be in a header file named something like
numeric_private.h, so that there's less room for complaint when
we change it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)

2014-12-19 Thread Andres Freund
Hi,

When debugging lwlock issues I found PRINT_LWDEBUG/LOG_LWDEBUG rather
painful to use because of the amount of elog contexts/statements
emitted. Given the number of lwlock acquirations that's just not doable.

To solve that during development I've solved that by basically
replacing:
if (Trace_lwlocks)
   elog(LOG, %s(%s %d): %s, where, name, index, msg);

with something like

if (Trace_lwlocks)
{
ErrorContextCallback *old_error_context_stack;
...
old_error_context_stack = error_context_stack;
error_context_stack = NULL;
ereport(LOG,
   (errhidestmt(true),
errmsg(%s(%s %d): %s, where, T_NAME(lock),
T_ID(lock), msg)));

I think it'd generally be useful to have something like errhidecontext()
akin to errhidestatement() to avoid things like the above.

The usecases wher eI see this as being useful is high volume debug
logging, not normal messages...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NUMERIC private methods?

2014-12-19 Thread Robert Haas
On Fri, Dec 19, 2014 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, there are two components to what I'm saying.  One is that the
 example David started with looks like it could use some better-informed
 consideration about which datatype to use.  The other is that numeric
 leaves quite a lot to be desired still, and someday we might want to fix
 that, and that might require breaking the APIs you want to expose.

As I say, I'm OK with that.  That file has been nearly-static for a
really long time, so I don't expect a massive rewrite imminently.  But
if it happens, and it breaks things for people relying on those
functions, tough luck for them.

 You make this argument every time somebody wants to drop static from a
 function or stick PGDLLIMPORT on a variable, and frankly I think it's
 pretty developer-hostile.  I would much rather see us go ahead and do
 those things and if people later complain that we broke stuff, we'll
 go tell them to pound sand.

 Really.  I will remember that next time you bitch about us changing some
 extension-visible behavior, which AFAIR you are usually one of the first
 to complain about.

Really?  I have bitched about what I see as nearly-pointless
reorganizations of header files, because I personally place very low
priority on speeding up incremental compiles, and most of those
changes have no other benefit.  But I can't remember ever bitching
about function signature changes or changes to the types of global
variables, or even the disappearance of previously-exposed types or
global variables.  And I daresay, given the nature of my
responsibilities at EDB, such things have at least one order of
magnitude more impact on me than they do on most extension
maintainers.  If it were up to me, I'd go stick PGDLLIMPORT on every
global variable I could find; I estimate that would make things easier
for me, say, five times as often as it would make them harder.  The
amount of time we've spent tracking down regressions on Windows
because some loadable module of ours depended on a variable that
wasn't PGDLLIMPORT'd is large - everything works fine on Linux, and
sometimes survives cursory testing on Windows too, but then breaks in
some obscure scenario.  And there have also been cases where we've had
to work around the absence of PGDLLIMPORT markings with ugly hacks and
then that workaround code has turned out to be buggy.  I would not
argue for indiscriminately making global every function we have
anywhere in the backend, but I would favor a policy of being
significantly more liberal about it than we have been heretofore.

 Anyway, if we do this, I concur with Alvaro's suggestion that the
 additional exposure be in a header file named something like
 numeric_private.h, so that there's less room for complaint when
 we change it.

Yes, I liked that proposal, too.  Also, I'm not sure whether there's
room to worry that making those function extern rather than static
would defeat compiler optimizations that materially affect
performance.  That might just be paranoia on my part, but I've seen
cases where it matters.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2014-12-19 Thread Alex Shulgin

Craig Ringer cr...@2ndquadrant.com writes:

 On 12/19/2014 11:41 PM, Alex Shulgin wrote:
 I don't think so.  The scenario this patch relies on assumes that the
 DBA will remember to look in the log if something goes wrong

 Well, actually, the whole point was that the user who's connecting
 (likely also the DBA) will see a HINT telling them that there's more
 in the logs.

 But that got removed.

While it sounds useful at first glance, I believe Peter's arguments
upthread provide enough justification to avoid sending the hint to the
client.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-19 Thread Arthur Silva
On Fri, Dec 19, 2014 at 6:28 AM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:

 On 19/12/14 20:48, Andres Freund wrote:

 On 2014-12-18 10:02:25 -0800, Joshua D. Drake wrote:

 I think a lot of hackers forget exactly how tender their egos are. Now I
 say
 this knowing that a lot of them will go, Oh give me a break but as
 someone
 who employs hackers, deals with open source AND normal people :P every
 single day, I can tell you without a single inch of sarcasm that petting
 egos is one of the ways you get things done in the open source (and
 really
 any male dominated) community.


 To me that's a bit over the top stereotyping.


 +1

 Having been mentioned one or two times myself - it was an unexpected wow
 - cool rather than a grumpy/fragile I must be noticed thing. I think
 some folk have forgotten the underlying principle of the open source
 community - it is about freely giving - time or code etc. The there must
 be something in it for me me me meme is - well - the *other* world view.

  However, doing crappy work and let's not be shy about it, there is NOTHING
 fun about reviewing someone else's code needs to have incentive.


 FWIW, I don't agree with this at all. Reviewing code can be quite
 interesting - with the one constraint that the problem the patch solves
 needs to be somewhat interesting. The latter is what I think gets many
 of the more experienced reviewers - lots of the patches just solve stuff
 they don't care about.


 Yeah, and also it helps if the patch addresses an area that you at least
 know *something* about - otherwise it is really hard to review in any
 useful way.

 Cheers

 Mark



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


I'm trying to follow this thread as much as I could but I could've missed a
part of it.

Merit/credits aside what would really help Postgres right now is a more
streamlined/modern (the only words I could come up with) development
process.

Using the mailing list for everything is alright, it works. But there is
lot of tools that could be used along it:
gerrit/gitlab/github/bitbucket/jira and other tools that do: pull requests,
code review and bugs (or any combination of them).

That'd reduce friction for new contributors and further development. These
tools are being used everywhere and I find hard to believe that PG can't
benefit from any of them.

--
Arthur Silva


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-19 Thread Alvaro Herrera
Alvaro Herrera wrote:
 The fact that the ROLE_ATTR_* definitions are in pg_authid.h means that
 there are now a lot of files that need to include that one.  I think the
 includes relative to ACLs and roles is rather messy now, and this patch
 makes it a bit worse.
 
 I think we should create a new header file (maybe acltypes.h or
 acldefs.h), which only contains the AclMode and RoleAttr typedefs and
 their associated defines; that one would be included from parsenodes.h,
 acl.h and pg_authid.h.  Everything else would be in acl.h.  So code that
 currently checks permissions using only acl.h do not need any extra
 includes.

I propose this patch on top of Adam's v5.  Also included is a full patch
against master.


Unrelated: when changing from unified to context format, I saw
filterdiff fail to produce a complete patch, skipping some hunks in its
output.  My first impression was that I had dropped some hunks in git,
so I wasted some time comparing v5 and v6 by hand before remembering
that Michael Paquier had mentioned this problem previously.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 8475985..3181a79 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -22,7 +22,6 @@
 #include access/xlog_internal.h
 #include access/xlogutils.h
 #include catalog/catalog.h
-#include catalog/pg_authid.h
 #include catalog/pg_type.h
 #include funcapi.h
 #include miscadmin.h
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index a403c64..a6de2ff 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -28,7 +28,7 @@ all: $(BKIFILES) schemapg.h
 # indexing.h had better be last, and toasting.h just before it.
 
 POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
-	pg_proc.h pg_type.h pg_attribute.h pg_class.h \
+	acldefs.h pg_proc.h pg_type.h pg_attribute.h pg_class.h \
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
 	pg_language.h pg_largeobject_metadata.h pg_largeobject.h pg_aggregate.h \
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 4663429..21d282c 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -5038,18 +5038,18 @@ pg_extension_ownercheck(Oid ext_oid, Oid roleid)
  * roleid - the oid of the role to check.
  * attribute - the attribute to check.
  *
- * Note: Use this function for role attribute permission checking as it accounts
- * for superuser status.  It will always return true for roles with superuser
- * privileges unless the attribute being checked is CATUPDATE (superusers are not
- * allowed to bypass CATUPDATE permissions).
+ * Note: Use this function for role attribute permission checking as it
+ * accounts for superuser status.  It will always return true for roles with
+ * superuser privileges unless the attribute being checked is CATUPDATE
+ * (superusers are not allowed to bypass CATUPDATE permissions).
  *
- * Note: roles do not have owners per se; instead we use this test in
- * places where an ownership-like permissions test is needed for a role.
- * Be sure to apply it to the role trying to do the operation, not the
- * role being operated on!  Also note that this generally should not be
- * considered enough privilege if the target role is a superuser.
- * (We don't handle that consideration here because we want to give a
- * separate error message for such cases, so the caller has to deal with it.)
+ * Note: roles do not have owners per se; instead we use this test in places
+ * where an ownership-like permissions test is needed for a role.  Be sure to
+ * apply it to the role trying to do the operation, not the role being operated
+ * on!  Also note that this generally should not be considered enough privilege
+ * if the target role is a superuser.  (We don't handle that consideration here
+ * because we want to give a separate error message for such cases, so the
+ * caller has to deal with it.)
  */
 bool
 has_role_attribute(Oid roleid, RoleAttr attribute)
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 2929b66..415ac17 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -91,7 +91,7 @@ my $BOOTSTRAP_SUPERUSERID =
 my $PG_CATALOG_NAMESPACE =
   find_defined_symbol('pg_namespace.h', 'PG_CATALOG_NAMESPACE');
 my $ROLE_ATTR_ALL =
-  find_defined_symbol('pg_authid.h', 'ROLE_ATTR_ALL');
+  find_defined_symbol('acldefs.h', 'ROLE_ATTR_ALL');
 
 # Read all the input header files into internal data structures
 my $catalogs = Catalog::Catalogs(@input_files);
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 5bde610..564f77a 100644
--- a/src/backend/commands/user.c
+++ 

Re: [HACKERS] NUMERIC private methods?

2014-12-19 Thread David Fetter
On Thu, Dec 18, 2014 at 11:51:37PM -0300, Alvaro Herrera wrote:
 Robert Haas wrote:
 
  I think that's ridiculous.  You're basically arguing that numeric
  doesn't offer meaningful advantages over float8, which flies in
  the face of the fact that essentially every database application
  I've ever seen uses numeric and I'm not sure I've ever seen one
  using float8.  Nearly all database users prefer to store
  quantities like currency units in a type that is guaranteed not to
  lose precision.
 
 I think it's reasonable to expose NumericVar and the supporting
 function prototypes in, say, numeric_internal.h; normal applications
 that just want to operate on numerics as today can just include
 numeric.h, and continue to be at arms-length of the implementation
 details, while code that wants to optimize operations further can
 use numeric_internal.h and be very aware that they are subject to
 heavy breakage if we ever feel a need to change the internal API.

While nothing can prevent negligence and pilot error, making it clear
by the name of the included header that breakable stuff is being used
seems like an excellent way to proceed.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-12-19 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Alvaro Herrera wrote:
  Andres Freund wrote:
  
   Having reread the patch just now I basically see two things to
   criticize:
   a) why isn't this accessible at SQL level? That seems easy to address.
   b) Arguably some of this could well be done in separate commits.
  
  Fair comments.  I will split it up.
 
 Here's a split version.  The last part is still missing some polish --
 in particular handling for OBJECT_POLICY, and the SQL interface which
 would also allow us to get something in the regression tests.

Pushed patch 1.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-19 Thread Bruce Momjian
On Wed, Dec 17, 2014 at 02:00:18PM -0500, Stephen Frost wrote:
 Another thought I had was to suggest we consider *everyone* to be a
 contributor and implement a way to tie together the mailing list
 archives with the commit history and perhaps the commitfest app and make
 it searchable and indexed on some website.  eg:
 
 contributors.postgresql.org/sfrost
   - Recent commits
   - Recent commit mentions
   - Recent emails to any list
   - Recent commitfest app activity
   - Recent wiki page updates
 ...
 
 Ideally with a way for individuals to upload a photo, provide a company
 link, etc, similar to what the existing Major Contributors have today.
 Obviously, this is not a small task to develop and there is some risk of
 abuse (which I expect the other folks on the infra team will point out
 and likely tar and feather me for suggesting this at all..) but it might
 be along the same lines as Bruce's PgLife..

The top of my Postgres blog page has some statistics for myself that
might be useful for the community to maintain, and promote:

http://momjian.us/main/blogs/pgblog/2014.html

incoming, outgoing, unread, commits (details), events

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Gavin Flower

On 20/12/14 03:54, Heikki Linnakangas wrote:

On 12/19/2014 04:39 PM, Stephen Frost wrote:

* Marko Tiikkaja (ma...@joh.to) wrote:

On 12/19/14 3:27 PM, Stephen Frost wrote:

We'd have to coach our users to
constantly be tweaking the enable_parallel_query (or whatever) option
for the queries where it helps and turning it off for others.  I'm not
so excited about that.


I'd be perfectly (that means 100%) happy if it just defaulted to
off, but I could turn it up to 11 whenever I needed it.  I don't
believe to be the only one with this opinion, either.


Perhaps we should reconsider our general position on hints then and
add them so users can define the plan to be used..  For my part, I don't
see this as all that much different.

Consider if we were just adding HashJoin support today as an example.
Would we be happy if we had to default to enable_hashjoin = off?  Or if
users had to do that regularly because our costing was horrid? It's bad
enough that we have to resort to those tweaks today in rare cases.


This is somewhat different. Imagine that we achieve perfect 
parallelization, so that when you set enable_parallel_query=8, every 
query runs exactly 8x faster on an 8-core system, by using all eight 
cores.


Now, you might still want to turn parallelization off, or at least set 
it to a lower setting, on an OLTP system. You might not want a single 
query to hog all CPUs to run one query faster; you'd want to leave 
some for other queries. In particular, if you run a mix of short 
transactions, and some background-like tasks that run for minutes or 
hours, you do not want to starve the short transactions by giving all 
eight CPUs to the background task.


Admittedly, this is a rather crude knob to tune for such things,
but it's quite intuitive to a DBA: how many CPU cores is one query 
allowed to utilize? And we don't really have anything better.


In real life, there's always some overhead to parallelization, so that 
even if you can make one query run faster by doing it, you might hurt 
overall throughput. To some extent, it's a latency vs. throughput 
tradeoff, and it's quite reasonable to have a GUC for that because 
people have different priorities.


- Heikki




How about 3 numbers:

   minCPUs #  0
   maxCPUs   # = minCPUs
   fractionOfCPUs# rounded up


If you just have the /*number*/ of CPUs then a setting that is 
appropriate for quad core, may be too /*small*/ for an octo core processor.


If you just have the /*fraction*/ of CPUs then a setting that is 
appropriate for quad core, may be too /*large*/ for an octo core processor.




Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Dec 19, 2014 at 9:39 AM, Stephen Frost sfr...@snowman.net wrote:
  Perhaps we should reconsider our general position on hints then and
  add them so users can define the plan to be used..  For my part, I don't
  see this as all that much different.
 
  Consider if we were just adding HashJoin support today as an example.
  Would we be happy if we had to default to enable_hashjoin = off?  Or if
  users had to do that regularly because our costing was horrid?  It's bad
  enough that we have to resort to those tweaks today in rare cases.
 
 If you're proposing that it is not reasonable to have a GUC that
 limits the degree of parallelism, then I think that's outright crazy:

I'm pretty sure that I didn't say anything along those lines.  I'll try
to be clearer.

What I'd like is such a GUC that we can set at a reasonable default of,
say, 4, and trust that our planner will generally do the right thing.
Clearly, this may be something which admins have to tweak but what I
would really like to avoid is users having to set this GUC explicitly
for each of their queries.

 that is probably the very first GUC we need to add.  New query
 processing capabilities can entail new controlling GUCs, and
 parallelism, being as complex at it is, will probably add several of
 them.

That's fine if they're intended for debugging issues or dealing with
unexpected bugs or issues, but let's not go into this thinking we should
add GUCs which are geared with the expectation of users tweaking them
regularly.

 But the big picture here is that if you want to ever have parallelism
 in PostgreSQL at all, you're going to have to live with the first
 version being pretty crude.  I think it's quite likely that the first
 version of parallel sequential scan will be just as buggy as Hot
 Standby was when we first added it, or as buggy as the multi-xact code
 was when it went in, and probably subject to an even greater variety
 of taxing limitations than any feature we've committed in the 6 years
 I've been involved in the project.  We get to pick between that and
 not having it at all.

If it's disabled by default then I'm worried it won't really improve
until it is.  Perhaps that's setting a higher bar than you feel is
necessary but, for my part at least, it doesn't feel like a very high
level.

 I'll take a look at the papers you sent about parallel query
 optimization, but personally I think that's putting the cart not only
 before the horse but also before the road.  For V1, we need a query
 optimization model that does not completely suck - no more.  The key
 criterion here is that this has to WORK.  There will be time enough to
 improve everything else once we reach that goal.

I agree that it's got to work, but it also needs to be generally well
designed, and have the expectation of being on by default.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel Seq Scan

2014-12-19 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 On 12/19/2014 04:39 PM, Stephen Frost wrote:
 * Marko Tiikkaja (ma...@joh.to) wrote:
 I'd be perfectly (that means 100%) happy if it just defaulted to
 off, but I could turn it up to 11 whenever I needed it.  I don't
 believe to be the only one with this opinion, either.
 
 Perhaps we should reconsider our general position on hints then and
 add them so users can define the plan to be used..  For my part, I don't
 see this as all that much different.
 
 Consider if we were just adding HashJoin support today as an example.
 Would we be happy if we had to default to enable_hashjoin = off?  Or if
 users had to do that regularly because our costing was horrid?  It's bad
 enough that we have to resort to those tweaks today in rare cases.
 
 This is somewhat different. Imagine that we achieve perfect
 parallelization, so that when you set enable_parallel_query=8, every
 query runs exactly 8x faster on an 8-core system, by using all eight
 cores.

To be clear, as I mentioned to Robert just now, I'm not objecting to a
GUC being added to turn off or control parallelization.  I don't want
such a GUC to be a crutch for us to lean on when it comes to questions
about the optimizer though.  We need to work through the optimizer
questions of should this be parallelized? and, perhaps later, how
many ways is it sensible to parallelize this?  I'm worried we'll take
such a GUC as a directive along the lines of we are being told to
parallelize to exactly this level every time and for every query which
can be.  The GUC should be an input into the planner/optimizer much the
way enable_hashjoin is, unless it's being done as a *limiting* factor
for the administrator to be able to control, but we've generally avoided
doing that (see: work_mem) and, if we're going to start, we should
probably come up with an approach that addresses the considerations for
other resources too.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-19 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Alvaro Herrera wrote:
  I think we should create a new header file (maybe acltypes.h or
  acldefs.h), which only contains the AclMode and RoleAttr typedefs and
  their associated defines; that one would be included from parsenodes.h,
  acl.h and pg_authid.h.  Everything else would be in acl.h.  So code that
  currently checks permissions using only acl.h do not need any extra
  includes.
 
 I propose this patch on top of Adam's v5.  Also included is a full patch
 against master.

Thanks!  I've just read through your changes to Adam's v5 and they all 
look reasonable to me.  I agree that having acldefs.h with the #define's
is nicer and cleaner and reduces the amount of including needed for 
pg_authid.  I also like that it removes those ACL_X definitions from 
parsenodes.h. 

Thanks also for the whiteline/line-wrap improvements and user.c cleanup,
nice that we don't need all of those individual variables now that we're
using a bitmask.

 Unrelated: when changing from unified to context format, I saw
 filterdiff fail to produce a complete patch, skipping some hunks in its
 output.  My first impression was that I had dropped some hunks in git,
 so I wasted some time comparing v5 and v6 by hand before remembering
 that Michael Paquier had mentioned this problem previously.

Ugh, that's definitely frustrating..  Will keep it in mind.

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Bogus WAL segments archived after promotion

2014-12-19 Thread Heikki Linnakangas

On 12/19/2014 02:55 PM, Heikki Linnakangas wrote:

I'm thinking that we should add a step to promotion, where we scan
pg_xlog for any segments higher than the timeline switch point, and
remove them, or mark them with .done so that they are not archived.
There might be some real WAL that was streamed from the primary, but not
yet applied, but such WAL is of no interest to that server anyway, after
it's been promoted. It's a bit disconcerting to zap WAL that's valid,
even if doesn't belong to the current server's timeline history, because
as a general rule it's good to avoid destroying evidence that might be
useful in debugging. There isn't much difference between removing them
immediately and marking them as .done, though, because they will
eventually be removed/recycled anyway if they're marked as .done.


This is what I came up with. This patch removes the suspect segments at 
timeline switch. The alternative of creating .done files for them would 
preserve more evidence for debugging, but OTOH it would also be very 
confusing to have valid-looking WAL segments in pg_xlog, with .done 
files, that in fact contain garbage.


The patch is a bit longer than it otherwise would be, because I moved 
the code to remove a single file from RemoveOldXlogFiles() to a new 
function. I think that makes it more readable in any case, simply 
because it was so deeply indented in RemoveOldXlogFiles.


Thoughts?

- Heikki
commit 9e5c1033fbec2ebe553d68a1a7e343c8779109d0
Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Date:   Fri Dec 19 21:12:40 2014 +0200

Don't archive bogus recycled or preallocated files after timeline switch.

After a timeline switch, either because we're following a timeline switch
during replay, or because end of recovery, we would leave behind recycled
WAL segments that are in the future, but on the old timeline. After
promotion, and after they become old enough to be recycled again, we would
notice that they don't have a .ready or .done file, create a .ready file
for them, and archive them. That's bogus, because the files contain
garbage, recycled from an older timeline (or prealloced as zeros). We
shouldn't archive such files.

To fix, whenever we switch to a new timeline, scan the data directory for
WAL segments on the old timeline, but with a higher segment number, and
remove them. Those don't belong to our timeline history, and are most
likely bogus recycled or preallocated files. They could also be valid files
that we streamed from the primary ahead of time, but in any case, they're
not needed to recover to the new timeline.

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9e45976..e267ca1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -792,6 +792,8 @@ static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static void PreallocXlogFiles(XLogRecPtr endptr);
 static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr);
+static void RemoveXlogFile(const char *segname, XLogRecPtr endptr);
+static void RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
 static void CleanupBackupHistory(void);
@@ -3431,24 +3433,9 @@ UpdateLastRemovedPtr(char *filename)
 static void
 RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
 {
-	XLogSegNo	endlogSegNo;
-	int			max_advance;
 	DIR		   *xldir;
 	struct dirent *xlde;
 	char		lastoff[MAXFNAMELEN];
-	char		path[MAXPGPATH];
-
-#ifdef WIN32
-	char		newpath[MAXPGPATH];
-#endif
-	struct stat statbuf;
-
-	/*
-	 * Initialize info about where to try to recycle to.  We allow recycling
-	 * segments up to XLOGfileslop segments beyond the current XLOG location.
-	 */
-	XLByteToPrevSeg(endptr, endlogSegNo);
-	max_advance = XLOGfileslop;
 
 	xldir = AllocateDir(XLOGDIR);
 	if (xldir == NULL)
@@ -3469,6 +3456,11 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
 
 	while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
 	{
+		/* Ignore files that are not XLOG segments */
+		if (strlen(xlde-d_name) != 24 ||
+			strspn(xlde-d_name, 0123456789ABCDEF) != 24)
+			continue;
+
 		/*
 		 * We ignore the timeline part of the XLOG segment identifiers in
 		 * deciding whether a segment is still needed.  This ensures that we
@@ -3480,92 +3472,110 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
 		 * We use the alphanumeric sorting property of the filenames to decide
 		 * which ones are earlier than the lastoff segment.
 		 */
-		if (strlen(xlde-d_name) == 24 
-			strspn(xlde-d_name, 0123456789ABCDEF) == 24 
-			strcmp(xlde-d_name + 8, lastoff + 8) = 0)
+		if (strcmp(xlde-d_name + 8, lastoff + 8) = 0)
 		{
 			if (XLogArchiveCheckDone(xlde-d_name))
 			{
-snprintf(path, MAXPGPATH, XLOGDIR /%s, xlde-d_name);
-
 /* Update the last removed 

Re: [HACKERS] Streaming replication and WAL archive interactions

2014-12-19 Thread Heikki Linnakangas

On 12/18/2014 12:32 PM, Fujii Masao wrote:

On Wed, Dec 17, 2014 at 4:11 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

On 12/16/2014 10:24 AM, Borodin Vladimir wrote:


12 дек. 2014 г., в 16:46, Heikki Linnakangas
hlinnakan...@vmware.com написал(а):


There have been a few threads on the behavior of WAL archiving,
after a standby server is promoted [1] [2]. In short, it doesn't
work as you might expect. The standby will start archiving after
it's promoted, but it will not archive files that were replicated
from the old master via streaming replication. If those files were
not already archived in the master before the promotion, they are
not archived at all. That's not good if you wanted to restore from
a base backup + the WAL archive later.

The basic setup is a master server, a standby, a WAL archive that's
shared by both, and streaming replication between the master and
standby. This should be a very common setup in the field, so how
are people doing it in practice? Just live with the wisk that you
might miss some files in the archive if you promote? Don't even
realize there's a problem? Something else?



Yes, I do live like that (with streaming replication and shared
archive between master and replicas) and don’t even realize there’s a
problem :( And I think I’m not the only one. Maybe at least a note
should be added to the documentation?



Let's try to figure out a way to fix this in master, but yeah, a note in the
documentation is in order.


+1


And how would we like it to work?



Here's a plan:

Have a mechanism in the standby, to track how far the master has archived
its WAL, and don't throw away WAL in the standby that hasn't been archived
in the master yet. This is similar to the physical replication slots, which
prevent the master from recycling WAL that a standby hasn't received yet,
but in reverse. I think we can use the .done and .ready files for this.
Whenever a file is streamed (completely) from the master, create a .ready
file for it. When we get an acknowledgement from the master that it has
archived it, create a .done file for it. To get the information from the
master, add the last archived WAL segment e.g. in the streaming
replication keep-alive message, or invent a new message type for it.


Sounds OK to me.

How does this work in cascade replication case? The cascading walsender
just relays the archive location to the downstream standby?


Hmm. Yeah, I guess so.


What happens when WAL streaming is terminated and the startup process starts to
read the WAL file from the archive? After reading the WAL file from the archive,
probably we would need to change .ready files of every older WAL files to .done.


I suppose. Although there's no big harm in leaving them in .ready state. 
As soon as you reconnect, the primary will tell if they were archived. 
If the server is promoted before reconnecting, it will try to archive 
the files and archive_command will see that they are already in the 
archive. It has to be prepared for that situation anyway, so that's OK too.


Here's a first cut at this. It includes the changes from your 
standby_wal_archiving_v1.patch, so you get that behaviour if you set 
archive_mode='always', and the new behaviour I wanted with 
archive_mode='shared'. I wrote it on top of the other patch I posted 
recently to not archive bogus recycled WAL segments after promotion 
(http://www.postgresql.org/message-id/549489fa.4010...@vmware.com), but 
it seems to apply without it too.


I suggest reading the documentation changes first, it hopefully explains 
pretty well how to use this. The code should work too, and comments on 
that are welcome too, but I haven't tested it much. I'll do more testing 
next week.


- Heikki

From 03dced40178c0a0b7c28ff630a15cf664995525d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakan...@iki.fi
Date: Tue, 16 Dec 2014 23:09:03 +0200
Subject: [PATCH 1/1] Make WAL archival behave more sensibly in standby mode.

This add two new archive_modes, 'shared' and 'always', to indicate whether
the WAL archive is shared between the primary and standby, or not. In
shared mode, the standby tracks which files have been archived by the
primary. The standby refrains from recycling files that the primary has
not yet archived, and at failover, the standby archives all those files too
from the old timeline. In 'always' mode, the standby's WAL archive is
taken to be separate from the primary's, and the standby independently
archives all files it receives from the primary.

Fujii Masao and me.
---
 doc/src/sgml/config.sgml  |  12 +-
 doc/src/sgml/high-availability.sgml   |  48 +++
 doc/src/sgml/protocol.sgml|  31 +
 src/backend/access/transam/xlog.c |  29 -
 src/backend/postmaster/postmaster.c   |  37 --
 src/backend/replication/walreceiver.c | 172 --
 src/backend/replication/walsender.c   |  47 +++
 

Re: [HACKERS] Commitfest problems

2014-12-19 Thread Joshua D. Drake


On 12/19/2014 12:28 AM, Mark Kirkwood wrote:


To me that's a bit over the top stereotyping.



+1

Having been mentioned one or two times myself - it was an unexpected
wow - cool rather than a grumpy/fragile I must be noticed thing. I
think some folk have forgotten the underlying principle of the open
source community - it is about freely giving - time or code etc. The
there must be something in it for me me me meme is - well - the
*other* world view.


It was supposed to be over the top. That doesn't make it any less true. 
Sure there are plenty of us that don't have any of the ego petting 
issues. However,t there are more of us in those of us that think we 
don't, that really, really do.


Sincerely,

Joshua D. Drake


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-19 Thread Jim Nasby

On 12/18/14, 5:00 PM, Jim Nasby wrote:

2201582 20 -- Mostly LOCALLOCK and Shared Buffer


Started looking into this; perhaps https://code.google.com/p/fast-hash/ would 
be worth looking at, though it requires uint64.

It also occurs to me that we're needlessly shoving a lot of 0's into the hash 
input by using RelFileNode and ForkNumber. RelFileNode includes the tablespace 
Oid, which is pointless here because relid is unique per-database. We also have 
very few forks and typically care about very few databases. If we crammed dbid 
and ForkNum together that gets us down to 12 bytes, which at minimum saves us 
the trip through the case logic. I suspect it also means we could eliminate one 
of the mix() calls.

But I wonder if we could still do better, because we typically also won't have 
that many relations. Is there some fast way we could combine dbid, forkNum and 
relid into a uint32? That gets us down to 8 bytes, which means we could use 
fash-hash, or a stripped down mix().

Unfortunately I don't know much about hash algorithms, so I don't know how 
practical any of this actually is, or what a good method for combining those 
fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ 
relid, but if you got unlucky with your oid values you could end up with a lot 
of collissions from that.

I can put some effort into this, but I'd like some guidance.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-19 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 On 12/18/14, 5:00 PM, Jim Nasby wrote:
 2201582 20 -- Mostly LOCALLOCK and Shared Buffer

 Started looking into this; perhaps https://code.google.com/p/fast-hash/ would 
 be worth looking at, though it requires uint64.

 It also occurs to me that we're needlessly shoving a lot of 0's into the hash 
 input by using RelFileNode and ForkNumber. RelFileNode includes the 
 tablespace Oid, which is pointless here because relid is unique per-database. 
 We also have very few forks and typically care about very few databases. If 
 we crammed dbid and ForkNum together that gets us down to 12 bytes, which at 
 minimum saves us the trip through the case logic. I suspect it also means we 
 could eliminate one of the mix() calls.

I don't see this working.  The lock table in shared memory can surely take
no such shortcuts.  We could make a backend's locallock table omit fields
that are predictable within the set of objects that backend could ever
lock, but (1) this doesn't help unless we can reduce the tag size for all
LockTagTypes, which we probably can't, and (2) having the locallock's tag
be different from the corresponding shared tag would be a mess too.
I think dealing with (2) might easily eat all the cycles we could hope to
save from a smaller hash tag ... and that's not even considering the added
logical complexity and potential for bugs.

Switching to a different hash algorithm could be feasible, perhaps.
I think we're likely stuck with Jenkins hashing for hashes that go to
disk, but hashes for dynahash tables don't do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-19 Thread k...@rice.edu
On Fri, Dec 19, 2014 at 04:41:51PM -0600, Jim Nasby wrote:
 On 12/18/14, 5:00 PM, Jim Nasby wrote:
 2201582 20 -- Mostly LOCALLOCK and Shared Buffer
 
 Started looking into this; perhaps https://code.google.com/p/fast-hash/ would 
 be worth looking at, though it requires uint64.
 
 It also occurs to me that we're needlessly shoving a lot of 0's into the hash 
 input by using RelFileNode and ForkNumber. RelFileNode includes the 
 tablespace Oid, which is pointless here because relid is unique per-database. 
 We also have very few forks and typically care about very few databases. If 
 we crammed dbid and ForkNum together that gets us down to 12 bytes, which at 
 minimum saves us the trip through the case logic. I suspect it also means we 
 could eliminate one of the mix() calls.
 
 But I wonder if we could still do better, because we typically also won't 
 have that many relations. Is there some fast way we could combine dbid, 
 forkNum and relid into a uint32? That gets us down to 8 bytes, which means we 
 could use fash-hash, or a stripped down mix().
 
 Unfortunately I don't know much about hash algorithms, so I don't know how 
 practical any of this actually is, or what a good method for combining those 
 fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ 
 relid, but if you got unlucky with your oid values you could end up with a 
 lot of collissions from that.
 
 I can put some effort into this, but I'd like some guidance.
 -- 
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com
 

Hi,

If we are going to consider changing the hash function, we should
consider something like xxhash which runs at 13.8GB/s on a 2.7GHz
x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which
is double the speed of fast-hash according to the page running on a
3GHz x86_64. In addition, something like that could be used a checksum
instead of the current CRC32, although some work has already gone into
speeding it up, as is. Otherwise, it probably makes sense to just stick
with creating the fastpath 8-byte analogously to the 4-byte fastpath
that was just added. Is calculating the hash the bottle-neck?

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-19 Thread Josh Berkus
On 12/18/2014 05:36 PM, Stephen Frost wrote:
 I tend to agree that we want to avoid complicated rules.  The corollary
 to that is the concern Andrew raised about my earlier off-the-cuff
 proposal- how do we avoid debasing the value of being recognized as a PG
 contributor?

I find that less of a concern than recognizing all contributors.   Let's
not go crazy, but if we're going to err, it should be on the side of
acknowledging *more* people, not less.

 
 is how to keep track of people that helps to which level.  I make a
 point of crediting reviewers and code contributors in my commit
 messages, but can you tell which ones of the following guys should make
 it to these lists?  I yanked this text from my commit
 73c986adde5d73a5e2555da9b5c8facedb146dcd:

 Reviewed to varying degrees by Michael Paquier, Andres Freund, Robert
 Haas, Amit Kapila, Fujii Masao, Jaime Casanova, Simon Riggs, Steven
 Singer, Peter Eisentraut

 I do agree that we need to give credit in some form, though.  I'm just
 saying can we please not put the responsibility on committers.
 
 Ugh, yeah, I certainly wouldn't want to have to work out some complex
 set of rules to be applied before each commit to define who can be
 considered a reviewer.  That said, most of the above list are
 committers and those who aren't should be recognized in some fashion, so
 I'm not sure that this is really a good example.

This really doesn't sound that difficult. If the committers include all
actual reviewers in the commit messages, then it will be fairly easy for
someone else (me) to go through and pick out the relative handful of
people who aren't already on the contributors list and check the level
of their contributions.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-19 Thread Tom Lane
k...@rice.edu k...@rice.edu writes:
 If we are going to consider changing the hash function, we should
 consider something like xxhash which runs at 13.8GB/s on a 2.7GHz
 x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which
 is double the speed of fast-hash according to the page running on a
 3GHz x86_64.

Well, as the google page points out, raw speed is not the only figure of
merit; otherwise we'd just xor all the bytes and call it good.  We need
the hash function to spread out the hash values well, or else we lose more
cycles chasing inordinately-long hash chains than we saved with a cheap
hash function.  Google claims their fast-hash is actually better on this
point than Jenkins, which if true would be very nice indeed.

Keep in mind also that very very few of our hash keys are longer than
about 20 bytes, so that speed-per-byte is not that exciting anyway.
Longer setup/finish times could easily swamp any per-byte advantages,
for example.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-19 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 12/18/2014 05:36 PM, Stephen Frost wrote:
 I do agree that we need to give credit in some form, though.  I'm just
 saying can we please not put the responsibility on committers.

 Ugh, yeah, I certainly wouldn't want to have to work out some complex
 set of rules to be applied before each commit to define who can be
 considered a reviewer.  That said, most of the above list are
 committers and those who aren't should be recognized in some fashion, so
 I'm not sure that this is really a good example.

 This really doesn't sound that difficult. If the committers include all
 actual reviewers in the commit messages, then it will be fairly easy for
 someone else (me) to go through and pick out the relative handful of
 people who aren't already on the contributors list and check the level
 of their contributions.

I'm with Alvaro: I don't mind copying the commitfest app's credits list
into commit messages, but please don't ask me to review who should get
credit or not.  If I have to do it, it's going to be done hastily at the
tail end of the commit process, and probably not very well; and once the
commit is made it's not very practical to fix any mistakes.

Could we establish an expectation that whoever sets a CF entry to ready
for committer is responsible for reviewing the authors/reviewers lists
and making sure that those fairly represent who should get credit?  That
would divide the labor a bit, and there would also be time enough for
corrections if anyone feels slighted.  The idea's not perfect since major
contributions could still happen after that point; but I think the major
risk is with the committer not remembering people who contributed early
in the patch's life cycle, and we could certainly hope that such people
get listed in the CF app entry.

Alternatively we could abandon the practice of using the commit log for
this purpose, which could simplify making after-the-fact corrections.
But then we'd have to set up some other recording infrastructure and work
flow for getting the info into the release notes.  That sounds like a lot
of work compared to the probable value.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-19 Thread Peter Geoghegan
On Thu, Dec 18, 2014 at 6:59 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Good point.

 For the IGNORE case: I guess the syntax just isn't that flexible. I
 agree that that isn't ideal.


 It should be simple to allow multiple key specifications:

 INSERT INTO persons (username, real_name, data)
 VALUES('foobar', 'foo bar')
 ON CONFLICT (username), (real_name) IGNORE;

 It's a rather niche use case, but might as well support it for the sake of
 completeness.

I guess that wouldn't be very hard to implement, and perhaps we should
do so soon. I am reluctant to let scope creep too far, though. As you
mentioned, this is a niche use case.

 I think that the long and the short of it is that you really ought to
 have one unique index as an arbiter in mind when writing a DML
 statement for the UPDATE variant. Relying on this type of convention
 is possible, I suppose, but ill-advised.

 Another thought is that you might want to specify a different action
 depending on which constraint is violated:

 INSERT INTO persons (username, real_name, data)
 VALUES('foobar', 'foo bar')
 ON CONFLICT (username) IGNORE
 ON CONFLICT (real_name) UPDATE ...;

 Although that leaves the question of what to do if both are violated.
 Perhaps:

 INSERT INTO persons (username, real_name, data)
 VALUES('foobar', 'foo bar')
 ON CONFLICT (username, real_name) IGNORE
 ON CONFLICT (real_name) UPDATE username = excluded.username;
 ON CONFLICT (username) UPDATE real_name = excluded.real_name;

I think that there might be a place for that, but I'd particularly
like to avoid figuring this out now - this suggestion is a complicated
new direction for the patch, and it's not as if adding this kind of
flexibility is precluded by not allowing it in the first version - we
won't paint ourselves into a corner by not doing this up front. The
patch is already complicated enough! Users can always have multiple
UPSERT commands, and that might be very close to good enough for a
relatively rare use case like this.

 I could easily have the unique index inference specification accept a
 named opclass, if you thought that was important, and you thought
 naming a non-default opclass by name was a good SQL interface. It
 would take only a little effort to support non-default opclasses.

 It's a little weird to mention an opclass by name. It's similar to naming an
 index by name, really. How about naming the operator? For an exclusion
 constraint, that would be natural, as the syntax to create an exclusion
 constraint in the first place is EXCLUDE USING gist (c WITH )

 Naming the index by columns makes sense in most cases, and I don't like
 specifying the index's name, but how about allowing naming a constraint?
 Indexes are just an implementation detail, but constraints are not. Unique
 and exclusion constraints are always backed by an index, so there is little
 difference in practice, but I would feel much more comfortable mentioning
 constraints by name than indexes.

The main reason for naming a constraint by name in practice will
probably be because there is no better way to deal with partial unique
indexes (which can be quite useful). But partial unique indexes aren't
formally constraints, in that they don't have pg_constraint entries.
So I don't think that that's going to be acceptable, entirely for that
reason. :-(

 Most people would list the columns, but if there is a really bizarre
 constraint, with non-default opclasses, or an exclusion constraint, it's
 probably been given a name that you could use.

What I find curious about the opclass thing is: when do you ever have
an opclass that has a different idea of equality than the default
opclass for the type? In other words, when is B-Tree strategy number 3
not actually '=' in practice, for *any* B-Tree opclass? Certainly, it
doesn't appear to be the case that it isn't so with any shipped
opclasses - the shipped non-default B-Tree opclasses only serve to
provide alternative notions of sort order, and never equals.

I think that with B-Tree (which is particularly relevant for the
UPDATE variant), it ought to be defined to work with the type's
default opclass equals operator, just like GROUP BY and DISTINCT.
Non-default opclass unique indexes work just as well in practice,
unless someone somewhere happens to create an oddball one that doesn't
use '=' as its equals operator (while also having '=' as the default
opclass equals operator). I am not aware that that leaves any
actually shipped opclass out (and I include our external extension
ecosystem here, although I might be wrong about that part).

 In theory, with the promise tuple approach to locking, you don't necessarily
 even need an index to back up the constraint.

 So probably better to not allow it.

I agree that we definitely want to require that there is an
appropriate index available.

I think we can live without support for partial unique indexes for the
time being. With non-default opclasses effectively handled (by caring
about the 

Re: [HACKERS] POLA violation with \c service=

2014-12-19 Thread David G Johnston
On Wed, Dec 17, 2014 at 8:25 AM, David Fetter [via PostgreSQL] 
ml-node+s1045698n5831124...@n5.nabble.com wrote:

 On Wed, Dec 17, 2014 at 08:14:04AM -0500, Andrew Dunstan wrote:

 
  On 12/17/2014 04:11 AM, Heikki Linnakangas wrote:
  On 12/17/2014 10:03 AM, Albe Laurenz wrote:
  David Fetter wrote:
  I've noticed that psql's  \c function handles service= requests in a
  way that I can only characterize as broken.
  
  This came up in the context of connecting to a cloud hosting service
  named after warriors or a river or something, whose default hostnames
  are long, confusing, and easy to typo, so I suspect that service= may
  come up more often going forward than it has until now.
  
  For example, when I try to use
  
  \c service=foo
  
  It will correctly figure out which database I'm trying to connect to,
  but fail to notice that it's on a different host, port, etc., and
  hence fail to connect with a somewhat unhelpful error message.
  
  I can think of a few approaches for fixing this:
  
  0.  Leave it broken.
  1.  Disable service= requests entirely in \c context, and error out
  if attempted.
  2.  Ensure that \c actually uses all of the available information.
  
  Is there another one I missed?
  
  If not, which of the approaches seems reasonable?
  
  #2 is the correct solution, #1 a band aid.
  
  It would be handy, if \c service=foo actually worked. We should do
 #3.
  If the database name is actually a connection string, or a service
  specification, it should not re-use the hostname and port from previous
  connection, but use the values from the connection string or service
 file.
 
 
  Yeah, that's the correct solution. It should not be terribly difficult
 to
  create a test for a conninfo string in the dbname parameter. That's what
  libpq does after all. We certainly don't want psql to have to try to
  interpret the service file. psql just needs to let libpq do its work in
 this
  situation.

 letting libpq handle this is the only sane plan for fixing it.  I'm
 looking into that today.


​
​On a tangentially related note; it is not outside the realm of possibility
that a user would want one pg_service entry​

​to reference another one​:

[realentry]
user=
dbname=

[aliasentry]
service=realentry

furthermore, having a shareable entry like:

[main-host]
host=ip-address
port=5433

[main-user1]
user=user1
service=main-host

[main-user2]
​user=user2
service=main-host

also seems potentially useful.

I just sent a -doc report that nothing in the documentation says this
behavior is not implemented but a cursory attempt at it confirms the lack.

While you are digging in there anything fundamental prohibiting the
behavior and is it something you ​think would be useful in these complex
environments you are working with?

David J.

Sorry about the oddball CC: but I don't have an e-mail with a full set of
recipients...
​




--
View this message in context: 
http://postgresql.nabble.com/POLA-violation-with-c-service-tp5831001p5831538.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

[HACKERS] table vs domain constraints in ObjType

2014-12-19 Thread Alvaro Herrera
Right now, both table and domain constraints share the same
classification in the ObjType enum, OBJECT_CONSTRAINT.  For my patch to
implement improved object drop reporting in event triggers, it is
necessary to separate them, as per in the attach patch.

One emerging side-effect is that it is now possible to execute COMMENT
ON domain constraints, which wasn't previously possible.  (I added \dd
support to psql as well.)

This is part of the patch for replicating DROP commands across
servers,
http://archives.postgresql.org/message-id/20140826135459.gd6...@eldon.alvh.no-ip.org
which I just split up.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
commit 762ef44c592c14a961867f17701e45d52722
Author: Alvaro Herrera alvhe...@alvh.no-ip.org
Date:   Fri Dec 19 17:03:29 2014 -0300

Distinguish domain constraint from table constraints

diff --git a/doc/src/sgml/ref/comment.sgml b/doc/src/sgml/ref/comment.sgml
index 36a7312..62e1968 100644
--- a/doc/src/sgml/ref/comment.sgml
+++ b/doc/src/sgml/ref/comment.sgml
@@ -28,6 +28,7 @@ COMMENT ON
   COLLATION replaceable class=PARAMETERobject_name/replaceable |
   COLUMN replaceable class=PARAMETERrelation_name/replaceable.replaceable class=PARAMETERcolumn_name/replaceable |
   CONSTRAINT replaceable class=PARAMETERconstraint_name/replaceable ON replaceable class=PARAMETERtable_name/replaceable |
+  CONSTRAINT replaceable class=PARAMETERconstraint_name/replaceable ON DOMAIN replaceable class=PARAMETERdomain_name/replaceable |
   CONVERSION replaceable class=PARAMETERobject_name/replaceable |
   DATABASE replaceable class=PARAMETERobject_name/replaceable |
   DOMAIN replaceable class=PARAMETERobject_name/replaceable |
@@ -127,6 +128,18 @@ COMMENT ON
/varlistentry
 
varlistentry
+termreplaceable class=parametertable_name/replaceable/term
+termreplaceable class=parameterdomain_name/replaceable/term
+listitem
+ para
+  When creating a comment on a constraint on a table or a domain, these
+  parameteres specify the name of the table or domain on which the
+  constraint is defined.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
  termreplaceablesource_type/replaceable/term
  listitem
   para
@@ -266,6 +279,7 @@ COMMENT ON COLLATION fr_CA IS 'Canadian French';
 COMMENT ON COLUMN my_table.my_column IS 'Employee ID number';
 COMMENT ON CONVERSION my_conv IS 'Conversion to UTF8';
 COMMENT ON CONSTRAINT bar_col_cons ON bar IS 'Constrains column col';
+COMMENT ON CONSTRAINT dom_col_constr ON DOMAIN dom IS 'Constrains col of domain';
 COMMENT ON DATABASE my_database IS 'Development Database';
 COMMENT ON DOMAIN my_domain IS 'Email Address Domain';
 COMMENT ON EXTENSION hstore IS 'implements the hstore data type';
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index e261307..297deb5 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -530,11 +530,28 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 break;
 			case OBJECT_RULE:
 			case OBJECT_TRIGGER:
-			case OBJECT_CONSTRAINT:
+			case OBJECT_TABCONSTRAINT:
 			case OBJECT_POLICY:
 address = get_object_address_relobject(objtype, objname,
 	   relation, missing_ok);
 break;
+			case OBJECT_DOMCONSTRAINT:
+{
+	List		   *domname;
+	ObjectAddress	domaddr;
+	char		   *constrname;
+
+	domname = list_truncate(list_copy(objname), list_length(objname) - 1);
+	constrname = strVal(llast(objname));
+	domaddr = get_object_address_type(OBJECT_DOMAIN, domname, missing_ok);
+
+	address.classId = ConstraintRelationId;
+	address.objectId = get_domain_constraint_oid(domaddr.objectId,
+			  constrname, missing_ok);
+	address.objectSubId = 0;
+
+}
+break;
 			case OBJECT_DATABASE:
 			case OBJECT_EXTENSION:
 			case OBJECT_TABLESPACE:
@@ -934,7 +951,7 @@ get_object_address_relobject(ObjectType objtype, List *objname,
 	const char *depname;
 
 	/* Extract name of dependent object. */
-	depname = strVal(lfirst(list_tail(objname)));
+	depname = strVal(llast(objname));
 
 	/* Separate relation name from dependent object name. */
 	nnames = list_length(objname);
@@ -990,7 +1007,7 @@ get_object_address_relobject(ObjectType objtype, List *objname,
 	get_trigger_oid(reloid, depname, missing_ok) : InvalidOid;
 address.objectSubId = 0;
 break;
-			case OBJECT_CONSTRAINT:
+			case OBJECT_TABCONSTRAINT:
 address.classId = ConstraintRelationId;
 address.objectId = relation ?
 	get_relation_constraint_oid(reloid, depname, missing_ok) :
@@ -1178,7 +1195,7 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 		case OBJECT_RULE:
 		case OBJECT_TRIGGER:
 		case OBJECT_POLICY:
-		case OBJECT_CONSTRAINT:
+		case OBJECT_TABCONSTRAINT:
 			if 

Re: [HACKERS] Reducing lock strength of adding foreign keys

2014-12-19 Thread Noah Misch
On Wed, Dec 17, 2014 at 02:41:39PM +, Simon Riggs wrote:
 Is there anything different here from work in my original patch series?

Not to my knowledge.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-19 Thread Jim Nasby

On 12/19/14, 5:13 PM, Tom Lane wrote:

Jim Nasby jim.na...@bluetreble.com writes:

On 12/18/14, 5:00 PM, Jim Nasby wrote:

2201582 20 -- Mostly LOCALLOCK and Shared Buffer



Started looking into this; perhaps https://code.google.com/p/fast-hash/ would 
be worth looking at, though it requires uint64.



It also occurs to me that we're needlessly shoving a lot of 0's into the hash 
input by using RelFileNode and ForkNumber. RelFileNode includes the tablespace 
Oid, which is pointless here because relid is unique per-database. We also have 
very few forks and typically care about very few databases. If we crammed dbid 
and ForkNum together that gets us down to 12 bytes, which at minimum saves us 
the trip through the case logic. I suspect it also means we could eliminate one 
of the mix() calls.


I don't see this working.  The lock table in shared memory can surely take
no such shortcuts.  We could make a backend's locallock table omit fields
that are predictable within the set of objects that backend could ever
lock, but (1) this doesn't help unless we can reduce the tag size for all
LockTagTypes, which we probably can't, and (2) having the locallock's tag
be different from the corresponding shared tag would be a mess too.
I think dealing with (2) might easily eat all the cycles we could hope to
save from a smaller hash tag ... and that's not even considering the added
logical complexity and potential for bugs.


I think we may be thinking different things here...

I'm not suggesting we change BufferTag or BufferLookupEnt; clearly we can't 
simply throw away any of the fields I was talking about (well, except possibly 
tablespace ID. AFAICT that's completely redundant for searching because relid 
is UNIQUE).

What I am thinking is not using all of those fields in their raw form to 
calculate the hash value. IE: something analogous to:

hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode)  32 | blockNum)

perhaps that actual code wouldn't work, but I don't see why we couldn't do 
something similar... am I missing something?


Switching to a different hash algorithm could be feasible, perhaps.
I think we're likely stuck with Jenkins hashing for hashes that go to
disk, but hashes for dynahash tables don't do that.


Yeah, I plan on testing the performance of fash-hash for HASH_BLOBS just to see 
how it compares.

Why would we be stuck with Jenkins hashing for on-disk data? pg_upgrade, or 
something else?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-19 Thread Jim Nasby

On 12/19/14, 6:16 PM, Tom Lane wrote:

Could we establish an expectation that whoever sets a CF entry to ready
for committer is responsible for reviewing the authors/reviewers lists
and making sure that those fairly represent who should get credit?  That
would divide the labor a bit, and there would also be time enough for
corrections if anyone feels slighted.  The idea's not perfect since major
contributions could still happen after that point; but I think the major
risk is with the committer not remembering people who contributed early
in the patch's life cycle, and we could certainly hope that such people
get listed in the CF app entry.


Perhaps go even one step further and let a reviewer draft the actual commit 
message? That would further reduce committer workload, assuming the committer 
agrees with the draft commit message.


Alternatively we could abandon the practice of using the commit log for
this purpose, which could simplify making after-the-fact corrections.
But then we'd have to set up some other recording infrastructure and work
flow for getting the info into the release notes.  That sounds like a lot
of work compared to the probable value.


git does allow you to revise a commit message; it just makes downstream pulls 
uglier if the commit was already pushed (see 
https://help.github.com/articles/changing-a-commit-message/). It might be 
possible to minimize or even eliminate that pain via git hooks.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-19 Thread Andres Freund
On 2014-12-19 22:17:54 -0600, Jim Nasby wrote:
 git does allow you to revise a commit message; it just makes
 downstream pulls uglier if the commit was already pushed (see
 https://help.github.com/articles/changing-a-commit-message/). It might
 be possible to minimize or even eliminate that pain via git hooks.

That's completely not acceptable for anything used by others.

What can sanely changed after the fact is 'git notes'. With that notes
to commits can be added, edited and changed after the original
commit.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

2014-12-19 Thread Andres Freund
On 2014-12-19 22:03:55 -0600, Jim Nasby wrote:
 I'm not suggesting we change BufferTag or BufferLookupEnt; clearly we
 can't simply throw away any of the fields I was talking about (well,
 except possibly tablespace ID. AFAICT that's completely redundant for
 searching because relid is UNIQUE).

It's actually not. BufferTag's contain relnodes via RelFileNode - that's
not the relation's oid, but the filenode. And that's *not* guranteed
unique across database unfortunately.

 What I am thinking is not using all of those fields in their raw form to 
 calculate the hash value. IE: something analogous to:
 
 hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode)  32 | 
 blockNum)
 
 perhaps that actual code wouldn't work, but I don't see why we couldn't do 
 something similar... am I missing something?

I don't think that'd improve anything. Jenkin's hash does have a quite
mixing properties, I don't believe that the above would improve the
quality of the hash.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pgsql: Allow pushdown of WHERE quals into subqueries with window functi

2014-12-19 Thread Noah Misch
On Sat, Jun 28, 2014 at 06:08:05AM +, Tom Lane wrote:
 Allow pushdown of WHERE quals into subqueries with window functions.
 
 We can allow this even without any specific knowledge of the semantics
 of the window function, so long as pushed-down quals will either accept
 every row in a given window partition, or reject every such row.  Because
 window functions act only within a partition, such a case can't result
 in changing the window functions' outputs for any surviving row.
 Eliminating entire partitions in this way obviously can reduce the cost
 of the window-function computations substantially.
 
 The fly in the ointment is that it's hard to be entirely sure whether
 this is true for an arbitrary qual condition.  This patch allows pushdown
 if (a) the qual references only partitioning columns, and (b) the qual
 contains no volatile functions.  We are at risk of incorrect results if
 the qual can produce different answers for values that the partitioning
 equality operator sees as equal.  While it's not hard to invent cases
 for which that can happen, it seems to seldom be a problem in practice,
 since no one has complained about a similar assumption that we've had
 for many years with respect to DISTINCT.  The potential performance
 gains seem to be worth the risk.

SQL leaves room for our DISTINCT optimization but not for this.  Where
pushdown of a qual beneath DISTINCT changes query results, it does so by
changing which one of various equal, distinguishable values survives to
represent a group.  SQL says:

If the set quantifier DISTINCT is specified, and the most specific type
of a result column is character string, datetime with time zone or a
user-defined type, then the precise values retained in that column after
eliminating redundant duplicates is implementation-dependent.

PostgreSQL conforms, making qual pushdown one of the factors determining which
value to retain.  This influences queries having a pushed-down qual that
distinguishes equal values.  An optimization fence changes the result:

SELECT * FROM (SELECT DISTINCT x FROM (VALUES (1.00),(1.1),(1.0)) t1(x) OFFSET 
0) t0 WHERE length(x::text)  4;
  x  
-
 1.1
(1 row)

SELECT * FROM (SELECT DISTINCT x FROM (VALUES (1.00),(1.1),(1.0)) t1(x)) t0 
WHERE length(x::text)  4;
  x  
-
 1.0
 1.1
(2 rows)

Type timetamptz is unaffected in PostgreSQL.  Character strings are affected
less than SQL contemplates, because PostgreSQL strings use binary equality.  I
don't know why SQL omits numeric from the list of affected types.

Here is the corresponding scenario for window functions:

SELECT x, y FROM (
  SELECT x, first_value(x) OVER (PARTITION BY x ORDER BY length(x::text)) AS y 
FROM
  (VALUES (1.00),(1.0)) t1(x)
  OFFSET 0
) t0 WHERE length(x::text) = 4;
  x   |  y  
--+-
 1.00 | 1.0
(1 row)

SELECT x, y FROM (
  SELECT x, first_value(x) OVER (PARTITION BY x ORDER BY length(x::text)) AS y 
FROM
  (VALUES (1.00),(1.0)) t1(x)
) t0 WHERE length(x::text) = 4;
  x   |  y   
--+--
 1.00 | 1.00
(1 row)

Only the first result is correct.  SQL does not give us the wiggle room it
offered for DISTINCT.  The conformance sacrifice is perhaps worthwhile
nonetheless.  I didn't see this highlighted in the original discussion, so I
wished to ensure this compromise is known explicitly.

If the system catalogs recorded whether each equality operator is also an
identity operator, we could restrict the optimization to types where the
equality operator of the partitioning opclass is so marked.  That eliminates
the conformance problem.  Such an annotation would facilitate other planner
advances; for example, predtest.c could attempt to refute f(x) given x = k.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest problems

2014-12-19 Thread Mark Kirkwood

On 20/12/14 11:22, Joshua D. Drake wrote:


On 12/19/2014 12:28 AM, Mark Kirkwood wrote:


To me that's a bit over the top stereotyping.



+1

Having been mentioned one or two times myself - it was an unexpected
wow - cool rather than a grumpy/fragile I must be noticed thing. I
think some folk have forgotten the underlying principle of the open
source community - it is about freely giving - time or code etc. The
there must be something in it for me me me meme is - well - the
*other* world view.


It was supposed to be over the top. That doesn't make it any less true.
Sure there are plenty of us that don't have any of the ego petting
issues. However,t there are more of us in those of us that think we
don't, that really, really do.



Heh - that fact that even you are stating it is over the top clearly 
makes it less *generally* true. Sure, there are some/few(?) folk who are 
seeing open source contributions as purely a CV enhancer, and perhaps a 
few in denial about their egos, but I don't see that as the common trend 
in this community (which is great).


regards

Mark





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Initdb-cs_CZ.WIN-1250 buildfarm failures

2014-12-19 Thread Noah Misch
Buildfarm members magpie, treepie and fulmar went absent on 2014-10-29.  Since
returning on 2014-11-16, they have consistently failed with 'initdb: invalid
locale name cs_CZ.WIN-1250'.  No commits in that period readily explain a
regression in this area.  Did the underlying system configurations change?

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New Python vs. old PG on raccoon and jaguarundi

2014-12-19 Thread Noah Misch
PostgreSQL 9.2 and later support Python 3.3 and Python 3.4; PostgreSQL 9.1 and
9.0 do not.  Buildfarm member raccoon upgraded from Python 2.7 to Python 3.3
for its 2013-08-18 run, and it has failed for REL9_1_STABLE and REL9_0_STABLE
ever since.  Please omit --with-python for those branches.  (Pointing to an
older Python is also an option.)  Incidentally, raccoon REL9_4_STABLE has
jammed for some time on a stale git lock file.  As best I recall, you can just
remove /usr/src/pgfarm/build/REL9_4_STABLE/pgsql and let the buildfarm client
re-clone it.

Buildfarm member jaguarundi, which has Python 3.4, activated --with-python for
REL9_1_STABLE as of its 2014-12-15 run.  Please remove --with-python or test
against an older Python.  It already omits --with-python for REL9_0_STABLE.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers