date:20141117

On Mon, Nov 17, 2014 at 5:56 AM, Christian Ullrich ch...@chrullrich.net wrote:
 * Alvaro Herrera wrote:

 Michael Paquier wrote:

 Btw, perhaps this diff should be pushed as a different patch as this is a
 rather different thing:
 -   if (heapRelation-rd_rel-relpersistence ==
 RELPERSISTENCE_UNLOGGED
 
 +   if (indexRelation-rd_rel-relpersistence ==
 RELPERSISTENCE_UNLOGGED 
  !smgrexists(indexRelation-rd_smgr, INIT_FORKNUM)
 As this is a correctness fix, it does not seem necessary to back-patch
 it:
 the parent relation always has the same persistence as its indexes.


 There was an argument for doing it this way that only applies if this
 patch went in, but I can't remember now what it was.

 Anyway I pushed the patch after some slight additional changes.  Thanks!


 The buildfarm says -DCLOBBER_CACHE_ALWAYS does not like this patch.
The complaint comes from jaguarundi, like here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundidt=2014-11-16%2015%3A33%3A23

Adding a new parameter to RelationSetNewRelFile
As Tom mentioned, adding a new parameter to set the persistence
through RelationSetNewRelfilenode works. Patch, actually tested with
CLOBBER_CACHE_ALWAYS, attached.
http://www.postgresql.org/message-id/27238.1416073...@sss.pgh.pa.us
Regards,
-- 
Michael
From a1a7e4182bd1b5d66260220f040aee4a35e91c75 Mon Sep 17 00:00:00 2001
From: Michael Paquier mpaqu...@otacoo.com
Date: Mon, 17 Nov 2014 17:04:05 +
Subject: [PATCH] Fix CLOBBER_CACHE_ALWAYS broken by 85b506b

Previous commit that removed ATChangeIndexesPersistence to replace it
with a lower-level API able to define a new relpersistence for a relation
when reindexing it failed with CLOBBER_CACHE_ALWAYS enabled. This patch
fixes it by setting the relpersistence of the newly-created relation after
its relfilenode is created by passing a new parameter to RelationSetNewRelfilenode
able to set the persistence of a relation.
---
 src/backend/catalog/index.c| 5 +
 src/backend/commands/sequence.c| 3 ++-
 src/backend/commands/tablecmds.c   | 6 --
 src/backend/utils/cache/relcache.c | 6 +++---
 src/include/utils/relcache.h   | 4 +++-
 5 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b57aa95..825235a 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3191,12 +3191,9 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
 			indexInfo-ii_ExclusionStrats = NULL;
 		}
 
-		/* Set the relpersistence of the new index */
-		iRel-rd_rel-relpersistence = persistence;
-
 		/* We'll build a new physical relation for the index */
 		RelationSetNewRelfilenode(iRel, InvalidTransactionId,
-  InvalidMultiXactId);
+  InvalidMultiXactId, persistence);
 
 		/* Initialize the index and rebuild */
 		/* Note: we do not need to re-establish pkey setting */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index e5f7765..4f7f586 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -304,7 +304,8 @@ ResetSequence(Oid seq_relid)
 	 * Same with relminmxid, since a sequence will never contain multixacts.
 	 */
 	RelationSetNewRelfilenode(seq_rel, InvalidTransactionId,
-			  InvalidMultiXactId);
+			  InvalidMultiXactId,
+			  seq_rel-rd_rel-relpersistence);
 
 	/*
 	 * Insert the modified tuple into the new storage file.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 093224f..c57751e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1196,7 +1196,8 @@ ExecuteTruncate(TruncateStmt *stmt)
 			 * as the relfilenode value. The old storage file is scheduled for
 			 * deletion at commit.
 			 */
-			RelationSetNewRelfilenode(rel, RecentXmin, minmulti);
+			RelationSetNewRelfilenode(rel, RecentXmin, minmulti,
+	  rel-rd_rel-relpersistence);
 			if (rel-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED)
 heap_create_init_fork(rel);
 
@@ -1209,7 +1210,8 @@ ExecuteTruncate(TruncateStmt *stmt)
 			if (OidIsValid(toast_relid))
 			{
 rel = relation_open(toast_relid, AccessExclusiveLock);
-RelationSetNewRelfilenode(rel, RecentXmin, minmulti);
+RelationSetNewRelfilenode(rel, RecentXmin, minmulti,
+		  rel-rd_rel-relpersistence);
 if (rel-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED)
 	heap_create_init_fork(rel);
 heap_close(rel, NoLock);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 2250c56..fa75771 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3008,7 +3008,7 @@ RelationBuildLocalRelation(const char *relname,
  */
 void
 RelationSetNewRelfilenode(Relation relation, TransactionId freezeXid,
-		  MultiXactId minmulti)
+		  MultiXactId minmulti, char relpersistence)
 {
 	Oid			newrelfilenode;

Re: [HACKERS] postgres_fdw behaves oddly

2014-11-17 Thread Ashutosh Bapat

Hi Fujita-san,

Here are comments for postgres_fdw-syscol patch.

Sanity

The patch applies and compiles cleanly.
The server regression and regression in contrib/postgres_fdw,file_fdw run
cleanly.

Code
---
1. Instead of a single liner comment System columns can't be sent to
remote., it might be better to explain why system columns can't be sent to
the remote.
2. The comment in deparseVar is single line comment, so it should start and
end on the same line i.e. /* and */ should be on the same line.
3. Since there is already a testcase which triggered this particular
change, it will good, if we add that to regression in postgres_fdw.

On Mon, Nov 17, 2014 at 4:06 PM, Ashutosh Bapat 
ashutosh.ba...@enterprisedb.com wrote:

 Hi Fujita-san,

 Here are my comments about the patch fscan_reltargetlist.patch
 Sanity
 
 Patch applies and compiles cleanly.
 Regressions in test/regress folder and postgres_fdw and file_fdw are clean.

 1. This isn't your change, but we might be able to get rid of assignment
 2071 /* Are any system columns requested from rel? */
 2072 scan_plan-fsSystemCol = false;

 since make_foreignscan() already does that. But I will leave upto you to
 remove this assignment or not.

 2. Instead of using rel-reltargetlist, we should use the tlist passed by
 caller. This is the tlist expected from the Plan node. For foreign scans it
 will be same as rel-reltargetlist. Using tlist would make the function
 consistent with create_*scan_plan functions.

 Otherwise, the patch looks good.

 On Wed, Nov 12, 2014 at 12:55 PM, Etsuro Fujita 
 fujita.ets...@lab.ntt.co.jp wrote:

 (2014/11/11 18:45), Etsuro Fujita wrote:

 (2014/11/10 20:05), Ashutosh Bapat wrote:

 Since two separate issues 1. using reltargetlist instead of attr_needed
 and 2. system columns usage in FDW are being tackled here, we should
 separate the patch into two one for each of the issues.


 Agreed.  Will split the patch into two.


 Here are splitted patches:

 fscan-reltargetlist.patch  - patch for #1
 postgres_fdw-syscol.patch  - patch for #2


 Thanks,

 Best regards,
 Etsuro Fujita




 --
 Best Wishes,
 Ashutosh Bapat
 EnterpriseDB Corporation
 The Postgres Database Company




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] WAL format and API changes (9.5)


On 11/14/2014 10:31 AM, Michael Paquier wrote:

5) Here why not using the 2nd block instead of the 3rd (@_bt_getroot)?
+   XLogBeginInsert();
+   XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
+   XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);


See the comment of the xl_btree_newroot struct. It explains the record 
format of the BTREE_NEWROOT record type:



 * Backup Blk 0: new root page (2 tuples as payload, if splitting old root)
 * Backup Blk 1: left child (if splitting an old root)
 * Backup Blk 2: metapage


When _bt_getroot creates a new root, there is no old root, but the same 
record type is used in _bt_newroot, which uses block ID 1 to refer to 
the old root page.


(Thanks for the comments, again! I'll post a new version soon)

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Refactoring code for sync node detection

On 16 November 2014 12:07, Michael Paquier michael.paqu...@gmail.com wrote:

 Let's work
 the multiple node thing once we have a better spec of how to do it,
 visibly using a dedicated micro-language within s_s_names.

Hmm, please make sure that is a new post. That is easily something I
could disagree with, even though I support the need for more
functionality.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Refactoring code for sync node detection

On Mon, Nov 17, 2014 at 10:00 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 16 November 2014 12:07, Michael Paquier michael.paqu...@gmail.com wrote:

 Let's work
 the multiple node thing once we have a better spec of how to do it,
 visibly using a dedicated micro-language within s_s_names.

 Hmm, please make sure that is a new post. That is easily something I
 could disagree with, even though I support the need for more
 functionality.
Sure. I am not really on that yet though :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited

2014-11-17 Thread Alex Shulgin


Andres Freund and...@2ndquadrant.com writes:
 On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: 
 After reading up through archives on the two $subj related TODO items
 I'm under impression that the patches[1,2] didn't make it mainly because
 of the risk of breaking SSL internals if we try to longjump out of the
 signal handler in the middle of a blocking SSL read and/or if we try to
 report that final transaction/connection aborted notice to the client.

 I've written a patch that allows that - check
 http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de

 I feel pretty confident that that's the way to go. I just need some time
 to polish it.

 What if instead of trying to escape from the signal handler we would set
 a flag and use it to simulate EOF after the recv() is restarted due to
 EINTR?  From the backend perspective it should look like the client has
 just hang up.

 That's pretty much what the above does. Except that it actually deals
 with blocking syscalls by not using them and relying on latches/select
 instead.

Yay, that's nice, because my EOF approach is sort of fishy.

I've checked the patches: they apply and compile on top of current HEAD
(one failed hunk in pqcomm.c), so I can help with testing if needed.

There is a (short) list of items that caught my attention.  I will post
in reply to your patches thread.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

Michael Paquier wrote:

 The complaint comes from jaguarundi, like here:
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguarundidt=2014-11-16%2015%3A33%3A23
 
 As Tom mentioned, adding a new parameter to set the persistence
 through RelationSetNewRelfilenode works. Patch, actually tested with
 CLOBBER_CACHE_ALWAYS, attached.

I wrote an identical patch on Saturday and watched it pass
CLOBBER_CACHE_ALWAYS test on Sunday.  But the reason I didn't push is I
couldn't understand *why* is there a problem here.  I imagine that the
source of the issue is supposed to be a relcache invalidation that takes
place (in the original code) after reindex_index changes relpersistence,
and before RelationSetNewRelfilenode() creates the filenode.  But at
what point does that code absorb invalidation messages?  Or is there a
completely different mechanism that causes the problem?  If so, what?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

Alvaro Herrera wrote:

 I wrote an identical patch on Saturday and watched it pass
 CLOBBER_CACHE_ALWAYS test on Sunday.  But the reason I didn't push is I
 couldn't understand *why* is there a problem here.  I imagine that the
 source of the issue is supposed to be a relcache invalidation that takes
 place (in the original code) after reindex_index changes relpersistence,
 and before RelationSetNewRelfilenode() creates the filenode.  But at
 what point does that code absorb invalidation messages?  Or is there a
 completely different mechanism that causes the problem?  If so, what?

Ah, it's the anti-collision stuff in GetNewOid(), isn't it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)


On 11/17/2014 03:22 PM, Heikki Linnakangas wrote:

Here is an updated version of the patch. It's rebased over current
master, and fixes a bunch of issues you and Michael pointed out, and a
ton of other cleanup.


That patch had two extra, unrelated patches applied to it. One to use 
the Intel CRC instruction for the CRC calculation, and another to speed 
up PageRepairFragmentation, by replacing the pq_qsort() call with a 
custom sort algorithm. Sorry about that. (the latter is something that 
showed up very high in a perf profile of replaying the WAL generated by 
pgbench - I'll start a new thread about that)


Here is the same patch without those extra things mixed in.

- Heikki



wal-format-and-api-changes-11.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-11-17 Thread Sawada Masahiko

On Thu, Nov 6, 2014 at 8:00 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Oct 24, 2014 at 3:41 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 On Wed, Oct 22, 2014 at 9:21 PM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:

 Sawada Masahiko wrote:

  Thank you for reviewing.
  I agree 2) - 5).
  Attached patch is latest version patch I modified above.
  Also, I noticed I had forgotten to add the patch regarding document of
  reindexdb.

 Please don't use pg_catalog in the regression test.  That way we will
 need to update the expected file whenever a new catalog is added, which
 seems pointless.  Maybe create a schema with a couple of tables
 specifically for this, instead.


 Attached new regression test.

 Hunk #1 FAILED at 1.
 1 out of 2 hunks FAILED -- saving rejects to file
 src/bin/scripts/t/090_reindexdb.pl.rej

 I tried to apply the 001 patch after applying the 000, but it was not
 applied cleanly.

 At least to me, it's more intuitive to use SCHEMA instead of ALL IN SCHEMA
 here because we already use DATABASE instead of ALL IN DATABASE.


Thank you for reporting.

Um, that's one way of looking at it
I think It's not quite new syntax, and we have not used ALL IN
clause in REINDEX command by now.
If we add SQL command to reindex table of all in table space as newly syntax,
we might be able to name it REINDEX TABLE ALL IN TABLESPACE.

Anyway, I created patch of REINDEX SCHEMA version, and attached.
Also inapplicable problem has been solved.

Regards,

---
Sawada Masahiko


001_Add_schema_option_to_reindexdb_v4.patch
Description: Binary data


000_reindex_schema_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapshot too large error when initializing logical replication (9.4)

Andres Freund wrote:
 Hi,
 
 On 2014-10-25 18:09:36 -0400, Steve Singer wrote:
  I sometimes get the error snapshot too large from my logical replication
  walsender process when in response to a CREATE_REPLICATION_SLOT.
 
 Yes. That's possible if 'too much' was going on until a consistent point
 was reached.  I think we can just use a much larger size for the array
 if necessary.
 
 I've attached patch for this. Could you try whether that helps? I don't
 have a testcase handy that reproduces the problem.

You haven't pushed this, have you?



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapshot too large error when initializing logical replication (9.4)

On 2014-11-17 11:51:54 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
  Hi,
  
  On 2014-10-25 18:09:36 -0400, Steve Singer wrote:
   I sometimes get the error snapshot too large from my logical replication
   walsender process when in response to a CREATE_REPLICATION_SLOT.
  
  Yes. That's possible if 'too much' was going on until a consistent point
  was reached.  I think we can just use a much larger size for the array
  if necessary.
  
  I've attached patch for this. Could you try whether that helps? I don't
  have a testcase handy that reproduces the problem.
 
 You haven't pushed this, have you?

No, but it's on my todo list.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On Sat, Oct 25, 2014 at 1:50 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Oct 24, 2014 at 4:05 PM, Andres Freund and...@2ndquadrant.com
 wrote:

 On 2014-10-24 15:59:30 +0530, Amit Kapila wrote:
and w.r.t performance it can lead extra
function call, few checks and I think in some cases even can
acquire/release spinlock.
  
   I fail to see how that could be the case.
 
  Won't it happen incase first backend sets releaseOK to true and another
  backend which tries to wakeup waiters on lock will acquire spinlock
  and tries to release the waiters.

 Sure, that can happen.
   And again, this is code that's
   only executed around a couple syscalls. And the cacheline will be
   touched around there *anyway*.
 
  Sure, but I think syscalls are required in case we need to wake any
  waiter.

 It won't wake up a waiter if there's none on the list.

 Yeap, but still it will acquire/release spinlock.

 And it'd be a pretty pointless
 behaviour, leading to useless increased contention. The only time
 it'd
 make sense for X to be woken up is when it gets run faster than
 the S
 processes.
   
Do we get any major benefit by changing the logic of waking up
waiters?
  
   Yes.
 
  I think one downside I could see of new strategy is that the chance of
  Exclusive waiter to take more time before getting woked up is increased
  as now it will by pass Exclusive waiters in queue.

 Note that that *already* happens for any *new* shared locker that comes
 in. It doesn't really make sense to have share lockers queued behind the
 exclusive locker if others just go in front of it anyway.

 Yeah, but I think it is difficult to avoid that behaviour as even when it
 wakes
 Exclusive locker, some new shared locker can comes in and acquire the
 lock before Exclusive locker.

 I think it is difficult to say what is the best waking strategy, as priority
 for
 Exclusive lockers is not clearly defined incase of LWLocks.

Andres, where are we with this patch?

1. You're going to commit it, but haven't gotten around to it yet.

2. You're going to modify it some more and repost, but haven't gotten
around to it yet.

3. You're willing to see it modified if somebody else does the work,
but are out of time to spend on it yourself.

4. Something else?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Escaping from blocked send() reprised.

2014-11-17 Thread Alex Shulgin


Andres Freund and...@2ndquadrant.com writes:

 I've invested some more time in this:
 0002 now makes sense on its own and doesn't change anything around the
  interrupt handling. Oh, and it compiles without 0003.

In this patch, the endif appears to be misplaced in PostgresMain:

+   if (MyProcPort != NULL)
+   {
+#ifdef WIN32
+   pgwin32_noblock = true;
+#else
+   if (!pg_set_noblock(MyProcPort-sock))
+   ereport(COMMERROR,
+   (errmsg(could not set socket to 
nonblocking mode: %m)));
+   }
+#endif
+
pqinitmask();

 0003 Sinval/notify processing got simplified further. There really isn't
  any need for DisableNotifyInterrupt/DisableCatchupInterrupt
  anymore. Also begin_client_read/client_read_ended don't make much
  sense anymore. Instead introduce ProcessClientReadInterrupt (which
  wants a better name).
 There's also a very WIP
 0004 Allows secure_read/write be interrupted when ProcDiePending is
  set. All of that happens via the latch mechanism, nothing happens
  inside signal handlers. So I do think it's quite an improvement
  over what's been discussed in this thread.
  But it (and the other approaches) do noticeably increase the
  likelihood of clients not getting the error message if the client
  isn't actually dead. The likelihood of write() being blocked
  *temporarily* due to normal bandwidth constraints is quite high
  when you consider COPY FROM and similar. Right now we'll wait till
  we can put the error message into the socket afaics.

 1-3 need some serious comment work, but I think the approach is
 basically sound. I'm much, much less sure about allowing send() to be
 interrupted.

After re-reading these I don't see the rest of items I wanted to inqury
about anymore, so it just makes more sense now.

One thing I did try is sending a NOTICE to the client when in
ProcessInterrupts() and DoingCommandRead is true.  I think[1] it was
expected to be delivered instantly, but actually the client (psql) only
displays it after sending the next statement.

While I'm reading on FE/BE protocol someone might want to share his
wisdom on this subject.  My guess: psql blocks on readline/libedit call
and can't effectively poll the server socket before complete input from
user.

--
Alex

[1] http://www.postgresql.org/message-id/1262173040.19367.5015.camel@ebony

  ``AFAIK, NOTICE was suggested because it can be sent at any time,
whereas ERRORs are only associated with statements.''


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-11-17 10:21:04 -0500, Robert Haas wrote:
 Andres, where are we with this patch?
 
 1. You're going to commit it, but haven't gotten around to it yet.
 
 2. You're going to modify it some more and repost, but haven't gotten
 around to it yet.
 
 3. You're willing to see it modified if somebody else does the work,
 but are out of time to spend on it yourself.
 
 4. Something else?

I'm working on it. Amit had found a hang on PPC that I couldn't
reproduce on x86. Since then I've reproduced it and I think yesterday I
found the problem. Unfortunately it always took a couple hours to
trigger...

I've also made some, in my opinion, cleanups to the patch since
then. Those have the nice side effect of making the size of struct
LWLock smaller, but that wasn't actually the indended effect.

I'll repost once I've verified the problem is fixed and I've updated all
commentary.

The current problem is that I seem to have found a problem that's also
reproducible with master :(. After a couple of hours a
pgbench -h /tmp -p 5440 scale3000 -M prepared -P 5 -c 180 -j 60 -T 2 -S
against a
-c max_connections=200 -c shared_buffers=4GB
cluster seems to hang on PPC. With all the backends waiting in buffer
mapping locks. I'm now making sure it's really master and not my patch
causing the problem - it's just not trivial with 180 processes involved.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Thu, Nov 13, 2014 at 9:12 PM, Stephen Frost sfr...@snowman.net wrote:
  I'm not a fan of using pg_class- there are a number of columns in there
  which I would *not* wish to be allowed to be different per partition
  (starting with relowner and relacl...).  Making those NULL would be just
  as bad (probably worse, really, since we'd also need to add new columns
  to pg_class to indicate the partitioning...) as having a sparsely
  populated new catalog table.

 I think you are, again, confused as to what we're discussing.  Nobody,
 including Alvaro, has proposed a design where the individual
 partitions don't have pg_class entries of some kind.  What we're
 talking about is where to store the metadata for partition exclusion
 and tuple routing.

 This discussion has gone a few rounds before and, yes, I was just
 jumping into the middle of this particular round, but I'm pretty sure
 I'm not the first to point out that storing the individual partition
 information into pg_class isn't ideal since there are pieces that we
 don't actually want to be different per partition, as I outlined
 previously.  Perhaps what that means is we should actually go the other
 way and move *those* columns into a new catalog instead.

 Consider this (totally off-the-cuff):

 pg_relation (pg_tables? pg_heaps?)
   oid
   relname
   relnamespace
   reltype
   reloftype
   relowner
   relam (?)
   relhas*
   relisshared
   relpersistence
   relkind (?)
   relnatts
   relchecks
   relacl
   reloptions
   relhowpartitioned (?)

 pg_class
   pg_relation.oid
   relfilenode
   reltablespace
   relpages
   reltuples
   reltoastrelid
   reltoastidxid
   relfrozenxid
   relhasindexes (?)
   relpartitioninfo (whatever this ends up being)

 The general idea being to seperate the user-facing notion of a table
 from the underlying implementation, with the implementation allowing
 multiple sets of files to be used for each table.  It's certainly not
 for the faint of heart, but we saw what happened with our inheiritance
 structure allowing different permissions on the child tables- we ended
 up creating a pretty grotty hack to deal with it (going through the
 parent bypasses the permissions).  That's the best solution for that
 situation, but it's far from ideal and it'd be nice to avoid that same
 risk with partitioning.  Additionally, if each partition is in pg_class,
 how are we handling name conflicts?  Why do individual partitions even
 need to have a name?  Do we allow queries against them directly?  etc..

There's certainly something to this, but not for the faint of heart
sounds like an understatement.

One of the good things about inheritance is that, if the system
doesn't automatically do the right thing, there's usually an escape
hatch.  If the INSERT trigger you use for tuple routing is too slow,
you can insert directly into the target partition.  If your query
doesn't realize that it can prune away all the partitions but one, or
takes too long to do it, you can query directly against that
partition.  These aren't beautiful things and I'm sure we're all
united in wanting a mechanism that will reduce the need to do them,
but we need to make sure that we are removing the need for the escape
hatch, and not just cementing it shut.

In other words, I don't think there is a problem with people querying
child tables directly; the problem is that people are forced to do so
in order to get good performance.  We shouldn't remove the ability for
people to do that unless we're extremely certain we've fixed the
problem that leads them to wish to do so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On Mon, Nov 17, 2014 at 10:31 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-11-17 10:21:04 -0500, Robert Haas wrote:
 Andres, where are we with this patch?

 1. You're going to commit it, but haven't gotten around to it yet.

 2. You're going to modify it some more and repost, but haven't gotten
 around to it yet.

 3. You're willing to see it modified if somebody else does the work,
 but are out of time to spend on it yourself.

 4. Something else?

 I'm working on it. Amit had found a hang on PPC that I couldn't
 reproduce on x86. Since then I've reproduced it and I think yesterday I
 found the problem. Unfortunately it always took a couple hours to
 trigger...

 I've also made some, in my opinion, cleanups to the patch since
 then. Those have the nice side effect of making the size of struct
 LWLock smaller, but that wasn't actually the indended effect.

 I'll repost once I've verified the problem is fixed and I've updated all
 commentary.

 The current problem is that I seem to have found a problem that's also
 reproducible with master :(. After a couple of hours a
 pgbench -h /tmp -p 5440 scale3000 -M prepared -P 5 -c 180 -j 60 -T 2 -S
 against a
 -c max_connections=200 -c shared_buffers=4GB
 cluster seems to hang on PPC. With all the backends waiting in buffer
 mapping locks. I'm now making sure it's really master and not my patch
 causing the problem - it's just not trivial with 180 processes involved.

Ah, OK.  Thanks for the update.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

2014-11-17 Thread Steve Singer


On 11/16/2014 04:49 PM, Steve Singer wrote:



I installed things following the above steps on a different system 
than my usual development laptop and I have been unable to reproduce 
the error so for (on that system).  But I am still able to reproduce 
it on occasion on my normal development laptop.






After continuously running the test I was eventually able to reproduce 
the error on the other system as well.




Greetings,

Andres Freund









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

On 2014-11-17 10:33:52 -0500, Steve Singer wrote:
 On 11/16/2014 04:49 PM, Steve Singer wrote:
 
 
 I installed things following the above steps on a different system than my
 usual development laptop and I have been unable to reproduce the error so
 for (on that system).  But I am still able to reproduce it on occasion on
 my normal development laptop.

Thanks for the testcase! I'll try to reproduce/fix.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

On 2014-11-13 22:23:02 -0500, Steve Singer wrote:
 On 11/13/2014 02:44 PM, Andres Freund wrote:
 H
 
 I've pushed a fix for a bug that could possibly also cause
 this. Although it'd be odd that it always hits the user catalog
 table. Except if your tests mostly modify the slony tables, but do not
 do much DDL otherwise?
 
 The test I was running doesn't do DDL, so only the user catalog tables would
 have changes being tracked.
 
 I still sometimes get the error. When I get sometime on the weekend I'll
 work on some detailed  instructions on how to grab and setup the various
 components to replicate this test.
 
 Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure in
 a normal client connection, not the walsender
 
 #3  0x006b4978 in GetSerializableTransactionSnapshotInt (
 snapshot=snapshot@entry=0xbfa8a0 CurrentSnapshotData,
 sourcexid=sourcexid@entry=0) at predicate.c:1738
 #4  0x006b66c3 in GetSafeSnapshot (origSnapshot=optimized out)
 at predicate.c:1517
 #5  GetSerializableTransactionSnapshot (
 snapshot=0xbfa8a0 CurrentSnapshotData) at predicate.c:1598
 #6  0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200
 #7  0x006c0e35 in exec_simple_query (
 query_string=0x1fd01b8 select ev_origin, ev_seqno, ev_timestamp,
 ev_snapshot, \pg_catalog\.txid_snapshot_xmin(ev_snapshot),
 \pg_catalog\.txid_snapshot_xmax(ev_snapshot),
 coalesce(ev_provider_xid,\...)
 at postgres.c:959
 #8  PostgresMain (argc=optimized out, argv=argv@entry=0x1f5ab50,
 
 
 I have no idea if this has anything to do with your recent changes or some
 other change. I haven't so far been able to replicate that since the first
 time I saw it.

That crash is decidedly odd. Any chance you still have the full
backtrace around?

This is in the SSI code... I'm not immediately seeing how this could be
related to logical decoding. It can't even be a imported snapshot,
because the exported snapshot is marked repeatable read.

Are you actually using serializable transactions? If so, how and why?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] BRIN page type identifier


The special area in a BRIN page looks like this:


/* special space on all BRIN pages stores a type identifier */
#define BRIN_PAGETYPE_META  0xF091
#define BRIN_PAGETYPE_REVMAP0xF092
#define BRIN_PAGETYPE_REGULAR   0xF093
...
typedef struct BrinSpecialSpace
{
uint16  flags;
uint16  type;
} BrinSpecialSpace;


I believe this is supposed to follow the usual convention that the last 
two bytes of a page can be used to identify the page type. SP-GiST uses 
0xFF82, while GIN uses values 0x00XX.


However, because the special size is MAXALIGNed, the 'type' field are 
not the last 2 bytes on the page, as intended. I'd suggest just adding 
char padding[6]  in BrinSpecialSpace, before 'flags'. That'll waste 4 
bytes on 32-bit systems, but that seems acceptable.


Also, the comment in GinPageOpaqueData needs updating:


/*
 * Page opaque data in an inverted index page.
 *
 * Note: GIN does not include a page ID word as do the other index types.
 * This is OK because the opaque data is only 8 bytes and so can be reliably
 * distinguished by size.  Revisit this if the size ever increases.
 * Further note: as of 9.2, SP-GiST also uses 8-byte special space.  This is
 * still OK, as long as GIN isn't using all of the high-order bits in its
 * flags word, because that way the flags word cannot match the page ID used
 * by SP-GiST.
 */


BRIN now also uses 8-byte special space. While you're at it, might want 
to move that comment somewhere else, as it's really about a convention 
among all page types, not just GIN.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unintended restart after recovery error

On Thu, Nov 13, 2014 at 10:59 PM, Fujii Masao masao.fu...@gmail.com wrote:
 442231d7f71764b8c628044e7ce2225f9aa43b6 introduced the latter rule
 for hot-standby case. Maybe *during crash recovery* (i.e., hot standby
 should not be enabled) it's better to treat the crash of startup process as
 a catastrophic crash.

Maybe, but why, specifically?  If the startup process failed
internally, it's probably because it hit an error during the replay of
some WAL record.  So if we restart it, it will back up to the previous
checkpoint or restartpoint, replay the same WAL records as before, and
die again in the same spot.  We don't want it to sit there and do that
forever in an infinite loop, so it makes sense to kill the whole
server.

But if the startup process was killed off because the checkpointer
croaked, that logic doesn't necessarily apply.  There's no reason to
assume that the replay of a particular WAL record was what killed the
checkpointer; in fact, it seems like the odds are against it.  So it
seems right to fall back to our general principle of restarting the
server and hoping that's enough to get things back on line.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

On Thu, Nov 13, 2014 at 7:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 12 November 2014 00:54, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 3:29 AM, Simon Riggs si...@2ndquadrant.com wrote:
 * only functions marked as CONTAINS NO SQL
 We don't really know what proisparallel is, but we do know what
 CONTAINS NO SQL means and can easily check for it.
 Plus I already have a patch for this, slightly bitrotted.

 Interestingly, I have a fairly solid idea of what proisparallel is,
 but I have no clear idea what CONTAINS NO SQL is or why it's relevant.
 I would imagine that srandom() contains no SQL under any reasonable
 definition of what that means, but it ain't parallel-safe.

 What is wrong in generating random numbers in parallel?

If they're random, nothing, provide that you use a different seed in
each backend.  But if the seed has been set using srandom(), then we
must generate the particular sequence of random numbers associated
with that seed.  If we don't, the behavior is incompatible with what
happens apart from parallel mode.

 But I'm sure many volatile functions would be annoying to support, so
 CONTAINS NO SQL and STABLE/IMMUTABLE seems OK for the first thing.

It might be OK to assume that immutable functions are fine, but not
all stable functions look like they'll be safe, or not without
additional effort: e.g. inet_client_addr(), pg_backend_pid(),
pg_cursor(), pg_event_trigger_dropped_objects().  Or even, more
mundanely, generate_series(), unless we ensure all calls are in the
same backend.  On the other hand, there are quite a few stable
functions that it seems like we would want to allow in parallel
queries, like to_char(), concat(), and textanycat().

I still don't know what CONTAINS NO SQL means.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

On Fri, Nov 14, 2014 at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-11-13 11:41:18 -0500, Robert Haas wrote:
 On Wed, Nov 12, 2014 at 7:31 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  But I think it won't work realistically. We have a *lot* of
  infrastructure that refers to indexes using it's primary key. I don't
  think we want to touch all those places to also disambiguate on some
  other factor. All the relevant APIs are either just passing around oids
  or relcache entries.

 I'm not quite following this.  The whole point is to AVOID having two
 indexes.  You have one index which may at times have two sets of
 physical storage.

 Right. But how are we going to refer to these different relfilenodes?
 All the indexing infrastructure just uses oids and/or Relation pointers
 to refer to the index. How would you hand down the knowledge which of
 the relfilenodes is supposed to be used in some callchain?

If you've got a Relation, you don't need someone to tell you which
physical storage to use; you can figure that out for yourself by
looking at the Relation.  If you've got an OID, you're probably going
to go conjure up a Relation, and then you can do the same thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

On 17 November 2014 07:31, Jeff Davis pg...@j-davis.com wrote:
 On Sat, 2014-11-15 at 21:36 +, Simon Riggs wrote:
 Do I understand correctly that we are trying to account for exact
 memory usage at palloc/pfree time? Why??

 Not palloc chunks, only tracking at the level of allocated blocks (that
 we allocate with malloc).

 It was a surprise to me that accounting at that level would have any
 measurable impact, but Robert found a reasonable case on a POWER machine
 that degraded a couple percent. I wasn't able to reproduce it
 consistently on x86.

Surprise to me also.

Robert's tests showed a deviation of 0.4 sec after a restart. ISTM
that we wouldn't see that every time.

AFAIK the whole purpose of the memory allocator is to reduce the
number of system calls, so if we are doing so many malloc() calls as
to be noticeable just for accounting then something is wrong.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] double counting of lines in psql

2014-11-17 Thread Andrew Dunstan



This tiny change fixes what I think is a longstanding bug in psql. I 
causes the first line of every cell to be counted twice, whereas it 
should in fact be excluded from extra_lines / extra_row_output_lines. 
The bug appears to date back to commit 43ee2282 in 2008. Changing it 
appears to make my proposed pager_min_lines feature work as expected.


So, should it be backpatched? It's a behaviour change, albeit that the 
existing behaviour is a bug, and will cause the pager to be invoked on 
output that is way too short (by about half a screen's height, I think).


cheers

andrew



diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index 2e158b8..c93f744 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -837,7 +837,7 @@ print_aligned_text(const printTableContent *cont, 
FILE *fout)

{
unsigned int extra_lines;

-   extra_lines = (width - 1) / width_wrap[i] + nl_lines;
+   extra_lines = ((width - 1) / width_wrap[i]) + (nl_lines 
- 1);

if (extra_lines  extra_row_output_lines)
extra_row_output_lines = extra_lines;
}




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

On 17 November 2014 07:31, Jeff Davis pg...@j-davis.com wrote:

 To calculate the total memory used, I included a function
 MemoryContextMemAllocated() that walks the memory context and its
 children recursively.

If we assume that we want the results accurate to within 5%, then we
can work out a good sampling rate to achieve that accuracy.

We would probably need a counting mechanism that didn't slow down O(N)
as our allocations get bigger.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

On 14 November 2014 11:02, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

 I'd like to throw community folks a question.
 Did someone have a discussion to the challenge of aggregate push-down across
 relations join in the past? It potentially reduces number of rows to be 
 joined.
 If we already had, I'd like to check up the discussion at that time.

Yes, I was looking at aggregate pushdown. I think it needs the same
changes to aggregates discussed upthread.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] ltree::text not immutable?

2014-11-17 Thread Christoph Berg

Re: Tom Lane 2014-11-05 29132.1415144...@sss.pgh.pa.us
 I wrote:
  An alternative that just occurred to me is to put the no-volatile-
  I/O-functions check into CREATE TYPE, but make it just WARNING not
  ERROR.  That would be nearly as good as an ERROR in terms of nudging
  people who'd accidentally omitted a volatility marking from their
  custom types.  But we could leave chkpass as-is and wait to see if
  anyone complains hey, why am I getting this warning?.  If we don't
  hear anyone complaining, maybe that means we can get away with changing
  the type's behavior in 9.6 or later.
 
 Attached is a complete patch along these lines.  As I suggested earlier,
 this just makes the relevant changes in ltree--1.0.sql and
 pg_trgm--1.1.sql without bumping their extension version numbers,
 since it doesn't seem important enough to justify a version bump.
 
 I propose that we could back-patch the immutability-additions in ltree and
 pg_trgm, since they won't hurt anything and might make life a little
 easier for future adopters of those modules.  The WARNING additions should
 only go into HEAD though.

In HEAD, there's this WARNING now:

WARNING:  type input function chkpass_in should not be volatile

From 66c029c842629958b3ae0d389f24ea3407225723:

A fly in the ointment is that chkpass_in actually is volatile, because
of its use of random() to generate a fresh salt when presented with a
not-yet-encrypted password.  This is bad because of the general assumption
that I/O functions aren't volatile: the consequence is that records or
arrays containing chkpass elements may have input behavior a bit different
from a bare chkpass column.  But there seems no way to fix this without
breaking existing usage patterns for chkpass, and the consequences of the
inconsistency don't seem bad enough to justify that.  So for the moment,
just document it in a comment.

IMHO built-in functions (even if only in contrib) shouldn't be raising
warnings - the user can't do anything about the warnings anyway, so
they shouldn't get notified in the first place.

(Catched by Debian's postgresql-common testsuite)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

On 17 November 2014 16:01, Robert Haas robertmh...@gmail.com wrote:

 I still don't know what CONTAINS NO SQL means.

It's a function marking that would indicate we aren't allowed to take
snapshots or run SQL.

I think you should publish the scheme for marking functions as safe
for parallelism, so we can judge that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] ltree::text not immutable?

2014-11-17 Thread Tom Lane

Christoph Berg c...@df7cb.de writes:
 In HEAD, there's this WARNING now:
 WARNING:  type input function chkpass_in should not be volatile

Yeah, that's intentional.

 IMHO built-in functions (even if only in contrib) shouldn't be raising
 warnings - the user can't do anything about the warnings anyway, so
 they shouldn't get notified in the first place.

The point is to find out how many people care ...

 (Catched by Debian's postgresql-common testsuite)

... and by people, I do not mean test suites.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

2014-11-17 Thread Steve Singer


On 11/17/2014 10:37 AM, Andres Freund wrote:

On 2014-11-13 22:23:02 -0500, Steve Singer wrote:


Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure in
a normal client connection, not the walsender

#3  0x006b4978 in GetSerializableTransactionSnapshotInt (
 snapshot=snapshot@entry=0xbfa8a0 CurrentSnapshotData,
 sourcexid=sourcexid@entry=0) at predicate.c:1738
#4  0x006b66c3 in GetSafeSnapshot (origSnapshot=optimized out)
 at predicate.c:1517
#5  GetSerializableTransactionSnapshot (
 snapshot=0xbfa8a0 CurrentSnapshotData) at predicate.c:1598
#6  0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200
#7  0x006c0e35 in exec_simple_query (
 query_string=0x1fd01b8 select ev_origin, ev_seqno, ev_timestamp,
ev_snapshot, \pg_catalog\.txid_snapshot_xmin(ev_snapshot),
\pg_catalog\.txid_snapshot_xmax(ev_snapshot),
coalesce(ev_provider_xid,\...)
 at postgres.c:959
#8  PostgresMain (argc=optimized out, argv=argv@entry=0x1f5ab50,


I have no idea if this has anything to do with your recent changes or some
other change. I haven't so far been able to replicate that since the first
time I saw it.
That crash is decidedly odd. Any chance you still have the full
backtrace around?


Yes I still have the core file



This is in the SSI code... I'm not immediately seeing how this could be
related to logical decoding. It can't even be a imported snapshot,
because the exported snapshot is marked repeatable read.

Are you actually using serializable transactions? If so, how and why?


Yes, the test client that performs the 'simulated workload' does some 
serializable transactions.  It connects as a normal client via JDBC and 
does a prepareStatement(SET TRANSACTION ISOLATION LEVEL SERIALIZABLE) 
and then does some DML. Why? because it seemed like a good thing to 
include in the test suite.


Your right this might have nothing to do with logical decoding.   I 
haven't been able to reproduce again either, I can't even say
 if this problem was introduced to 9.4 in the past month or if it has 
been around much longer and I just haven't hit it before.







Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

Hi,

Kevin: CCed you, because it doesn't really look like a logical decoding
related issue.

On 2014-11-17 11:25:40 -0500, Steve Singer wrote:
 On 11/17/2014 10:37 AM, Andres Freund wrote:
 On 2014-11-13 22:23:02 -0500, Steve Singer wrote:
 
 
 Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure in
 a normal client connection, not the walsender
 
 #3  0x006b4978 in GetSerializableTransactionSnapshotInt (
  snapshot=snapshot@entry=0xbfa8a0 CurrentSnapshotData,
  sourcexid=sourcexid@entry=0) at predicate.c:1738
 #4  0x006b66c3 in GetSafeSnapshot (origSnapshot=optimized out)
  at predicate.c:1517
 #5  GetSerializableTransactionSnapshot (
  snapshot=0xbfa8a0 CurrentSnapshotData) at predicate.c:1598
 #6  0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200
 #7  0x006c0e35 in exec_simple_query (
  query_string=0x1fd01b8 select ev_origin, ev_seqno, ev_timestamp,
 ev_snapshot, \pg_catalog\.txid_snapshot_xmin(ev_snapshot),
 \pg_catalog\.txid_snapshot_xmax(ev_snapshot),
 coalesce(ev_provider_xid,\...)
  at postgres.c:959
 #8  PostgresMain (argc=optimized out, argv=argv@entry=0x1f5ab50,
 
 
 I have no idea if this has anything to do with your recent changes or some
 other change. I haven't so far been able to replicate that since the first
 time I saw it.
 That crash is decidedly odd. Any chance you still have the full
 backtrace around?
 
 Yes I still have the core file

Cool, could you show the full thing? Or did you just snip it because
it's just the Assert/ExceptionalCondition thing?

Could you print *snapshot in frame #3?

 
 This is in the SSI code... I'm not immediately seeing how this could be
 related to logical decoding. It can't even be a imported snapshot,
 because the exported snapshot is marked repeatable read.
 
 Are you actually using serializable transactions? If so, how and why?
 
 Yes, the test client that performs the 'simulated workload' does some
 serializable transactions.  It connects as a normal client via JDBC and does
 a prepareStatement(SET TRANSACTION ISOLATION LEVEL SERIALIZABLE) and then
 does some DML. Why? because it seemed like a good thing to include in the
 test suite.

Yes, it's a good reason as the above backtrace proves ;). I'm just
trying to understand uner which circumstances it happens.

The above backtrace looks like it can only be triggered if you do a
BEGIN TRANSACTION SERIALIZABLE DEFERRABLE READ ONLY; Is that something
you do?

 Your right this might have nothing to do with logical decoding.   I haven't
 been able to reproduce again either, I can't even say
  if this problem was introduced to 9.4 in the past month or if it has been
 around much longer and I just haven't hit it before.

It's not hard to imagine that the safe/deferred snapshot code isn't all
that well exercised. I don't think many people use it yet.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] ltree::text not immutable?

2014-11-17 Thread Christoph Berg

Re: Tom Lane 2014-11-17 6903.1416241...@sss.pgh.pa.us
 Christoph Berg c...@df7cb.de writes:
  In HEAD, there's this WARNING now:
  WARNING:  type input function chkpass_in should not be volatile
 
 Yeah, that's intentional.
 
  IMHO built-in functions (even if only in contrib) shouldn't be raising
  warnings - the user can't do anything about the warnings anyway, so
  they shouldn't get notified in the first place.
 
 The point is to find out how many people care ...

Is the point of this to figure out whether to fix this properly, or to
revert to the old code? The current status isn't something that should
be released with 9.5.

  (Catched by Debian's postgresql-common testsuite)
 
 ... and by people, I do not mean test suites.

Well atm this breaks the building of 9.5 packages. This means we are
not going to find out if anyone cares by this way :)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

On 15.11.2014 22:36, Simon Riggs wrote:
 On 16 October 2014 02:26, Jeff Davis pg...@j-davis.com wrote:
 
 The inheritance is awkward anyway, though. If you create a tracked 
 context as a child of an already-tracked context, allocations in 
 the newer one won't count against the original. I don't see a way 
 around that without introducing even more performance problems.
 
 This seems to have reached impasse, which is a shame. Let me throw 
 out a few questions to see if that might help.
 
 Do I understand correctly that we are trying to account for exact 
 memory usage at palloc/pfree time? Why??
 
 Surely we just want to keep track of the big chunks? Does this need 
 to be exact? How exact does it need to be?

What do you mean by big chunks? Blocks, or really over-sized chunks
(over 8kB)? I assume blocks, and that's what the proposed patches are doing.

 Or alternatively, can't we just sample the allocations to reduce
 the overhead?

That might work, but I'm afraid it's trickier than it might seem.

 Are there some assumptions we can make about micro-contexts never 
 needing to be tracked at all?

What is a micro-context?

 Jeff seems not too bothered by inheritance, whereas Tomas thinks its
 essential. Perhaps there is a middle ground, depending upon the role
 of the context?

Having a hierarchy of memory contexts and an accounting completely
ignoring that hierarchy seems a bit strange to me.

Maybe we can impose some restrictions, for example:

  (a) a single context with tracking explicitly requested (in the whole
  hierarchy)

  (b) when tracking is requested for a context, the parent context must
  not have tracking enabled

Either of these restrictions would prevent a situation where a context
has to update accounting for two parent contexts. That'd allow updating
a single place (because each context has a single parent context with
tracking requested).

But I'm not sure that those restrictions are acceptable, that we can
enforce them reliably, or how frequently that'd break external code
(e.g. extensions with custom aggregates etc.). Because this is part of
the public API, including the option to enable accounting for a memory
context.

Imagine you write an extension with a custom aggregate (or whatever) and
it works just fine. Then we tweak something in the executor, requesting
accounting for another context, and the extension suddenly breaks.

Or maybe we don't change anything, and the aggregate works fine with
Group Aggregate, but breaks whenever the plan switches to Hash Aggregate
(because that's using accounting).

Or maybe the user just redirects the tracking somewhere else (e.g. by
supplying a pointer to the v6 of the patch), excluding the context from
the accounting entirely.

So yeah, I think the inheritance is an important aspect, at least for a
generic accounting infrastructure. I'd bet that if we choose to do that,
there'll be a good deal of various WTF moments and foot-shooting in the
near future ...

On 23/8 I proposed to use a separate hierarchy of accounting info,
parallel to the memory context hierarchy (but only for contexts with
tracking explicitly requested)

   http://www.postgresql.org/message-id/53f7e83c.3020...@fuzzy.cz

but no response so far. I do see ~1% regression on Robert's benchmark
(on x86), but it's quite noisy. Not sure about PPC, which is what Robert
used. I don't see how to make it faster without sacrificing full
inheritance support ...

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

Hi,

On 2014-11-16 23:31:51 -0800, Jeff Davis wrote:
 *** a/src/include/nodes/memnodes.h
 --- b/src/include/nodes/memnodes.h
 ***
 *** 60,65  typedef struct MemoryContextData
 --- 60,66 
   MemoryContext nextchild;/* next child of same parent */
   char   *name;   /* context name (just for 
 debugging) */
   boolisReset;/* T = no space alloced since 
 last reset */
 + uint64  mem_allocated;  /* track memory allocated for this 
 context */
   #ifdef USE_ASSERT_CHECKING
   boolallowInCritSection; /* allow palloc in critical 
 section */
   #endif

That's quite possibly one culprit for the slowdown. Right now one
AllocSetContext struct fits precisely into three cachelines. After your
change it doesn't anymore.

Consider not counting memory in bytes but blocks and adding it directly
after the NodeTag. That'd leave the size the same as right now.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-11-17 Thread Peter Geoghegan

On Mon, Nov 10, 2014 at 3:33 PM, Peter Geoghegan p...@heroku.com wrote:
 * Documentation clean-up - as I mentioned, I tried to address Simon's
 concerns here. Also, as you'd expect, the documentation has been fixed
 up to reflect the new syntax. I'll need to take a pass at updating the
 UPSERT Wiki page soon, too.

I should mention that I've updated the Wiki to reflect the current,
post-v1.4 state of affairs. That remains the best place to get a
relatively quick overview of where things stand with open items,
discussion of the patch, etc:

https://wiki.postgresql.org/wiki/UPSERT

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

On 17.11.2014 08:31, Jeff Davis wrote:
 On Sat, 2014-11-15 at 21:36 +, Simon Riggs wrote:
 Do I understand correctly that we are trying to account for exact
 memory usage at palloc/pfree time? Why??
 
 Not palloc chunks, only tracking at the level of allocated blocks
 (that we allocate with malloc).
 
 It was a surprise to me that accounting at that level would have any 
 measurable impact, but Robert found a reasonable case on a POWER
 machine that degraded a couple percent. I wasn't able to reproduce
 it consistently on x86.
 
 Or alternatively, can't we just sample the allocations to reduce
 the overhead?
 
 Not sure quite what you mean by sample, but it sounds like
 something along those lines would work.

Maybe. It might also cause unexpected volatility / nondeterminism in the
query execution.

Imagine a Hash Aggregate, where a small number of the requests is
considerably larger than the rest. If you happen to sample only a few of
those large requests, the whole hash aggregate happens in-memory,
otherwise it starts batching. The users will observe this as random
variations to the runtimes - same query / parameters, idle machine,
sudden changes to the plan ...

Maybe I'm too wary, but I guess there are use cases where the latency
uniformity is a concern.

 Attached is a patch that does something very simple: only tracks 
 blocks held in the current context, with no inheritance or anything 
 like it. This reduces it to a couple arithmetic instructions added
 to the alloc/dealloc path, so hopefully that removes the general 
 performance concern raised by Robert[1].
 
 To calculate the total memory used, I included a function
 MemoryContextMemAllocated() that walks the memory context and its
 children recursively.
 
 Of course, I was originally trying to avoid that, because it moves 
 the problem to HashAgg. For each group, it will need to execute 
 MemoryContextMemAllocated() to see if work_mem has been exceeded. It
  will have to visit a couple contexts, or perhaps many (in the case 
 of array_agg, which creates one per group), which could be a 
 measurable regression for HashAgg.

:-(

 But if that does turn out to be a problem, I think it's solvable. 
 First, I could micro-optimize the function by making it iterative 
 rather than recursive, to save on function call overhead. Second, I 
 could offer a way to prevent the HTAB from creating its own context, 
 which would be one less context to visit. And if those don't work, 
 perhaps I could resort to a sampling method of some kind, as you 
 allude to above.

Do you plan to try this approach with your hashagg patch, and do some
measurements?

I'd expect the performance hit to be significant, but I'd like to see
some numbers.

kind regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] ltree::text not immutable?

2014-11-17 Thread Tom Lane

Christoph Berg c...@df7cb.de writes:
 Re: Tom Lane 2014-11-17 6903.1416241...@sss.pgh.pa.us
 The point is to find out how many people care ...

 Is the point of this to figure out whether to fix this properly, or to
 revert to the old code? The current status isn't something that should
 be released with 9.5.

The warning will not be reverted, if that's what you mean.  The problem
is that chkpass_in is broken by design.  We need to find some actual
users of the type and see what they think is the least painful solution.
(Or, if we find out there aren't any, maybe we'll just flush the whole
module ...)

I will not promise that it will change by 9.5.  It may take some time
to collect information, and 9.4 will not have been in wide use for all
that long before 9.5 freezes.

 Well atm this breaks the building of 9.5 packages. This means we are
 not going to find out if anyone cares by this way :)

You will need to adjust your test.  This is by no means the first time
that we've intentionally shipped contrib modules that generate warnings;
there was that messy business with the = operator ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PgBench's \setrandom could be better

On Sat, Nov 15, 2014 at 5:51 PM, David Rowley dgrowle...@gmail.com wrote:
 With pgbench I can do:

 \setrandom qty 24 25
 then throw a :qty into the query and have pgbench replace that with either
 24 or 25.

 The problem is that this does not work for anything apart from integer
 types.

I've got a pending patch here - which I don't think anyone has
reviewed - that adds an expression syntax to pgbench:

https://commitfest.postgresql.org/action/patch_view?id=1581

Once that patch goes in, I think we should generalize it so that the
\set syntax can use not only operators but also functions, like abs().
We could also generalize it to work on data types other than integers,
such as strings or (like you're suggesting here) dates, so that you
could have a random_date() function.

In general I'm pretty skeptical about continuing to expend pgbench by
adding more backslash commands and complicating the syntax of the ones
we've already got.  We can certainly do it that way, but it's kind of
a mess.  I'd rather have a more general framework to build on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-11-17 Thread Fujii Masao

On Thu, Nov 13, 2014 at 12:38 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Thanks for reviewing the patch!

 On Thu, Nov 13, 2014 at 4:05 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Fujii Masao wrote:

 --- 127,152 
When this option is used, applicationpg_receivexlog/ will 
 report
a flush position to the server, indicating when each segment has 
 been
synchronized to disk so that the server can remove that segment 
 if it
 !  is not otherwise needed. literal--synchronous/literal option 
 must
 ! be specified when making applicationpg_receivexlog/ run as
 ! synchronous standby by using replication slot. Otherwise WAL data
 ! cannot be flushed frequently enough for this to work correctly.
   /para
 /listitem
/varlistentry

 Whitespace damage here.

 Fixed.

 + printf(_(  --synchronous  flush transaction log in real 
 time\n));

 in real time sounds odd.  How about flush transaction log
 immediately after writing, or maybe have transaction log writes be
 synchronous.

 The former sounds better to me. So I chose it.

 --- 781,791 
   now = feGetCurrentTimestamp();

   /*
 !  * Issue sync command as soon as there are WAL data which
 !  * has not been flushed yet if synchronous option is true.
*/
   if (lastFlushPosition  blockpos 
 ! walfile != -1  synchronous)

 I'd put the synchronous condition first in the if(), and start the
 comment with it rather than putting it at the end.  Both seem weird.

 Fixed, i.e., moved the synchronous condition first in the if()'s test
 and also moved the comment If synchronous option is true also first
 in the comment.

   progname, 
 current_walfile_name, strerror(errno));
   goto error;
   }
   lastFlushPosition = blockpos;
 !
 ! /*
 !  * Send feedback so that the server sees the latest 
 WAL locations
 !  * immediately if synchronous option is true.
 !  */
 ! if (!sendFeedback(conn, blockpos, now, false))
 ! goto error;
 ! last_status = now;

 I'm not clear about this comment .. why does it say if synchronous
 option is true when it's not checking the condition?

 I added that comment because the code exists with the if() block
 checking synchronous condition. But it seems confusing. Just removed
 that part from the comment.

 Attached is the updated version of the patch.

I've just pushed this.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-17 Thread Fujii Masao

On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I'll send an updated patch tomorrow.

 Here are updated versions. I divided the patch into two for clarity,
 the first one is the actual refactoring patch, renaming
 SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
 like updating synchronous to sync in the comments as you mentioned)
 such as the namings have no conflicts.

 The second one updates the syncrep code, including WAL sender and WAL
 receiver, and its surroundings to use the term node instead of
 standby. This brings in the code the idea that a node using
 replication APIs is not necessarily a standby, making it more generic.
 This can be applied on top of the refactoring patch. If any other
 folks (Heikki or Fujii-san?) have comments about this idea feel free.

I've not read the patches yet. But while I was reading the code around
sync node detection, I thought that it might be good idea to treat the
node with priority as special. That is, if we find the active node with
the priority 1, we can immediately go out of the sync-node-detection-loop.

In many cases we can expect that the sync standby has the priority 1.
If yes, treating the priority 1 as special is good way to do, I think. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

On Sat, Nov 15, 2014 at 7:36 PM, Peter Geoghegan p...@heroku.com wrote:
 Attached patch incorporates this feedback.

 The only user-visible difference between this revision and the
 previous revision is that it's quite possible for two suggestion to
 originate from the same RTE (there is exactly one change in the
 regression test's expected output as compared to the last revision for
 this reason. The regression tests are otherwise unchanged). It's still
 not possible to see more than 2 suggestions under any circumstances,
 no matter where they might have originated from, which I think is
 appropriate -- we continue to not present any HINT in the event of 3
 or more equidistant matches.

 I think that the restructuring required to pass around a state
 variable has resulted in somewhat clearer code.

Cool!

I'm grumpy about the distanceName() function.  That seems too generic.
If we're going to keep this as it is, I suggest something like
computeRTEColumnDistance().  But see below.

On a related note, I'm also grumpy about this comment:

+/* Charge half as much per deletion as per insertion or per substitution */
+return varstr_levenshtein_less_equal(actual, len, match, match_len,
+ 2, 1, 2, max);

The purpose of a code comment is to articulate WHY we did something,
rather than simply to restate what the code quite obviously does.  I
haven't heard a compelling argument for why this should be 2, 1, 2
rather than the default 1, 1, 1; and I'm inclined to do the latter
unless you can make some very good argument for this combination of
weights.  And if you can make such an argument, then there should be
comments so that the next person to come along and look at this code
doesn't go, huh, that's whacky, and change it.

+ int location, FuzzyAttrMatchState *rtestate)

I suggest calling this fuzzystate rather than rtestate; it's not
the state of the RTE, but the state of the fuzzy matching.

Within the scanRTEForColumn block, we have a rather large chunk of
code protected by if (rtestate), which contains the only call to
distanceName().  I suggest that we move all of this logic to a
separate, static function, and merge distanceName into it.  I also
suggest testing against NULL explicitly instead of implicitly.  So
this block of code would end up as something like:

if (fuzzystate != NULL)
updateFuzzyAttrMatchState(rte, attcolname, colname, fuzzystate);

In searchRangeTableForCol, I'm fairly certain that you've changed the
behavior by adding a check for if (rte-rtekind == RTE_JOIN) before
the call to scanRTEForColumn().  Why not instead put this check into
updateFuzzyAttrMatchState?  Then you can be sure you're not changing
the behavior in any other case.

On a similar note, I think the dropped-column test should happen early
as well, probably again in updateFuzzyAttrMatchState().  There's
little point in adding a suggestion only to throw it away again.

+/*
+ * Charge extra (for inexact matches only) when an alias was
+ * specified that differs from what might have been used to
+ * correctly qualify this RTE's closest column
+ */
+if (wrongalias)
+rtestate.distance += 3;

I don't understand what situation this is catering to.  Can you
explain?  It seems to account for a good deal of complexity.

 ERROR:  column oid does not exist
 LINE 1: select oid  0, * from altwithoid;
^
+HINT:  Perhaps you meant to reference the column altwithoid.col.

That seems like a stretch.  I think I suggested before using a
distance threshold of at most 3 or half the word length, whichever is
less.  For a three-letter column name that means not suggesting
anything if more than one character is different.  What you
implemented here is close to that, yet somehow we've got a suggestion
slipping through that has 2 out of 3 characters different.  I'm not
quite sure I see how that's getting through, but I think it shouldn't.

 ERROR:  column fullname.text does not exist
 LINE 1: select fullname.text from fullname;
^
+HINT:  Perhaps you meant to reference the column fullname.last.

Same problem, only worse!  They've only got one letter of four in common.

 ERROR:  column xmin does not exist
 LINE 1: select xmin, * from fooview;
^
+HINT:  Perhaps you meant to reference the column fooview.x.

Basically the same problem again.  I think the distance threshold in
this case should be half the shorter column name, i.e. 0.

Your new test cases include no negative test cases; that is, cases
where the machinery declines to suggest a hint because of, say, 3
equally good possibilities.  They probably should have something like
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] New Event Trigger: table_rewrite

On Sun, Nov 16, 2014 at 5:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 16 November 2014 06:59, Michael Paquier michael.paqu...@gmail.com wrote:
 1) This patch is authorizing VACUUM and CLUSTER to use the event
 triggers ddl_command_start and ddl_command_end, but aren't those
 commands actually not DDLs but control commands?

 I could go either way on that. I'm happy to remove those from this commit.

Yeah, this patch definitely shouldn't change the set of commands to
which existing event triggers apply as a side-effect.  There's no
reason new DDL commands need to apply to the same set of operations as
existing DDL commands, but the existing ones shouldn't be changed
without specific discussion and agreement.

It seems pretty weird, also, that the event trigger will fire after
we've taken AccessExclusiveLock when you cluster a particular
relation, and before we've taken AccessExclusiveLock when you cluster
database-wide.  That's more or less an implementation artifact of the
current code that we're exposing to the use for, really, no good
reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

On 17.11.2014 18:04, Andres Freund wrote:
 Hi,
 
 On 2014-11-16 23:31:51 -0800, Jeff Davis wrote:
 *** a/src/include/nodes/memnodes.h
 --- b/src/include/nodes/memnodes.h
 ***
 *** 60,65  typedef struct MemoryContextData
 --- 60,66 
  MemoryContext nextchild;/* next child of same parent */
  char   *name;   /* context name (just for 
 debugging) */
  boolisReset;/* T = no space alloced since 
 last reset */
 +uint64  mem_allocated;  /* track memory allocated for this 
 context */
   #ifdef USE_ASSERT_CHECKING
  boolallowInCritSection; /* allow palloc in critical 
 section */
   #endif
 
 That's quite possibly one culprit for the slowdown. Right now one 
 AllocSetContext struct fits precisely into three cachelines. After
 your change it doesn't anymore.

I'm no PPC64 expert, but I thought the cache lines are 128 B on that
platform, since at least Power6?

Also, if I'm counting right, the MemoryContextData structure is 56B
without the 'mem_allocated' field (and without the allowInCritSection),
and 64B with it (at that particular place). sizeof() seems to confirm
that. (But I'm on x86, so maybe the alignment on PPC64 is different?).

 Consider not counting memory in bytes but blocks and adding it
 directly after the NodeTag. That'd leave the size the same as right
 now.

I suppose you mean kbytes, because the block size is not fixed.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

On 2014-11-17 19:42:25 +0100, Tomas Vondra wrote:
 On 17.11.2014 18:04, Andres Freund wrote:
  Hi,
  
  On 2014-11-16 23:31:51 -0800, Jeff Davis wrote:
  *** a/src/include/nodes/memnodes.h
  --- b/src/include/nodes/memnodes.h
  ***
  *** 60,65  typedef struct MemoryContextData
  --- 60,66 
 MemoryContext nextchild;/* next child of same parent */
 char   *name;   /* context name (just for 
  debugging) */
 boolisReset;/* T = no space alloced since 
  last reset */
  +  uint64  mem_allocated;  /* track memory allocated for this 
  context */
#ifdef USE_ASSERT_CHECKING
 boolallowInCritSection; /* allow palloc in critical 
  section */
#endif
  
  That's quite possibly one culprit for the slowdown. Right now one 
  AllocSetContext struct fits precisely into three cachelines. After
  your change it doesn't anymore.
 
 I'm no PPC64 expert, but I thought the cache lines are 128 B on that
 platform, since at least Power6?

Hm, might be true.

 Also, if I'm counting right, the MemoryContextData structure is 56B
 without the 'mem_allocated' field (and without the allowInCritSection),
 and 64B with it (at that particular place). sizeof() seems to confirm
 that. (But I'm on x86, so maybe the alignment on PPC64 is different?).

The MemoryContextData struct is embedded into AllocSetContext.

  Consider not counting memory in bytes but blocks and adding it
  directly after the NodeTag. That'd leave the size the same as right
  now.
 
 I suppose you mean kbytes, because the block size is not fixed.

Some coarser size than bytes. Don't really care about the granularity.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-17 Thread David Rowley

On 18 November 2014 05:19, Simon Riggs si...@2ndquadrant.com wrote:

 On 14 November 2014 11:02, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

  I'd like to throw community folks a question.
  Did someone have a discussion to the challenge of aggregate push-down
 across
  relations join in the past? It potentially reduces number of rows to be
 joined.
  If we already had, I'd like to check up the discussion at that time.

 Yes, I was looking at aggregate pushdown. I think it needs the same
 changes to aggregates discussed upthread.


I have something that I've been working on locally. It's far from ready,
but it does work in very simple cases, and shows a nice performance boost.
I'll start another thread soon and copy you both in. Perhaps we can share
some ideas.

Regards

David Rowley

Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 One thing that occurs to me is that if the generic plan estimate comes
 out much cheaper than the custom one, maybe we should assume that the
 generic's cost estimate is bogus.  Right offhand I can't think of a reason
 for a custom plan to look worse than a generic one, unless there's a
 statistical quirk like this one.

That's an interesting idea, but what do we do after deciding that it's
bogus?  The generic plan really can't be cheaper than the custom plan,
but it could be the same price, or as close as makes no difference.

I have long thought that maybe the custom vs. generic plan decision
should have something to do with whether the parameters are MCVs of
the table columns they're being compared against.  This case is
interesting because it demonstrates that querying for a *non*-MCV can
require a switch to a custom plan, which is something I don't think
I've seen before.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] group locking: incomplete patch, just for discussion

On Thu, Nov 13, 2014 at 4:57 PM, Jeff Davis pg...@j-davis.com wrote:
On Thu, Nov 13, 2014 at 11:26 AM, Robert Haas robertmh...@gmail.com wrote:
On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis pg...@j-davis.com wrote:
If two backends both have an exclusive lock on the relation for a join
operation, that implies that they need to do their own synchronization,
because obviously the lock manager is not doing it for them.

This doesn't make sense to me. Why would they need to synchronize
access to a relation in order to join it?

Unfortunate typo: that was supposed to be joint operation, just meaning
that they are working together for something (e.g. CLUSTER, VACUUM FULL as
you suggest). Sorry for the confusion.

Ah, well, that's different.

I meant that the backends need to divide up the work somehow. And if each
operator needs to divide up the work before operating, that means we need to
change every operator.

I think I see your point, which it just so happens Amit articulated to
me in different words while we were chatting about this problem this
morning. We want to avoid waiving the mutual exclusion provided by
the lock manager only to end up re-implementing something very much
like the current lock manager to arbitrate resource contention among
backends within a parallel group. However, we also don't want the
mutual exclusion that the lock manager provides to serve as a barrier
to implementing useful parallelism; that is, if the parallel backends
want to manage conflicts themselves instead of letting the lock
manager do it, that should be possible.

In
http://www.postgresql.org/message-id/ca+tgmoygplojo+lg1bebos8gdjwjtq0qdmxsyj4ihfyj11t...@mail.gmail.com
I propose a new approach. The basic idea is that any heavyweight lock
that we hold at the start of a parallel phase can't represent a bona
fide conflict between the activities of two different backends, so we
arrange to waive conflicts over those locks. The problem (as Andres
points out) is that those locks could be subsequently re-taken by
operations that are not parallel-safe, and the new locks would be
granted locally regardless of the shared lock table because the
current backend would already hold them. However, I think it's fine
to make it the job of the parallelism infrastructure to plug that
hole, rather than trying to enforce it in the lock manager. There are
LOTS of things that need to be prohibited in parallel mode and only
allowed once they've been made sufficiently safe, and in the first
version, that's certainly going to include - among other things - any
operation that writes data. I think it's OK to the job of the people
who want to relax those prohibitions to deal with making it safe to do
so.

To reiterate the basic problem here, if we do nothing at all about the
lock manager, a parallel backend can stall trying to grab an
AccessExclusiveLock that the user backend alread holds, either because
the user backend holds an AccessExclusiveLock as well, or because some
other process is waiting for one, we'll deadlock and not notice. We
have to do *something* about that. I'd like to arrive at some
consensus on how to move forward here as soon as possible, because the
clock is ticking here.
http://www.postgresql.org/message-id/ca+tgmoygplojo+lg1bebos8gdjwjtq0qdmxsyj4ihfyj11t...@mail.gmail.com
articulates what I currently believe to be the best plan.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

On 17.11.2014 19:46, Andres Freund wrote:
 On 2014-11-17 19:42:25 +0100, Tomas Vondra wrote:
 On 17.11.2014 18:04, Andres Freund wrote:
 Hi,

 On 2014-11-16 23:31:51 -0800, Jeff Davis wrote:
 *** a/src/include/nodes/memnodes.h
 --- b/src/include/nodes/memnodes.h
 ***
 *** 60,65  typedef struct MemoryContextData
 --- 60,66 
MemoryContext nextchild;/* next child of same parent */
char   *name;   /* context name (just for 
 debugging) */
boolisReset;/* T = no space alloced since 
 last reset */
 +  uint64  mem_allocated;  /* track memory allocated for this 
 context */
   #ifdef USE_ASSERT_CHECKING
boolallowInCritSection; /* allow palloc in critical 
 section */
   #endif

 That's quite possibly one culprit for the slowdown. Right now one 
 AllocSetContext struct fits precisely into three cachelines. After
 your change it doesn't anymore.

 I'm no PPC64 expert, but I thought the cache lines are 128 B on that
 platform, since at least Power6?
 
 Hm, might be true.
 
 Also, if I'm counting right, the MemoryContextData structure is 56B
 without the 'mem_allocated' field (and without the allowInCritSection),
 and 64B with it (at that particular place). sizeof() seems to confirm
 that. (But I'm on x86, so maybe the alignment on PPC64 is different?).
 
 The MemoryContextData struct is embedded into AllocSetContext.

Oh, right. That makes is slightly more complicated, though, because
AllocSetContext adds 6 x 8B fields plus an in-line array of freelist
pointers. Which is 11x8 bytes. So in total 56+56+88=200B, without the
additional field. There might be some difference because of alignment,
but I still don't see how that one additional field might impact cachelines?

But if we separated the freelist, that might actually make it faster, at
least for calls that don't touch the freelist at all, no? Because most
of the palloc() calls will be handled from the current block.

I tried this on the patch I sent on 23/8 (with the MemoryAccounting
hierarchy parallel to the contexts), and I got this (running 10x the
reindex, without restarts or such):

without the patch
-
  ... 1723933 KB used: CPU 1.35s/4.33u sec elapsed 10.38 sec
  ... 1723933 KB used: CPU 1.39s/4.19u sec elapsed 9.89 sec
  ... 1723933 KB used: CPU 1.39s/4.26u sec elapsed 9.90 sec
  ... 1723933 KB used: CPU 1.45s/4.22u sec elapsed 10.15 sec
  ... 1723933 KB used: CPU 1.36s/4.20u sec elapsed 9.84 sec
  ... 1723933 KB used: CPU 1.32s/4.20u sec elapsed 9.96 sec
  ... 1723933 KB used: CPU 1.50s/4.23u sec elapsed 9.91 sec
  ... 1723933 KB used: CPU 1.47s/4.24u sec elapsed 9.90 sec
  ... 1723933 KB used: CPU 1.38s/4.23u sec elapsed 10.09 sec
  ... 1723933 KB used: CPU 1.37s/4.19u sec elapsed 9.84 sec

with the patch (no tracking enabled)

  ... 1723933 KB used: CPU 1.40s/4.28u sec elapsed 10.79 sec
  ... 1723933 KB used: CPU 1.39s/4.32u sec elapsed 10.14 sec
  ... 1723933 KB used: CPU 1.40s/4.16u sec elapsed 9.99 sec
  ... 1723933 KB used: CPU 1.32s/4.21u sec elapsed 10.16 sec
  ... 1723933 KB used: CPU 1.37s/4.34u sec elapsed 10.03 sec
  ... 1723933 KB used: CPU 1.35s/4.22u sec elapsed 9.97 sec
  ... 1723933 KB used: CPU 1.43s/4.20u sec elapsed 9.99 sec
  ... 1723933 KB used: CPU 1.39s/4.17u sec elapsed 9.71 sec
  ... 1723933 KB used: CPU 1.44s/4.16u sec elapsed 9.91 sec
  ... 1723933 KB used: CPU 1.40s/4.25u sec elapsed 9.99 sec

So it's about 9,986 vs 10,068 for averages, and 9,905 vs 9,99 for a
median. I.e. ~0.8% difference. That's on x86, though. I wonder what'd be
the effect on the PPC machine.

Patch attached - it's not perfect, though. For example it should free
the freelists at some point, but I feel a bit lazy today.

Tomas
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 743455e..d2ecb5b 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -175,7 +175,7 @@ typedef struct AllocSetContext
 	MemoryContextData header;	/* Standard memory-context fields */
 	/* Info about storage allocated in this context: */
 	AllocBlock	blocks;			/* head of list of blocks in this set */
-	AllocChunk	freelist[ALLOCSET_NUM_FREELISTS];		/* free chunk lists */
+	AllocChunk *freelist;		/* free chunk lists */
 	/* Allocation parameters for this context: */
 	Size		initBlockSize;	/* initial block size */
 	Size		maxBlockSize;	/* maximum block size */
@@ -242,6 +242,8 @@ typedef struct AllocChunkData
 #define AllocChunkGetPointer(chk)	\
 	((AllocPointer)(((char *)(chk)) + ALLOC_CHUNKHDRSZ))
 
+static void update_allocation(MemoryContext context, int64 size);
+
 /*
  * These functions implement the MemoryContext API for AllocSet contexts.
  */
@@ -250,7 +252,7 @@ static void AllocSetFree(MemoryContext context, void *pointer);
 static void *AllocSetRealloc(MemoryContext context, void *pointer, Size size);
 static void AllocSetInit(MemoryContext

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-11-17 Thread Peter Geoghegan

On Mon, Nov 17, 2014 at 10:15 AM, Robert Haas robertmh...@gmail.com wrote:
 I'm grumpy about the distanceName() function.  That seems too generic.
 If we're going to keep this as it is, I suggest something like
 computeRTEColumnDistance().  But see below.

Fair point.

 On a related note, I'm also grumpy about this comment:

 +/* Charge half as much per deletion as per insertion or per substitution 
 */
 +return varstr_levenshtein_less_equal(actual, len, match, match_len,
 + 2, 1, 2, max);

 The purpose of a code comment is to articulate WHY we did something,
 rather than simply to restate what the code quite obviously does.  I
 haven't heard a compelling argument for why this should be 2, 1, 2
 rather than the default 1, 1, 1; and I'm inclined to do the latter
 unless you can make some very good argument for this combination of
 weights.  And if you can make such an argument, then there should be
 comments so that the next person to come along and look at this code
 doesn't go, huh, that's whacky, and change it.

Okay. I agree that that deserves a comment. The actual argument for
this formulation is that it just seems to work better that way. For
example:


postgres=# \d orderlines
 Table public.orderlines
   Column|   Type   | Modifiers
-+--+---
 orderlineid | integer  | not null
 orderid | integer  | not null
 prod_id | integer  | not null
 quantity| smallint | not null
 orderdate   | date | not null

postgres=# select qty from orderlines ;
ERROR:  42703: column qty does not exist
LINE 1: select qty from orderlines ;
   ^
HINT:  Perhaps you meant to reference the column orderlines.quantity.


The point is that the fact that the user supplied qty string has so
many fewer characters than what was obviously intended - quantity -
deserves to be weighed less. If you change the costing to weigh
character deletion as being equal to substitution/addition, this
example breaks. I also think it's pretty common to have noise words in
every attribute (e.g. every column in the orderlines table matches
orderlines_*), which might otherwise mess things up by overcharging
for deletion. Having extra characters in the correctly spelled column
name seems legitimately less significant to me.

Or, in other words: having actual characters from the misspelling
match the correct spelling (and having actual characters given not
fail to match) is most important. What was given by the user is more
important than what was not given but should have been, which is not
generally true for uses of Levenshtein distance. I reached this
conclusion through trying out the patch with a couple of real schemas,
and seeing what works best.

It's hard to express that idea tersely, in a comment, but I guess I'll try.

 + int location, FuzzyAttrMatchState *rtestate)

 I suggest calling this fuzzystate rather than rtestate; it's not
 the state of the RTE, but the state of the fuzzy matching.

The idea here was to differentiate this state from the overall range
table state (in general, FuzzyAttrMatchState may be one or the other).
But okay.

 Within the scanRTEForColumn block, we have a rather large chunk of
 code protected by if (rtestate), which contains the only call to
 distanceName().  I suggest that we move all of this logic to a
 separate, static function, and merge distanceName into it.  I also
 suggest testing against NULL explicitly instead of implicitly.  So
 this block of code would end up as something like:

 if (fuzzystate != NULL)
 updateFuzzyAttrMatchState(rte, attcolname, colname, fuzzystate);

Okay.

 In searchRangeTableForCol, I'm fairly certain that you've changed the
 behavior by adding a check for if (rte-rtekind == RTE_JOIN) before
 the call to scanRTEForColumn().  Why not instead put this check into
 updateFuzzyAttrMatchState?  Then you can be sure you're not changing
 the behavior in any other case.

I thought that I had avoided changing things (beyond what was
advertised as changed in relation to this most recent revision)
because I also changed things WRT multiple matches per RTE. It's
fuzzy. Anyway, yeah, I could do it there instead.

 On a similar note, I think the dropped-column test should happen early
 as well, probably again in updateFuzzyAttrMatchState().  There's
 little point in adding a suggestion only to throw it away again.

Agreed.

 +/*
 + * Charge extra (for inexact matches only) when an alias was
 + * specified that differs from what might have been used to
 + * correctly qualify this RTE's closest column
 + */
 +if (wrongalias)
 +rtestate.distance += 3;

 I don't understand what situation this is catering to.  Can you
 explain?  It seems to account for a good deal of complexity.

Two cases:

1. Distinguishing between the case where there was an exact match to a
column that isn't visible

Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver

On 21 August 2014 09:17, Julien Rouhaud julien.rouh...@dalibo.com wrote:

 Track number of WAL files ready to be archived in pg_stat_archiver

Would it be OK to ask what the purpose of this TODO item is?

pg_stat_archiver already has a column for last_archived_wal and
last_failed_wal, so you can already work out how many files there must
be between then and now. Perhaps that can be added directly to the
view, to assist the user in calculating it. Reading the directory
itself to count the file is unnecessary, except as a diagnostic.

Please don't take it is a TODO item as generally accepeted that
this makes sense.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

What happened to this patch?  I'm going over something that could use
the concept of clean some stuff up when reading this page, but only if
we're already writing or similar.

I see some cases were presented that had a performance decrease.  Did we
get any numbers for the increase in performance in some other
interesting cases?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-17 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 One thing that occurs to me is that if the generic plan estimate comes
 out much cheaper than the custom one, maybe we should assume that the
 generic's cost estimate is bogus.  Right offhand I can't think of a reason
 for a custom plan to look worse than a generic one, unless there's a
 statistical quirk like this one.

 That's an interesting idea, but what do we do after deciding that it's
 bogus?

Keep using custom plans.  It's possible that the estimate that's in error
is the custom one, but that's not the way to bet IMO, since the custom
plan estimate is based on better information.

 The generic plan really can't be cheaper than the custom plan,
 but it could be the same price, or as close as makes no difference.

Right, and what we want to do is use the generic plan as long as it's
close to the same cost (close enough to not justify replanning effort).
The trick here is to not be fooled by estimation errors.  Can we assume
that generic cost  custom cost is always an estimation error?

Another idea that occurred to me is to run a planning cycle in which the
actual parameter values are made available to the planner, but as
estimates not hard constants (this facility already exists, it's just not
being used by plancache.c).  This would yield cost estimates that are more
safely comparable to the custom plan.  But I'm not sure that we'd want to
expend yet another planning cycle to do this, nor am I sure that we'd want
to use such a plan as The Generic Plan.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-17 Thread Sam Saffron

One interesting option would be kicking in an extra more expensive
planning cycle after the Nth run of the query, in general a lot of
these planned queries run 1000s of times, if you add some extra cost
to run 100 it may not be prohibitive cost wise.

On Tue, Nov 18, 2014 at 8:27 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 One thing that occurs to me is that if the generic plan estimate comes
 out much cheaper than the custom one, maybe we should assume that the
 generic's cost estimate is bogus.  Right offhand I can't think of a reason
 for a custom plan to look worse than a generic one, unless there's a
 statistical quirk like this one.

 That's an interesting idea, but what do we do after deciding that it's
 bogus?

 Keep using custom plans.  It's possible that the estimate that's in error
 is the custom one, but that's not the way to bet IMO, since the custom
 plan estimate is based on better information.

 The generic plan really can't be cheaper than the custom plan,
 but it could be the same price, or as close as makes no difference.

 Right, and what we want to do is use the generic plan as long as it's
 close to the same cost (close enough to not justify replanning effort).
 The trick here is to not be fooled by estimation errors.  Can we assume
 that generic cost  custom cost is always an estimation error?

 Another idea that occurred to me is to run a planning cycle in which the
 actual parameter values are made available to the planner, but as
 estimates not hard constants (this facility already exists, it's just not
 being used by plancache.c).  This would yield cost estimates that are more
 safely comparable to the custom plan.  But I'm not sure that we'd want to
 expend yet another planning cycle to do this, nor am I sure that we'd want
 to use such a plan as The Generic Plan.

 regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On 17 November 2014 21:09, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 What happened to this patch?  I'm going over something that could use
 the concept of clean some stuff up when reading this page, but only if
 we're already writing or similar.

 I see some cases were presented that had a performance decrease.  Did we
 get any numbers for the increase in performance in some other
 interesting cases?

It's not dead; it just needs more work. Maybe for next CF, or you can now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better support of exported snapshots with pg_dump

On 28 October 2014 11:34, Michael Paquier michael.paqu...@gmail.com wrote:
 On Tue, Oct 28, 2014 at 8:08 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 Hi,

 On 28/10/14 03:15, Michael Paquier wrote:


 Updated patch with those comments addressed is attached.


 Great, I have no further comments so I consider this patch ready for
 committer (and will mark it so momentarily).
 OK thanks!

 Just as a note - there is the issue you raised yourself about the less than
 perfect error message, but I really don't think it's worth the trouble to
 have special handling for it as the message coming from the server is clear
 enough IMHO.
 Yeah thinking the same. Let's see what others think (btw, if this code
 gets committed, be sure to mark Simon as a co-author).

Committed.

Thanks very much for pushing forwards with this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement

 Great, looks good to me, marking as ready for committer.

What is wrong with using IF ?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fwd: ability to return number of rows inserted into child partition tables request

2014-11-17 Thread Roger Pack

Hello.

I was trying to get postgres to return the correct number of rows
inserted for batch inserts to a partitioned table [using the triggers as
suggested here
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html results in
it always returning 0 by default].

What I ideally wanted it to do is to be able to insert into just the child
partitions, and return number of rows updated.

It seems the state of the art is either to return the NEW row from the
insert trigger [which causes it to also be saved to the parent master
table], then define an extra trigger to remove the parent table.  So 2
inserts and 1 delete for an insert. [1]

Or you can use an unconditional rule and it will return the number of rows
updated [however, in this case, since we're using partitioning, we I think
need multiple rules, once for each child table].

It is possible for a view to use a trigger and still return the number of
rows updated, which provides another work around. (See bottom of [1]).

Is there some more elegant way here?  It seems odd that partitioned tables
basically cannot, without a *lot* of massaging, return number of rows
updated, am I missing something or do I understand ok? [Today this requires
people to put in lots of work arounds, like *not* checking for number of
rows returned for batch inserts, etc.-- potentially dangerous as well]

Is there, for instance, some work around, like a way to manually cause the
count of the number of rows affected by the command to be incremented
here?  Or possibly conditional rules could be made possible to return the
output string with number of rows affected (feature request)?

I guess this has come up before, FWIW.
http://grokbase.com/t/postgresql/pgsql-general/0863bjzths/insert-into-master-table-0-rows-affected-hibernate-problems


One way of fixing this would be to allow do instead rules on normal
tables, instead of only on views (otherwise we are forced to use a rule,
correct me if I'm wrong).  I'd wager there would be other viable options as
well.

Thanks!
-roger-


[1]
http://stackoverflow.com/questions/83093/hibernate-insert-batch-with-partitioned-postgresql

[HACKERS] vacuumdb: Help text for --analyze-only.

2014-11-17 Thread Mats Erik Andersson

Hello there,

I observe that the help text of vacuumdb for --analyze,
--analyze-only, and --analyze-in-stages could do with
a little clarification in order to be self-documenting
and thus improve the user experience of vacuumdb.

The problem is that the sole addition of the word only to
an otherwise identical text for --analyze and --analyze-only
seems rather obscure. My suggestion follows.

Best regards,
  Mats Erik Andersson
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 86e6ab3..a07f081 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -435,9 +435,9 @@ help(const char *progname)
 	printf(_(  -v, --verbose   write a lot of output\n));
 	printf(_(  -V, --version   output version information, then exit\n));
 	printf(_(  -z, --analyze   update optimizer statistics\n));
-	printf(_(  -Z, --analyze-only  only update optimizer statistics\n));
-	printf(_(  --analyze-in-stages only update optimizer statistics, in multiple\n
-		 stages for faster results\n));
+	printf(_(  -Z, --analyze-only  only update optimizer statistics; no vacuum\n));
+	printf(_(  --analyze-in-stages only update statistics, but in multiple\n
+		 stages for faster results; no vacuum\n));
 	printf(_(  -?, --help  show this help, then exit\n));
 	printf(_(\nConnection options:\n));
 	printf(_(  -h, --host=HOSTNAME   database server host or socket directory\n));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

2014-11-17 Thread Steve Singer


On 11/17/2014 11:34 AM, Andres Freund wrote:

Hi,

Kevin: CCed you, because it doesn't really look like a logical decoding
related issue.

On 2014-11-17 11:25:40 -0500, Steve Singer wrote:

On 11/17/2014 10:37 AM, Andres Freund wrote:

On 2014-11-13 22:23:02 -0500, Steve Singer wrote:


Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure in
a normal client connection, not the walsender

#3  0x006b4978 in GetSerializableTransactionSnapshotInt (
 snapshot=snapshot@entry=0xbfa8a0 CurrentSnapshotData,
 sourcexid=sourcexid@entry=0) at predicate.c:1738
#4  0x006b66c3 in GetSafeSnapshot (origSnapshot=optimized out)
 at predicate.c:1517
#5  GetSerializableTransactionSnapshot (
 snapshot=0xbfa8a0 CurrentSnapshotData) at predicate.c:1598
#6  0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200
#7  0x006c0e35 in exec_simple_query (
 query_string=0x1fd01b8 select ev_origin, ev_seqno, ev_timestamp,
ev_snapshot, \pg_catalog\.txid_snapshot_xmin(ev_snapshot),
\pg_catalog\.txid_snapshot_xmax(ev_snapshot),
coalesce(ev_provider_xid,\...)
 at postgres.c:959
#8  PostgresMain (argc=optimized out, argv=argv@entry=0x1f5ab50,


I have no idea if this has anything to do with your recent changes or some
other change. I haven't so far been able to replicate that since the first
time I saw it.
That crash is decidedly odd. Any chance you still have the full
backtrace around?

Yes I still have the core file

Cool, could you show the full thing? Or did you just snip it because
it's just the Assert/ExceptionalCondition thing?


I just snipped the exception stuff. Here is the full thing

(gdb) backtrace
#0  0x7f6fb22e8295 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f6fb22eb438 in __GI_abort () at abort.c:90
#2  0x007a08e7 in ExceptionalCondition (
conditionName=conditionName@entry=0x918e88 
!(TransactionIdFollows(snapshot-xmin, PredXact-SxactGlobalXmin)),

errorType=errorType@entry=0x7da390 FailedAssertion,
fileName=fileName@entry=0x9182c3 predicate.c,
lineNumber=lineNumber@entry=1738) at assert.c:54
#3  0x006b4978 in GetSerializableTransactionSnapshotInt (
snapshot=snapshot@entry=0xbfa8a0 CurrentSnapshotData,
sourcexid=sourcexid@entry=0) at predicate.c:1738
#4  0x006b66c3 in GetSafeSnapshot (origSnapshot=optimized out)
at predicate.c:1517
#5  GetSerializableTransactionSnapshot (
snapshot=0xbfa8a0 CurrentSnapshotData) at predicate.c:1598
#6  0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200
#7  0x006c0e35 in exec_simple_query (
query_string=0x1fd01b8 select ev_origin, ev_seqno, 
ev_timestamp,ev_snapshot, 
\pg_catalog\.txid_snapshot_xmin(ev_snapshot), 
\pg_catalog\.txid_snapshot_xmax(ev_snapshot), 
coalesce(ev_provider_xid,\...)

at postgres.c:959
#8  PostgresMain (argc=optimized out, argv=argv@entry=0x1f5ab50,
---Type return to continue, or q return to quit---
dbname=0x1f5ab30 test2, username=optimized out) at postgres.c:4016
#9  0x0046a08e in BackendRun (port=0x1f83010) at postmaster.c:4123
#10 BackendStartup (port=0x1f83010) at postmaster.c:3797
#11 ServerLoop () at postmaster.c:1576
#12 0x00665eae in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x1f59d00) at postmaster.c:1223
#13 0x0046ab58 in main (argc=3, argv=0x1f59d00) at main.c:227




Could you print *snapshot in frame #3?


(gdb) p *snapshot
$1 = {satisfies = 0x7d0330 HeapTupleSatisfiesMVCC, xmin = 412635,
  xmax = 412637, xip = 0x1f86e40, xcnt = 1, subxcnt = 0, subxip = 
0x1fd2190,
  suboverflowed = 0 '\000', takenDuringRecovery = 0 '\000', copied = 0 
'\000',

  curcid = 0, active_count = 0, regd_count = 0}
(gdb)



This is in the SSI code... I'm not immediately seeing how this could be
related to logical decoding. It can't even be a imported snapshot,
because the exported snapshot is marked repeatable read.

Are you actually using serializable transactions? If so, how and why?

Yes, the test client that performs the 'simulated workload' does some
serializable transactions.  It connects as a normal client via JDBC and does
a prepareStatement(SET TRANSACTION ISOLATION LEVEL SERIALIZABLE) and then
does some DML. Why? because it seemed like a good thing to include in the
test suite.

Yes, it's a good reason as the above backtrace proves ;). I'm just
trying to understand uner which circumstances it happens.

The above backtrace looks like it can only be triggered if you do a
BEGIN TRANSACTION SERIALIZABLE DEFERRABLE READ ONLY; Is that something
you do?


The slony remote listener connection does this when it selects from 
sl_event.  We switched to this shortly after 9.1 came out.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better support of exported snapshots with pg_dump

On Tue, Nov 18, 2014 at 7:13 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Committed.

 Thanks very much for pushing forwards with this.
Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Refactoring code for sync node detection

On Tue, Nov 18, 2014 at 3:03 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I'll send an updated patch tomorrow.

 Here are updated versions. I divided the patch into two for clarity,
 the first one is the actual refactoring patch, renaming
 SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
 like updating synchronous to sync in the comments as you mentioned)
 such as the namings have no conflicts.

 The second one updates the syncrep code, including WAL sender and WAL
 receiver, and its surroundings to use the term node instead of
 standby. This brings in the code the idea that a node using
 replication APIs is not necessarily a standby, making it more generic.
 This can be applied on top of the refactoring patch. If any other
 folks (Heikki or Fujii-san?) have comments about this idea feel free.

 I've not read the patches yet. But while I was reading the code around
 sync node detection, I thought that it might be good idea to treat the
 node with priority as special. That is, if we find the active node with
 the priority 1, we can immediately go out of the sync-node-detection-loop.

 In many cases we can expect that the sync standby has the priority 1.
 If yes, treating the priority 1 as special is good way to do, I think. 
 Thought?

That would really save some cycles. Now we still need to fetch the
priority numbers of all nodes for pg_stat_get_wal_senders, so in this
case scanning all the entries in the WAL sender array is necessary.
This optimization is orthogonal to the refactoring patch btw, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-11-17 Thread Jim Nasby


On 11/17/14, 4:58 PM, Simon Riggs wrote:

Great, looks good to me, marking as ready for committer.


What is wrong with using IF ?


It's a hell of a lot wordier. I've previously created a more sophisticated 
assert framework to allow more control over things, but ended up also using 
it just for simple sanity checking because it was much nicer than typeing IF THEN RAISE 
ERROR END IF.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Obsolete debug #define in pg_config_manual.h

2014-11-17 Thread Peter Geoghegan

Attached patch removes obsolete debugging #define in
pg_config_manual.h (RTDEBUG).

This was last used in commit 2a8d3d, from 2005. Apparently RTDEBUG is
a contraction of R-Tree Debug.

-- 
Peter Geoghegan
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 9e25ce0..265dae0 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -321,7 +321,6 @@
  */
 /* #define HEAPDEBUGALL */
 /* #define ACLDEBUG */
-/* #define RTDEBUG */
 
 /*
  * Automatic configuration file name for ALTER SYSTEM.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] vacuumdb: Help text for --analyze-only.

2014-11-17 Thread David G Johnston

Mats Erik Andersson wrote
 Hello there,
 
 I observe that the help text of vacuumdb for --analyze,
 --analyze-only, and --analyze-in-stages could do with
 a little clarification in order to be self-documenting
 and thus improve the user experience of vacuumdb.
 
 The problem is that the sole addition of the word only to
 an otherwise identical text for --analyze and --analyze-only
 seems rather obscure. My suggestion follows.

-1 for the changes as proposed.  I'm not particularly convinced this is bad
since the help text rightly presumes the knowledge that the vacuum process
also performs analyze and all the flags do is toggle between the three
possible combinations of these two actions - and the special in-stages
version.  The word only quickly communicates that the vacuum does not
occur - only the secondary analyze.

Help is not typically meant to be self-documenting; it presumes one has read
the docs and/or man page for the relevant program and concepts and simply
needs a quick reminder as to syntax and capabilities.

All that said I would drop the word only and add without vacuuming
instead if you feel that only is too vague.  I do not like the phrase no
vacuum...

David J.





--
View this message in context: 
http://postgresql.nabble.com/vacuumdb-Help-text-for-analyze-only-tp5827310p5827317.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index scan optimization

2014-11-17 Thread Rajeev rastogi

On 17 November 2014 16:05, Simon Riggs Wrote:

 I confirm 5% improvement in both short and long index scans, on the least 
 beneficial datatype. Looks likely to be a very positive win overall.

Thanks a lot for testing and confirmation.


 The language used in the comments is not clear enough. I'll do my best to 
 improve that, then look to commit this in about 5 hours.
Thanks a lot for support. Please let me know if I also need to add something.

Thanks and Regards,
Kumar Rajeev Rastogi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-11-17 Thread Pavel Stehule

2014-11-17 23:58 GMT+01:00 Simon Riggs si...@2ndquadrant.com:

  Great, looks good to me, marking as ready for committer.

 What is wrong with using IF ?


It significantly increase code' length .. and decrease readability when you
intensive use a pattern IF THEN RAISE END IF - when you check every
parameter, when you check every result.

RAISE ... WHEN ... is shorter with full power of RAISE statement and
possibility for future enhancing.

Regards

Pavel



 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-17 Thread Kouhei Kaigai

 On 18 November 2014 05:19, Simon Riggs si...@2ndquadrant.com wrote:
 
 
   On 14 November 2014 11:02, Kouhei Kaigai kai...@ak.jp.nec.com
 wrote:
 
I'd like to throw community folks a question.
Did someone have a discussion to the challenge of aggregate
 push-down across
relations join in the past? It potentially reduces number of rows
 to be joined.
If we already had, I'd like to check up the discussion at that
 time.
 
   Yes, I was looking at aggregate pushdown. I think it needs the same
   changes to aggregates discussed upthread.
 
 
 
 
 I have something that I've been working on locally. It's far from ready,
 but it does work in very simple cases, and shows a nice performance boost.
 I'll start another thread soon and copy you both in. Perhaps we can share
 some ideas.
 
Great, it's exactly valuable functionality to be in the core.

--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver

On Tue, Nov 18, 2014 at 5:47 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 21 August 2014 09:17, Julien Rouhaud julien.rouh...@dalibo.com wrote:

 Track number of WAL files ready to be archived in pg_stat_archiver

 Would it be OK to ask what the purpose of this TODO item is?

 pg_stat_archiver already has a column for last_archived_wal and
 last_failed_wal, so you can already work out how many files there must
 be between then and now. Perhaps that can be added directly to the
 view, to assist the user in calculating it. Reading the directory
 itself to count the file is unnecessary, except as a diagnostic.
Not sure if this holds true in a node freshly started from a base
backup with a set of WAL files, or with files manually copied by an
operator.

 Please don't take it is a TODO item as generally accepeted that
 this makes sense.
On systems where the WAL archiving is slower than WAL generation at
peak time, the DBA may want to know how long is the queue of WAL files
waiting to be archived. That's IMO something we simply forgot in the
first implementation of pg_stat_archiver, and the most direct way to
know that is to count the .ready files in archive_status.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver

On Wed, Oct 22, 2014 at 12:50 AM, Brightwell, Adam
adam.brightw...@crunchydatasolutions.com wrote:
 Though, I would think that the general desire would be to keep the patch
 relevant ONLY to the necessary changes.  I would not qualify making those
 types of changes as relevant, IMHO.  I do think this is potential for
 cleanup, however, I would suspect that would be best done in a separate
 patch.  But again, I'd defer to a committer whether such changes are even
 necessary/acceptable.

I have been looking at this patch, and I think that it is a mistake to
count the .ready files present in archive_status when calling
pg_stat_get_archiver(). If there are many files waiting to be
archived, this penalizes the run time of this function, and the
application behind relying on those results, not to mention that
actually the loop used to count the .ready files is a copy of what is
in pgarch.c. Hence I think that we should simply count them in
pgarch_readyXlog, and then return a value back to
pgarch_ArchiverCopyLoop, value that could be decremented by 1 each
time a file is successfully archived to keep the stats as precise as
possible, and let the information know useful information when
archiver process is within a single loop process of
pgarch_ArchiverCopyLoop. This way, we just need to extend
PgStat_MsgArchiver with a new counter to track this number.

The attached patch, based on v2 sent previously, does so. Thoughts?
-- 
Michael
From e12a1aff3f1b423da5277cccf2a76ec09318567a Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Tue, 18 Nov 2014 16:30:23 +0900
Subject: [PATCH] Track number of files marked as ready for archiving in
 pg_stat_archiver

This number of files is directly tracked by the archiver process that then
reports the number it finds to the stat machinery. Note that when archiver
marks a file as successfully archived, it decrements by one the number of
files waiting to be archived, giving more precise information to the user.
---
 doc/src/sgml/monitoring.sgml |  5 +
 src/backend/catalog/system_views.sql |  1 +
 src/backend/postmaster/pgarch.c  | 33 +++--
 src/backend/postmaster/pgstat.c  |  6 +-
 src/backend/utils/adt/pgstatfuncs.c  | 21 +++--
 src/include/catalog/pg_proc.h|  2 +-
 src/include/pgstat.h |  5 -
 src/test/regress/expected/rules.out  |  3 ++-
 8 files changed, 52 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b29e5e6..4f4ac73 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -870,6 +870,11 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   entryTime of the last failed archival operation/entry
  /row
  row
+  entrystructfieldready_count//entry
+  entrytypebigint/type/entry
+  entryNumber of files waiting to be archived/entry
+ /row
+ row
   entrystructfieldstats_reset//entry
   entrytypetimestamp with time zone/type/entry
   entryTime at which these statistics were last reset/entry
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a819952..195769c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -729,6 +729,7 @@ CREATE VIEW pg_stat_archiver AS
 s.failed_count,
 s.last_failed_wal,
 s.last_failed_time,
+s.ready_count,
 s.stats_reset
 FROM pg_stat_get_archiver() s;
 
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 6a5c5b0..7f5b813 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -100,7 +100,7 @@ static void pgarch_waken_stop(SIGNAL_ARGS);
 static void pgarch_MainLoop(void);
 static void pgarch_ArchiverCopyLoop(void);
 static bool pgarch_archiveXlog(char *xlog);
-static bool pgarch_readyXlog(char *xlog);
+static int64 pgarch_readyXlog(char *xlog);
 static void pgarch_archiveDone(char *xlog);
 
 
@@ -440,6 +440,7 @@ static void
 pgarch_ArchiverCopyLoop(void)
 {
 	char		xlog[MAX_XFN_CHARS + 1];
+	int64		ready_count;
 
 	/*
 	 * loop through all xlogs with archive_status of .ready and archive
@@ -447,7 +448,7 @@ pgarch_ArchiverCopyLoop(void)
 	 * some backend will add files onto the list of those that need archiving
 	 * while we are still copying earlier archives
 	 */
-	while (pgarch_readyXlog(xlog))
+	while ((ready_count = pgarch_readyXlog(xlog)) != 0)
 	{
 		int			failures = 0;
 
@@ -488,11 +489,16 @@ pgarch_ArchiverCopyLoop(void)
 pgarch_archiveDone(xlog);
 
 /*
+ * File has been archived, reducing by one the entries waiting
+ * to be archived.
+ */
+ready_count--;
+
+/*
  * Tell the collector about the WAL file that we successfully
  * archived
  */
-pgstat_send_archiver(xlog, false);
-
+pgstat_send_archiver(xlog, ready_count, false);
 break;

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-11-17 Thread Jeff Davis

On Mon, 2014-11-17 at 18:04 +0100, Andres Freund wrote:
 That's quite possibly one culprit for the slowdown. Right now one
 AllocSetContext struct fits precisely into three cachelines. After your
 change it doesn't anymore.

Thank you! I made the same mistake as Tomas: I saw that
MemoryContextData went to 64 bytes and thought that should be fine
while ignoring AllocSetContext.

I did some tests in the past between a version of my patch that made
MemoryContextData 72 bytes and one that made it 64 bytes, but I may have
neglected to test against the original 56 byte version.

I'm still not able to test to see if this is the real problem, but it
seems best to eliminate it anyway.

 Consider not counting memory in bytes but blocks and adding it directly
 after the NodeTag. That'd leave the size the same as right now.

I can also just move isReset there, and keep mem_allocated as a uint64.
That way, if I find later that I want to track the aggregated value for
the child contexts as well, I can split it into two uint32s. I'll hold
off any any such optimizations until I see some numbers from HashAgg
though.

Attached new version.

Regards,
Jeff Davis

*** a/src/backend/utils/mmgr/aset.c
--- b/src/backend/utils/mmgr/aset.c
***
*** 438,451  AllocSetContextCreate(MemoryContext parent,
  	  Size initBlockSize,
  	  Size maxBlockSize)
  {
! 	AllocSet	context;
  
  	/* Do the type-independent part of context creation */
! 	context = (AllocSet) MemoryContextCreate(T_AllocSetContext,
! 			 sizeof(AllocSetContext),
! 			 AllocSetMethods,
! 			 parent,
! 			 name);
  
  	/*
  	 * Make sure alloc parameters are reasonable, and save them.
--- 438,454 
  	  Size initBlockSize,
  	  Size maxBlockSize)
  {
! 	AllocSet			set;
! 	MemoryContext		context;
  
  	/* Do the type-independent part of context creation */
! 	context = MemoryContextCreate(T_AllocSetContext,
!   sizeof(AllocSetContext),
!   AllocSetMethods,
!   parent,
!   name);
! 
! 	set = (AllocSet) context;
  
  	/*
  	 * Make sure alloc parameters are reasonable, and save them.
***
*** 459,467  AllocSetContextCreate(MemoryContext parent,
  	if (maxBlockSize  initBlockSize)
  		maxBlockSize = initBlockSize;
  	Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */
! 	context-initBlockSize = initBlockSize;
! 	context-maxBlockSize = maxBlockSize;
! 	context-nextBlockSize = initBlockSize;
  
  	/*
  	 * Compute the allocation chunk size limit for this context.  It can't be
--- 462,470 
  	if (maxBlockSize  initBlockSize)
  		maxBlockSize = initBlockSize;
  	Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */
! 	set-initBlockSize = initBlockSize;
! 	set-maxBlockSize = maxBlockSize;
! 	set-nextBlockSize = initBlockSize;
  
  	/*
  	 * Compute the allocation chunk size limit for this context.  It can't be
***
*** 477,486  AllocSetContextCreate(MemoryContext parent,
  	 * and actually-allocated sizes of any chunk must be on the same side of
  	 * the limit, else we get confused about whether the chunk is big.
  	 */
! 	context-allocChunkLimit = ALLOC_CHUNK_LIMIT;
! 	while ((Size) (context-allocChunkLimit + ALLOC_CHUNKHDRSZ) 
  		   (Size) ((maxBlockSize - ALLOC_BLOCKHDRSZ) / ALLOC_CHUNK_FRACTION))
! 		context-allocChunkLimit = 1;
  
  	/*
  	 * Grab always-allocated space, if requested
--- 480,489 
  	 * and actually-allocated sizes of any chunk must be on the same side of
  	 * the limit, else we get confused about whether the chunk is big.
  	 */
! 	set-allocChunkLimit = ALLOC_CHUNK_LIMIT;
! 	while ((Size) (set-allocChunkLimit + ALLOC_CHUNKHDRSZ) 
  		   (Size) ((maxBlockSize - ALLOC_BLOCKHDRSZ) / ALLOC_CHUNK_FRACTION))
! 		set-allocChunkLimit = 1;
  
  	/*
  	 * Grab always-allocated space, if requested
***
*** 500,519  AllocSetContextCreate(MemoryContext parent,
  	 errdetail(Failed while creating memory context \%s\.,
  			   name)));
  		}
! 		block-aset = context;
  		block-freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
  		block-endptr = ((char *) block) + blksize;
! 		block-next = context-blocks;
! 		context-blocks = block;
  		/* Mark block as not to be released at reset time */
! 		context-keeper = block;
  
  		/* Mark unallocated space NOACCESS; leave the block header alone. */
  		VALGRIND_MAKE_MEM_NOACCESS(block-freeptr,
     blksize - ALLOC_BLOCKHDRSZ);
  	}
  
! 	return (MemoryContext) context;
  }
  
  /*
--- 503,525 
  	 errdetail(Failed while creating memory context \%s\.,
  			   name)));
  		}
! 
! 		context-mem_allocated += blksize;
! 
! 		block-aset = set;
  		block-freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
  		block-endptr = ((char *) block) + blksize;
! 		block-next = set-blocks;
! 		set-blocks = block;
  		/* Mark block as not to be released at reset time */
! 		set-keeper = block;
  
  		/* Mark unallocated space

Re: [HACKERS] Obsolete debug #define in pg_config_manual.h