date:20150115

Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 13, 2015 at 8:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Not sure whether to just commit this to HEAD and call it a day, or to
 risk back-patching.

 I think we need to back-patch something; that's a pretty nasty
 regression, and I have some EDB-internal reports that might be from
 the same cause.

OK.  I went ahead and made the necessary code changes to avoid changing
the struct size in released branches, as per Heikki's idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 6:02 AM, Merlin Moncure mmonc...@gmail.com wrote:
 Question: Coming in this morning I did an immediate restart and logged
 into the database and queried pg_class via index.   Everything was
 fine, and the leftright verify returns nothing.  How did it repair
 itself without a reindex?

Maybe the relevant WAL records are more or less correct, and recovery
restores the database to its correct state, as opposed to the state it
was actually in at the time of the crash due to a race condition or
whatever. Could you possibly send bt_page_items() against block 9 of
the index after recovery? It probably looks totally sane, but it would
be nice to be sure, if you happen to still have the data.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

On 2015-01-15 11:56:24 -0500, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On 2015-01-15 10:57:10 -0500, Tom Lane wrote:
  While I'll not cry too hard when we decide to break C89 compatibility,
  I don't want it to happen accidentally; so having a pretty old-school
  compiler in the farm seems important to me.
 
  I'd worked on setting up a modern gcc (or was it clang?) with the
  appropriate flags to warn about !C89 stuff some time back, but failed
  because of configure bugs.
 
 My recollection is that there isn't any reasonable way to get gcc to
 warn about C89 violations as such.  -ansi -pedantic is not very fit
 for the purpose.

It was clang, which has -Wc99-extensions/-Wc11-extensions.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compress method for spgist - 2


On 01/15/2015 09:28 AM, Michael Paquier wrote:

Marking this patch as returned with feedback because it is waiting for
input from the author for now a couple of weeks. Heikki, the
refactoring patch has some value, are you planning to push it?


I think you're mixing up with the other thread, btree_gin and ranges. 
I pushed the refactoring patch I posted to that thread 
(http://www.postgresql.org/message-id/54983cf2.80...@vmware.com) 
already. I haven't proposed any refactoring related to spgist.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLOG_PARAMETER_CHANGE handling of wal_log_hints


On 01/07/2015 11:53 AM, Petr Jelinek wrote:

On 07/01/15 00:59, Michael Paquier wrote:

On Wed, Jan 7, 2015 at 4:24 AM, Petr Jelinek p...@2ndquadrant.com wrote:

Hi,

when I was fixing how commit_ts handles the XLOG_PARAMETER_CHANGE I noticed
that for wal_log_hints we assign the value in ControFile to current value
instead of value that comes from WAL.

ISTM it has just information value so it should not have any practical
impact, but it looks like a bug anyway so here is simple patch to change
that.

Right. That's something that should get fixed in 9.4 as well..

   ControlFile-track_commit_timestamp = track_commit_timestamp;
The new value of track_commit_timestamp should be taken as well from
xlrec on master btw.


That's part of the larger fix for CommitTs that I sent separately in
response to the bug report from Fujii - there is much more to be done
there for CommitTs than just this.


Pushed this part now. I'll let you and Fujii handle that larger fix. Thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

Andres Freund and...@anarazel.de writes:
 On 2015-01-15 10:57:10 -0500, Tom Lane wrote:
 While I'll not cry too hard when we decide to break C89 compatibility,
 I don't want it to happen accidentally; so having a pretty old-school
 compiler in the farm seems important to me.

 I'd worked on setting up a modern gcc (or was it clang?) with the
 appropriate flags to warn about !C89 stuff some time back, but failed
 because of configure bugs.

My recollection is that there isn't any reasonable way to get gcc to
warn about C89 violations as such.  -ansi -pedantic is not very fit
for the purpose.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 8:02 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 01/15/2015 03:23 AM, Peter Geoghegan wrote:

 So now the question is: how did that inconsistency arise? It didn't
 necessarily arise at the time of the (presumed) split of block 2 to
 create 9. It could be that the opaque area was changed by something
 else, some time later. I'll investigate more.


 Merlin, could you re-run the test with a WAL archive (if you don't have one
 already), and then run pg_xlogdump, filtering it to show only the changes to
 the index? That should show us how the index got to be the way it is. Also,
 if you could post a copy of the raw relation file for pg_class_oid_index; I
 assume it's not too large.

 Something like:

 pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917

 11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd
 cluster. In case it's not the same on your system, you can use oid2name to
 find it out.

 I'm on it.  Will try this first, then patch removal.

 Question: Coming in this morning I did an immediate restart and logged
 into the database and queried pg_class via index.   Everything was
 fine, and the leftright verify returns nothing.  How did it repair
 itself without a reindex?

The plot thickens!  I looped the test, still stock 9.4 as of this time
and went to lunch. When I came back, the database was in recovery
mode.  Here is the rough sequence of events.

1) REINDEXED pg_class (wanted clean slate for full reproduction)
2) before too long (unfortunately did not configure timestamps in the
log) starting seeing:

ERROR:  root page 3 of index pg_class_oid_index has level 0,
expected 1 at character 8
QUERY:  UPDATE CDSRunTable SET
  Finished = clock_timestamp(),
  DidSucceed = _DidSucceed,
  ErrorMessage = _ErrorMessage
WHERE CDSRunTableId = _CdsRunTableId
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
SQL statement
STATEMENT:  SELECT CDSReconcileRunTable(2020)

..and again with a FATAL
FATAL:  root page 3 of index pg_class_oid_index has level 0, expected 1
ERROR:  root page 3 of index pg_class_oid_index has level 0, expected 1
CONTEXT:  SQL statement UPDATE CDSRunTable SET
  Finished = clock_timestamp(),
  DidSucceed = _DidSucceed,
  ErrorMessage = _ErrorMessage
WHERE CDSRunTableId = _CdsRunTableId
PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement


3) shortly (?) after that, I got:

WARNING:  did not find subXID 14955 in MyProc
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
WARNING:  you don't own a lock of type RowExclusiveLock
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
LOG:  could not send data to client: Broken pipe
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
STATEMENT:  SELECT CDSReconcileRunTable(2151)
WARNING:  ReleaseLockIfHeld: failed??
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
ERROR:  failed to re-find shared proclock object
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
STATEMENT:  SELECT CDSReconcileRunTable(2151)
WARNING:  AbortSubTransaction while in ABORT state
WARNING:  did not find subXID 14955 in MyProc
WARNING:  you don't own a lock of type AccessShareLock
WARNING:  ReleaseLockIfHeld: failed??
ERROR:  failed to re-find shared proclock object
WARNING:  AbortSubTransaction while in ABORT state
WARNING:  did not find subXID 14955 in MyProc
WARNING:  you don't own a lock of type AccessShareLock
WARNING:  ReleaseLockIfHeld: failed??
WARNING:  you don't own a lock of type ShareLock
TRAP: FailedAssertion(!(FastPathStrongRelationLocks-count[fasthashcode]
 0), File: lock.c, Line: 1240)
LOG:  server process (PID 10117) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

4) while I was still at lunch, the database started up again, and
quickly fell over:
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2268)
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2270)
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2272)
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2273)
TRAP: FailedAssertion(!(( ((void) ((bool) ((! assert_enabled) || !
(!(((bool) (((const void*)(htup-t_ctid) != ((void *)0)) 
((htup-t_ctid)-ip_posid != 0)
LOG:  server process (PID 30520) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

this repeated a very large number of times.   Occasionally, but
infrequently, I'd see during the database startup something

Re: [HACKERS] [PATCH] explain sortorder

Timmer, Marius marius.tim...@uni-muenster.de writes:
 attached is version 8, fixing remaining issues, adding docs and tests as 
 requested/agreed.

I'll pick this up --- I've been a bit lax about helping with this
commitfest.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

On Thu, Jan 15, 2015 at 8:42 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-15 08:40:34 -0500, Robert Haas wrote:
 I do think that safe is the wrong suffix.  Maybe palloc_soft_fail()
 or palloc_null() or palloc_no_oom() or palloc_unsafe().

 palloc_or_null()?

That'd work for me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-15 Thread Alvaro Herrera

Robert Haas wrote:

 Hmm, I understood Tom to be opposing the idea of a palloc variant that
 returns NULL on failure, and I understand you to be supporting it.
 But maybe I'm confused.

Your understanding seems correct to me.  I was just saying that your
description of Tom's argument to dislike the idea seemed at odds with
what he was actually saying.

 Anyway, I support it.  I agree that there are
 systems (or circumstances?) where malloc is going to succeed and then
 the world will blow up later on anyway, but I don't think that means
 that an out-of-memory error is the only sensible response to a palloc
 failure; returning NULL seems like a sometimes-useful alternative.
 
 I do think that safe is the wrong suffix.  Maybe palloc_soft_fail()
 or palloc_null() or palloc_no_oom() or palloc_unsafe().

I liked palloc_noerror() better myself FWIW.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-15 Thread Alvaro Herrera

Michael Paquier wrote:

 Andres, this patch needs more effort from the author, right? So
 marking it as returned with feedback.

I will give this patch a look in the current commitfest, if you can
please set as 'needs review' instead with me as reviewer, so that I
don't forget, I would appreciate it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] explain sortorder (fwd)

2015-01-15 Thread Mike Blackwell



 From: Timmer, Marius marius.tim...@uni-muenster.de

 Hi,

 attached is version 8, fixing remaining issues, adding docs and tests as
 requested/agreed.


 Marius  Arne


This looks good to me.  Test coverage seems complete.  Doc updates are
included.  Output format looks like it should be acceptable to Heikki.

I'll mark this as ready for committer.

Thanks for the patch!

Mike

Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-01-15 Thread Alvaro Herrera

Etsuro Fujita wrote:

 ***
 *** 817,826  InitPlan(QueryDesc *queryDesc, int eflags)
 --- 818,833 
   break;
   case ROW_MARK_COPY:
   /* there's no real table here ... */
 + relkind = rt_fetch(rc-rti, 
 rangeTable)-relkind;
 + if (relkind == RELKIND_FOREIGN_TABLE)
 + relid = getrelid(rc-rti, rangeTable);
 + else
 + relid = InvalidOid;
   relation = NULL;
   break;
   default:
   elog(ERROR, unrecognized markType: %d, 
 rc-markType);
 + relid = InvalidOid;
   relation = NULL;/* keep compiler quiet 
 */
   break;
   }

[ ... ]

 --- 2326,2342 
   
   /* build a temporary HeapTuple control structure */
   tuple.t_len = HeapTupleHeaderGetDatumLength(td);
 ! /* if relid is valid, rel is a foreign table; set 
 system columns */
 ! if (OidIsValid(erm-relid))
 ! {
 ! tuple.t_self = td-t_ctid;
 ! tuple.t_tableOid = erm-relid;
 ! }
 ! else
 ! {
 ! ItemPointerSetInvalid((tuple.t_self));
 ! tuple.t_tableOid = InvalidOid;
 ! }
   tuple.t_data = td;
   
   /* copy and store tuple */

I find this arrangement confusing and unnecessary -- surely if you have
access to the ExecRowMark here, you could just obtain the relid with
RelationGetRelid instead of saving the OID beforehand?  And if you have
the Relation, you could just consult the relkind at that point instead
of relying on the relid being set or not as a flag to indicate whether
the rel is foreign.

I didn't look at anything else in the patch so I can't comment more on
it, but the change to ExecRowMark caught my attention.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

On Thu, Jan 15, 2015 at 9:09 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jan 15, 2015 at 6:52 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 15, 2015 at 7:00 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  +HandleParallelMessages(void)
  +{
  ..
  ..
  + for (i = 0; i  pcxt-nworkers; ++i)
  + {
  + /*
  + * Read messages for as long as we have an error queue; if we
  + * have hit (or hit while reading) ReadyForQuery, this will go to
  + * NULL.
  + */
  + while (pcxt-worker[i].error_mqh != NULL)
  + {
  + shm_mq_result res;
  +
  + CHECK_FOR_INTERRUPTS();
  +
  + res = shm_mq_receive(pcxt-worker[i].error_mqh, nbytes,
  + data, true);
  + if (res == SHM_MQ_SUCCESS)
 
  Here we are checking the error queue for all the workers and this loop
  will continue untill all have sent ReadyForQuery() message ('Z') which
  will make this loop continue till all workers have finished their work.
  Assume situation where first worker has completed the work and sent
  'Z' message and second worker is still sending some tuples, now above
  code will keep on waiting for 'Z' message from second worker and won't
  allow to receive tuples sent by second worker till it send 'Z' message.
 
  As each worker send its own 'Z' message after completion, so ideally
  the above code should receive the message only for worker which has
  sent the message.  I think for that it needs worker information who has
  sent the message.

 Are you talking about HandleParallelMessages() or
 WaitForParallelWorkersToFinish()?  The former doesn't wait for
 anything; it just handles any messages that are available now.

 I am talking about HandleParallelMessages().  It doesn't wait but
 it is looping which will make it run for longer time as explained
 above.  Just imagine a case where there are two workers and first
 worker has sent 'Z' message and second worker is doing some
 work, now in such a scenario loop will not finish until second worker
 also send 'Z' message or error.  Am I missing something?

Blah.  You're right.  I intended to write this loop so that it only
runs until shm_mq_receive() returns SHM_MQ_WOULD_BLOCK.  But that's
not what I did.  Will fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] explain sortorder

2015-01-15 Thread Timmer, Marius

Hi,

attached is version 8, fixing remaining issues, adding docs and tests as 
requested/agreed.


Marius  Arne




---
Marius Timmer
Zentrum für Informationsverarbeitung
Westfälische Wilhelms-Universität Münster
Einsteinstraße 60

mtimm...@uni-muenster.de

Am 14.01.2015 um 17:42 schrieb Arne Scheffer sche...@uni-muenster.de:

 Hi,

 we will also remove the following is lc_collate hint in the next version, 
 showing only mandatory info as suggested.

/* for those who use COLLATE although their default is already 
 the wanted */
if (strcmp(collname, localeptr) == 0)
{
appendStringInfo(sortorderInformation,  (%s is 
 LC_COLLATE), collname);
}

 Anybody insisting on that?

 Arne

 Note: I see, at the moment we use the wrong default for DESC. We'll fix that.

 On Wed, 14 Jan 2015, Heikki Linnakangas wrote:

 On 01/14/2015 05:26 PM, Timmer, Marius wrote:
 Hello Heikki,
 abbreviated version:
 Sorry, the problem is only the unhandy patch text format, not different 
 opinions how to proceed.
 Long version:
 The v7 patch file already addressed your suggestions,
 but the file contained serveral (old) local commits,
 the new ones at the end of the patch text/file.

 Ah, missed that. I stopped reading when I saw the old stuff there :-).

 v7.1 is attached and addresses this issue providing a clean patch file.

 Ok, thanks, will take a look.

 V8 will - as mentioned - add missing docs and regression tests,

 Great!

 - Heikki





explain_sortorder-v8.patch
Description: explain_sortorder-v8.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 1:15 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hi,

 The plot thickens!  I looped the test, still stock 9.4 as of this time
 and went to lunch. When I came back, the database was in recovery
 mode.  Here is the rough sequence of events.


 Whoa. That looks scary. Did you see (some of) those errors before? Most
 of them should have been emitted independently of being built with
 assertions.

Nope.

 1) REINDEXED pg_class (wanted clean slate for full reproduction)
 2) before too long (unfortunately did not configure timestamps in the
 log) starting seeing:

 ERROR:  root page 3 of index pg_class_oid_index has level 0,
 expected 1 at character 8
 QUERY:  UPDATE CDSRunTable SET
   Finished = clock_timestamp(),
   DidSucceed = _DidSucceed,
   ErrorMessage = _ErrorMessage
 WHERE CDSRunTableId = _CdsRunTableId
 CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
 SQL statement
 STATEMENT:  SELECT CDSReconcileRunTable(2020)

 ..and again with a FATAL
 FATAL:  root page 3 of index pg_class_oid_index has level 0, expected 1
 ERROR:  root page 3 of index pg_class_oid_index has level 0, expected 1
 CONTEXT:  SQL statement UPDATE CDSRunTable SET
   Finished = clock_timestamp(),
   DidSucceed = _DidSucceed,
   ErrorMessage = _ErrorMessage
 WHERE CDSRunTableId = _CdsRunTableId
 PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement

 Just to be absolutely clear, those happened *before* the following
 errors? And there were no 'during exception cleanup' like errors before?

correct.  unfortunately, I wasn't logging times so I'm not 100% sure
if the 'root page' error happened during the event or sometime before
it.  I'm addressing that with the log prefix.

 Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
 entries before? It's hard to know from here, but the 'during exception
 cleanup' indicates a problem in abort handling.  Were there any deadlock
 detected errors closeby?

 You're catching deadlock errors in a subtransaction. Hm.

yup, exactly 1:
WARNING:  LOG: Handling deadlock detected on CdsRunTableId 1626

Based on the id that is proven to be before the crashing started,
although at least one manual server restart happened in the interim.

Since it's possible the database is a loss, do you see any value in
bootstrappinng it again with checksums turned on?  One point of note
is that this is a brand spanking new SSD, maybe we nee to rule out
hardware based corruption?

 lternatively were there any 'LOG: CdsRunTableId %s Failed' messages? If
so, what was the cause?

There was one, but unfortunately since the error was trapped the
relevant detail to catch the other side of the deadlock is missed.
Barring this analysis, I'm inclined to remove that guard: it was
mostly there to deal with what turned out to be some bugs in the code
(specifically nonconstrained update on CDSTable).

 * Do you also use lock_timeout/statement_timeout?
not on this database -- this is a coding environment and not set up
with standard configuration

  were any processes killed at that time?
later on, yes, by manual restart, and the query was cancelled first: to wit:

WARNING:  LOG: Handling deadlock detected on CdsRunTableId 1626
ERROR:  canceling statement due to user request
STATEMENT:  SELECT CDSReconcileRunTable(1626)

followed by restart -m fast

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] advance local xmin more aggressively

On Thu, Jan 15, 2015 at 3:08 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Dec 22, 2014 at 7:31 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Here's an updated version, rebased over the pairing heap code that I just
 committed, and fixing those bugs.
 So, are we reaching an outcome for the match happening here?

Well, I still like using the existing ResourceOwner pointers to find
the snapshots more than introducing a separate data structure just for
that.  But I'm OK with Heikki committing his version and, if anybody
notices the new code becoming a hotspot, we can revisit the issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

2015-01-15 Thread Bruce Momjian

On Tue, Jan 13, 2015 at 10:56:48AM -0500, Peter Eisentraut wrote:
 On 11/30/14 11:45 AM, Tom Lane wrote:
  The API break isn't a big issue imo.  The net effect would be that eg
  hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
  thing *all the time* --- at least twice in the past year, according to
  a quick scan of the commit logs.  If you were changing or removing a
  function that third-party code might depend on, it'd be problematic,
  but an addition has no such risk.
 
 This sort of things is actually a bit of an annoyance, because it means
 that for minor-version upgrades, you need to stop the server before
 unpacking the new version, otherwise the old running server will try to
 load the new hstore module and fail with a symbol lookup.  This can
 increase the downtime significantly.
 
 Yes, we've done this before, and people have gotten bitten by it before.

Uh, do we ever support installing new binaries while the server is
running?  I would hope not.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On 2015-01-15 20:15:42 +0100, Andres Freund wrote:
  WARNING:  did not find subXID 14955 in MyProc
  CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
  during exception cleanup
  WARNING:  you don't own a lock of type RowExclusiveLock
  CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
  during exception cleanup
  LOG:  could not send data to client: Broken pipe
  CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
  during exception cleanup
  STATEMENT:  SELECT CDSReconcileRunTable(2151)
  WARNING:  ReleaseLockIfHeld: failed??
  CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
  during exception cleanup
  ERROR:  failed to re-find shared proclock object
  CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
  during exception cleanup
  STATEMENT:  SELECT CDSReconcileRunTable(2151)
  WARNING:  AbortSubTransaction while in ABORT state
  WARNING:  did not find subXID 14955 in MyProc
  WARNING:  you don't own a lock of type AccessShareLock
  WARNING:  ReleaseLockIfHeld: failed??
  ERROR:  failed to re-find shared proclock object
  WARNING:  AbortSubTransaction while in ABORT state
  WARNING:  did not find subXID 14955 in MyProc
  WARNING:  you don't own a lock of type AccessShareLock
  WARNING:  ReleaseLockIfHeld: failed??
  WARNING:  you don't own a lock of type ShareLock
  TRAP: FailedAssertion(!(FastPathStrongRelationLocks-count[fasthashcode]
   0), File: lock.c, Line: 1240)
  LOG:  server process (PID 10117) was terminated by signal 6: Aborted
  LOG:  terminating any other active server processes
 
 Ick.
 
 Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
 entries before? It's hard to know from here, but the 'during exception
 cleanup' indicates a problem in abort handling.  Were there any deadlock
 detected errors closeby?

Alternatively were there any 'LOG: CdsRunTableId %s Failed' messages? If
so, what was the cause?

 You're catching deadlock errors in a subtransaction. Hm.

A couple questions:
* Do you also use lock_timeout/statement_timeout? If so, what are their
  settings + deadlock_timeout?
* were any processes killed at that time?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple

Manuel Kniep man...@adjust.com writes:
 ok after lotâs of testing I could create a test caseÂ 
 which can be found here https://gist.github.com/rapimo/3c8c1b35270e5854c524Â 
 itâs written in ruby an depends on the gem activerecord pg and parallelÂ 

Hm.  I don't see a segfault from this.  I do see the CREATE TEMP TABLE
command failing with ctid is NULL, which probably shouldn't be happening
... but no segfault.  I tried with current HEAD and with current 9.3
branch tip, so it's possible there's been some relevant bugfix since
9.3.5.

 So my feeling is that the row exclusive lock is not taken properly on 
 inherited tables.Â 

I think it's much more likely something to do with EvalPlanQual.  There
have been some bugs fixed in that area since 9.3.5, but none of them
look real promising as an explanation for this ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

2015-01-15 Thread Peter Eisentraut

On 1/15/15 2:29 PM, Bruce Momjian wrote:
 On Tue, Jan 13, 2015 at 10:56:48AM -0500, Peter Eisentraut wrote:
 On 11/30/14 11:45 AM, Tom Lane wrote:
 The API break isn't a big issue imo.  The net effect would be that eg
 hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
 thing *all the time* --- at least twice in the past year, according to
 a quick scan of the commit logs.  If you were changing or removing a
 function that third-party code might depend on, it'd be problematic,
 but an addition has no such risk.

 This sort of things is actually a bit of an annoyance, because it means
 that for minor-version upgrades, you need to stop the server before
 unpacking the new version, otherwise the old running server will try to
 load the new hstore module and fail with a symbol lookup.  This can
 increase the downtime significantly.

 Yes, we've done this before, and people have gotten bitten by it before.
 
 Uh, do we ever support installing new binaries while the server is
 running?  I would hope not.

Effectively, we don't, but it's not unreasonable to expect it.  Check
how your operating system upgrades other server packages such as apache
or openssh.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] attribute for non-gcc compilers

On Wed, Jan 14, 2015 at 5:54 PM, Andres Freund and...@2ndquadrant.com wrote:
 I think I'd for now simply not define pg_attribute_aligned() on
 platforms where it's not supported, instead of defining it empty. If we
 need a softer variant we can name it pg_attribute_aligned_if_possible or
 something.

 Sounds sane?

Yes, that sounds like a much better plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Since it's possible the database is a loss, do you see any value in
 bootstrappinng it again with checksums turned on?  One point of note
 is that this is a brand spanking new SSD, maybe we nee to rule out
 hardware based corruption?

hm!  I bootstrapped a new database with checksums on, and lo,

[cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING:  page
verification failed, calculated checksum 28520 but expected 28541
[cds2 18347 2015-01-15 15:58:29.955 CST 1779]CONTEXT:  SQL statement
COPY lease_delete FROM
'/tmp/C1188732_onesiteproperty.lease_delete.csv' CSV HEADER

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] infinite loop in _bt_getstackbuf

On Thu, Oct 30, 2014 at 10:46 AM, Robert Haas robertmh...@gmail.com wrote:
 (9.3.5 problem report)

I think I saw a similar issue, by a 9.3.5 instance that was affected
by the in pg_upgrade, remove pg_multixact files left behind by
initdb issue (I ran the remediation recommended in the 9.3.5 release
notes). Multiple anti-wraparound vacuums were stuck following a PITR.
I resolved this (as far as I can tell) by killing the autovacuum
workers, and manually running VACUUM FREEZE. I have yet to do any root
cause analysis, but I think I could reproduce the problem.

 The fundamental structure of that function is an infinite loop.  We
 break out of that loop when BTEntrySame(item, stack-bts_btentry) or
 P_RIGHTMOST(opaque) and I'm sure that it's correct to think that, in
 theory, one of those things will eventually happen.

Not in theory - only in practice. LY specifically state:

We wish to point out here that our algorithms do not prevent the
possibility of livelock (where one process rrms indefinitely). This
can happen if a process never terminates because it keeps having to
follow link pointers created by other processes. This might happen in
the case of a process being run on a (relatively) very slow processor
in a multiprocessor system.

 But the index
 could be corrupted, most obviously by having a page where
 opaque-btpo_next points pack to the current block number.  If that
 happens, you need an immediate shutdown (or some clever gdb hackery)
 to terminate the VACUUM.  That's unfortunate and unnecessary.

Merlin reported a bug that looked exactly like this. Hardware failure
may now explain the problem.

 It also looks likes something we can fix, at a minimum by adding a
 CHECK_FOR_INTERRUPTS() at the top of that loop, or in some function
 that it calls, like _bt_getbuf(), so that if it goes into an infinite
 loop, it can at least be killed.

I think that it might be a good idea to have circular _bt_moveright()
moves (the direct offender in Merlin's case, which has very similar
logic to your _bt_getstackbuf() problem case) detected. I'm pretty
sure that it's exceptional for there to be more than 2 or 3 retries in
_bt_moveright(). It would probably be fine to consider the possibility
that we'll never finish once we get past 5 retries or something like
that. We'd then start keeping track of blocks visited, and raise an
error when a page was visited a second time.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-01-15 Thread Andreas Karlsson


On 01/14/2015 08:48 AM, Michael Paquier wrote:

All those things gathered give the patch attached. Andreas, if you are
fine with it I think that we could pass it to a committer.


Excellent changes. Thanks for the patch and the reviews.

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump

2015-01-15 Thread Jim Nasby


On 1/15/15 5:26 AM, Gilles Darold wrote:

Hello,

There's a long pending issue with pg_dump and extensions that have table 
members with foreign keys. This was previously reported in this thread 
http://www.postgresql.org/message-id/ca+tgmoyvzkadmgh_8el7uvm472geru0b4pnnfjqye6ss1k9...@mail.gmail.com
 and discuss by Robert. All PostgreSQL users that use the PostGis extension 
postgis_topology are facing the issue because the two members tables (topology 
and layer) are linked by foreign keys.

If you dump a database with this extension and try to import it you will 
experience this error:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3345; 0 157059176 TABLE 
DATA layer gilles
pg_restore: [archiver (db)] COPY failed for table layer: ERROR: insert or update on table 
layer violates foreign key constraint layer_topology_id_fkey
DETAIL:  Key (topology_id)=(1) is not present in table topology.
WARNING: errors ignored on restore: 1


The problem is that, whatever export type you choose (plain/custom and full-export/data-only) the 
data of tables topology and layer are always exported in alphabetic order. 
I think this is a bug because outside extension, in data-only export, pg_dump is able to find 
foreign keys dependency and dump table's data in the right order but not with extension's members. 
Default is alphabetic order but that should not be the case with extension's members because 
constraints are recreated during the CREATE EXTENSION order. I hope I am clear enough.

Here we have three solutions:

 1/ Inform developers of extensions to take care to alphabetical order when 
they have member tables using foreign keys.
 2/ Inform DBAs that they have to restore the failing table independently. 
The use case above can be resumed using the following command:

  pg_restore -h localhost -n topology -t layer -Fc -d testdb_empty 
testdump.dump

 3/ Inform DBAs that they have to restore the schema first then the data 
only using --disable-triggers


I don't like 1-3, and I doubt anyone else does...


 4/ Patch pg_dump to solve this issue.


5. Disable FK's during load.
This is really a bigger item than just extensions. It would have the nice 
benefit of doing a wholesale FK validation instead of firing per-row triggers, 
but it would leave the database in a weird state if a restore failed...


I attach a patch that solves the issue in pg_dump, let me know if it might be 
included in Commit Fest or if the three other solutions are a better choice. I 
also join a sample extension (test_fk_in_ext) to be able to reproduce the issue 
and test the patch. Note that it might exists a simpler solution than the one I 
used in this patch, if this is the case please point me on the right way, I 
will be pleased to rewrite and send an other patch.


The only problem I see with this approach is circular FK's:

decibel@decina.local=# create table a(a_id serial primary key, b_id int);
CREATE TABLE
decibel@decina.local=# create table b(b_id serial primary key, a_id int 
references a);
CREATE TABLE
decibel@decina.local=# alter table a add foreign key(b_id) references b;
ALTER TABLE
decibel@decina.local=#

That's esoteric enough that I think it's OK not to directly support them, but 
pg_dump shouldn't puke on them (and really should throw a warning). Though it 
looks like it doesn't handle that in the data-only case anyway...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 4:03 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Since it's possible the database is a loss, do you see any value in
 bootstrappinng it again with checksums turned on?  One point of note
 is that this is a brand spanking new SSD, maybe we nee to rule out
 hardware based corruption?

 hm!  I bootstrapped a new database with checksums on, and lo,

 [cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING:  page
 verification failed, calculated checksum 28520 but expected 28541
 [cds2 18347 2015-01-15 15:58:29.955 CST 1779]CONTEXT:  SQL statement
 COPY lease_delete FROM
 '/tmp/C1188732_onesiteproperty.lease_delete.csv' CSV HEADER

Running this test on another set of hardware to verify -- if this
turns out to be a false alarm which it may very well be, I can only
offer my apologies!  I've never had a new drive fail like that, in
that manner.  I'll burn the other hardware in overnight and report
back.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] can you have any idea about toast missing chunk issu resolution

2015-01-15 Thread Jim Nasby


On 1/15/15 6:22 AM, M Tarkeshwar Rao wrote:

We are getting following error message on doing any action on the table 
like(Select or open from pgadmin).


Error reports should go to pgsql-general. I'm moving the discussion there (and 
BCC'ing -hackers).


Please suggest.

ERROR:  missing chunk number 0 for toast value 54787 in pg_toast_2619

** Error **

ERROR: missing chunk number 0 for toast value 54787 in pg_toast_2619

SQL state: XX000


That means that the database tried to detoast a value and it couldn't find it 
in the toast table. Likely causes:

- Someone manually modified the toast table. Hard to do, but not impossible.
- The toast index is corrupted. toast_fetch_datum() will always use a toast 
index so the only way to see if this is the issue is to try REINDEXing.
- The index is fine and the toast table is corrupted.
- The base table is corrupted. I think a corrupt index on the base table could 
also cause this, but I'm not sure.
- You've found some bug in either the toast or detoast code.

Note that when I say 'corrupted', that could also mean that the data is there 
and simply isn't being considered as visible per MVCC rules. Also, the actual 
condition throwing this error is

if (nextidx != numchunks)

where

numchunks = ((ressize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;

and nextidx is incremented with every chunk that's read.

The easiest thing to try right now is a REINDEX and see if that fixes things. 
It would be best if you stopped the server and made a filesystem copy of the 
data directory before doing that (or at least pg_clog, pg_xlog and the relevant 
toast heap and index files).
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes superuser review

2015-01-15 Thread Adam Brightwell

All,

Attached is a patch that proposes the following additional role attributes
for review:

* ONLINE_BACKUP - allows role to perform backup operations
  - originally proposed as BACKUP - due to concern for the use of that term
in relation to other potential backup related permissions this form is in
line with the documentation as it describes the affected backup operations
as being 'online backups'.
  - applies only to the originally proposed backup functions.
* XLOG_REPLAY - allows role to perform pause and resume on xlog_replay
operations ('pg_xlog_replay_pause' and 'pg_xlog_replay_resume')
  - following the recommendation from Stephen and Magnus.
* LOG - allows role to rotate log files - remains broad enough to consider
future log related operations
* MONITOR - allows role to view pg_stat_* details (as originally proposed)
* SIGNAL - allows role to signal backend processes (as originally proposed)

The documentation still needs to be updated.  If this these attributes and
the capabilities they provide are acceptable, then I'll begin moving
forward on making those updates as well.

Regarding the discussion on a DUMP/READONLY permission. I believe that
topic needs much further discussion and decided it is probably best to keep
it as a separate patch/effort.  I'd certainly be willing to continue that
discussion and assist in moving any related effort forward, therefore,
please let me know if there is anything I can do to help.

Thanks,
Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
new file mode 100644
index 2179bf7..aaf13c1
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
***
*** 27,32 
--- 27,33 
  #include miscadmin.h
  #include replication/walreceiver.h
  #include storage/smgr.h
+ #include utils/acl.h
  #include utils/builtins.h
  #include utils/numeric.h
  #include utils/guc.h
*** pg_start_backup(PG_FUNCTION_ARGS)
*** 54,63 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!superuser()  !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! 		   errmsg(must be superuser or replication role to run a backup)));
  
  	startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL);
  
--- 55,65 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!has_replication_privilege(GetUserId())
! 		 !has_online_backup_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  errmsg(must be superuser, replication role or online backup role to run a backup)));
  
  	startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL);
  
*** pg_stop_backup(PG_FUNCTION_ARGS)
*** 82,91 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!superuser()  !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! 		 (errmsg(must be superuser or replication role to run a backup;
  
  	stoppoint = do_pg_stop_backup(NULL, true, NULL);
  
--- 84,94 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!has_replication_privilege(GetUserId())
! 		 !has_online_backup_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  errmsg(must be superuser, replication role or online backup role to run a backup)));
  
  	stoppoint = do_pg_stop_backup(NULL, true, NULL);
  
*** pg_switch_xlog(PG_FUNCTION_ARGS)
*** 100,109 
  {
  	XLogRecPtr	switchpoint;
  
! 	if (!superuser())
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! 			 (errmsg(must be superuser to switch transaction log files;
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
--- 103,112 
  {
  	XLogRecPtr	switchpoint;
  
! 	if (!has_online_backup_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  errmsg(must be superuser or online backup role to switch transaction log files)));
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
*** pg_create_restore_point(PG_FUNCTION_ARGS
*** 129,138 
  	char	   *restore_name_str;
  	XLogRecPtr	restorepoint;
  
! 	if (!superuser())
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  (errmsg(must be superuser to create a restore point;
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
--- 132,141 
  	char	   *restore_name_str;
  	XLogRecPtr	restorepoint;
  
! 	if (!has_online_backup_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  (errmsg(must be superuser or online backup role to create a restore point;
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
*** pg_xlogfile_name(PG_FUNCTION_ARGS)
*** 338,347 
  Datum
  pg_xlog_replay_pause(PG_FUNCTION_ARGS)
  {
! 	if (!superuser())
  		ereport(ERROR,

Re: [HACKERS] Safe memory allocation functions

On Fri, Jan 16, 2015 at 12:57 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I do think that safe is the wrong suffix.  Maybe palloc_soft_fail()
 or palloc_null() or palloc_no_oom() or palloc_unsafe().

 I liked palloc_noerror() better myself FWIW.
Voting for palloc_noerror() as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Running this test on another set of hardware to verify -- if this
 turns out to be a false alarm which it may very well be, I can only
 offer my apologies!  I've never had a new drive fail like that, in
 that manner.  I'll burn the other hardware in overnight and report
 back.

Thanks.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple

I wrote:
 Manuel Kniep man...@adjust.com writes:
 ok after lotâs of testing I could create a test caseÂ 
 which can be found here https://gist.github.com/rapimo/3c8c1b35270e5854c524Â 
 itâs written in ruby an depends on the gem activerecord pg and parallelÂ 

 Hm.  I don't see a segfault from this.  I do see the CREATE TEMP TABLE
 command failing with ctid is NULL, which probably shouldn't be happening
 ... but no segfault.

The reason turns out to be that this is a dangling-pointer bug, and I was
using a memory-clobber-enabled build so it was pretty predictable what the
pointer would be pointing at.  I've got no doubt that hard-to-reproduce
misbehavior, including segfaults, would ensue without CLOBBER_FREED_MEMORY
turned on.

You need this patch:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=34668c8eca065d745bf1166a92c9efc588e7aee2

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Check that streaming replica received all data after master shutdown

2015-01-15 Thread Sameer Kumar

On Thu, Jan 15, 2015 at 6:19 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

  On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com
   wrote:
 
   On 01/13/2015 12:11 PM, Vladimir Borodin wrote:
  
  
   05 янв. 2015 г., в 18:15, Vladimir Borodin r...@simply.name
 написал(а):
  
Hi all.
  
   I have a simple script for planned switchover of PostgreSQL (9.3 and
   9.4) master to one of its replicas. This script checks a lot of
 things
   before doing it and one of them is that all data from master has been
   received by replica that is going to be promoted. Right now the
 check is
   done like below:
  
   On the master:
  
   postgres@pgtest03d ~ $ psql -t -A -c 'select
   pg_current_xlog_location();'
   0/3390
   postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
   waiting for server to shut down done
   server stopped
   postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
   pg_control version number:937
   Catalog version number:   201306121
   Database system identifier:   6061800518091528182
   Database cluster state:   shut down
   pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
   Latest checkpoint location:   0/3428
   Prior checkpoint location:0/3328
   Latest checkpoint's REDO location:0/3428
   Latest checkpoint's REDO WAL file:001B0034
   Latest checkpoint's TimeLineID:   27
   postgres@pgtest03d ~ $
  
   On the replica (after shutdown of master):
  
   postgres@pgtest03g ~ $ psql -t -A -c select
   pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/3428');
   104
   postgres@pgtest03g ~ $
  
   These 104 bytes seems to be the size of shutdown checkpoint record
 (as I
   can understand from pg_xlogdump output).
  
   postgres@pgtest03g ~/9.3/data/pg_xlog $
 /usr/pgsql-9.3/bin/pg_xlogdump
   -s 0/3390 -t 27
   rmgr: XLOGlen (rec/tot):  0/32, tx:  0, lsn:
   0/3390, prev 0/3328, bkp: , desc: xlog switch
   rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn:
   0/3428, prev 0/3390, bkp: , desc: checkpoint: redo
 0/3428;
   tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1;
 offset 0;
   oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid
 0;
   shutdown
   pg_xlogdump: FATAL:  error in WAL record at 0/3428: record with
 zero
   length at 0/3490
  
   postgres@pgtest03g ~/9.3/data/pg_xlog $
  
   I’m not sure that these 104 bytes will always be 104 bytes to have a
   strict equality while checking. Could it change in the future? Or is
 there
   a better way to understand that streaming replica received all data
 after
   master shutdown? The check that pg_xlog_location_diff returns 104
 bytes
   seems a bit strange.
  
  
   Don't rely on it being 104 bytes. It can vary across versions, and
 across
   different architectures.
  
   You could simply check that the standby's
 pg_last_xlog_replay_location() 
   master's Latest checkpoint location, and not care about the exact
   difference.
  
 
   I believe there were some changes made in v9.3 which will wait for
 pending
  WALs to be replicated before a fast and smart shutdown (of master) can
  close the replication connection.
 
 
 http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459

 I don't understand the relation between it and 104 bytes, it says
 that the change is backpatched up to 9.1. Since it assures all
 xlog records to be transferred if no trouble happens. Relying on
 the mechanism, you don't need to check that if master is known to
 have gracefully shut down and had no trouble around the
 environment. Judging from that you want this check, I suppose
 you're not guaranteed not to have trouble or not trusting the
 mechanism itself.

 Right! I was coming from the point that if master has shutdown gracefully
then you don't really need to worry about ensuring with such checks on
Standby (it is supposed to get the pending WAL before master goes down.

This obviously (as rightly pointed out by you), would not work if master
has not shutdown gracefully or if there is a connection issue between
master and slave while master is being shutdown (even if it is smart or
fast shutdown).



 Given the condition, as Alvaro said upthread, verifying that the
 last record is a shutdown checkpoint should raise a lot the
 chance for the all record being received except for the exteme
 case such that the master have upped and downed while replication
 connection cannot be made.


I am not sure if this would cover the cases where the master has gone down
abruptly or has crashed (or the service has been killed).



 For the case, I think there's no means
 to confirm that by standby alone, you should at least compare the
 next LSN to the last xlog record with the old master by any
 means.


That is the method

Re: [HACKERS] orangutan seizes up during isolation-check

2015-01-15 Thread Noah Misch

On Thu, Jan 15, 2015 at 09:24:01AM -0500, Robert Haas wrote:
 On Thu, Jan 15, 2015 at 1:04 AM, Noah Misch n...@leadboat.com wrote:
  On Wed, Jan 14, 2015 at 04:48:53PM -0500, Peter Eisentraut wrote:
  What I'm seeing now is that the unaccent regression tests when run under
  make check-world abort with
 
  FATAL:  postmaster became multithreaded during startup
  HINT:  Set the LC_ALL environment variable to a valid locale.
 
  contrib/unaccent/Makefile sets NO_LOCALE=1, so that makes sense.  I expect 
  the
  patch over here will fix it:
  http://www.postgresql.org/message-id/20150109063015.ga2491...@tornado.leadboat.com
 
 I just hit this same problem; are you going to commit that patch soon?
  It's rather annoying to have make check-world fail.

Sure, done.  Dave, orangutan should now be able to pass with --enable-nls.
Would you restore that option?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

On Fri, Jan 16, 2015 at 12:53 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Michael Paquier wrote:

 Andres, this patch needs more effort from the author, right? So
 marking it as returned with feedback.

 I will give this patch a look in the current commitfest, if you can
 please set as 'needs review' instead with me as reviewer, so that I
 don't forget, I would appreciate it.
Fine for me, done this way.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind in contrib

2015-01-15 Thread Peter Eisentraut

Here is a random bag of comments for the v5 patch:

pg_xlogdump fails to build:

  CC   xlogreader.o
  CC   rmgrdesc.o
../../src/include/access/rmgrlist.h:32:46: error: 'dbase_desc' undeclared here 
(not in a function)
 PG_RMGR(RM_DBASE_ID, Database, dbase_redo, dbase_desc, dbase_identify, NULL, 
NULL)
  ^
rmgrdesc.c:33:10: note: in definition of macro 'PG_RMGR'
  { name, desc, identify},
  ^
../../src/include/access/rmgrlist.h:32:58: error: 'dbase_identify' undeclared 
here (not in a function)
 PG_RMGR(RM_DBASE_ID, Database, dbase_redo, dbase_desc, dbase_identify, NULL, 
NULL)
  ^
rmgrdesc.c:33:16: note: in definition of macro 'PG_RMGR'
  { name, desc, identify},
^
../../src/Makefile.global:732: recipe for target 'rmgrdesc.o' failed
make[2]: *** [rmgrdesc.o] Error 1


SGML files should use 1-space indentation, not 2.

In high-availability.sgml, pg_rewind could be a link.

In ref/pg_rewind.sgml,

-P option listed twice.

The option --source-server had be confused at first, because the entry
above under --source-pgdata also talks about a source server.  Maybe
--source-connection would be clearer?

Reference pages have standardized top-level headers, so Theory of
operation should be under something like Notes.

Similarly for Restrictions, but that seems important enough to go
into the description.


src/bin/pg_rewind/.gitignore lists files for pg_regress use, which is not used 
anymore.

src/bin/pg_rewind/Makefile says Makefile for src/bin/pg_basebackup.

There should be an installcheck target.

RewindTest.pm should be in the t/ directory.

Code like this:

+   if (map-bitmap == NULL)
+   map-bitmap = pg_malloc(newsize);
+   else
+   map-bitmap = pg_realloc(map-bitmap, newsize);

is unnecessary.  You can just write

map-bitmap = pg_realloc(map-bitmap, newsize);

because realloc handles NULL.

Instead of FILE_TYPE_DIRECTORY etc., why not use S_IFDIR etc.?

About this code

+   if (!exists)
+   action = FILE_ACTION_CREATE;
+   else
+   action = FILE_ACTION_NONE;

action is earlier initialized as FILE_ACTION_NONE, so the second
branch is redundant.  Alternatively, remove the earlier
initialization, so that maybe the compiler can detect if action is not
assigned in some paths.

Error messages should not end with a period.

Some calls to pg_fatal() don't have the string end with a newline.

In libpqProcessFileList(), I would tend to put the comment outside of
the SQL command string.

Mkvcbuild.pm changes still refer to pg_rewind in contrib.



TestLib.pm addition command_is sounds a bit wrong.  It's evidently
modelled after command_like, but that now sounds wrong too.  How about
command_stdout_is?

The test suite needs to silence all non-TAP output.  So psql needs to
be run with -q pg_ctl with -s etc.  Any important output needs to be
through diag() or note().

Test cases like

ok 6 - psql -A -t --no-psqlrc -c port=10072 -c SELECT datname FROM pg_database 
exit code 0

should probably get a real name.

The whole structure of the test suite still looks too much like the
old hack.  I'll try to think of other ways to structure it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

On Sun, Jan 4, 2015 at 10:57 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-12-31 18:35:38 +0530, Amit Kapila wrote:
 +  termoption-j replaceable 
 class=parameterjobs/replaceable/option/term
 +  termoption--jobs=replaceable 
 class=parameternjobs/replaceable/option/term
 +  listitem
 +   para
 +Number of concurrent connections to perform the operation.
 +This option will enable the vacuum operation to run on asynchronous
 +connections, at a time one table will be operated on one connection.
 +So at one time as many tables will be vacuumed parallely as number 
 of
 +jobs.  If number of jobs given are more than number of tables then
 +number of jobs will be set to number of tables.

 asynchronous connections isn't a very well defined term. Also, the
 second part of that sentence doesn't seem to be gramattically correct.

 +   /para
 +   para
 +applicationvacuumdb/application will open
 +replaceable class=parameter njobs/replaceable connections to 
 the
 +database, so make sure your xref linkend=guc-max-connections
 +setting is high enough to accommodate all connections.
 +   /para

 Isn't it njobs+1?

 @@ -141,6 +199,7 @@ main(int argc, char *argv[])
   }
   }

 + optind++;

 Hm, where's that coming from?

 + PQsetnonblocking(connSlot[0].connection, 1);
 +
 + for (i = 1; i  concurrentCons; i++)
 + {
 + connSlot[i].connection = connectDatabase(dbname, host, port, 
 username,
 +   
 prompt_password, progname, false);
 +
 + PQsetnonblocking(connSlot[i].connection, 1);
 + connSlot[i].isFree = true;
 + connSlot[i].sock = PQsocket(connSlot[i].connection);
 + }

 Are you sure about this global PQsetnonblocking()? This means that you
 might not be able to send queries... And you don't seem to be waiting
 for sockets waiting for writes in the select loop - which means you
 might end up being stuck waiting for reads when you haven't submitted
 the query.

 I think you might need a more complex select() loop. On nonfree
 connections also wait for writes if PQflush() returns != 0.


 +/*
 + * GetIdleSlot
 + * Process the slot list, if any free slot is available then return
 + * the slotid else perform the select on all the socket's and wait
 + * until atleast one slot becomes available.
 + */
 +static int
 +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
 + const char *progname, bool completedb)
 +{
 + int i;
 + fd_set  slotset;


 Hm, you probably need to limit -j to FD_SETSIZE - 1 or so.

 + int firstFree = -1;
 + pgsocket maxFd;
 +
 + for (i = 0; i  max_slot; i++)
 + if (pSlot[i].isFree)
 + return i;

 + FD_ZERO(slotset);
 +
 + maxFd = pSlot[0].sock;
 +
 + for (i = 0; i  max_slot; i++)
 + {
 + FD_SET(pSlot[i].sock, slotset);
 + if (pSlot[i].sock  maxFd)
 + maxFd = pSlot[i].sock;
 + }

 So we're waiting for idle connections?

 I think you'll have to have to use two fdsets here, and set the write
 set based on PQflush() != 0.

 +/*
 + * A select loop that repeats calling select until a descriptor in the read
 + * set becomes readable. On Windows we have to check for the termination 
 event
 + * from time to time, on Unix we can just block forever.
 + */

 Should a) mention why we have to check regularly on windows b) that on
 linux we don't have to because we send a cancel event from the signal
 handler.

 +static int
 +select_loop(int maxFd, fd_set *workerset)
 +{
 + int i;
 + fd_set  saveSet = *workerset;

 +#ifdef WIN32
 + /* should always be the master */

 Hm?


 I have to say, this is a fairly large patch for such a minor feature...
Andres, this patch needs more effort from the author, right? So
marking it as returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HINTing on UPDATE foo SET foo.bar = ..;

On Thu, Jan 8, 2015 at 2:28 AM, Marko Tiikkaja ma...@joh.to wrote:
 Yeah.  (The CF entry is also set to Waiting on Author, which seems
 appropriate.)
Seeing nothing happening here for quite some time, marked as returned
with feedback..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] advance local xmin more aggressively

On Mon, Dec 22, 2014 at 7:31 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Here's an updated version, rebased over the pairing heap code that I just
 committed, and fixing those bugs.
So, are we reaching an outcome for the match happening here?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Async execution of postgres_fdw.

2015-01-15 Thread Kyotaro HORIGUCHI

Hello,

  I'll look into the case after this, but I'd like to send a
  revised patch at this point.
 Hm. Seems like this patch is not completely baked yet. Horiguchi-san,
 as you are obviously still working on it, would you agree to move it
 to the next CF?

Yes, that's fine with me. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: multivariate statistics / proof of concept

On Mon, Dec 15, 2014 at 11:55 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Dec 10, 2014 at 5:15 AM, Tomas Vondra t...@fuzzy.cz wrote:
 I agree with moving the patch to the next CF - I'm working on the patch,
 but I will take a bit more time to submit a new version and I can do
 that in the next CF.
 OK cool. I just moved it by myself. I didn't see it yet registered in 2014-12.
Marked as returned with feedback. No new version showed up in the last
month and this patch was waiting for input from author.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Bug in pg_dump

2015-01-15 Thread Gilles Darold

Hello,

There's a long pending issue with pg_dump and extensions that have table
members with foreign keys. This was previously reported in this thread
http://www.postgresql.org/message-id/ca+tgmoyvzkadmgh_8el7uvm472geru0b4pnnfjqye6ss1k9...@mail.gmail.com
and discuss by Robert. All PostgreSQL users that use the PostGis
extension postgis_topology are facing the issue because the two members
tables (topology and layer) are linked by foreign keys.

If you dump a database with this extension and try to import it you will
experience this error:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3345; 0 157059176
TABLE DATA layer gilles
pg_restore: [archiver (db)] COPY failed for table layer: ERROR: 
insert or update on table layer violates foreign key constraint
layer_topology_id_fkey
DETAIL:  Key (topology_id)=(1) is not present in table topology.
WARNING: errors ignored on restore: 1


The problem is that, whatever export type you choose (plain/custom and
full-export/data-only) the data of tables topology and layer are
always exported in alphabetic order. I think this is a bug because
outside extension, in data-only export, pg_dump is able to find foreign
keys dependency and dump table's data in the right order but not with
extension's members. Default is alphabetic order but that should not be
the case with extension's members because constraints are recreated
during the CREATE EXTENSION order. I hope I am clear enough.

Here we have three solutions:

1/ Inform developers of extensions to take care to alphabetical
order when they have member tables using foreign keys.
2/ Inform DBAs that they have to restore the failing table
independently. The use case above can be resumed using the following
command:

 pg_restore -h localhost -n topology -t layer -Fc -d
testdb_empty testdump.dump

3/ Inform DBAs that they have to restore the schema first then the
data only using --disable-triggers
4/ Patch pg_dump to solve this issue.

I attach a patch that solves the issue in pg_dump, let me know if it
might be included in Commit Fest or if the three other solutions are a
better choice. I also join a sample extension (test_fk_in_ext) to be
able to reproduce the issue and test the patch. Note that it might
exists a simpler solution than the one I used in this patch, if this is
the case please point me on the right way, I will be pleased to rewrite
and send an other patch.

In the test extension attached, there is a file called
test_fk_in_ext/SYNOPSIS.txt that describe all actions to reproduce the
issue and test the patch. Here is the SQL part of the test extension:

CREATE TABLE IF NOT EXISTS b_test_fk_in_ext1 (
id int PRIMARY KEY
);

CREATE TABLE IF NOT EXISTS a_test_fk_in_ext1 (
id int REFERENCES b_test_fk_in_ext1(id)
);

SELECT pg_catalog.pg_extension_config_dump('b_test_fk_in_ext1', '');
SELECT pg_catalog.pg_extension_config_dump('a_test_fk_in_ext1', '');



Best regards,

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index dc062e6..49889ce 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -209,6 +209,7 @@ static void addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 
 static void getDomainConstraints(Archive *fout, TypeInfo *tyinfo);
 static void getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids);
+static bool hasExtensionMember(TableInfo *tblinfo, int numTables);
 static void makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo, bool oids);
 static void buildMatViewRefreshDependencies(Archive *fout);
 static void getTableDataFKConstraints(void);
@@ -730,9 +731,17 @@ main(int argc, char **argv)
 
 	if (!dopt.schemaOnly)
 	{
+		bool has_ext_member;
+
 		getTableData(dopt, tblinfo, numTables, dopt.oids);
+		/* Search if there is dumpable tables member of and extension */
+		has_ext_member = hasExtensionMember(tblinfo, numTables);
 		buildMatViewRefreshDependencies(fout);
-		if (dopt.dataOnly)
+		/*
+		 * Always get FK constraints even with schema+data, extension's
+		 * members can have FK so tables need to be dump-ordered.
+		 */
+		if (dopt.dataOnly || has_ext_member)
 			getTableDataFKConstraints();
 	}
 
@@ -1852,6 +1861,25 @@ getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids)
 }
 
 /*
+ * hasExtensionMember -
+ *	  set up dumpable objects representing the contents of tables
+ */
+static bool
+hasExtensionMember(TableInfo *tblinfo, int numTables)
+{
+	int			i;
+
+	for (i = 0; i  numTables; i++)
+	{
+		if (tblinfo[i].dobj.ext_member)
+			return true;
+	}
+
+	return false;
+}
+
+
+/*
  * Make a dumpable object for the data of this specific table
  *
  * Note: we make a TableDataInfo if and only if we are going to dump the
@@ -2024,12 +2052,14 @@

Re: [HACKERS] Overhauling our interrupt handling

Hi,

On 2015-01-15 15:05:08 +0900, Kyotaro HORIGUCHI wrote:
 Hello, I'd synced up this at last.
 
 I think I should finilize my commitfest item for this issue, with
 .. Rejected?

Fine with me.

  All the patches in the series up to 0008 hav ecommit messages providing
  more detail. A short description of each patch follows:
  
  0001: Replace walsender's latch with the general shared latch.
  
New patch that removes ImmediateInteruptOK behaviour from walsender. I
think that's a rather good idea, because walsender currently seems to
assume WaitLatchOrSocket is reentrant - which I don't think is really
guaranteed.
Hasn't been reviewed yet, but I think it's not far from being
committable.
 
 Deesn't this patchset containing per-socket basis non-blocking
 control for win32? It should make the code (above the win32
 socket layer itself) more simpler.

I don't think so - we still rely on it unfortunately.

  0004: Process 'die' interrupts while reading/writing from the client socket.
  
This is the reason Horiguchi-san started this thread.
  
I think the important debate here is whether we think it's
acceptable that there are situations (a full socket buffer, but a
alive connection) where the client previously got an error, but
not anymore afterwards. I think that's much better than having
unkillable connections, but I can see others being of a different
opinion.
 
 
 This patch yields a code a bit confusion like following.
 
 | secure_raw_write(Port *port, const void *ptr, size_t len)
 | {
 ..
 | w = WaitLatchOrSocket(MyLatch,
 | if (w  WL_LATCH_SET)
 ...
 | errno = EINTR;
 | else if (w  WL_SOCKET_WRITEABLE)
 | goto wloop;
 |
 | errno = save_errno;
 
 The errno set when WL_LATCH_SET always vanishes. Specifically,
 the EINTR set by SIGTERM(WL_LATCH_SET) is overwritten by
 EAGAIN. As the result, pg_terminte_backend() cannot kill the
 backend till the write blocking is released. errno = save_errno
 should be the alternative of the line errno = EINTR and I
 confirmed that the change leads to the desirable (as of me)
 behavior.

Ugh, that's the result stupid last minute cleanup. You're right.

  0006: Don't allow immediate interupts during authentication anymore.
  
So far we've set ImmediateInterruptOK to true during large parts
of the client authentication - that's not all that pretty,
interrupts might arrive while we're in some system routines.
  
Due to patches 0003/0004 we now are able to safely serve
interrupts during client communication which is the major are
where we want to adhere to authentication_timeout.
  
I additionally placed some CHECK_FOR_INTERRUPTS()s in some
somewhat randomly chosen places in auth.c. Those don't completely
get back the previous 'appruptness' (?) of reacting to
interrupts, but that's partially for the better, because we don't
interrupt foreign code anymore.
 
 Simplly as a comment on style, this patch introduces checks of
 ClientAuthInProgress twice successively into
 ProcessInterrupts(). Isn't it better to make it like following?
 
 | /* As in quickdie, ...
 | if (ClientAuthInProgress)
 | {
 |if (whereToSendOutput == DestRemote) whereToSendOutput = DestNone;
 |ereport(FATAL,

Hm, yes.

  0008: Remove remnants of ImmediateInterruptOK handling.
  
Now that ImmediateInterruptOK is never set to true anymore, we can
remove related code and comments.
  
New and not reviewed.
 
 walreceiver.c still has WalRcvImmediateInterruptOK as mentioned
 below, apart from whether it should be changed or not, the
 following comment remains.

 | * This is very much like what regular backends do with ImmediateInterruptOK,
 | * ProcessInterrupts() etc.

Yep. As mentioned below, it doesn't use the same infrastructure, so I'd
rather treat this separately. This set is more than big enough.

Thanks for looking!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Check that streaming replica received all data after master shutdown

2015-01-15 Thread Kyotaro HORIGUCHI

Hi,

 On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas hlinnakan...@vmware.com
  wrote:
 
  On 01/13/2015 12:11 PM, Vladimir Borodin wrote:
 
 
  05 янв. 2015 г., в 18:15, Vladimir Borodin r...@simply.name написал(а):
 
   Hi all.
 
  I have a simple script for planned switchover of PostgreSQL (9.3 and
  9.4) master to one of its replicas. This script checks a lot of things
  before doing it and one of them is that all data from master has been
  received by replica that is going to be promoted. Right now the check is
  done like below:
 
  On the master:
 
  postgres@pgtest03d ~ $ psql -t -A -c 'select
  pg_current_xlog_location();'
  0/3390
  postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
  waiting for server to shut down done
  server stopped
  postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
  pg_control version number:937
  Catalog version number:   201306121
  Database system identifier:   6061800518091528182
  Database cluster state:   shut down
  pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
  Latest checkpoint location:   0/3428
  Prior checkpoint location:0/3328
  Latest checkpoint's REDO location:0/3428
  Latest checkpoint's REDO WAL file:001B0034
  Latest checkpoint's TimeLineID:   27
  postgres@pgtest03d ~ $
 
  On the replica (after shutdown of master):
 
  postgres@pgtest03g ~ $ psql -t -A -c select
  pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/3428');
  104
  postgres@pgtest03g ~ $
 
  These 104 bytes seems to be the size of shutdown checkpoint record (as I
  can understand from pg_xlogdump output).
 
  postgres@pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump
  -s 0/3390 -t 27
  rmgr: XLOGlen (rec/tot):  0/32, tx:  0, lsn:
  0/3390, prev 0/3328, bkp: , desc: xlog switch
  rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn:
  0/3428, prev 0/3390, bkp: , desc: checkpoint: redo 0/3428;
  tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0;
  oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
  shutdown
  pg_xlogdump: FATAL:  error in WAL record at 0/3428: record with zero
  length at 0/3490
 
  postgres@pgtest03g ~/9.3/data/pg_xlog $
 
  I’m not sure that these 104 bytes will always be 104 bytes to have a
  strict equality while checking. Could it change in the future? Or is there
  a better way to understand that streaming replica received all data after
  master shutdown? The check that pg_xlog_location_diff returns 104 bytes
  seems a bit strange.
 
 
  Don't rely on it being 104 bytes. It can vary across versions, and across
  different architectures.
 
  You could simply check that the standby's pg_last_xlog_replay_location() 
  master's Latest checkpoint location, and not care about the exact
  difference.
 
 
  I believe there were some changes made in v9.3 which will wait for pending
 WALs to be replicated before a fast and smart shutdown (of master) can
 close the replication connection.
 
 http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459

I don't understand the relation between it and 104 bytes, it says
that the change is backpatched up to 9.1. Since it assures all
xlog records to be transferred if no trouble happens. Relying on
the mechanism, you don't need to check that if master is known to
have gracefully shut down and had no trouble around the
environment. Judging from that you want this check, I suppose
you're not guaranteed not to have trouble or not trusting the
mechanism itself.

Given the condition, as Alvaro said upthread, verifying that the
last record is a shutdown checkpoint should raise a lot the
chance for the all record being received except for the exteme
case such that the master have upped and downed while replication
connection cannot be made. For the case, I think there's no means
to confirm that by standby alone, you should at least compare the
next LSN to the last xlog record with the old master by any
means. Or doing any sanity check of the database on the standby
utilizing the nature of the data instead?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Wed, Jan 14, 2015 at 8:50 PM, Peter Geoghegan p...@heroku.com wrote:
I am mistaken on one detail here - blocks 2 and 9 are actually fully
identical. I still have no idea why, though.

So, I've looked at it in more detail and it appears that the page of
block 2 split at some point, thereby creating a new page (the block 9
page). There is a sane downlink in the root page for the new rightlink
page. The root page looks totally sane, as does every other page - as
I said, the problem is only that block 9 is spuriously identical to
block 2. So the (correct) downlink in the root page, to block 9, is
the same as the (incorrect) high key value in block 9 - Oid value
69924. To be clear: AFAICT everything is perfect except block 9, which
is bizarrely identical to block 2.

Now, since the sane page downlink located in the root (like every
downlink, a lower bound on items in its child) is actually a copy of
the high key on the page that is the child's left link (that is to
say, it comes from the original target of a page split - it shares the
target's high key value, Oid value 69924), there may have never been
sane data in block 9, even though its downlink is sane (so maybe the
page split patch is implicated). But it's hard to see how that could
be true. The relevant code wasn't really what was changed about page
splits in 9.4 anyway (plus this wasn't a non-leaf split, since there
aren't enough pages for those to be a factor). There just isn't that
many items on page 2 (or its bizarre identical twin, page 9), so a
recent split seems unlikely. And, the target and new right page are
locked together throughout both the split and down link insertion
(even though there are two atomic operations/WAL inserts). So to
reiterate, a close by page split that explains the problem seems
unlikely.

I'm going to focus on the page deletion patch for the time being.
Merlin - it would be great if you could revert all the page split
commits (which came after the page deletion fix). All the follow-up
page split commits [1] were fairly straightforward bugs with recovery,
so it should be easy enough to totally remove the page split stuff
from 9.4 for the purposes of isolating the bug.

[1]
http://www.postgresql.org/message-id/cam3swzspj6m9hfhksjuiuof30auwxyyb56fjbw1_dogqkbe...@mail.gmail.com
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Check that streaming replica received all data after master shutdown

2015-01-15 Thread Sameer Kumar

On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/13/2015 12:11 PM, Vladimir Borodin wrote:


 05 янв. 2015 г., в 18:15, Vladimir Borodin r...@simply.name написал(а):

  Hi all.

 I have a simple script for planned switchover of PostgreSQL (9.3 and
 9.4) master to one of its replicas. This script checks a lot of things
 before doing it and one of them is that all data from master has been
 received by replica that is going to be promoted. Right now the check is
 done like below:

 On the master:

 postgres@pgtest03d ~ $ psql -t -A -c 'select
 pg_current_xlog_location();'
 0/3390
 postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
 waiting for server to shut down done
 server stopped
 postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
 pg_control version number:937
 Catalog version number:   201306121
 Database system identifier:   6061800518091528182
 Database cluster state:   shut down
 pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
 Latest checkpoint location:   0/3428
 Prior checkpoint location:0/3328
 Latest checkpoint's REDO location:0/3428
 Latest checkpoint's REDO WAL file:001B0034
 Latest checkpoint's TimeLineID:   27
 postgres@pgtest03d ~ $

 On the replica (after shutdown of master):

 postgres@pgtest03g ~ $ psql -t -A -c select
 pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/3428');
 104
 postgres@pgtest03g ~ $

 These 104 bytes seems to be the size of shutdown checkpoint record (as I
 can understand from pg_xlogdump output).

 postgres@pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump
 -s 0/3390 -t 27
 rmgr: XLOGlen (rec/tot):  0/32, tx:  0, lsn:
 0/3390, prev 0/3328, bkp: , desc: xlog switch
 rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn:
 0/3428, prev 0/3390, bkp: , desc: checkpoint: redo 0/3428;
 tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0;
 oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
 shutdown
 pg_xlogdump: FATAL:  error in WAL record at 0/3428: record with zero
 length at 0/3490

 postgres@pgtest03g ~/9.3/data/pg_xlog $

 I’m not sure that these 104 bytes will always be 104 bytes to have a
 strict equality while checking. Could it change in the future? Or is there
 a better way to understand that streaming replica received all data after
 master shutdown? The check that pg_xlog_location_diff returns 104 bytes
 seems a bit strange.


 Don't rely on it being 104 bytes. It can vary across versions, and across
 different architectures.

 You could simply check that the standby's pg_last_xlog_replay_location() 
 master's Latest checkpoint location, and not care about the exact
 difference.




I believe there were some changes made in v9.3 which will wait for pending
WALs to be replicated before a fast and smart shutdown (of master) can
close the replication connection.

http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459

Re: [HACKERS] Fillfactor for GIN indexes

2015-01-15 Thread Alexander Korotkov

On Thu, Jan 15, 2015 at 10:19 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 On Thu, Jan 8, 2015 at 2:03 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov aekorot...@gmail.com
 wrote:
  On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier 
 michael.paqu...@gmail.com
  I am attaching an updated patch, with the default fillfactor value at
  75%, and with the page split code using the fillfactor rate.
  Thoughts?
  Rewritten version of patch is attached. I made following changes:
 
  Thanks! With this patch (and my previous version as well) GIN indexes
  with default fillfactor have a size higher than 9.4 indexes, 9.4
  behavior being consistent only with fillfactor=100 and not the default
  of 90. Are we fine with that?
 IMO, this patch has value to control random updates on GIN indexes,
 but we should have a default fillfactor of 100 to have index size
 consistent with 9.4. Thoughts?


I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this ready for committer. So, committer can make a
final decision.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] parallel mode and parallel contexts

On Thu, Jan 15, 2015 at 7:00 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Jan 13, 2015 at 1:33 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 8, 2015 at 6:52 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  + seg = dsm_attach(DatumGetInt32(main_arg));
 
  Here, I think DatumGetUInt32() needs to be used instead of
  DatumGetInt32() as the segment handle is uint32.

 OK, I'll change that in the next version.


 No issues, I have another question related to below code:

 +HandleParallelMessages(void)
 +{
 ..
 ..
 + for (i = 0; i  pcxt-nworkers; ++i)
 + {
 + /*
 + * Read messages for as long as we have an error queue; if we
 + * have hit (or hit while reading) ReadyForQuery, this will go to
 + * NULL.
 + */
 + while (pcxt-worker[i].error_mqh != NULL)
 + {
 + shm_mq_result res;
 +
 + CHECK_FOR_INTERRUPTS();
 +
 + res = shm_mq_receive(pcxt-worker[i].error_mqh, nbytes,
 + data, true);
 + if (res == SHM_MQ_SUCCESS)

 Here we are checking the error queue for all the workers and this loop
 will continue untill all have sent ReadyForQuery() message ('Z') which
 will make this loop continue till all workers have finished their work.
 Assume situation where first worker has completed the work and sent
 'Z' message and second worker is still sending some tuples, now above
 code will keep on waiting for 'Z' message from second worker and won't
 allow to receive tuples sent by second worker till it send 'Z' message.

 As each worker send its own 'Z' message after completion, so ideally
 the above code should receive the message only for worker which has
 sent the message.  I think for that it needs worker information who has
 sent the message.

Are you talking about HandleParallelMessages() or
WaitForParallelWorkersToFinish()?  The former doesn't wait for
anything; it just handles any messages that are available now.  The
latter does wait for all workers to finish, but the intention is that
you only call it when you're ready to wind up the entire parallel
operation, so that's OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop


On 01/15/2015 03:23 AM, Peter Geoghegan wrote:

So now the question is: how did that inconsistency arise? It didn't
necessarily arise at the time of the (presumed) split of block 2 to
create 9. It could be that the opaque area was changed by something
else, some time later. I'll investigate more.


Merlin, could you re-run the test with a WAL archive (if you don't have 
one already), and then run pg_xlogdump, filtering it to show only the 
changes to the index? That should show us how the index got to be the 
way it is. Also, if you could post a copy of the raw relation file for 
pg_class_oid_index; I assume it's not too large.


Something like:

pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917

11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd 
cluster. In case it's not the same on your system, you can use oid2name 
to find it out.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

Alexander Korotkov wrote:
 I'm not sure. On the one hand it's unclear why fillfactor should be
 different from 9.4.
 On the other hand it's unclear why it should be different from btree.
 I propose marking this ready for committer. So, committer can make a final
 decision.
OK let's do so then. My preference is to fully pack the index at
build. GIN compression has been one of the headlines of 9.4.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] can you have any idea about toast missing chunk issu resolution

2015-01-15 Thread M Tarkeshwar Rao

Hi all,

We are getting following error message on doing any action on the table 
like(Select or open from pgadmin).

Please suggest.

ERROR:  missing chunk number 0 for toast value 54787 in pg_toast_2619
** Error **

ERROR: missing chunk number 0 for toast value 54787 in pg_toast_2619
SQL state: XX000


CREATE TABLE mm_activealarm
(
  alarm_id integer NOT NULL,
  source_address character varying(255) NOT NULL,
  alarm_instance_id integer NOT NULL,
  alarm_raise_time bigint,
  alarm_update_time bigint,
  alarm_cease_time bigint,
  alarm_count integer,
  alarm_severity integer NOT NULL,
  source_type character varying(40) NOT NULL,
  alarm_state integer NOT NULL,
  event_type integer,
  notification_id integer NOT NULL,
  probable_cause integer NOT NULL,
  specific_problem integer NOT NULL,
  alarm_additional_text character varying(10240),
  alarm_ack_time bigint,
  alarm_ack_user character varying(100) NOT NULL,
  alarm_ack_system character varying(100) NOT NULL,
  alarm_proposed_repair_action character varying(10240) NOT NULL,
  CONSTRAINT mm_activealarm_pk PRIMARY KEY (alarm_id, source_address)
  USING INDEX TABLESPACE mgrdata
)
WITH (
  OIDS=FALSE
)
TABLESPACE mgrdata;
ALTER TABLE ss_activealarm
  OWNER TO ss_super;

Regards
Tarkeshwar

Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings

On 2015-01-14 09:34:23 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2015-01-13 22:19:30 -0500, Tom Lane wrote:
  A slightly more complicated change could be applied to make sure that
  *all* of the CFLAGS forcibly inserted by configure appear before any
  externally-sourced CFLAGS, allowing any of them to be overridden from the
  environment variable.  I'm not sure if it's worth the trouble to do that,
  but if there's interest I could make it happen.
 
  I think it'd be good idea, but unless you're enthusiastic I guess there
  are more important things.
 
 Nah, I'm fine with doing it, it's just a couple more lines of code.
 I feared people might think it wasn't worth adding extra complexity for,
 but as long as someone else likes the idea I'll go do it.

FWIW, if we moved the
CFLAGS=$CFLAGS $user_CFLAGS
further down, it'd have advantage that compiling with -Werror would be
more realistic. Right now doing so breaks about half of the feature
checking configure checks because of warnings. E.g. on my platform it
fails to detect 64bit integers, inline, ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

On Tue, Jan 13, 2015 at 1:33 AM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jan 8, 2015 at 6:52 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
  + seg = dsm_attach(DatumGetInt32(main_arg));
 
  Here, I think DatumGetUInt32() needs to be used instead of
  DatumGetInt32() as the segment handle is uint32.

 OK, I'll change that in the next version.


No issues, I have another question related to below code:

+HandleParallelMessages(void)
+{
..
..
+ for (i = 0; i  pcxt-nworkers; ++i)
+ {
+ /*
+ * Read messages for as long as we have an error queue; if we
+ * have hit (or hit while reading) ReadyForQuery, this will go to
+ * NULL.
+ */
+ while (pcxt-worker[i].error_mqh != NULL)
+ {
+ shm_mq_result res;
+
+ CHECK_FOR_INTERRUPTS();
+
+ res = shm_mq_receive(pcxt-worker[i].error_mqh, nbytes,
+ data, true);
+ if (res == SHM_MQ_SUCCESS)


Here we are checking the error queue for all the workers and this loop
will continue untill all have sent ReadyForQuery() message ('Z') which
will make this loop continue till all workers have finished their work.
Assume situation where first worker has completed the work and sent
'Z' message and second worker is still sending some tuples, now above
code will keep on waiting for 'Z' message from second worker and won't
allow to receive tuples sent by second worker till it send 'Z' message.

As each worker send its own 'Z' message after completion, so ideally
the above code should receive the message only for worker which has
sent the message.  I think for that it needs worker information who has
sent the message.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pg_rewind in contrib

On 2015-01-15 13:21:56 +, Greg Stark wrote:
 I must have missed this, how did you some the hint bit problem with
 pg_rewind? Last I understood you ran the risk that the server has unlogged
 hint bit updates that you wouldn't know to rewind.

wal_log_hints = on

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind in contrib


On 01/15/2015 03:21 PM, Greg Stark wrote:

I must have missed this, how did you some the hint bit problem with
pg_rewind? Last I understood you ran the risk that the server has unlogged
hint bit updates that you wouldn't know to rewind.


There's a new GUC in 9.4, wal_log_hints, for that. It has to be turned 
on for pg_rewind to work. Or data checksums must be enabled, which also 
causes hint bit updates to be logged.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote:

 On Sat, Jan 10, 2015 at 11:14 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
  I don't think you should be switching queues.  The tuples should be
  sent to the tuple queue, and errors and notices to the error queue.
  To achieve what you said (The tuples should be sent to the tuple
  queue, and errors and notices to the error queue.), we need to
  switch the queues.
  The difficulty here is that once we set the queue (using
  pq_redirect_to_shm_mq()) through which the communication has to
  happen, it will use the same unless we change again the queue
  using pq_redirect_to_shm_mq().  For example, assume we have
  initially set error queue (using pq_redirect_to_shm_mq()) then to
  send tuples, we need to call pq_redirect_to_shm_mq() to
  set the tuple queue as the queue that needs to be used for communication
  and again if error happens then we need to do the same for error
  queue.
  Do you have any other idea to achieve the same?

 Yeah, you need two separate global variables pointing to shm_mq
 objects, one of which gets used by pqmq.c for errors and the other of
 which gets used by printtup.c for tuples.


Okay, I will try to change the way as suggested without doing
switching, but this way we need to do it separately for 'T', 'D', and
'C' messages.

I have moved this patch to next CF as apart from above still I
have to work on execution strategy and optimizer related changes
as discussed in this thread


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] initdb -S and tablespaces

On 2015-01-15 11:02:43 +0530, Abhijit Menon-Sen wrote:
 At 2015-01-14 11:59:08 +0100, and...@2ndquadrant.com wrote:
 
   + if (ControlFile-state != DB_SHUTDOWNED 
   + ControlFile-state != DB_SHUTDOWNED_IN_RECOVERY)
   + perform_fsync(data_directory);
   +
  
  a) Please think of a slightly more descriptive name than perform_fsync
 
 OK. (I just copied the name initdb uses, because at the time I was still
 thinking in terms of a later patch moving this to src/common.) What do
 you think of fsync_recursively? fsync_pgdata?

I like fsync_pgdata/datadir or something.

Note that I think you'll have to check/handle pg_xlog being a symlink -
we explicitly support that as a usecase...

  c) I'm wondering if we should add fsync to the control file and also
 perform an fsync if the last shutdown was clear, but fsync was
 disabled.
 
 Explain? Add fsync to the control file means store the value of the
 fsync GUC setting in the control file?

Yes.

 And would the fsync you mention be dependent on the setting, or unconditional?

What I am thinking of is that, currently, if you start the server for
initial loading with fsync=off, and then restart it, you're open to data
loss. So when the current config file setting is changed from off to on,
we should fsync the data directory. Even if there was no crash restart.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

On Wed, Jan 14, 2015 at 9:42 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
 On Tue, Jan 13, 2015 at 10:10 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  However, there is a larger practical problem with this whole concept,
  which is that experience should teach us to be very wary of the assumption
  that asking for memory the system can't give us will just lead to nice
  neat malloc-returns-NULL behavior.  Any small perusal of the mailing list
  archives will remind you that very often the end result will be SIGSEGV,
  OOM kills, unrecoverable trap-on-write when the kernel realizes it can't
  honor a copy-on-write promise, yadda yadda.  Agreed that it's arguable
  that these only occur in misconfigured systems ... but misconfiguration
  appears to be the default in a depressingly large fraction of systems.
  (This is another reason for _safe not being the mot juste :-()

 I don't really buy this.  It's pretty incredible to think that after a
 malloc() failure there is absolutely no hope of carrying on sanely.
 If that were true, we wouldn't be able to ereport() out-of-memory
 errors at any severity less than FATAL, but of course it doesn't work
 that way.  Moreover, AllocSetAlloc() contains malloc() and, if that
 fails, calls malloc() again with a smaller value, without even
 throwing an error.

 I understood Tom's point differently: instead of malloc() failing,
 malloc() will return a supposedly usable pointer, but later usage of it
 will lead to a crash of some sort.  We know this does happen in reality,
 because people do report it; but we also know how to fix it.  And for
 systems that have been correctly set up, the new behavior (using some
 plan B for when malloc actually fails instead of spuriously succeeding
 only to cause a later crash) will be much more convenient.

Hmm, I understood Tom to be opposing the idea of a palloc variant that
returns NULL on failure, and I understand you to be supporting it.
But maybe I'm confused.  Anyway, I support it.  I agree that there are
systems (or circumstances?) where malloc is going to succeed and then
the world will blow up later on anyway, but I don't think that means
that an out-of-memory error is the only sensible response to a palloc
failure; returning NULL seems like a sometimes-useful alternative.

I do think that safe is the wrong suffix.  Maybe palloc_soft_fail()
or palloc_null() or palloc_no_oom() or palloc_unsafe().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

On 2015-01-15 08:40:34 -0500, Robert Haas wrote:
 I do think that safe is the wrong suffix.  Maybe palloc_soft_fail()
 or palloc_null() or palloc_no_oom() or palloc_unsafe().

palloc_or_null()?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/15/2015 03:23 AM, Peter Geoghegan wrote:

 So now the question is: how did that inconsistency arise? It didn't
 necessarily arise at the time of the (presumed) split of block 2 to
 create 9. It could be that the opaque area was changed by something
 else, some time later. I'll investigate more.


 Merlin, could you re-run the test with a WAL archive (if you don't have one
 already), and then run pg_xlogdump, filtering it to show only the changes to
 the index? That should show us how the index got to be the way it is. Also,
 if you could post a copy of the raw relation file for pg_class_oid_index; I
 assume it's not too large.

 Something like:

 pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917

 11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd
 cluster. In case it's not the same on your system, you can use oid2name to
 find it out.

I'm on it.  Will try this first, then patch removal.

Question: Coming in this morning I did an immediate restart and logged
into the database and queried pg_class via index.   Everything was
fine, and the leftright verify returns nothing.  How did it repair
itself without a reindex?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

On Thu, Jan 15, 2015 at 6:52 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jan 15, 2015 at 7:00 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
  +HandleParallelMessages(void)
  +{
  ..
  ..
  + for (i = 0; i  pcxt-nworkers; ++i)
  + {
  + /*
  + * Read messages for as long as we have an error queue; if we
  + * have hit (or hit while reading) ReadyForQuery, this will go to
  + * NULL.
  + */
  + while (pcxt-worker[i].error_mqh != NULL)
  + {
  + shm_mq_result res;
  +
  + CHECK_FOR_INTERRUPTS();
  +
  + res = shm_mq_receive(pcxt-worker[i].error_mqh, nbytes,
  + data, true);
  + if (res == SHM_MQ_SUCCESS)
 
  Here we are checking the error queue for all the workers and this loop
  will continue untill all have sent ReadyForQuery() message ('Z') which
  will make this loop continue till all workers have finished their work.
  Assume situation where first worker has completed the work and sent
  'Z' message and second worker is still sending some tuples, now above
  code will keep on waiting for 'Z' message from second worker and won't
  allow to receive tuples sent by second worker till it send 'Z' message.
 
  As each worker send its own 'Z' message after completion, so ideally
  the above code should receive the message only for worker which has
  sent the message.  I think for that it needs worker information who has
  sent the message.

 Are you talking about HandleParallelMessages() or
 WaitForParallelWorkersToFinish()?  The former doesn't wait for
 anything; it just handles any messages that are available now.

I am talking about HandleParallelMessages().  It doesn't wait but
it is looping which will make it run for longer time as explained
above.  Just imagine a case where there are two workers and first
worker has sent 'Z' message and second worker is doing some
work, now in such a scenario loop will not finish until second worker
also send 'Z' message or error.  Am I missing something?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] s_lock.h default definitions are rather confused

and...@anarazel.de (Andres Freund) writes:
 On 2015-01-14 19:31:18 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 Right now I think a #ifdef/undef S_UNLOCK in the relevant gcc section
 sufficient and acceptable. It's after all the HPPA section that doesn't
 really play by the rules.

 Works for me.

 Pushed something like that. Gaur has the note 'Runs infrequently' - I'm
 not sure whether that means we'll see the results anytime soon...

That means it runs when I boot it up and launch a run ;-) ... the
machine's old enough (and noisy enough) that I don't want to leave
it turned on 24x7.

I've launched a run now, expect results from gcc HEAD in an hour and
a half or so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

Hi,

 The plot thickens!  I looped the test, still stock 9.4 as of this time
 and went to lunch. When I came back, the database was in recovery
 mode.  Here is the rough sequence of events.


Whoa. That looks scary. Did you see (some of) those errors before? Most
of them should have been emitted independently of being built with
assertions.

 1) REINDEXED pg_class (wanted clean slate for full reproduction)
 2) before too long (unfortunately did not configure timestamps in the
 log) starting seeing:
 
 ERROR:  root page 3 of index pg_class_oid_index has level 0,
 expected 1 at character 8
 QUERY:  UPDATE CDSRunTable SET
   Finished = clock_timestamp(),
   DidSucceed = _DidSucceed,
   ErrorMessage = _ErrorMessage
 WHERE CDSRunTableId = _CdsRunTableId
 CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
 SQL statement
 STATEMENT:  SELECT CDSReconcileRunTable(2020)
 
 ..and again with a FATAL
 FATAL:  root page 3 of index pg_class_oid_index has level 0, expected 1
 ERROR:  root page 3 of index pg_class_oid_index has level 0, expected 1
 CONTEXT:  SQL statement UPDATE CDSRunTable SET
   Finished = clock_timestamp(),
   DidSucceed = _DidSucceed,
   ErrorMessage = _ErrorMessage
 WHERE CDSRunTableId = _CdsRunTableId
 PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement

Just to be absolutely clear, those happened *before* the following
errors? And there were no 'during exception cleanup' like errors before?

 3) shortly (?) after that, I got:
 
 WARNING:  did not find subXID 14955 in MyProc
 CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
 during exception cleanup
 WARNING:  you don't own a lock of type RowExclusiveLock
 CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
 during exception cleanup
 LOG:  could not send data to client: Broken pipe
 CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
 during exception cleanup
 STATEMENT:  SELECT CDSReconcileRunTable(2151)
 WARNING:  ReleaseLockIfHeld: failed??
 CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
 during exception cleanup
 ERROR:  failed to re-find shared proclock object
 CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
 during exception cleanup
 STATEMENT:  SELECT CDSReconcileRunTable(2151)
 WARNING:  AbortSubTransaction while in ABORT state
 WARNING:  did not find subXID 14955 in MyProc
 WARNING:  you don't own a lock of type AccessShareLock
 WARNING:  ReleaseLockIfHeld: failed??
 ERROR:  failed to re-find shared proclock object
 WARNING:  AbortSubTransaction while in ABORT state
 WARNING:  did not find subXID 14955 in MyProc
 WARNING:  you don't own a lock of type AccessShareLock
 WARNING:  ReleaseLockIfHeld: failed??
 WARNING:  you don't own a lock of type ShareLock
 TRAP: FailedAssertion(!(FastPathStrongRelationLocks-count[fasthashcode]
  0), File: lock.c, Line: 1240)
 LOG:  server process (PID 10117) was terminated by signal 6: Aborted
 LOG:  terminating any other active server processes

Ick.

Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
entries before? It's hard to know from here, but the 'during exception
cleanup' indicates a problem in abort handling.  Were there any deadlock
detected errors closeby?

You're catching deadlock errors in a subtransaction. Hm.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] orangutan seizes up during isolation-check

On Thu, Jan 15, 2015 at 1:04 AM, Noah Misch n...@leadboat.com wrote:
 On Wed, Jan 14, 2015 at 04:48:53PM -0500, Peter Eisentraut wrote:
 What I'm seeing now is that the unaccent regression tests when run under
 make check-world abort with

 FATAL:  postmaster became multithreaded during startup
 HINT:  Set the LC_ALL environment variable to a valid locale.

 contrib/unaccent/Makefile sets NO_LOCALE=1, so that makes sense.  I expect the
 patch over here will fix it:
 http://www.postgresql.org/message-id/20150109063015.ga2491...@tornado.leadboat.com

I just hit this same problem; are you going to commit that patch soon?
 It's rather annoying to have make check-world fail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings

Andres Freund and...@2ndquadrant.com writes:
 FWIW, if we moved the
 CFLAGS=$CFLAGS $user_CFLAGS
 further down, it'd have advantage that compiling with -Werror would be
 more realistic. Right now doing so breaks about half of the feature
 checking configure checks because of warnings. E.g. on my platform it
 fails to detect 64bit integers, inline, ...

Given the way autoconf works, I think trying to run the configure tests
with -Werror is a fool's errand.  OTOH, not applying the user's CFLAGS
during configure is a nonstarter as well.  So rather than trying to inject
-Werror via generic CFLAGS, it would likely be better to have some means
of injecting it only into the actual build and not into the configure run.

There is at least one way to do that already (Makefile.custom).  Not
sure if it's worth inventing an --enable-warnings-as-errors type of
switch to do it more directly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings

On 2015-01-15 09:25:29 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  FWIW, if we moved the
  CFLAGS=$CFLAGS $user_CFLAGS
  further down, it'd have advantage that compiling with -Werror would be
  more realistic. Right now doing so breaks about half of the feature
  checking configure checks because of warnings. E.g. on my platform it
  fails to detect 64bit integers, inline, ...
 
 Given the way autoconf works, I think trying to run the configure tests
 with -Werror is a fool's errand.

Yea, agreed.

 OTOH, not applying the user's CFLAGS  during configure is a nonstarter
 as well.

Fair enough. What about just filtering out -Werror during configure
alone? Or just specifying -Wno-error during it? Given that it really
can't work properly, that seems like a relatively simple solution.

 So rather than trying to inject -Werror via generic CFLAGS, it would
 likely be better to have some means of injecting it only into the
 actual build and not into the configure run.
 
 There is at least one way to do that already (Makefile.custom).  Not
 sure if it's worth inventing an --enable-warnings-as-errors type of
 switch to do it more directly.

I think Makefile.custom is really rather hard to discover for new
developers. That its inclusion is commented with
# NOTE:  Makefile.custom is from the pre-Autoconf days of PostgreSQL.
# You are liable to shoot yourself in the foot if you use it without
# knowing exactly what you're doing.  The preferred (and more
# reliable) method is to communicate what you want to do to the
# configure script, and leave the makefiles alone.
doesn't help...

I'd also like to have a easy way of adding CFLAGS to configure, instead
of overwriting them. There's COPT for make, but that doesn't persist...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf