date:20150115

On Thu, Jan 15, 2015 at 9:48 PM, Sawada Masahiko 
wrote:
> On Thu, Jan 15, 2015 at 2:02 PM, Amit Kapila 
wrote:
> >
> > One thought I have in this line is that currently there doesn't seem to
be
> > a way to know if the setting has an entry both in postgresql.conf and
> > postgresql.auto.conf, if we can have some way of knowing the same
> > (pg_settings?), then it could be convenient for user to decide if the
value
> > in postgresql.auto.conf is useful or not and if it's not useful then use
> > Alter System .. Reset command to remove the same from
> > postgresql.auto.conf.
>
> I think one way is that pg_settings has file name of variables,  But
> It would not affect to currently status of postgresql.conf
> So we would need to parse postgresql.conf again at that time.
>

Yeah that could be a possibility, but I think that will break the existing
command('s) as this is the common infrastructure used for SHOW ..
commands as well which displays the guc value that is used by
current session rather than the value in postgresql.conf.

I don't know how appealing it would be to others, but a new view
like pg_file_settings which would display the settings in file could
be meaningful for your need.

Another way is user can do pg_reload_conf() to see the latest
values (excluding values for server startup time parameters).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-15 Thread Dilip kumar

On 04 January 2015 07:27, Andres Freund Wrote,
> On 2014-12-31 18:35:38 +0530, Amit Kapila wrote:
> > +  -j  class="parameter">jobs
> > +  --jobs= class="parameter">njobs
> > +  
> > +   
> > +Number of concurrent connections to perform the operation.
> > +This option will enable the vacuum operation to run on
> asynchronous
> > +connections, at a time one table will be operated on one
> connection.
> > +So at one time as many tables will be vacuumed parallely as
> number of
> > +jobs.  If number of jobs given are more than number of
> tables then
> > +number of jobs will be set to number of tables.
> 
> "asynchronous connections" isn't a very well defined term. Also, the
> second part of that sentence doesn't seem to be gramattically correct.

I have changed this to concurrent connections, is this ok?


> > +   
> > +   
> > +vacuumdb will open
> > + njobs
> connections to the
> > +database, so make sure your 
> > +setting is high enough to accommodate all connections.
> > +   
> 
> Isn't it njobs+1?

The main connections what we are using for getting table information, same is 
use as first slot connections, so total number of connections are still njobs.

 
> > @@ -141,6 +199,7 @@ main(int argc, char *argv[])
> > }
> > }
> >
> > +   optind++;
> 
> Hm, where's that coming from?

This is wrong, I have removed it.

> 
> > +   PQsetnonblocking(connSlot[0].connection, 1);
> > +
> > +   for (i = 1; i < concurrentCons; i++)
> > +   {
> > +   connSlot[i].connection = connectDatabase(dbname, host, port,
> username,
> > + 
> > prompt_password,
> progname, false);
> > +
> > +   PQsetnonblocking(connSlot[i].connection, 1);
> > +   connSlot[i].isFree = true;
> > +   connSlot[i].sock = PQsocket(connSlot[i].connection);
> > +   }
> 
> Are you sure about this global PQsetnonblocking()? This means that you
> might not be able to send queries... And you don't seem to be waiting
> for sockets waiting for writes in the select loop - which means you
> might end up being stuck waiting for reads when you haven't submitted
> the query.
> 
> I think you might need a more complex select() loop. On nonfree
> connections also wait for writes if PQflush() returns != 0.

1. In GetIdleSlot we are making sure that, only if connection is busy, means if 
we have sent query on that connections, only in that case we will wait.
2. When all the connections are busy in that case we are doing select on all FD 
to make sure some response on connections, and if there is any response on 
connections
   Select will come out, then we consume the input and check whether connection 
is idle, or it's just a intermediate response, if it not busy then we process 
all the resultand set it as free.

> 
> > +/*
> > + * GetIdleSlot
> > + * Process the slot list, if any free slot is available then return
> > + * the slotid else perform the select on all the socket's and wait
> > + * until atleast one slot becomes available.
> > + */
> > +static int
> > +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
> > +   const char *progname, bool completedb) {
> > +   int i;
> > +   fd_set  slotset;
> 
> 
> Hm, you probably need to limit -j to FD_SETSIZE - 1 or so.

I will change this in next patch..

> > +   int firstFree = -1;
> > +   pgsocket maxFd;
> > +
> > +   for (i = 0; i < max_slot; i++)
> > +   if (pSlot[i].isFree)
> > +   return i;
> 
> > +   FD_ZERO(&slotset);
> > +
> > +   maxFd = pSlot[0].sock;
> > +
> > +   for (i = 0; i < max_slot; i++)
> > +   {
> > +   FD_SET(pSlot[i].sock, &slotset);
> > +   if (pSlot[i].sock > maxFd)
> > +   maxFd = pSlot[i].sock;
> > +   }
> 
> So we're waiting for idle connections?
>
> I think you'll have to have to use two fdsets here, and set the write
> set based on PQflush() != 0.

I did not get this ?

The logic here is, we are waiting for any connections to respond, and wait 
using select on all fds.
When select come out, we check all the socket that which all are not busy, mark 
all the finished connection as idle at once,
If none of the connection free, we go to select again, otherwise will return 
first idle connection.


> > +/*
> > + * A select loop that repeats calling select until a descriptor in
> > +the read
> > + * set becomes readable. On Windows we have to check for the
> > +termination event
> > + * from time to time, on Unix we can just block forever.
> > + */
> 
> Should a) mention why we have to check regularly on windows b) that on
> linux we don't have to because we send a cancel event from the signal
> handler.

I have added the comments..
 
> > +static int
> > +select_loop(int maxFd, fd_set *workerset) {
> > +   int i;
> > +   fd_set  saveSet = *workerset;

Re: [HACKERS] Overhauling our interrupt handling

2015-01-15 Thread Kyotaro HORIGUCHI

Hello,

> > I think I should finilize my commitfest item for this issue, with
> > .. "Rejected"?
> 
> Fine with me.

done.

> > > 0001: Replace walsender's latch with the general shared latch.
> > > 
> > >   New patch that removes ImmediateInteruptOK behaviour from 
> > > walsender. I
> > >   think that's a rather good idea, because walsender currently seems 
> > > to
> > >   assume WaitLatchOrSocket is reentrant - which I don't think is 
> > > really
> > >   guaranteed.
> > >   Hasn't been reviewed yet, but I think it's not far from being
> > >   committable.
> > 
> > Deesn't this patchset containing per-socket basis non-blocking
> > control for win32? It should make the code (above the win32
> > socket layer itself) more simpler.
> 
> I don't think so - we still rely on it unfortunately.

Does "it" mean win32_noblock?  Or the nonblocking bare win32
socket? The win32-per-sock-blkng-cntl patch in the below message
should cover both of them.

http://www.postgresql.org/message-id/54060ae5.5020...@vmware.com

If you are saying it should be a patch separate from this, I'll
do so.

regareds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] A minor typo in brin.c

2015-01-15 Thread Amit Langote

Hi,

Here's a patch that does:

  * For each new index tuple inserted, *numSummarized (if not NULL) is
- * incremented; for each existing tuple, numExisting (if not NULL) is
+ * incremented; for each existing tuple, *numExisting (if not NULL) is
  * incremented.
  */

Thanks,
Amit
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index c93422a..2b5fb8d 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1066,7 +1066,7 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
  * least ShareUpdateExclusiveLock mode.
  *
  * For each new index tuple inserted, *numSummarized (if not NULL) is
- * incremented; for each existing tuple, numExisting (if not NULL) is
+ * incremented; for each existing tuple, *numExisting (if not NULL) is
  * incremented.
  */
 static void

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind in contrib

2015-01-15 Thread Peter Eisentraut

Here is a random bag of comments for the v5 patch:

pg_xlogdump fails to build:

  CC   xlogreader.o
  CC   rmgrdesc.o
../../src/include/access/rmgrlist.h:32:46: error: 'dbase_desc' undeclared here 
(not in a function)
 PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, dbase_identify, NULL, 
NULL)
  ^
rmgrdesc.c:33:10: note: in definition of macro 'PG_RMGR'
  { name, desc, identify},
  ^
../../src/include/access/rmgrlist.h:32:58: error: 'dbase_identify' undeclared 
here (not in a function)
 PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, dbase_identify, NULL, 
NULL)
  ^
rmgrdesc.c:33:16: note: in definition of macro 'PG_RMGR'
  { name, desc, identify},
^
../../src/Makefile.global:732: recipe for target 'rmgrdesc.o' failed
make[2]: *** [rmgrdesc.o] Error 1


SGML files should use 1-space indentation, not 2.

In high-availability.sgml, pg_rewind could be a link.

In ref/pg_rewind.sgml,

-P option listed twice.

The option --source-server had be confused at first, because the entry
above under --source-pgdata also talks about a "source server".  Maybe
--source-connection would be clearer?

Reference pages have standardized top-level headers, so "Theory of
operation" should be under something like "Notes".

Similarly for "Restrictions", but that seems important enough to go
into the description.


src/bin/pg_rewind/.gitignore lists files for pg_regress use, which is not used 
anymore.

src/bin/pg_rewind/Makefile says "Makefile for src/bin/pg_basebackup".

There should be an installcheck target.

RewindTest.pm should be in the t/ directory.

Code like this:

+   if (map->bitmap == NULL)
+   map->bitmap = pg_malloc(newsize);
+   else
+   map->bitmap = pg_realloc(map->bitmap, newsize);

is unnecessary.  You can just write

map->bitmap = pg_realloc(map->bitmap, newsize);

because realloc handles NULL.

Instead of FILE_TYPE_DIRECTORY etc., why not use S_IFDIR etc.?

About this code

+   if (!exists)
+   action = FILE_ACTION_CREATE;
+   else
+   action = FILE_ACTION_NONE;

action is earlier initialized as FILE_ACTION_NONE, so the second
branch is redundant.  Alternatively, remove the earlier
initialization, so that maybe the compiler can detect if action is not
assigned in some paths.

Error messages should not end with a period.

Some calls to pg_fatal() don't have the string end with a newline.

In libpqProcessFileList(), I would tend to put the comment outside of
the SQL command string.

Mkvcbuild.pm changes still refer to pg_rewind in contrib.



TestLib.pm addition command_is sounds a bit wrong.  It's evidently
modelled after command_like, but that now sounds wrong too.  How about
command_stdout_is?

The test suite needs to silence all non-TAP output.  So psql needs to
be run with -q pg_ctl with -s etc.  Any important output needs to be
through diag() or note().

Test cases like

ok 6 - psql -A -t --no-psqlrc -c port=10072 -c SELECT datname FROM pg_database 
exit code 0

should probably get a real name.

The whole structure of the test suite still looks too much like the
old hack.  I'll try to think of other ways to structure it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

On Fri, Jan 16, 2015 at 12:53 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>
>> Andres, this patch needs more effort from the author, right? So
>> marking it as returned with feedback.
>
> I will give this patch a look in the current commitfest, if you can
> please set as 'needs review' instead with me as reviewer, so that I
> don't forget, I would appreciate it.
Fine for me, done this way.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] can you have any idea about toast missing chunk issu resolution

2015-01-15 Thread Jim Nasby


On 1/15/15 6:22 AM, M Tarkeshwar Rao wrote:

We are getting following error message on doing any action on the table 
like(Select or open from pgadmin).


Error reports should go to pgsql-general. I'm moving the discussion there (and 
BCC'ing -hackers).


Please suggest.

ERROR:  missing chunk number 0 for toast value 54787 in pg_toast_2619

** Error **

ERROR: missing chunk number 0 for toast value 54787 in pg_toast_2619

SQL state: XX000


That means that the database tried to detoast a value and it couldn't find it 
in the toast table. Likely causes:

- Someone manually modified the toast table. Hard to do, but not impossible.
- The toast index is corrupted. toast_fetch_datum() will always use a toast 
index so the only way to see if this is the issue is to try REINDEXing.
- The index is fine and the toast table is corrupted.
- The base table is corrupted. I think a corrupt index on the base table could 
also cause this, but I'm not sure.
- You've found some bug in either the toast or detoast code.

Note that when I say 'corrupted', that could also mean that the data is there 
and simply isn't being considered as visible per MVCC rules. Also, the actual 
condition throwing this error is

if (nextidx != numchunks)

where

numchunks = ((ressize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;

and nextidx is incremented with every chunk that's read.

The easiest thing to try right now is a REINDEX and see if that fixes things. 
It would be best if you stopped the server and made a filesystem copy of the 
data directory before doing that (or at least pg_clog, pg_xlog and the relevant 
toast heap and index files).
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump

2015-01-15 Thread Jim Nasby


On 1/15/15 5:26 AM, Gilles Darold wrote:

Hello,

There's a long pending issue with pg_dump and extensions that have table 
members with foreign keys. This was previously reported in this thread 
http://www.postgresql.org/message-id/ca+tgmoyvzkadmgh_8el7uvm472geru0b4pnnfjqye6ss1k9...@mail.gmail.com
 and discuss by Robert. All PostgreSQL users that use the PostGis extension 
postgis_topology are facing the issue because the two members tables (topology 
and layer) are linked by foreign keys.

If you dump a database with this extension and try to import it you will 
experience this error:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3345; 0 157059176 TABLE 
DATA layer gilles
pg_restore: [archiver (db)] COPY failed for table "layer": ERROR: insert or update on table 
"layer" violates foreign key constraint "layer_topology_id_fkey"
DETAIL:  Key (topology_id)=(1) is not present in table "topology".
WARNING: errors ignored on restore: 1


The problem is that, whatever export type you choose (plain/custom and full-export/data-only) the 
data of tables "topology" and "layer" are always exported in alphabetic order. 
I think this is a bug because outside extension, in data-only export, pg_dump is able to find 
foreign keys dependency and dump table's data in the right order but not with extension's members. 
Default is alphabetic order but that should not be the case with extension's members because 
constraints are recreated during the CREATE EXTENSION order. I hope I am clear enough.

Here we have three solutions:

 1/ Inform developers of extensions to take care to alphabetical order when 
they have member tables using foreign keys.
 2/ Inform DBAs that they have to restore the failing table independently. 
The use case above can be resumed using the following command:

  pg_restore -h localhost -n topology -t layer -Fc -d testdb_empty 
testdump.dump

 3/ Inform DBAs that they have to restore the schema first then the data 
only using --disable-triggers


I don't like 1-3, and I doubt anyone else does...


 4/ Patch pg_dump to solve this issue.


5. Disable FK's during load.
This is really a bigger item than just extensions. It would have the nice 
benefit of doing a wholesale FK validation instead of firing per-row triggers, 
but it would leave the database in a weird state if a restore failed...


I attach a patch that solves the issue in pg_dump, let me know if it might be 
included in Commit Fest or if the three other solutions are a better choice. I 
also join a sample extension (test_fk_in_ext) to be able to reproduce the issue 
and test the patch. Note that it might exists a simpler solution than the one I 
used in this patch, if this is the case please point me on the right way, I 
will be pleased to rewrite and send an other patch.


The only problem I see with this approach is circular FK's:

decibel@decina.local=# create table a(a_id serial primary key, b_id int);
CREATE TABLE
decibel@decina.local=# create table b(b_id serial primary key, a_id int 
references a);
CREATE TABLE
decibel@decina.local=# alter table a add foreign key(b_id) references b;
ALTER TABLE
decibel@decina.local=#

That's esoteric enough that I think it's OK not to directly support them, but 
pg_dump shouldn't puke on them (and really should throw a warning). Though it 
looks like it doesn't handle that in the data-only case anyway...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple

I wrote:
> Manuel Kniep  writes:
>> ok after lotâs of testing I could create a test caseÂ 
>> which can be found here https://gist.github.com/rapimo/3c8c1b35270e5854c524Â 
>> itâs written in ruby an depends on the gem activerecord pg and parallelÂ 

> Hm.  I don't see a segfault from this.  I do see the CREATE TEMP TABLE
> command failing with "ctid is NULL", which probably shouldn't be happening
> ... but no segfault.

The reason turns out to be that this is a dangling-pointer bug, and I was
using a memory-clobber-enabled build so it was pretty predictable what the
pointer would be pointing at.  I've got no doubt that hard-to-reproduce
misbehavior, including segfaults, would ensue without CLOBBER_FREED_MEMORY
turned on.

You need this patch:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=34668c8eca065d745bf1166a92c9efc588e7aee2

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

On Fri, Jan 16, 2015 at 12:57 AM, Alvaro Herrera
 wrote:
>> I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
>> or palloc_null() or palloc_no_oom() or palloc_unsafe().
>
> I liked palloc_noerror() better myself FWIW.
Voting for palloc_noerror() as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-01-15 Thread Andreas Karlsson


On 01/14/2015 08:48 AM, Michael Paquier wrote:

All those things gathered give the patch attached. Andreas, if you are
fine with it I think that we could pass it to a committer.


Excellent changes. Thanks for the patch and the reviews.

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure  wrote:
> Running this test on another set of hardware to verify -- if this
> turns out to be a false alarm which it may very well be, I can only
> offer my apologies!  I've never had a new drive fail like that, in
> that manner.  I'll burn the other hardware in overnight and report
> back.

Thanks.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes && superuser review

2015-01-15 Thread Adam Brightwell

All,

Attached is a patch that proposes the following additional role attributes
for review:

* ONLINE_BACKUP - allows role to perform backup operations
  - originally proposed as BACKUP - due to concern for the use of that term
in relation to other potential backup related permissions this form is in
line with the documentation as it describes the affected backup operations
as being 'online backups'.
  - applies only to the originally proposed backup functions.
* XLOG_REPLAY - allows role to perform pause and resume on xlog_replay
operations ('pg_xlog_replay_pause' and 'pg_xlog_replay_resume')
  - following the recommendation from Stephen and Magnus.
* LOG - allows role to rotate log files - remains broad enough to consider
future log related operations
* MONITOR - allows role to view pg_stat_* details (as originally proposed)
* SIGNAL - allows role to signal backend processes (as originally proposed)

The documentation still needs to be updated.  If this these attributes and
the capabilities they provide are acceptable, then I'll begin moving
forward on making those updates as well.

Regarding the discussion on a DUMP/READONLY permission. I believe that
topic needs much further discussion and decided it is probably best to keep
it as a separate patch/effort.  I'd certainly be willing to continue that
discussion and assist in moving any related effort forward, therefore,
please let me know if there is anything I can do to help.

Thanks,
Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
new file mode 100644
index 2179bf7..aaf13c1
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
***
*** 27,32 
--- 27,33 
  #include "miscadmin.h"
  #include "replication/walreceiver.h"
  #include "storage/smgr.h"
+ #include "utils/acl.h"
  #include "utils/builtins.h"
  #include "utils/numeric.h"
  #include "utils/guc.h"
*** pg_start_backup(PG_FUNCTION_ARGS)
*** 54,63 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!superuser() && !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! 		   errmsg("must be superuser or replication role to run a backup")));
  
  	startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL);
  
--- 55,65 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!has_replication_privilege(GetUserId())
! 		&& !has_online_backup_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  errmsg("must be superuser, replication role or online backup role to run a backup")));
  
  	startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL);
  
*** pg_stop_backup(PG_FUNCTION_ARGS)
*** 82,91 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!superuser() && !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! 		 (errmsg("must be superuser or replication role to run a backup";
  
  	stoppoint = do_pg_stop_backup(NULL, true, NULL);
  
--- 84,94 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!has_replication_privilege(GetUserId())
! 		&& !has_online_backup_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  errmsg("must be superuser, replication role or online backup role to run a backup")));
  
  	stoppoint = do_pg_stop_backup(NULL, true, NULL);
  
*** pg_switch_xlog(PG_FUNCTION_ARGS)
*** 100,109 
  {
  	XLogRecPtr	switchpoint;
  
! 	if (!superuser())
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! 			 (errmsg("must be superuser to switch transaction log files";
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
--- 103,112 
  {
  	XLogRecPtr	switchpoint;
  
! 	if (!has_online_backup_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  errmsg("must be superuser or online backup role to switch transaction log files")));
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
*** pg_create_restore_point(PG_FUNCTION_ARGS
*** 129,138 
  	char	   *restore_name_str;
  	XLogRecPtr	restorepoint;
  
! 	if (!superuser())
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  (errmsg("must be superuser to create a restore point";
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
--- 132,141 
  	char	   *restore_name_str;
  	XLogRecPtr	restorepoint;
  
! 	if (!has_online_backup_privilege(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  (errmsg("must be superuser or online backup role to create a restore point";
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
*** pg_xlogfile_name(PG_FUNCTION_ARGS)
*** 338,347 
  Datum
  pg_xlog_replay_pause(PG_FUNCTION_ARGS)
  {
! 	if (!superuser())
  		ereport

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 4:03 PM, Merlin Moncure  wrote:
> On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure  wrote:
>> Since it's possible the database is a loss, do you see any value in
>> bootstrappinng it again with checksums turned on?  One point of note
>> is that this is a brand spanking new SSD, maybe we nee to rule out
>> hardware based corruption?
>
> hm!  I bootstrapped a new database with checksums on, and lo,
>
> [cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING:  page
> verification failed, calculated checksum 28520 but expected 28541
> [cds2 18347 2015-01-15 15:58:29.955 CST 1779]CONTEXT:  SQL statement
> "COPY lease_delete FROM
> '/tmp/C1188732_onesiteproperty.lease_delete.csv' CSV HEADER"

Running this test on another set of hardware to verify -- if this
turns out to be a false alarm which it may very well be, I can only
offer my apologies!  I've never had a new drive fail like that, in
that manner.  I'll burn the other hardware in overnight and report
back.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] infinite loop in _bt_getstackbuf

On Thu, Oct 30, 2014 at 10:46 AM, Robert Haas  wrote:
> (9.3.5 problem report)

I think I saw a similar issue, by a 9.3.5 instance that was affected
by the "in pg_upgrade, remove pg_multixact files left behind by
initdb" issue (I ran the remediation recommended in the 9.3.5 release
notes). Multiple anti-wraparound vacuums were stuck following a PITR.
I resolved this (as far as I can tell) by killing the autovacuum
workers, and manually running VACUUM FREEZE. I have yet to do any root
cause analysis, but I think I could reproduce the problem.

> The fundamental structure of that function is an infinite loop.  We
> break out of that loop when BTEntrySame(item, &stack->bts_btentry) or
> P_RIGHTMOST(opaque) and I'm sure that it's correct to think that, in
> theory, one of those things will eventually happen.

Not in theory - only in practice. L&Y specifically state:

"We wish to point out here that our algorithms do not prevent the
possibility of livelock (where one process rrms indefinitely). This
can happen if a process never terminates because it keeps having to
follow link pointers created by other processes. This might happen in
the case of a process being run on a (relatively) very slow processor
in a multiprocessor system".

> But the index
> could be corrupted, most obviously by having a page where
> opaque->btpo_next points pack to the current block number.  If that
> happens, you need an immediate shutdown (or some clever gdb hackery)
> to terminate the VACUUM.  That's unfortunate and unnecessary.

Merlin reported a bug that looked exactly like this. Hardware failure
may now explain the problem.

> It also looks likes something we can fix, at a minimum by adding a
> CHECK_FOR_INTERRUPTS() at the top of that loop, or in some function
> that it calls, like _bt_getbuf(), so that if it goes into an infinite
> loop, it can at least be killed.

I think that it might be a good idea to have circular _bt_moveright()
moves (the direct offender in Merlin's case, which has very similar
logic to your _bt_getstackbuf() problem case) detected. I'm pretty
sure that it's exceptional for there to be more than 2 or 3 retries in
_bt_moveright(). It would probably be fine to consider the possibility
that we'll never finish once we get past 5 retries or something like
that. We'd then start keeping track of blocks visited, and raise an
error when a page was visited a second time.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 1:32 PM, Merlin Moncure  wrote:
> Since it's possible the database is a loss, do you see any value in
> bootstrappinng it again with checksums turned on?  One point of note
> is that this is a brand spanking new SSD, maybe we nee to rule out
> hardware based corruption?

hm!  I bootstrapped a new database with checksums on, and lo,

[cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING:  page
verification failed, calculated checksum 28520 but expected 28541
[cds2 18347 2015-01-15 15:58:29.955 CST 1779]CONTEXT:  SQL statement
"COPY lease_delete FROM
'/tmp/C1188732_onesiteproperty.lease_delete.csv' CSV HEADER"

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple

Manuel Kniep  writes:
> ok after lotâs of testing I could create a test caseÂ 
> which can be found here https://gist.github.com/rapimo/3c8c1b35270e5854c524Â 
> itâs written in ruby an depends on the gem activerecord pg and parallelÂ 

Hm.  I don't see a segfault from this.  I do see the CREATE TEMP TABLE
command failing with "ctid is NULL", which probably shouldn't be happening
... but no segfault.  I tried with current HEAD and with current 9.3
branch tip, so it's possible there's been some relevant bugfix since
9.3.5.

> So my feeling is that the row exclusive lock is not taken properly on 
> inherited tables.Â 

I think it's much more likely something to do with EvalPlanQual.  There
have been some bugs fixed in that area since 9.3.5, but none of them
look real promising as an explanation for this ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

2015-01-15 Thread Peter Eisentraut

On 1/15/15 2:29 PM, Bruce Momjian wrote:
> On Tue, Jan 13, 2015 at 10:56:48AM -0500, Peter Eisentraut wrote:
>> On 11/30/14 11:45 AM, Tom Lane wrote:
>>> The "API break" isn't a big issue imo.  The net effect would be that eg
>>> hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
>>> thing *all the time* --- at least twice in the past year, according to
>>> a quick scan of the commit logs.  If you were changing or removing a
>>> function that third-party code might depend on, it'd be problematic,
>>> but an addition has no such risk.
>>
>> This sort of things is actually a bit of an annoyance, because it means
>> that for minor-version upgrades, you need to stop the server before
>> unpacking the new version, otherwise the old running server will try to
>> load the new hstore module and fail with a symbol lookup.  This can
>> increase the downtime significantly.
>>
>> Yes, we've done this before, and people have gotten bitten by it before.
> 
> Uh, do we ever support installing new binaries while the server is
> running?  I would hope not.

Effectively, we don't, but it's not unreasonable to expect it.  Check
how your operating system upgrades other server packages such as apache
or openssh.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] attribute for non-gcc compilers

On Wed, Jan 14, 2015 at 5:54 PM, Andres Freund  wrote:
> I think I'd for now simply not define pg_attribute_aligned() on
> platforms where it's not supported, instead of defining it empty. If we
> need a softer variant we can name it pg_attribute_aligned_if_possible or
> something.
>
> Sounds sane?

Yes, that sounds like a much better plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] advance local xmin more aggressively

On Thu, Jan 15, 2015 at 3:08 AM, Michael Paquier
 wrote:
> On Mon, Dec 22, 2014 at 7:31 PM, Heikki Linnakangas
>  wrote:
>> Here's an updated version, rebased over the pairing heap code that I just
>> committed, and fixing those bugs.
> So, are we reaching an outcome for the match happening here?

Well, I still like using the existing ResourceOwner pointers to find
the snapshots more than introducing a separate data structure just for
that.  But I'm OK with Heikki committing his version and, if anybody
notices the new code becoming a hotspot, we can revisit the issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 1:15 PM, Andres Freund  wrote:
> Hi,
>
>> The plot thickens!  I looped the test, still stock 9.4 as of this time
>> and went to lunch. When I came back, the database was in recovery
>> mode.  Here is the rough sequence of events.
>>
>
> Whoa. That looks scary. Did you see (some of) those errors before? Most
> of them should have been emitted independently of being built with
> assertions.

Nope.

>> 1) REINDEXED pg_class (wanted clean slate for full reproduction)
>> 2) before too long (unfortunately did not configure timestamps in the
>> log) starting seeing:
>>
>> ERROR:  root page 3 of index "pg_class_oid_index" has level 0,
>> expected 1 at character 8
>> QUERY:  UPDATE CDSRunTable SET
>>   Finished = clock_timestamp(),
>>   DidSucceed = _DidSucceed,
>>   ErrorMessage = _ErrorMessage
>> WHERE CDSRunTableId = _CdsRunTableId
>> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
>> SQL statement
>> STATEMENT:  SELECT CDSReconcileRunTable(2020)
>>
>> ..and again with a FATAL
>> FATAL:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
>> ERROR:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
>> CONTEXT:  SQL statement "UPDATE CDSRunTable SET
>>   Finished = clock_timestamp(),
>>   DidSucceed = _DidSucceed,
>>   ErrorMessage = _ErrorMessage
>> WHERE CDSRunTableId = _CdsRunTableId"
>> PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement
>
> Just to be absolutely clear, those happened *before* the following
> errors? And there were no 'during exception cleanup' like errors before?

correct.  unfortunately, I wasn't logging times so I'm not 100% sure
if the 'root page' error happened during the event or sometime before
it.  I'm addressing that with the log prefix.

> Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
> entries before? It's hard to know from here, but the 'during exception
> cleanup' indicates a problem in abort handling.  Were there any deadlock
> detected errors closeby?
>
> You're catching deadlock errors in a subtransaction. Hm.

yup, exactly 1:
WARNING:  LOG: Handling deadlock detected on CdsRunTableId 1626

Based on the id that is proven to be before the crashing started,
although at least one manual server restart happened in the interim.

Since it's possible the database is a loss, do you see any value in
bootstrappinng it again with checksums turned on?  One point of note
is that this is a brand spanking new SSD, maybe we nee to rule out
hardware based corruption?

> lternatively were there any 'LOG: CdsRunTableId %s Failed' messages? If
so, what was the cause?

There was one, but unfortunately since the error was trapped the
relevant detail to catch the other side of the deadlock is missed.
Barring this analysis, I'm inclined to remove that guard: it was
mostly there to deal with what turned out to be some bugs in the code
(specifically nonconstrained update on CDSTable).

> * Do you also use lock_timeout/statement_timeout?
not on this database -- this is a coding environment and not set up
with standard configuration

>  were any processes killed at that time?
later on, yes, by manual restart, and the query was cancelled first: to wit:

WARNING:  LOG: Handling deadlock detected on CdsRunTableId 1626
ERROR:  canceling statement due to user request
STATEMENT:  SELECT CDSReconcileRunTable(1626)

followed by restart -m fast

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #12070: hstore extension: hstore_to_json_loose produces invalid JSON

2015-01-15 Thread Bruce Momjian

On Tue, Jan 13, 2015 at 10:56:48AM -0500, Peter Eisentraut wrote:
> On 11/30/14 11:45 AM, Tom Lane wrote:
> > The "API break" isn't a big issue imo.  The net effect would be that eg
> > hstore 9.3.6 wouldn't work against a 9.3.5 server.  We do that sort of
> > thing *all the time* --- at least twice in the past year, according to
> > a quick scan of the commit logs.  If you were changing or removing a
> > function that third-party code might depend on, it'd be problematic,
> > but an addition has no such risk.
> 
> This sort of things is actually a bit of an annoyance, because it means
> that for minor-version upgrades, you need to stop the server before
> unpacking the new version, otherwise the old running server will try to
> load the new hstore module and fail with a symbol lookup.  This can
> increase the downtime significantly.
> 
> Yes, we've done this before, and people have gotten bitten by it before.

Uh, do we ever support installing new binaries while the server is
running?  I would hope not.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On 2015-01-15 20:15:42 +0100, Andres Freund wrote:
> > WARNING:  did not find subXID 14955 in MyProc
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > WARNING:  you don't own a lock of type RowExclusiveLock
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > LOG:  could not send data to client: Broken pipe
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > STATEMENT:  SELECT CDSReconcileRunTable(2151)
> > WARNING:  ReleaseLockIfHeld: failed??
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > ERROR:  failed to re-find shared proclock object
> > CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> > during exception cleanup
> > STATEMENT:  SELECT CDSReconcileRunTable(2151)
> > WARNING:  AbortSubTransaction while in ABORT state
> > WARNING:  did not find subXID 14955 in MyProc
> > WARNING:  you don't own a lock of type AccessShareLock
> > WARNING:  ReleaseLockIfHeld: failed??
> > ERROR:  failed to re-find shared proclock object
> > WARNING:  AbortSubTransaction while in ABORT state
> > WARNING:  did not find subXID 14955 in MyProc
> > WARNING:  you don't own a lock of type AccessShareLock
> > WARNING:  ReleaseLockIfHeld: failed??
> > WARNING:  you don't own a lock of type ShareLock
> > TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode]
> > > 0)", File: "lock.c", Line: 1240)
> > LOG:  server process (PID 10117) was terminated by signal 6: Aborted
> > LOG:  terminating any other active server processes
> 
> Ick.
> 
> Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
> entries before? It's hard to know from here, but the 'during exception
> cleanup' indicates a problem in abort handling.  Were there any deadlock
> detected errors closeby?

Alternatively were there any 'LOG: CdsRunTableId %s Failed' messages? If
so, what was the cause?

> You're catching deadlock errors in a subtransaction. Hm.

A couple questions:
* Do you also use lock_timeout/statement_timeout? If so, what are their
  settings + deadlock_timeout?
* were any processes killed at that time?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

Hi,

> The plot thickens!  I looped the test, still stock 9.4 as of this time
> and went to lunch. When I came back, the database was in recovery
> mode.  Here is the rough sequence of events.
>

Whoa. That looks scary. Did you see (some of) those errors before? Most
of them should have been emitted independently of being built with
assertions.

> 1) REINDEXED pg_class (wanted clean slate for full reproduction)
> 2) before too long (unfortunately did not configure timestamps in the
> log) starting seeing:
> 
> ERROR:  root page 3 of index "pg_class_oid_index" has level 0,
> expected 1 at character 8
> QUERY:  UPDATE CDSRunTable SET
>   Finished = clock_timestamp(),
>   DidSucceed = _DidSucceed,
>   ErrorMessage = _ErrorMessage
> WHERE CDSRunTableId = _CdsRunTableId
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
> SQL statement
> STATEMENT:  SELECT CDSReconcileRunTable(2020)
> 
> ..and again with a FATAL
> FATAL:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
> ERROR:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
> CONTEXT:  SQL statement "UPDATE CDSRunTable SET
>   Finished = clock_timestamp(),
>   DidSucceed = _DidSucceed,
>   ErrorMessage = _ErrorMessage
> WHERE CDSRunTableId = _CdsRunTableId"
> PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement

Just to be absolutely clear, those happened *before* the following
errors? And there were no 'during exception cleanup' like errors before?

> 3) shortly (?) after that, I got:
> 
> WARNING:  did not find subXID 14955 in MyProc
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> WARNING:  you don't own a lock of type RowExclusiveLock
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> LOG:  could not send data to client: Broken pipe
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> STATEMENT:  SELECT CDSReconcileRunTable(2151)
> WARNING:  ReleaseLockIfHeld: failed??
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> ERROR:  failed to re-find shared proclock object
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> STATEMENT:  SELECT CDSReconcileRunTable(2151)
> WARNING:  AbortSubTransaction while in ABORT state
> WARNING:  did not find subXID 14955 in MyProc
> WARNING:  you don't own a lock of type AccessShareLock
> WARNING:  ReleaseLockIfHeld: failed??
> ERROR:  failed to re-find shared proclock object
> WARNING:  AbortSubTransaction while in ABORT state
> WARNING:  did not find subXID 14955 in MyProc
> WARNING:  you don't own a lock of type AccessShareLock
> WARNING:  ReleaseLockIfHeld: failed??
> WARNING:  you don't own a lock of type ShareLock
> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode]
> > 0)", File: "lock.c", Line: 1240)
> LOG:  server process (PID 10117) was terminated by signal 6: Aborted
> LOG:  terminating any other active server processes

Ick.

Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
entries before? It's hard to know from here, but the 'during exception
cleanup' indicates a problem in abort handling.  Were there any deadlock
detected errors closeby?

You're catching deadlock errors in a subtransaction. Hm.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLOG_PARAMETER_CHANGE handling of wal_log_hints


On 01/07/2015 11:53 AM, Petr Jelinek wrote:

On 07/01/15 00:59, Michael Paquier wrote:

On Wed, Jan 7, 2015 at 4:24 AM, Petr Jelinek  wrote:

Hi,

when I was fixing how commit_ts handles the XLOG_PARAMETER_CHANGE I noticed
that for wal_log_hints we assign the value in ControFile to current value
instead of value that comes from WAL.

ISTM it has just information value so it should not have any practical
impact, but it looks like a bug anyway so here is simple patch to change
that.

Right. That's something that should get fixed in 9.4 as well..

   ControlFile->track_commit_timestamp = track_commit_timestamp;
The new value of track_commit_timestamp should be taken as well from
xlrec on master btw.


That's part of the larger fix for CommitTs that I sent separately in
response to the bug report from Fujii - there is much more to be done
there for CommitTs than just this.


Pushed this part now. I'll let you and Fujii handle that larger fix. Thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 8:02 AM, Merlin Moncure  wrote:
> On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas
>  wrote:
>> On 01/15/2015 03:23 AM, Peter Geoghegan wrote:
>>>
>>> So now the question is: how did that inconsistency arise? It didn't
>>> necessarily arise at the time of the (presumed) split of block 2 to
>>> create 9. It could be that the opaque area was changed by something
>>> else, some time later. I'll investigate more.
>>
>>
>> Merlin, could you re-run the test with a WAL archive (if you don't have one
>> already), and then run pg_xlogdump, filtering it to show only the changes to
>> the index? That should show us how the index got to be the way it is. Also,
>> if you could post a copy of the raw relation file for pg_class_oid_index; I
>> assume it's not too large.
>>
>> Something like:
>>
>> pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917
>>
>> 11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd
>> cluster. In case it's not the same on your system, you can use oid2name to
>> find it out.
>
> I'm on it.  Will try this first, then patch removal.
>
> Question: Coming in this morning I did an immediate restart and logged
> into the database and queried pg_class via index.   Everything was
> fine, and the leftright verify returns nothing.  How did it repair
> itself without a reindex?

The plot thickens!  I looped the test, still stock 9.4 as of this time
and went to lunch. When I came back, the database was in recovery
mode.  Here is the rough sequence of events.

1) REINDEXED pg_class (wanted clean slate for full reproduction)
2) before too long (unfortunately did not configure timestamps in the
log) starting seeing:

ERROR:  root page 3 of index "pg_class_oid_index" has level 0,
expected 1 at character 8
QUERY:  UPDATE CDSRunTable SET
  Finished = clock_timestamp(),
  DidSucceed = _DidSucceed,
  ErrorMessage = _ErrorMessage
WHERE CDSRunTableId = _CdsRunTableId
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
SQL statement
STATEMENT:  SELECT CDSReconcileRunTable(2020)

..and again with a FATAL
FATAL:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
ERROR:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
CONTEXT:  SQL statement "UPDATE CDSRunTable SET
  Finished = clock_timestamp(),
  DidSucceed = _DidSucceed,
  ErrorMessage = _ErrorMessage
WHERE CDSRunTableId = _CdsRunTableId"
PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement


3) shortly (?) after that, I got:

WARNING:  did not find subXID 14955 in MyProc
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
WARNING:  you don't own a lock of type RowExclusiveLock
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
LOG:  could not send data to client: Broken pipe
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
STATEMENT:  SELECT CDSReconcileRunTable(2151)
WARNING:  ReleaseLockIfHeld: failed??
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
ERROR:  failed to re-find shared proclock object
CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
during exception cleanup
STATEMENT:  SELECT CDSReconcileRunTable(2151)
WARNING:  AbortSubTransaction while in ABORT state
WARNING:  did not find subXID 14955 in MyProc
WARNING:  you don't own a lock of type AccessShareLock
WARNING:  ReleaseLockIfHeld: failed??
ERROR:  failed to re-find shared proclock object
WARNING:  AbortSubTransaction while in ABORT state
WARNING:  did not find subXID 14955 in MyProc
WARNING:  you don't own a lock of type AccessShareLock
WARNING:  ReleaseLockIfHeld: failed??
WARNING:  you don't own a lock of type ShareLock
TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode]
> 0)", File: "lock.c", Line: 1240)
LOG:  server process (PID 10117) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

4) while I was still at lunch, the database started up again, and
quickly fell over:
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2268)
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2270)
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2272)
ERROR:  could not open relation with OID 3597
STATEMENT:  SELECT CDSReconcileRunTable(2273)
TRAP: FailedAssertion("!(( ((void) ((bool) ((! assert_enabled) || !
(!(((bool) (((const void*)(&htup->t_ctid) != ((void *)0)) &&
((&htup->t_ctid)->ip_posid != 0)
LOG:  server process (PID 30520) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

this repeated a very large number of times.   Occasionally, but
infrequently, I'd see duri

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 6:02 AM, Merlin Moncure  wrote:
> Question: Coming in this morning I did an immediate restart and logged
> into the database and queried pg_class via index.   Everything was
> fine, and the leftright verify returns nothing.  How did it repair
> itself without a reindex?

Maybe the relevant WAL records are more or less correct, and recovery
restores the database to its correct state, as opposed to the state it
was actually in at the time of the crash due to a race condition or
whatever. Could you possibly send bt_page_items() against block 9 of
the index after recovery? It probably looks totally sane, but it would
be nice to be sure, if you happen to still have the data.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] OOM on EXPLAIN with lots of nodes

Robert Haas  writes:
> On Tue, Jan 13, 2015 at 8:16 PM, Tom Lane  wrote:
>> Not sure whether to just commit this to HEAD and call it a day, or to
>> risk back-patching.

> I think we need to back-patch something; that's a pretty nasty
> regression, and I have some EDB-internal reports that might be from
> the same cause.

OK.  I went ahead and made the necessary code changes to avoid changing
the struct size in released branches, as per Heikki's idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compress method for spgist - 2


On 01/15/2015 09:28 AM, Michael Paquier wrote:

Marking this patch as returned with feedback because it is waiting for
input from the author for now a couple of weeks. Heikki, the
refactoring patch has some value, are you planning to push it?


I think you're mixing up with the other thread, "btree_gin and ranges". 
I pushed the refactoring patch I posted to that thread 
(http://www.postgresql.org/message-id/54983cf2.80...@vmware.com) 
already. I haven't proposed any refactoring related to spgist.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] explain sortorder

"Timmer, Marius"  writes:
> attached is version 8, fixing remaining issues, adding docs and tests as 
> requested/agreed.

I'll pick this up --- I've been a bit lax about helping with this
commitfest.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segmentation fault in execTuples.c#ExecStoreVirtualTuple

2015-01-15 Thread Manuel Kniep

On 6. Januar 2015 at 07:20:21, Michael Paquier (michael.paqu...@gmail.com) 
wrote:
> On Tue, Jan 6, 2015 at 12:39 AM, Manuel Kniep wrote:
> > Hi,
> >
> > we are running postges 9.3.5 on gentoo linux kernel 3.16.5, compiled with 
> > gcc 4.8.3  
> > Any ideas ?
>  
> > #17 0x0062bb9d in SPI_execute_with_args (
> src=0x22b880bb0 "\n CREATE TEMPORARY TABLE
> > [...]
> > #33 0x7f363555ab97 in plpgsql_exec_function (func=0xd888c8, 
> > fcinfo=0x7aa89a60)  
> at pl_exec.c:321
> > #34 0x7f3632be in plpgsql_call_handler (fcinfo=0x7aa89a60) at 
> > pl_handler.c:129  
> > [...]
> > #46 0x0072e4eb in exec_simple_query (
> > query_string=0xd633b0 "SELECT 'event' as item, '2014-12-30' as date, 
> > 'Backends::Backend9'  
> as backend, '33' as bucket, * FROM materialize_events('2014-11-20', 
> '2014-12-30')")  
> at postgres.c:1048
> From the backtrace you are showing, you are creating a temporary table
> with CREATE TABLE AS within a plpgsql function. Could you provide a
> self-contained test case?
> --
> Michael

ok after lot’s of testing I could create a test case 
which can be found here https://gist.github.com/rapimo/3c8c1b35270e5854c524 
it’s written in ruby an depends on the gem activerecord pg and parallel 
the issue is basically the combination of writable CTE on a parent table and 
concurrent updates on the child table 

the test case create 3 child tables with 1e6 rows each  

CREATE TABLE test_data( 
  tracker_id  integer not null, 
  created_at timestamp, 
  processed boolean 
); 

and then runs the CTE command 

 CREATE TEMPORARY TABLE outtbl ON COMMIT DROP AS 
          WITH affected AS ( 
            UPDATE test_data t SET processed = true 
            WHERE tracker_id BETWEEN 200 AND 300 
            AND NOT processed 
            RETURNING 
              date(t.created_at) as date, tracker_id, created_at 
          ) 
          SELECT * FROM affected 

while this command is running 5 other concurrent updates do a 

UPDATE test_data SET tracker_id = 400 
      WHERE tracker_id = 200 
      and created_at = '2015-01-02 00:03:20' 
      and date(created_at) = '2015-01-02’ 

So my feeling is that the row exclusive lock is not taken properly on inherited 
tables. 
I tried the tastcase without inheritance and it did not segfault. 

If you don’t like ruby I might be able to rewrite the test case in bash. 

thanks 

Manuel 




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

On 2015-01-15 11:56:24 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2015-01-15 10:57:10 -0500, Tom Lane wrote:
> >> While I'll not cry too hard when we decide to break C89 compatibility,
> >> I don't want it to happen accidentally; so having a pretty old-school
> >> compiler in the farm seems important to me.
> 
> > I'd worked on setting up a modern gcc (or was it clang?) with the
> > appropriate flags to warn about !C89 stuff some time back, but failed
> > because of configure bugs.
> 
> My recollection is that there isn't any reasonable way to get gcc to
> warn about C89 violations as such.  -ansi -pedantic is not very fit
> for the purpose.

It was clang, which has -Wc99-extensions/-Wc11-extensions.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

Andres Freund  writes:
> On 2015-01-15 10:57:10 -0500, Tom Lane wrote:
>> While I'll not cry too hard when we decide to break C89 compatibility,
>> I don't want it to happen accidentally; so having a pretty old-school
>> compiler in the farm seems important to me.

> I'd worked on setting up a modern gcc (or was it clang?) with the
> appropriate flags to warn about !C89 stuff some time back, but failed
> because of configure bugs.

My recollection is that there isn't any reasonable way to get gcc to
warn about C89 violations as such.  -ansi -pedantic is not very fit
for the purpose.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] explain sortorder (fwd)

2015-01-15 Thread Mike Blackwell

>
>
> From: "Timmer, Marius" 
>
> Hi,
>
> attached is version 8, fixing remaining issues, adding docs and tests as
> requested/agreed.
>
>
> Marius & Arne
>
>
This looks good to me.  Test coverage seems complete.  Doc updates are
included.  Output format looks like it should be acceptable to Heikki.

I'll mark this as ready for committer.

Thanks for the patch!

Mike

Re: [HACKERS] [PATCH] explain sortorder

2015-01-15 Thread Timmer, Marius

Hi,

attached is version 8, fixing remaining issues, adding docs and tests as 
requested/agreed.


Marius & Arne




---
Marius Timmer
Zentrum für Informationsverarbeitung
Westfälische Wilhelms-Universität Münster
Einsteinstraße 60

mtimm...@uni-muenster.de

Am 14.01.2015 um 17:42 schrieb Arne Scheffer :

> Hi,
>
> we will also remove the following is lc_collate hint in the next version, 
> showing only mandatory info as suggested.
>
>/* for those who use COLLATE although their default is already 
> the wanted */
>if (strcmp(collname, localeptr) == 0)
>{
>appendStringInfo(sortorderInformation, " (%s is 
> LC_COLLATE)", collname);
>}
>
> Anybody insisting on that?
>
> Arne
>
> Note: I see, at the moment we use the wrong default for DESC. We'll fix that.
>
> On Wed, 14 Jan 2015, Heikki Linnakangas wrote:
>
>> On 01/14/2015 05:26 PM, Timmer, Marius wrote:
>>> Hello Heikki,
>>> abbreviated version:
>>> Sorry, the problem is only the unhandy patch text format, not different 
>>> opinions how to proceed.
>>> Long version:
>>> The v7 patch file already addressed your suggestions,
>>> but the file contained serveral (old) local commits,
>>> the new ones at the end of the patch text/file.
>>
>> Ah, missed that. I stopped reading when I saw the old stuff there :-).
>>
>>> v7.1 is attached and addresses this issue providing a clean patch file.
>>
>> Ok, thanks, will take a look.
>>
>>> V8 will - as mentioned - add missing docs and regression tests,
>>
>> Great!
>>
>> - Heikki
>>
>>



explain_sortorder-v8.patch
Description: explain_sortorder-v8.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-01-15 Thread Alvaro Herrera

Etsuro Fujita wrote:

> ***
> *** 817,826  InitPlan(QueryDesc *queryDesc, int eflags)
> --- 818,833 
>   break;
>   case ROW_MARK_COPY:
>   /* there's no real table here ... */
> + relkind = rt_fetch(rc->rti, 
> rangeTable)->relkind;
> + if (relkind == RELKIND_FOREIGN_TABLE)
> + relid = getrelid(rc->rti, rangeTable);
> + else
> + relid = InvalidOid;
>   relation = NULL;
>   break;
>   default:
>   elog(ERROR, "unrecognized markType: %d", 
> rc->markType);
> + relid = InvalidOid;
>   relation = NULL;/* keep compiler quiet 
> */
>   break;
>   }

[ ... ]

> --- 2326,2342 
>   
>   /* build a temporary HeapTuple control structure */
>   tuple.t_len = HeapTupleHeaderGetDatumLength(td);
> ! /* if relid is valid, rel is a foreign table; set 
> system columns */
> ! if (OidIsValid(erm->relid))
> ! {
> ! tuple.t_self = td->t_ctid;
> ! tuple.t_tableOid = erm->relid;
> ! }
> ! else
> ! {
> ! ItemPointerSetInvalid(&(tuple.t_self));
> ! tuple.t_tableOid = InvalidOid;
> ! }
>   tuple.t_data = td;
>   
>   /* copy and store tuple */

I find this arrangement confusing and unnecessary -- surely if you have
access to the ExecRowMark here, you could just obtain the relid with
RelationGetRelid instead of saving the OID beforehand?  And if you have
the Relation, you could just consult the relkind at that point instead
of relying on the relid being set or not as a flag to indicate whether
the rel is foreign.

I didn't look at anything else in the patch so I can't comment more on
it, but the change to ExecRowMark caught my attention.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

2015-01-15 Thread Alvaro Herrera

Robert Haas wrote:

> Hmm, I understood Tom to be opposing the idea of a palloc variant that
> returns NULL on failure, and I understand you to be supporting it.
> But maybe I'm confused.

Your understanding seems correct to me.  I was just saying that your
description of Tom's argument to dislike the idea seemed at odds with
what he was actually saying.

> Anyway, I support it.  I agree that there are
> systems (or circumstances?) where malloc is going to succeed and then
> the world will blow up later on anyway, but I don't think that means
> that an out-of-memory error is the only sensible response to a palloc
> failure; returning NULL seems like a sometimes-useful alternative.
> 
> I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
> or palloc_null() or palloc_no_oom() or palloc_unsafe().

I liked palloc_noerror() better myself FWIW.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-15 Thread Alvaro Herrera

Michael Paquier wrote:

> Andres, this patch needs more effort from the author, right? So
> marking it as returned with feedback.

I will give this patch a look in the current commitfest, if you can
please set as 'needs review' instead with me as reviewer, so that I
don't forget, I would appreciate it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-15 Thread Sawada Masahiko

On Thu, Jan 15, 2015 at 2:02 PM, Amit Kapila  wrote:
> On Wed, Jan 14, 2015 at 9:01 PM, Sawada Masahiko 
> wrote:
>>
>> Hi all,
>>
>> The postgresql.auto.conf is loaded after loading of postgresql.conf
>> whenever configuration file is loaded or reloaded.
>> This means that parameter in postgresql.auto.conf is quite high
>> priority, so the parameter in postgresql.conf does not work at all
>> even if user set it manually.
>>
>> If user want to change stopped postgresql server then user need to
>> merge two configuration file(postgresql.conf and postgresql.auto.conf)
>> while maintaining the consistency manually.
>>
>
> I think one way to currently do it is with the help of Alter System .. Reset
> command.  Basically user can check via pg_settings, if the variable
> belongs to postgresql.auto.conf and he knows already a duplicate copy
> of same is available in postrgresql.conf and that is the value he want to
> use, then RESET command could be used to remove the setting from
> postgresql.auto.conf
>

It sounds convenient.
e.g., It can reset one variable using by like RESET command, and reset
all variable using by RESET ALL command

>> From an operational perspective having a written config with duplicate
>> entries is not good thing.
>> I think we need to merge two configuration file into one (or more than
>> one, if it uses like 'include' word)
>>
>> The one solution is to add merging tool/commnand which merges two
>> configuration file while maintaining the consistency.
>
> If you want to think of some solution which can make the usage of Alter
> System more convenient, then we should think only in terms of change/
> remove the value in postgresql.auto.conf as that is the place where we
> have added values programatically.
>
> One thought I have in this line is that currently there doesn't seem to be
> a way to know if the setting has an entry both in postgresql.conf and
> postgresql.auto.conf, if we can have some way of knowing the same
> (pg_settings?), then it could be convenient for user to decide if the value
> in postgresql.auto.conf is useful or not and if it's not useful then use
> Alter System .. Reset command to remove the same from
> postgresql.auto.conf.

I think one way is that pg_settings has file name of variables,  But
It would not affect to currently status of postgresql.conf
So we would need to parse postgresql.conf again at that time.


Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

On 2015-01-15 10:57:10 -0500, Tom Lane wrote:
> * I've got gaur configured so it will throw "array subscript of type char"
> complaints whenever somebody forgets to cast a  function argument
> to unsigned char.

But, but. That would never happen to anyone (hides).

> While I'll not cry too hard when we decide to break C89 compatibility,
> I don't want it to happen accidentally; so having a pretty old-school
> compiler in the farm seems important to me.

Yea, agreed. I also don't think we want to adopt all of C99 at once, but
rather do it piecemal. Feature by feature.

I'd worked on setting up a modern gcc (or was it clang?) with the
appropriate flags to warn about !C89 stuff some time back, but failed
because of configure bugs. I think Robert has committed most of the
fixes since, and I now actually could do the rest one of these days...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

I wrote:
> I've launched a run now, expect results from gcc HEAD in an hour and
> a half or so.

... and it's happy.  Thanks!

BTW, the reason I went to the trouble of cranking up the buildfarm scripts
on that machine (and it was painful :-() is that I don't believe any other
buildfarm members are running compilers old enough to complain about some
of the things these will.  In particular:

* I've got gaur configured so it will throw "array subscript of type char"
complaints whenever somebody forgets to cast a  function argument
to unsigned char.

* pademelon will complain about // comments, variable-sized local arrays,
flexible array syntax, non-static function definition after static
declaration, and probably some other C89 violations that I am not
remembering right now.

While I'll not cry too hard when we decide to break C89 compatibility,
I don't want it to happen accidentally; so having a pretty old-school
compiler in the farm seems important to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

On Thu, Jan 15, 2015 at 9:09 AM, Amit Kapila  wrote:
> On Thu, Jan 15, 2015 at 6:52 PM, Robert Haas  wrote:
>> On Thu, Jan 15, 2015 at 7:00 AM, Amit Kapila 
>> wrote:
>> > +HandleParallelMessages(void)
>> > +{
>> > ..
>> > ..
>> > + for (i = 0; i < pcxt->nworkers; ++i)
>> > + {
>> > + /*
>> > + * Read messages for as long as we have an error queue; if we
>> > + * have hit (or hit while reading) ReadyForQuery, this will go to
>> > + * NULL.
>> > + */
>> > + while (pcxt->worker[i].error_mqh != NULL)
>> > + {
>> > + shm_mq_result res;
>> > +
>> > + CHECK_FOR_INTERRUPTS();
>> > +
>> > + res = shm_mq_receive(pcxt->worker[i].error_mqh, &nbytes,
>> > + &data, true);
>> > + if (res == SHM_MQ_SUCCESS)
>> >
>> > Here we are checking the error queue for all the workers and this loop
>> > will continue untill all have sent ReadyForQuery() message ('Z') which
>> > will make this loop continue till all workers have finished their work.
>> > Assume situation where first worker has completed the work and sent
>> > 'Z' message and second worker is still sending some tuples, now above
>> > code will keep on waiting for 'Z' message from second worker and won't
>> > allow to receive tuples sent by second worker till it send 'Z' message.
>> >
>> > As each worker send its own 'Z' message after completion, so ideally
>> > the above code should receive the message only for worker which has
>> > sent the message.  I think for that it needs worker information who has
>> > sent the message.
>>
>> Are you talking about HandleParallelMessages() or
>> WaitForParallelWorkersToFinish()?  The former doesn't wait for
>> anything; it just handles any messages that are available now.
>
> I am talking about HandleParallelMessages().  It doesn't wait but
> it is looping which will make it run for longer time as explained
> above.  Just imagine a case where there are two workers and first
> worker has sent 'Z' message and second worker is doing some
> work, now in such a scenario loop will not finish until second worker
> also send 'Z' message or error.  Am I missing something?

Blah.  You're right.  I intended to write this loop so that it only
runs until shm_mq_receive() returns SHM_MQ_WOULD_BLOCK.  But that's
not what I did.  Will fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

On Thu, Jan 15, 2015 at 8:42 AM, Andres Freund  wrote:
> On 2015-01-15 08:40:34 -0500, Robert Haas wrote:
>> I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
>> or palloc_null() or palloc_no_oom() or palloc_unsafe().
>
> palloc_or_null()?

That'd work for me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Out-of-bounds write and incorrect detection of trigger file in pg_standby

On Wed, Jan 14, 2015 at 9:27 PM, Michael Paquier
 wrote:
> On Thu, Jan 15, 2015 at 7:13 AM, Robert Haas  wrote:
>> Instead of doing this:
>>
>> if (len < sizeof(buf))
>> buf[len] = '\0';
>>
>> ...I would suggest making the size of the buffer one greater than the
>> size of the read(), and then always nul-terminating the buffer.  It
>> seems to me that would make the code easier to reason about.
> How about the attached then? This way we still detect the same way any
> invalid values:
> -   if ((len = read(fd, buf, sizeof(buf))) < 0)
> +   if ((len = read(fd, buf, sizeof(buf) - 1)) < 0)

Committed and back-patched all the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings

On 2015-01-15 09:25:29 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > FWIW, if we moved the
> > CFLAGS="$CFLAGS $user_CFLAGS"
> > further down, it'd have advantage that compiling with -Werror would be
> > more realistic. Right now doing so breaks about half of the feature
> > checking configure checks because of warnings. E.g. on my platform it
> > fails to detect 64bit integers, inline, ...
> 
> Given the way autoconf works, I think trying to run the configure tests
> with -Werror is a fool's errand.

Yea, agreed.

> OTOH, not applying the user's CFLAGS  during configure is a nonstarter
> as well.

Fair enough. What about just filtering out -Werror during configure
alone? Or just specifying -Wno-error during it? Given that it really
can't work properly, that seems like a relatively simple solution.

> So rather than trying to inject -Werror via generic CFLAGS, it would
> likely be better to have some means of injecting it only into the
> actual build and not into the configure run.
> 
> There is at least one way to do that already (Makefile.custom).  Not
> sure if it's worth inventing an --enable-warnings-as-errors type of
> switch to do it more directly.

I think Makefile.custom is really rather hard to discover for new
developers. That its inclusion is commented with
# NOTE:  Makefile.custom is from the pre-Autoconf days of PostgreSQL.
# You are liable to shoot yourself in the foot if you use it without
# knowing exactly what you're doing.  The preferred (and more
# reliable) method is to communicate what you want to do to the
# configure script, and leave the makefiles alone.
doesn't help...

I'd also like to have a easy way of adding CFLAGS to configure, instead
of overwriting them. There's COPT for make, but that doesn't persist...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings

Andres Freund  writes:
> FWIW, if we moved the
> CFLAGS="$CFLAGS $user_CFLAGS"
> further down, it'd have advantage that compiling with -Werror would be
> more realistic. Right now doing so breaks about half of the feature
> checking configure checks because of warnings. E.g. on my platform it
> fails to detect 64bit integers, inline, ...

Given the way autoconf works, I think trying to run the configure tests
with -Werror is a fool's errand.  OTOH, not applying the user's CFLAGS
during configure is a nonstarter as well.  So rather than trying to inject
-Werror via generic CFLAGS, it would likely be better to have some means
of injecting it only into the actual build and not into the configure run.

There is at least one way to do that already (Makefile.custom).  Not
sure if it's worth inventing an --enable-warnings-as-errors type of
switch to do it more directly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] orangutan seizes up during isolation-check

On Thu, Jan 15, 2015 at 1:04 AM, Noah Misch  wrote:
> On Wed, Jan 14, 2015 at 04:48:53PM -0500, Peter Eisentraut wrote:
>> What I'm seeing now is that the unaccent regression tests when run under
>> make check-world abort with
>>
>> FATAL:  postmaster became multithreaded during startup
>> HINT:  Set the LC_ALL environment variable to a valid locale.
>
> contrib/unaccent/Makefile sets NO_LOCALE=1, so that makes sense.  I expect the
> patch over here will fix it:
> http://www.postgresql.org/message-id/20150109063015.ga2491...@tornado.leadboat.com

I just hit this same problem; are you going to commit that patch soon?
 It's rather annoying to have make check-world fail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

and...@anarazel.de (Andres Freund) writes:
> On 2015-01-14 19:31:18 -0500, Tom Lane wrote:
>> Andres Freund  writes:
>>> Right now I think a #ifdef/undef S_UNLOCK in the relevant gcc section
>>> sufficient and acceptable. It's after all the HPPA section that doesn't
>>> really play by the rules.

>> Works for me.

> Pushed something like that. Gaur has the note 'Runs infrequently' - I'm
> not sure whether that means we'll see the results anytime soon...

That means it runs when I boot it up and launch a run ;-) ... the
machine's old enough (and noisy enough) that I don't want to leave
it turned on 24x7.

I've launched a run now, expect results from gcc HEAD in an hour and
a half or so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

On Thu, Jan 15, 2015 at 6:52 PM, Robert Haas  wrote:
>
> On Thu, Jan 15, 2015 at 7:00 AM, Amit Kapila 
wrote:
> > +HandleParallelMessages(void)
> > +{
> > ..
> > ..
> > + for (i = 0; i < pcxt->nworkers; ++i)
> > + {
> > + /*
> > + * Read messages for as long as we have an error queue; if we
> > + * have hit (or hit while reading) ReadyForQuery, this will go to
> > + * NULL.
> > + */
> > + while (pcxt->worker[i].error_mqh != NULL)
> > + {
> > + shm_mq_result res;
> > +
> > + CHECK_FOR_INTERRUPTS();
> > +
> > + res = shm_mq_receive(pcxt->worker[i].error_mqh, &nbytes,
> > + &data, true);
> > + if (res == SHM_MQ_SUCCESS)
> >
> > Here we are checking the error queue for all the workers and this loop
> > will continue untill all have sent ReadyForQuery() message ('Z') which
> > will make this loop continue till all workers have finished their work.
> > Assume situation where first worker has completed the work and sent
> > 'Z' message and second worker is still sending some tuples, now above
> > code will keep on waiting for 'Z' message from second worker and won't
> > allow to receive tuples sent by second worker till it send 'Z' message.
> >
> > As each worker send its own 'Z' message after completion, so ideally
> > the above code should receive the message only for worker which has
> > sent the message.  I think for that it needs worker information who has
> > sent the message.
>
> Are you talking about HandleParallelMessages() or
> WaitForParallelWorkersToFinish()?  The former doesn't wait for
> anything; it just handles any messages that are available now.

I am talking about HandleParallelMessages().  It doesn't wait but
it is looping which will make it run for longer time as explained
above.  Just imagine a case where there are two workers and first
worker has sent 'Z' message and second worker is doing some
work, now in such a scenario loop will not finish until second worker
also send 'Z' message or error.  Am I missing something?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Thu, Jan 15, 2015 at 6:04 AM, Heikki Linnakangas
 wrote:
> On 01/15/2015 03:23 AM, Peter Geoghegan wrote:
>>
>> So now the question is: how did that inconsistency arise? It didn't
>> necessarily arise at the time of the (presumed) split of block 2 to
>> create 9. It could be that the opaque area was changed by something
>> else, some time later. I'll investigate more.
>
>
> Merlin, could you re-run the test with a WAL archive (if you don't have one
> already), and then run pg_xlogdump, filtering it to show only the changes to
> the index? That should show us how the index got to be the way it is. Also,
> if you could post a copy of the raw relation file for pg_class_oid_index; I
> assume it's not too large.
>
> Something like:
>
> pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917
>
> 11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd
> cluster. In case it's not the same on your system, you can use oid2name to
> find it out.

I'm on it.  Will try this first, then patch removal.

Question: Coming in this morning I did an immediate restart and logged
into the database and queried pg_class via index.   Everything was
fine, and the leftright verify returns nothing.  How did it repair
itself without a reindex?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

On 2015-01-15 08:40:34 -0500, Robert Haas wrote:
> I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
> or palloc_null() or palloc_no_oom() or palloc_unsafe().

palloc_or_null()?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safe memory allocation functions

On Wed, Jan 14, 2015 at 9:42 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Tue, Jan 13, 2015 at 10:10 AM, Tom Lane  wrote:
>> > However, there is a larger practical problem with this whole concept,
>> > which is that experience should teach us to be very wary of the assumption
>> > that asking for memory the system can't give us will just lead to nice
>> > neat malloc-returns-NULL behavior.  Any small perusal of the mailing list
>> > archives will remind you that very often the end result will be SIGSEGV,
>> > OOM kills, unrecoverable trap-on-write when the kernel realizes it can't
>> > honor a copy-on-write promise, yadda yadda.  Agreed that it's arguable
>> > that these only occur in misconfigured systems ... but misconfiguration
>> > appears to be the default in a depressingly large fraction of systems.
>> > (This is another reason for "_safe" not being the mot juste :-()
>>
>> I don't really buy this.  It's pretty incredible to think that after a
>> malloc() failure there is absolutely no hope of carrying on sanely.
>> If that were true, we wouldn't be able to ereport() out-of-memory
>> errors at any severity less than FATAL, but of course it doesn't work
>> that way.  Moreover, AllocSetAlloc() contains malloc() and, if that
>> fails, calls malloc() again with a smaller value, without even
>> throwing an error.
>
> I understood Tom's point differently: instead of malloc() failing,
> malloc() will return a supposedly usable pointer, but later usage of it
> will lead to a crash of some sort.  We know this does happen in reality,
> because people do report it; but we also know how to fix it.  And for
> systems that have been correctly set up, the new behavior (using some
> plan B for when malloc actually fails instead of spuriously succeeding
> only to cause a later crash) will be much more convenient.

Hmm, I understood Tom to be opposing the idea of a palloc variant that
returns NULL on failure, and I understand you to be supporting it.
But maybe I'm confused.  Anyway, I support it.  I agree that there are
systems (or circumstances?) where malloc is going to succeed and then
the world will blow up later on anyway, but I don't think that means
that an out-of-memory error is the only sensible response to a palloc
failure; returning NULL seems like a sometimes-useful alternative.

I do think that "safe" is the wrong suffix.  Maybe palloc_soft_fail()
or palloc_null() or palloc_no_oom() or palloc_unsafe().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb -S and tablespaces

On 2015-01-15 11:02:43 +0530, Abhijit Menon-Sen wrote:
> At 2015-01-14 11:59:08 +0100, and...@2ndquadrant.com wrote:
> >
> > > + if (ControlFile->state != DB_SHUTDOWNED &&
> > > + ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> > > + perform_fsync(data_directory);
> > > +
> > 
> > a) Please think of a slightly more descriptive name than perform_fsync
> 
> OK. (I just copied the name initdb uses, because at the time I was still
> thinking in terms of a later patch moving this to src/common.) What do
> you think of fsync_recursively? fsync_pgdata?

I like fsync_pgdata/datadir or something.

Note that I think you'll have to check/handle pg_xlog being a symlink -
we explicitly support that as a usecase...

> > c) I'm wondering if we should add fsync to the control file and also
> >perform an fsync if the last shutdown was clear, but fsync was
> >disabled.
> 
> Explain? "Add fsync to the control file" means store the value of the
> fsync GUC setting in the control file?

Yes.

> And would the fsync you mention be dependent on the setting, or unconditional?

What I am thinking of is that, currently, if you start the server for
initial loading with fsync=off, and then restart it, you're open to data
loss. So when the current config file setting is changed from off to on,
we should fsync the data directory. Even if there was no crash restart.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind in contrib


On 01/15/2015 03:21 PM, Greg Stark wrote:

I must have missed this, how did you some the hint bit problem with
pg_rewind? Last I understood you ran the risk that the server has unlogged
hint bit updates that you wouldn't know to rewind.


There's a new GUC in 9.4, wal_log_hints, for that. It has to be turned 
on for pg_rewind to work. Or data checksums must be enabled, which also 
causes hint bit updates to be logged.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas  wrote:
>
> On Sat, Jan 10, 2015 at 11:14 PM, Amit Kapila 
wrote:
> >> I don't think you should be "switching" queues.  The tuples should be
> >> sent to the tuple queue, and errors and notices to the error queue.
> > To achieve what you said (The tuples should be sent to the tuple
> > queue, and errors and notices to the error queue.), we need to
> > switch the queues.
> > The difficulty here is that once we set the queue (using
> > pq_redirect_to_shm_mq()) through which the communication has to
> > happen, it will use the same unless we change again the queue
> > using pq_redirect_to_shm_mq().  For example, assume we have
> > initially set error queue (using pq_redirect_to_shm_mq()) then to
> > send tuples, we need to call pq_redirect_to_shm_mq() to
> > set the tuple queue as the queue that needs to be used for communication
> > and again if error happens then we need to do the same for error
> > queue.
> > Do you have any other idea to achieve the same?
>
> Yeah, you need two separate global variables pointing to shm_mq
> objects, one of which gets used by pqmq.c for errors and the other of
> which gets used by printtup.c for tuples.
>

Okay, I will try to change the way as suggested without doing
switching, but this way we need to do it separately for 'T', 'D', and
'C' messages.

I have moved this patch to next CF as apart from above still I
have to work on execution strategy and optimizer related changes
as discussed in this thread


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pg_rewind in contrib

On 2015-01-15 13:21:56 +, Greg Stark wrote:
> I must have missed this, how did you some the hint bit problem with
> pg_rewind? Last I understood you ran the risk that the server has unlogged
> hint bit updates that you wouldn't know to rewind.

wal_log_hints = on

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

On Thu, Jan 15, 2015 at 7:00 AM, Amit Kapila  wrote:
> On Tue, Jan 13, 2015 at 1:33 AM, Robert Haas  wrote:
>> On Thu, Jan 8, 2015 at 6:52 AM, Amit Kapila 
>> wrote:
>> > + seg = dsm_attach(DatumGetInt32(main_arg));
>> >
>> > Here, I think DatumGetUInt32() needs to be used instead of
>> > DatumGetInt32() as the segment handle is uint32.
>>
>> OK, I'll change that in the next version.
>>
>
> No issues, I have another question related to below code:
>
> +HandleParallelMessages(void)
> +{
> ..
> ..
> + for (i = 0; i < pcxt->nworkers; ++i)
> + {
> + /*
> + * Read messages for as long as we have an error queue; if we
> + * have hit (or hit while reading) ReadyForQuery, this will go to
> + * NULL.
> + */
> + while (pcxt->worker[i].error_mqh != NULL)
> + {
> + shm_mq_result res;
> +
> + CHECK_FOR_INTERRUPTS();
> +
> + res = shm_mq_receive(pcxt->worker[i].error_mqh, &nbytes,
> + &data, true);
> + if (res == SHM_MQ_SUCCESS)
>
> Here we are checking the error queue for all the workers and this loop
> will continue untill all have sent ReadyForQuery() message ('Z') which
> will make this loop continue till all workers have finished their work.
> Assume situation where first worker has completed the work and sent
> 'Z' message and second worker is still sending some tuples, now above
> code will keep on waiting for 'Z' message from second worker and won't
> allow to receive tuples sent by second worker till it send 'Z' message.
>
> As each worker send its own 'Z' message after completion, so ideally
> the above code should receive the message only for worker which has
> sent the message.  I think for that it needs worker information who has
> sent the message.

Are you talking about HandleParallelMessages() or
WaitForParallelWorkersToFinish()?  The former doesn't wait for
anything; it just handles any messages that are available now.  The
latter does wait for all workers to finish, but the intention is that
you only call it when you're ready to wind up the entire parallel
operation, so that's OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind in contrib

2015-01-15 Thread Greg Stark

I must have missed this, how did you some the hint bit problem with
pg_rewind? Last I understood you ran the risk that the server has unlogged
hint bit updates that you wouldn't know to rewind.

Re: [HACKERS] Minor configure tweak to simplify adjusting gcc warnings

On 2015-01-14 09:34:23 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2015-01-13 22:19:30 -0500, Tom Lane wrote:
> >> A slightly more complicated change could be applied to make sure that
> >> *all* of the CFLAGS forcibly inserted by configure appear before any
> >> externally-sourced CFLAGS, allowing any of them to be overridden from the
> >> environment variable.  I'm not sure if it's worth the trouble to do that,
> >> but if there's interest I could make it happen.
> 
> > I think it'd be good idea, but unless you're enthusiastic I guess there
> > are more important things.
> 
> Nah, I'm fine with doing it, it's just a couple more lines of code.
> I feared people might think it wasn't worth adding extra complexity for,
> but as long as someone else likes the idea I'll go do it.

FWIW, if we moved the
CFLAGS="$CFLAGS $user_CFLAGS"
further down, it'd have advantage that compiling with -Werror would be
more realistic. Right now doing so breaks about half of the feature
checking configure checks because of warnings. E.g. on my platform it
fails to detect 64bit integers, inline, ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] s_lock.h default definitions are rather confused

On 2015-01-14 19:31:18 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Right now I think a #ifdef/undef S_UNLOCK in the relevant gcc section
> > sufficient and acceptable. It's after all the HPPA section that doesn't
> > really play by the rules.
> 
> Works for me.

Pushed something like that. Gaur has the note 'Runs infrequently' - I'm
not sure whether that means we'll see the results anytime soon...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] can you have any idea about toast missing chunk issu resolution

2015-01-15 Thread M Tarkeshwar Rao

Hi all,

We are getting following error message on doing any action on the table 
like(Select or open from pgadmin).

Please suggest.

ERROR:  missing chunk number 0 for toast value 54787 in pg_toast_2619
** Error **

ERROR: missing chunk number 0 for toast value 54787 in pg_toast_2619
SQL state: XX000


CREATE TABLE mm_activealarm
(
  alarm_id integer NOT NULL,
  source_address character varying(255) NOT NULL,
  alarm_instance_id integer NOT NULL,
  alarm_raise_time bigint,
  alarm_update_time bigint,
  alarm_cease_time bigint,
  alarm_count integer,
  alarm_severity integer NOT NULL,
  source_type character varying(40) NOT NULL,
  alarm_state integer NOT NULL,
  event_type integer,
  notification_id integer NOT NULL,
  probable_cause integer NOT NULL,
  specific_problem integer NOT NULL,
  alarm_additional_text character varying(10240),
  alarm_ack_time bigint,
  alarm_ack_user character varying(100) NOT NULL,
  alarm_ack_system character varying(100) NOT NULL,
  alarm_proposed_repair_action character varying(10240) NOT NULL,
  CONSTRAINT mm_activealarm_pk PRIMARY KEY (alarm_id, source_address)
  USING INDEX TABLESPACE mgrdata
)
WITH (
  OIDS=FALSE
)
TABLESPACE mgrdata;
ALTER TABLE ss_activealarm
  OWNER TO ss_super;

Regards
Tarkeshwar

Re: [HACKERS] ereport bug

2015-01-15 Thread Воронин Дмитрий

Hello all

I see your patch, Tom. It works on my postgres 9.4 (intel x86_64). I write a 
letter if I can test it on others platform. 

Thank you!

14.01.2015, 21:40, "Tom Lane" :
> Robert Haas  writes:
>>  On Mon, Jan 12, 2015 at 6:27 AM, Dmitry Voronin
>>   wrote:
>>>  I am attaching to this letter a test case that shows the behavior
>>>  errcontext() macro and the way to fix it.
>>  So the upshot of this is that given errfinish(A, B, C), where A, B,
>>  and C are expressions, my gcc is choosing to evaluate C, then B, then
>>  A, then the errfinish call itself.  But whoever wrote the errcontext()
>>  macro evidently thought, in this kind of situation, the compiler would
>>  be certain to evaluate A, then B, then C, then errfinish.  But it
>>  doesn't.
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1f9bf05e539646103c518bcbb49c04919b238f7a
>
> regards, tom lane

-- 
С уважением, Дмитрий Воронин


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

Alexander Korotkov wrote:
> I'm not sure. On the one hand it's unclear why fillfactor should be
> different from 9.4.
> On the other hand it's unclear why it should be different from btree.
> I propose marking this "ready for committer". So, committer can make a final
> decision.
OK let's do so then. My preference is to fully pack the index at
build. GIN compression has been one of the headlines of 9.4.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop


On 01/15/2015 03:23 AM, Peter Geoghegan wrote:

So now the question is: how did that inconsistency arise? It didn't
necessarily arise at the time of the (presumed) split of block 2 to
create 9. It could be that the opaque area was changed by something
else, some time later. I'll investigate more.


Merlin, could you re-run the test with a WAL archive (if you don't have 
one already), and then run pg_xlogdump, filtering it to show only the 
changes to the index? That should show us how the index got to be the 
way it is. Also, if you could post a copy of the raw relation file for 
pg_class_oid_index; I assume it's not too large.


Something like:

pg_xlogdump -r Btree -p walarchive/ -s 0/20035D0 | grep 11917

11917 is the relfilenode of pg_class_oid_index on a freshly initdb'd 
cluster. In case it's not the same on your system, you can use oid2name 
to find it out.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

On Tue, Jan 13, 2015 at 1:33 AM, Robert Haas  wrote:
>
> On Thu, Jan 8, 2015 at 6:52 AM, Amit Kapila 
wrote:
> > + seg = dsm_attach(DatumGetInt32(main_arg));
> >
> > Here, I think DatumGetUInt32() needs to be used instead of
> > DatumGetInt32() as the segment handle is uint32.
>
> OK, I'll change that in the next version.
>

No issues, I have another question related to below code:

+HandleParallelMessages(void)
+{
..
..
+ for (i = 0; i < pcxt->nworkers; ++i)
+ {
+ /*
+ * Read messages for as long as we have an error queue; if we
+ * have hit (or hit while reading) ReadyForQuery, this will go to
+ * NULL.
+ */
+ while (pcxt->worker[i].error_mqh != NULL)
+ {
+ shm_mq_result res;
+
+ CHECK_FOR_INTERRUPTS();
+
+ res = shm_mq_receive(pcxt->worker[i].error_mqh, &nbytes,
+ &data, true);
+ if (res == SHM_MQ_SUCCESS)

Here we are checking the error queue for all the workers and this loop
will continue untill all have sent ReadyForQuery() message ('Z') which
will make this loop continue till all workers have finished their work.
Assume situation where first worker has completed the work and sent
'Z' message and second worker is still sending some tuples, now above
code will keep on waiting for 'Z' message from second worker and won't
allow to receive tuples sent by second worker till it send 'Z' message.

As each worker send its own 'Z' message after completion, so ideally
the above code should receive the message only for worker which has
sent the message.  I think for that it needs worker information who has
sent the message.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Overhauling our interrupt handling

Hi,

On 2015-01-15 15:05:08 +0900, Kyotaro HORIGUCHI wrote:
> Hello, I'd synced up this at last.
> 
> I think I should finilize my commitfest item for this issue, with
> .. "Rejected"?

Fine with me.

> > All the patches in the series up to 0008 hav ecommit messages providing
> > more detail. A short description of each patch follows:
> > 
> > 0001: Replace walsender's latch with the general shared latch.
> > 
> >   New patch that removes ImmediateInteruptOK behaviour from walsender. I
> >   think that's a rather good idea, because walsender currently seems to
> >   assume WaitLatchOrSocket is reentrant - which I don't think is really
> >   guaranteed.
> >   Hasn't been reviewed yet, but I think it's not far from being
> >   committable.
> 
> Deesn't this patchset containing per-socket basis non-blocking
> control for win32? It should make the code (above the win32
> socket layer itself) more simpler.

I don't think so - we still rely on it unfortunately.

> > 0004: Process 'die' interrupts while reading/writing from the client socket.
> > 
> >   This is the reason Horiguchi-san started this thread.
> > 
> >   I think the important debate here is whether we think it's
> >   acceptable that there are situations (a full socket buffer, but a
> >   alive connection) where the client previously got an error, but
> >   not anymore afterwards. I think that's much better than having
> >   unkillable connections, but I can see others being of a different
> >   opinion.
> 
> 
> This patch yields a code a bit confusion like following.
> 
> | secure_raw_write(Port *port, const void *ptr, size_t len)
> | {
> ..
> | w = WaitLatchOrSocket(MyLatch,
> | if (w & WL_LATCH_SET)
> ...
> | errno = EINTR;
> | else if (w & WL_SOCKET_WRITEABLE)
> | goto wloop;
> |
> | errno = save_errno;
> 
> The errno set when WL_LATCH_SET always vanishes. Specifically,
> the EINTR set by SIGTERM(WL_LATCH_SET) is overwritten by
> EAGAIN. As the result, pg_terminte_backend() cannot kill the
> backend till the write blocking is released. errno = save_errno
> should be the alternative of the line "errno = EINTR" and I
> confirmed that the change leads to the desirable (as of me)
> behavior.

Ugh, that's the result stupid last minute "cleanup". You're right.

> > 0006: Don't allow immediate interupts during authentication anymore.
> > 
> >   So far we've set ImmediateInterruptOK to true during large parts
> >   of the client authentication - that's not all that pretty,
> >   interrupts might arrive while we're in some system routines.
> > 
> >   Due to patches 0003/0004 we now are able to safely serve
> >   interrupts during client communication which is the major are
> >   where we want to adhere to authentication_timeout.
> > 
> >   I additionally placed some CHECK_FOR_INTERRUPTS()s in some
> >   somewhat randomly chosen places in auth.c. Those don't completely
> >   get back the previous 'appruptness' (?) of reacting to
> >   interrupts, but that's partially for the better, because we don't
> >   interrupt foreign code anymore.
> 
> Simplly as a comment on style, this patch introduces checks of
> ClientAuthInProgress twice successively into
> ProcessInterrupts(). Isn't it better to make it like following?
> 
> | /* As in quickdie, ...
> | if (ClientAuthInProgress)
> | {
> |if (whereToSendOutput == DestRemote) whereToSendOutput = DestNone;
> |ereport(FATAL,

Hm, yes.

> > 0008: Remove remnants of ImmediateInterruptOK handling.
> > 
> >   Now that ImmediateInterruptOK is never set to true anymore, we can
> >   remove related code and comments.
> > 
> >   New and not reviewed.
> 
> walreceiver.c still has WalRcvImmediateInterruptOK as mentioned
> below, apart from whether it should be changed or not, the
> following comment remains.

> | * This is very much like what regular backends do with ImmediateInterruptOK,
> | * ProcessInterrupts() etc.

Yep. As mentioned below, it doesn't use the same infrastructure, so I'd
rather treat this separately. This set is more than big enough.

Thanks for looking!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Bug in pg_dump

2015-01-15 Thread Gilles Darold

Hello,

There's a long pending issue with pg_dump and extensions that have table
members with foreign keys. This was previously reported in this thread
http://www.postgresql.org/message-id/ca+tgmoyvzkadmgh_8el7uvm472geru0b4pnnfjqye6ss1k9...@mail.gmail.com
and discuss by Robert. All PostgreSQL users that use the PostGis
extension postgis_topology are facing the issue because the two members
tables (topology and layer) are linked by foreign keys.

If you dump a database with this extension and try to import it you will
experience this error:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3345; 0 157059176
TABLE DATA layer gilles
pg_restore: [archiver (db)] COPY failed for table "layer": ERROR: 
insert or update on table "layer" violates foreign key constraint
"layer_topology_id_fkey"
DETAIL:  Key (topology_id)=(1) is not present in table "topology".
WARNING: errors ignored on restore: 1


The problem is that, whatever export type you choose (plain/custom and
full-export/data-only) the data of tables "topology" and "layer" are
always exported in alphabetic order. I think this is a bug because
outside extension, in data-only export, pg_dump is able to find foreign
keys dependency and dump table's data in the right order but not with
extension's members. Default is alphabetic order but that should not be
the case with extension's members because constraints are recreated
during the CREATE EXTENSION order. I hope I am clear enough.

Here we have three solutions:

1/ Inform developers of extensions to take care to alphabetical
order when they have member tables using foreign keys.
2/ Inform DBAs that they have to restore the failing table
independently. The use case above can be resumed using the following
command:

 pg_restore -h localhost -n topology -t layer -Fc -d
testdb_empty testdump.dump

3/ Inform DBAs that they have to restore the schema first then the
data only using --disable-triggers
4/ Patch pg_dump to solve this issue.

I attach a patch that solves the issue in pg_dump, let me know if it
might be included in Commit Fest or if the three other solutions are a
better choice. I also join a sample extension (test_fk_in_ext) to be
able to reproduce the issue and test the patch. Note that it might
exists a simpler solution than the one I used in this patch, if this is
the case please point me on the right way, I will be pleased to rewrite
and send an other patch.

In the test extension attached, there is a file called
test_fk_in_ext/SYNOPSIS.txt that describe all actions to reproduce the
issue and test the patch. Here is the SQL part of the test extension:

CREATE TABLE IF NOT EXISTS b_test_fk_in_ext1 (
id int PRIMARY KEY
);

CREATE TABLE IF NOT EXISTS a_test_fk_in_ext1 (
id int REFERENCES b_test_fk_in_ext1(id)
);

SELECT pg_catalog.pg_extension_config_dump('b_test_fk_in_ext1', '');
SELECT pg_catalog.pg_extension_config_dump('a_test_fk_in_ext1', '');



Best regards,

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index dc062e6..49889ce 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -209,6 +209,7 @@ static void addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 
 static void getDomainConstraints(Archive *fout, TypeInfo *tyinfo);
 static void getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids);
+static bool hasExtensionMember(TableInfo *tblinfo, int numTables);
 static void makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo, bool oids);
 static void buildMatViewRefreshDependencies(Archive *fout);
 static void getTableDataFKConstraints(void);
@@ -730,9 +731,17 @@ main(int argc, char **argv)
 
 	if (!dopt.schemaOnly)
 	{
+		bool has_ext_member;
+
 		getTableData(&dopt, tblinfo, numTables, dopt.oids);
+		/* Search if there is dumpable tables member of and extension */
+		has_ext_member = hasExtensionMember(tblinfo, numTables);
 		buildMatViewRefreshDependencies(fout);
-		if (dopt.dataOnly)
+		/*
+		 * Always get FK constraints even with schema+data, extension's
+		 * members can have FK so tables need to be dump-ordered.
+		 */
+		if (dopt.dataOnly || has_ext_member)
 			getTableDataFKConstraints();
 	}
 
@@ -1852,6 +1861,25 @@ getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids)
 }
 
 /*
+ * hasExtensionMember -
+ *	  set up dumpable objects representing the contents of tables
+ */
+static bool
+hasExtensionMember(TableInfo *tblinfo, int numTables)
+{
+	int			i;
+
+	for (i = 0; i < numTables; i++)
+	{
+		if (tblinfo[i].dobj.ext_member)
+			return true;
+	}
+
+	return false;
+}
+
+
+/*
  * Make a dumpable object for the data of this specific table
  *
  * Note: we make a TableDataInfo if and only if we are going to dump the
@@ -2024,12 +2052,14 @@ buildMatViewRe

Re: [HACKERS] Check that streaming replica received all data after master shutdown

2015-01-15 Thread Kyotaro HORIGUCHI

Hi,

> On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas  > wrote:
> 
> > On 01/13/2015 12:11 PM, Vladimir Borodin wrote:
> >
> >>
> >> 05 янв. 2015 г., в 18:15, Vladimir Borodin  написал(а):
> >>
> >>  Hi all.
> >>>
> >>> I have a simple script for planned switchover of PostgreSQL (9.3 and
> >>> 9.4) master to one of its replicas. This script checks a lot of things
> >>> before doing it and one of them is that all data from master has been
> >>> received by replica that is going to be promoted. Right now the check is
> >>> done like below:
> >>>
> >>> On the master:
> >>>
> >>> postgres@pgtest03d ~ $ psql -t -A -c 'select
> >>> pg_current_xlog_location();'
> >>> 0/3390
> >>> postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
> >>> waiting for server to shut down done
> >>> server stopped
> >>> postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
> >>> pg_control version number:937
> >>> Catalog version number:   201306121
> >>> Database system identifier:   6061800518091528182
> >>> Database cluster state:   shut down
> >>> pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
> >>> Latest checkpoint location:   0/3428
> >>> Prior checkpoint location:0/3328
> >>> Latest checkpoint's REDO location:0/3428
> >>> Latest checkpoint's REDO WAL file:001B0034
> >>> Latest checkpoint's TimeLineID:   27
> >>> postgres@pgtest03d ~ $
> >>>
> >>> On the replica (after shutdown of master):
> >>>
> >>> postgres@pgtest03g ~ $ psql -t -A -c "select
> >>> pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/3428');"
> >>> 104
> >>> postgres@pgtest03g ~ $
> >>>
> >>> These 104 bytes seems to be the size of shutdown checkpoint record (as I
> >>> can understand from pg_xlogdump output).
> >>>
> >>> postgres@pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump
> >>> -s 0/3390 -t 27
> >>> rmgr: XLOGlen (rec/tot):  0/32, tx:  0, lsn:
> >>> 0/3390, prev 0/3328, bkp: , desc: xlog switch
> >>> rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn:
> >>> 0/3428, prev 0/3390, bkp: , desc: checkpoint: redo 0/3428;
> >>> tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0;
> >>> oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
> >>> shutdown
> >>> pg_xlogdump: FATAL:  error in WAL record at 0/3428: record with zero
> >>> length at 0/3490
> >>>
> >>> postgres@pgtest03g ~/9.3/data/pg_xlog $
> >>>
> >>> I’m not sure that these 104 bytes will always be 104 bytes to have a
> >>> strict equality while checking. Could it change in the future? Or is there
> >>> a better way to understand that streaming replica received all data after
> >>> master shutdown? The check that pg_xlog_location_diff returns 104 bytes
> >>> seems a bit strange.
> >>>
> >>
> > Don't rely on it being 104 bytes. It can vary across versions, and across
> > different architectures.
> >
> > You could simply check that the standby's pg_last_xlog_replay_location() >
> > master's "Latest checkpoint location", and not care about the exact
> > difference.
> >
> 
>  I believe there were some changes made in v9.3 which will wait for pending
> WALs to be replicated before a fast and smart shutdown (of master) can
> close the replication connection.
> 
> http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459

I don't understand the relation between it and 104 bytes, it says
that the change is backpatched up to 9.1. Since it assures all
xlog records to be transferred if no trouble happens. Relying on
the mechanism, you don't need to check that if master is known to
have gracefully shut down and had no trouble around the
environment. Judging from that you want this check, I suppose
you're not guaranteed not to have trouble or not trusting the
mechanism itself.

Given the condition, as Alvaro said upthread, verifying that the
last record is a shutdown checkpoint should raise a lot the
chance for the all record being received except for the exteme
case such that the master have upped and downed while replication
connection cannot be made. For the case, I think there's no means
to confirm that by standby alone, you should at least compare the
next LSN to the last xlog record with the old master by any
means. Or doing any sanity check of the database on the standby
utilizing the nature of the data instead?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-01-15 Thread Alexander Korotkov

On Thu, Jan 15, 2015 at 10:19 AM, Michael Paquier  wrote:

> On Thu, Jan 8, 2015 at 2:03 PM, Michael Paquier
>  wrote:
> > On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov 
> wrote:
> >> On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier <
> michael.paqu...@gmail.com>
> >>> I am attaching an updated patch, with the default fillfactor value at
> >>> 75%, and with the page split code using the fillfactor rate.
> >>> Thoughts?
> >> Rewritten version of patch is attached. I made following changes:
> >
> > Thanks! With this patch (and my previous version as well) GIN indexes
> > with default fillfactor have a size higher than 9.4 indexes, 9.4
> > behavior being consistent only with fillfactor=100 and not the default
> > of 90. Are we fine with that?
> IMO, this patch has value to control random updates on GIN indexes,
> but we should have a default fillfactor of 100 to have index size
> consistent with 9.4. Thoughts?
>

I'm not sure. On the one hand it's unclear why fillfactor should be
different from 9.4.
On the other hand it's unclear why it should be different from btree.
I propose marking this "ready for committer". So, committer can make a
final decision.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] hung backends stuck in spinlock heavy endless loop

On Wed, Jan 14, 2015 at 8:50 PM, Peter Geoghegan wrote:
> I am mistaken on one detail here - blocks 2 and 9 are actually fully
> identical. I still have no idea why, though.

So, I've looked at it in more detail and it appears that the page of
block 2 split at some point, thereby creating a new page (the block 9
page). There is a sane downlink in the root page for the new rightlink
page. The root page looks totally sane, as does every other page - as
I said, the problem is only that block 9 is spuriously identical to
block 2. So the (correct) downlink in the root page, to block 9, is
the same as the (incorrect) high key value in block 9 - Oid value
69924. To be clear: AFAICT everything is perfect except block 9, which
is bizarrely identical to block 2.

Now, since the sane page downlink located in the root (like every
downlink, a lower bound on items in its child) is actually a copy of
the high key on the page that is the child's left link (that is to
say, it comes from the original target of a page split - it shares the
target's high key value, Oid value 69924), there may have never been
sane data in block 9, even though its downlink is sane (so maybe the
page split patch is implicated). But it's hard to see how that could
be true. The relevant code wasn't really what was changed about page
splits in 9.4 anyway (plus this wasn't a non-leaf split, since there
aren't enough pages for those to be a factor). There just isn't that
many items on page 2 (or its bizarre identical twin, page 9), so a
recent split seems unlikely. And, the target and new right page are
locked together throughout both the split and down link insertion
(even though there are two atomic operations/WAL inserts). So to
reiterate, a close by page split that explains the problem seems
unlikely.

I'm going to focus on the page deletion patch for the time being.
Merlin - it would be great if you could revert all the page split
commits (which came after the page deletion fix). All the follow-up
page split commits [1] were fairly straightforward bugs with recovery,
so it should be easy enough to totally remove the page split stuff
from 9.4 for the purposes of isolating the bug.

[1]
http://www.postgresql.org/message-id/cam3swzspj6m9hfhksjuiuof30auwxyyb56fjbw1_dogqkbe...@mail.gmail.com
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Check that streaming replica received all data after master shutdown

2015-01-15 Thread Sameer Kumar

On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas  wrote:

> On 01/13/2015 12:11 PM, Vladimir Borodin wrote:
>
>>
>> 05 янв. 2015 г., в 18:15, Vladimir Borodin  написал(а):
>>
>>  Hi all.
>>>
>>> I have a simple script for planned switchover of PostgreSQL (9.3 and
>>> 9.4) master to one of its replicas. This script checks a lot of things
>>> before doing it and one of them is that all data from master has been
>>> received by replica that is going to be promoted. Right now the check is
>>> done like below:
>>>
>>> On the master:
>>>
>>> postgres@pgtest03d ~ $ psql -t -A -c 'select
>>> pg_current_xlog_location();'
>>> 0/3390
>>> postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
>>> waiting for server to shut down done
>>> server stopped
>>> postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
>>> pg_control version number:937
>>> Catalog version number:   201306121
>>> Database system identifier:   6061800518091528182
>>> Database cluster state:   shut down
>>> pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
>>> Latest checkpoint location:   0/3428
>>> Prior checkpoint location:0/3328
>>> Latest checkpoint's REDO location:0/3428
>>> Latest checkpoint's REDO WAL file:001B0034
>>> Latest checkpoint's TimeLineID:   27
>>> postgres@pgtest03d ~ $
>>>
>>> On the replica (after shutdown of master):
>>>
>>> postgres@pgtest03g ~ $ psql -t -A -c "select
>>> pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/3428');"
>>> 104
>>> postgres@pgtest03g ~ $
>>>
>>> These 104 bytes seems to be the size of shutdown checkpoint record (as I
>>> can understand from pg_xlogdump output).
>>>
>>> postgres@pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump
>>> -s 0/3390 -t 27
>>> rmgr: XLOGlen (rec/tot):  0/32, tx:  0, lsn:
>>> 0/3390, prev 0/3328, bkp: , desc: xlog switch
>>> rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn:
>>> 0/3428, prev 0/3390, bkp: , desc: checkpoint: redo 0/3428;
>>> tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0;
>>> oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
>>> shutdown
>>> pg_xlogdump: FATAL:  error in WAL record at 0/3428: record with zero
>>> length at 0/3490
>>>
>>> postgres@pgtest03g ~/9.3/data/pg_xlog $
>>>
>>> I’m not sure that these 104 bytes will always be 104 bytes to have a
>>> strict equality while checking. Could it change in the future? Or is there
>>> a better way to understand that streaming replica received all data after
>>> master shutdown? The check that pg_xlog_location_diff returns 104 bytes
>>> seems a bit strange.
>>>
>>
> Don't rely on it being 104 bytes. It can vary across versions, and across
> different architectures.
>
> You could simply check that the standby's pg_last_xlog_replay_location() >
> master's "Latest checkpoint location", and not care about the exact
> difference.
>
>
>

I believe there were some changes made in v9.3 which will wait for pending
WALs to be replicated before a fast and smart shutdown (of master) can
close the replication connection.

http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459

Re: [HACKERS] advance local xmin more aggressively

On Mon, Dec 22, 2014 at 7:31 PM, Heikki Linnakangas
 wrote:
> Here's an updated version, rebased over the pairing heap code that I just
> committed, and fixing those bugs.
So, are we reaching an outcome for the match happening here?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HINTing on UPDATE foo SET foo.bar = ..;

On Thu, Jan 8, 2015 at 2:28 AM, Marko Tiikkaja  wrote:
> Yeah.  (The CF entry is also set to Waiting on Author, which seems
> appropriate.)
Seeing nothing happening here for quite some time, marked as returned
with feedback..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

On Sun, Jan 4, 2015 at 10:57 AM, Andres Freund  wrote:
> On 2014-12-31 18:35:38 +0530, Amit Kapila wrote:
>> +  -j > class="parameter">jobs
>> +  --jobs=> class="parameter">njobs
>> +  
>> +   
>> +Number of concurrent connections to perform the operation.
>> +This option will enable the vacuum operation to run on asynchronous
>> +connections, at a time one table will be operated on one connection.
>> +So at one time as many tables will be vacuumed parallely as number 
>> of
>> +jobs.  If number of jobs given are more than number of tables then
>> +number of jobs will be set to number of tables.
>
> "asynchronous connections" isn't a very well defined term. Also, the
> second part of that sentence doesn't seem to be gramattically correct.
>
>> +   
>> +   
>> +vacuumdb will open
>> + njobs connections to 
>> the
>> +database, so make sure your 
>> +setting is high enough to accommodate all connections.
>> +   
>
> Isn't it njobs+1?
>
>> @@ -141,6 +199,7 @@ main(int argc, char *argv[])
>>   }
>>   }
>>
>> + optind++;
>
> Hm, where's that coming from?
>
>> + PQsetnonblocking(connSlot[0].connection, 1);
>> +
>> + for (i = 1; i < concurrentCons; i++)
>> + {
>> + connSlot[i].connection = connectDatabase(dbname, host, port, 
>> username,
>> +   
>> prompt_password, progname, false);
>> +
>> + PQsetnonblocking(connSlot[i].connection, 1);
>> + connSlot[i].isFree = true;
>> + connSlot[i].sock = PQsocket(connSlot[i].connection);
>> + }
>
> Are you sure about this global PQsetnonblocking()? This means that you
> might not be able to send queries... And you don't seem to be waiting
> for sockets waiting for writes in the select loop - which means you
> might end up being stuck waiting for reads when you haven't submitted
> the query.
>
> I think you might need a more complex select() loop. On nonfree
> connections also wait for writes if PQflush() returns != 0.
>
>
>> +/*
>> + * GetIdleSlot
>> + * Process the slot list, if any free slot is available then return
>> + * the slotid else perform the select on all the socket's and wait
>> + * until atleast one slot becomes available.
>> + */
>> +static int
>> +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
>> + const char *progname, bool completedb)
>> +{
>> + int i;
>> + fd_set  slotset;
>
>
> Hm, you probably need to limit -j to FD_SETSIZE - 1 or so.
>
>> + int firstFree = -1;
>> + pgsocket maxFd;
>> +
>> + for (i = 0; i < max_slot; i++)
>> + if (pSlot[i].isFree)
>> + return i;
>
>> + FD_ZERO(&slotset);
>> +
>> + maxFd = pSlot[0].sock;
>> +
>> + for (i = 0; i < max_slot; i++)
>> + {
>> + FD_SET(pSlot[i].sock, &slotset);
>> + if (pSlot[i].sock > maxFd)
>> + maxFd = pSlot[i].sock;
>> + }
>
> So we're waiting for idle connections?
>
> I think you'll have to have to use two fdsets here, and set the write
> set based on PQflush() != 0.
>
>> +/*
>> + * A select loop that repeats calling select until a descriptor in the read
>> + * set becomes readable. On Windows we have to check for the termination 
>> event
>> + * from time to time, on Unix we can just block forever.
>> + */
>
> Should a) mention why we have to check regularly on windows b) that on
> linux we don't have to because we send a cancel event from the signal
> handler.
>
>> +static int
>> +select_loop(int maxFd, fd_set *workerset)
>> +{
>> + int i;
>> + fd_set  saveSet = *workerset;
>>
>> +#ifdef WIN32
>> + /* should always be the master */
>
> Hm?
>
>
> I have to say, this is a fairly large patch for such a minor feature...
Andres, this patch needs more effort from the author, right? So
marking it as returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: multivariate statistics / proof of concept