Re: [HACKERS] checkpointer continuous flushing

2015-06-23 Thread Fabien COELHO



It'd be interesting to see numbers for tiny, without the overly small
checkpoint timeout value. 30s is below the OS's writeback time.


Here are some tests with longer timeout:

tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min
max_wal_size=1GB warmup=600 time=4000

 flsh |  full speed tps  | percent of late tx, 4 clients, for tps:
 /srt |  1 client  |  4 clients  |  100 |  200 |  400 |  800 | 1200 | 1600
 N/N  | 930 +- 124 | 2560 +- 394 | 0.70 | 1.03 | 1.27 | 1.56 | 2.02 | 2.38
 N/Y  | 924 +- 122 | 2612 +- 326 | 0.63 | 0.79 | 0.94 | 1.15 | 1.45 | 1.67
 Y/N  | 907 +- 112 | 2590 +- 315 | 0.58 | 0.83 | 0.68 | 0.71 | 0.81 | 1.26
 Y/Y  | 915 +- 114 | 2590 +- 317 | 0.60 | 0.68 | 0.70 | 0.78 | 0.88 | 1.13

There seems to be a small 1-2% performance benefit with 4 clients, this is 
reversed for 1 client, there are significantly and consistently less late 
transactions when options are activated, the performance is more stable

(standard deviation reduced by 10-18%).

The db is about 200 MB ~ 25000 pages, at 2500+ tps it is written 40 times 
over in 5 minutes, so the checkpoint basically writes everything over 220 
seconds, 0.9 MB/s. Given the preload phase the buffers may be more or less 
in order in memory, so would be written out in order.



medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
 max_wal_size=4GB warmup=1200 time=7500

 flsh |  full speed tps   | percent of late tx, 4 clients
 /srt |  1 client   |  4 clients  |   100 |   200 |   400 |
  N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
  N/Y | 458 +- 327* | 743 +- 920* |  7.05 | 14.24 | 24.07 |
  Y/N | 169 +- 166* | 187 +- 302* |  4.01 | 39.84 | 65.70 |
  Y/Y | 546 +- 143  | 681 +- 459  |  1.55 |  3.51 |  2.84 |

The effect of sorting is very positive (+150% to 270% tps). On this run, 
flushing has a positive (+20% with 1 client) or negative (-8 % with 4 
clients) on throughput, and late transactions are reduced by 92-95% when 
both options are activated.


At 550 tps checkpoints are xlog-triggered and write about 1/3 of the 
database, (17 buffers to write very 220-260 seconds, 4 MB/s).


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-23 Thread Pavel Stehule
2015-06-23 15:20 GMT+02:00 Robert Haas robertmh...@gmail.com:

 On Mon, Jun 22, 2015 at 9:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Michael Paquier michael.paqu...@gmail.com writes:
  On Tue, Jun 23, 2015 at 10:07 AM, Robert Haas wrote:
  On Mon, Jun 22, 2015 at 8:19 PM, Jim Nasby wrote:
  Anything ever happen with this? I agree that LOG is to high for
 reporting
  most (if not all) of these things.
 
  I think we should consider having a flag for this behavior rather than
  changing the behavior across the board.
  But then again, maybe we should just change it.
 
  What do others think?
 
  A GUC just for that looks like an overkill to me, this log is useful
  when debugging. And one could always have its bgworker call elog by
  itself at startup and before leaving to provide more or less similar
  information.
 
  I agree that we don't need YAGUC here, particularly not one that applies
  indiscriminately to all bgworkers.  I'd vote for just decreasing the log
  level.  The current coding is appropriate for a facility that's basically
  experimental; but as it moves towards being something that would be used
  routinely in production, the argument for being noisy in the log gets
  weaker and weaker.

 I was thinking of a background worker flag, not a GUC.
 BGWORKER_QUIET, or something like that.  But I guess we ought to just
 change it.


I have not any problem with bg worker flag. The only question is, what
should be by default.

Pavel



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] proposal: row_to_array function

2015-06-23 Thread Pavel Stehule
2015-06-23 1:56 GMT+02:00 Jim Nasby jim.na...@bluetreble.com:

 On 6/22/15 2:46 AM, Pavel Stehule wrote:


 FOREACH key, val IN RECORD myrow
 LOOP
IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') THEN
  val := val + 1; -- these variables can be mutable
  -- or maybe in futore
 myrow[key] := val + 1;
END IF;
 END LOOP;

 What is important - val is automatic variable, and it can has
 different type in any step.

 It is little bit strange, but impossible to solve, so we cannot to
 support row[var] as right value (without immutable casting). But we can
 do it with left value.


 Actually, you can (theoretically) solve it for the right value as well
 with if val is an actual type and you have operators on that type that know
 to search for a specific operator given the actual types that are involved.
 So if val is int4, val + 1 becomes int4 + int4.

 The problem I've run into with this is by the time you've added enough
 casts to make this workable you've probably created a situation where val +
 something is going to recurse back to itself. I've partially solved this in
 [1], and intend to finish it by calling back in via SPI to do the final
 resolution, the same way the RI triggers do.

 What would be a lot better is if we had better control over function and
 operator resolution.

 [1]
 https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846


The solution of dynamic operators changes philosophy about 180° - and I
afraid about a performance.

Now if I am thinking about possibilities - probably it is solvable on right
side too. It needs to solve two steps:

1. parametrized record reference syntax - some like SELECT $1[$]
2. possibility to throw plan cache, if result has different type than is
expected in cache.


Pavel



 --
 Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX

 Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-06-23 Thread Heikki Linnakangas

On 06/23/2015 07:51 AM, Michael Paquier wrote:

So... Attached are a set of patches dedicated at fixing this issue:


Thanks for working on this!


- 0001, add if_not_exists to pg_tablespace_location, returning NULL if
path does not exist
- 0002, same with pg_stat_file, returning NULL if file does not exist
- 0003, same with pg_read_*file. I added them to all the existing
functions for consistency.
- 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
(thanks Robert for the naming!)
- 0005, as things get complex, a set of regression tests aimed to
covering those things. pg_tablespace_location is platform-dependent,
so there are no tests for it.
- 0006, the fix for pg_rewind, using what has been implemented before.


With thes patches, pg_read_file() will return NULL for any failure to 
open the file, which makes pg_rewind to assume that the file doesn't 
exist in the source server, and will remove the file from the 
destination. That's dangerous, those functions should check specifically 
for ENOENT.


There's still a small race condition with tablespaces. If you run CREATE 
TABLESPACE in the source server while pg_rewind is running, it's 
possible that the recursive query that pg_rewind uses sees the symlink 
in pg_tblspc/ directory, but its snapshot doesn't see the row in 
pg_tablespace yet. It will think that the symlink is a regular file, try 
to read it, and fail (if we checked for ENOENT).


Actually, I think we need try to deal with symlinks a bit harder. 
Currently, pg_rewind assumes that anything in pg_tblspace that has a 
matching row in pg_tablespace is a symlink, and nothing else is. I think 
symlinks to directories. I just noticed that pg_rewind fails miserable 
if pg_xlog is a symlink, because of that:



The servers diverged at WAL position 0/3023F08 on timeline 1.
Rewinding from last common checkpoint at 0/260 on timeline 1

data-master//pg_xlog is not a directory
Failure, exiting


I think we need to add a column to pg_stat_file output, to indicate 
symbolic links, and add a pg_readlink() function. That still leaves a 
race condition if the type of a file changes, i.e. a file is deleted and 
a directory with the same name is created in its place, but that seems 
acceptable. I don't think PostgreSQL ever does such a thing, so that 
could only happen if you mess with the data directory manually while the 
server is running.


I just realized another problem: We recently learned the hard way that 
some people have files in the data directory that are not writeable by 
the 'postgres' user 
(http://www.postgresql.org/message-id/20150523172627.ga24...@msg.df7cb.de). 
pg_rewind will try to overwrite all files it doesn't recognize as 
relation files, so it's going to fail on those. A straightforward fix 
would be to first open the destination file in read-only mode, and 
compare its contents, and only open the file in write mode if it has 
changed. It would still fail when the files really differ, but I think 
that's acceptable.


I note that pg_rewind doesn't need to distinguish between an empty and a 
non-existent directory, so it's quite silly for it to pass 
include_dot_dirs=true, and then filter out . and .. from the result 
set. The documentation should mention the main reason for including . 
and ..: to distinguish between an empty and non-existent directory.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-06-23 Thread Heikki Linnakangas

On 06/23/2015 05:03 PM, Fujii Masao wrote:

On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

On 06/23/2015 07:51 AM, Michael Paquier wrote:


So... Attached are a set of patches dedicated at fixing this issue:



Thanks for working on this!


- 0001, add if_not_exists to pg_tablespace_location, returning NULL if
path does not exist
- 0002, same with pg_stat_file, returning NULL if file does not exist
- 0003, same with pg_read_*file. I added them to all the existing
functions for consistency.
- 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
(thanks Robert for the naming!)
- 0005, as things get complex, a set of regression tests aimed to
covering those things. pg_tablespace_location is platform-dependent,
so there are no tests for it.
- 0006, the fix for pg_rewind, using what has been implemented before.



With thes patches, pg_read_file() will return NULL for any failure to open
the file, which makes pg_rewind to assume that the file doesn't exist in the
source server, and will remove the file from the destination. That's
dangerous, those functions should check specifically for ENOENT.


I'm wondering if using pg_read_file() to copy the file from source server
is reasonable. ISTM that it has two problems as follows.

1. It cannot read very large file like 1GB file. So if such large file was
 created in source server after failover, pg_rewind would not be able
 to copy the file. No?


pg_read_binary_file() handles large files just fine. It cannot return 
more than 1GB in one call, but you can call it several times and 
retrieve the file in chunks. That's what pg_rewind does, except for 
reading the control file, which is known to be small.



2. Many users may not allow a remote client to connect to the
 PostgreSQL server as a superuser for some security reasons. IOW,
 there would be no entry in pg_hba.conf for such connection.
 In this case, pg_rewind always fails because pg_read_file() needs
 superuser privilege. No?

I'm tempting to implement the replication command version of
pg_read_file(). That is, it reads and sends the data like BASE_BACKUP
replication command does...


Yeah, that would definitely be nice. Peter suggested it back in January 
(http://www.postgresql.org/message-id/54ac4801.7050...@gmx.net). I think 
it's way too late to do that for 9.5, however. I'm particularly worried 
that if we design the required API in a rush, we're not going to get it 
right, and will have to change it again soon. That might be difficult in 
a minor release. Using pg_read_file() and friends is quite flexible, 
even though we just find out that they're not quite flexible enough 
right now (the ENOENT problem).


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index creation warning

2015-06-23 Thread Robert Haas
On Mon, Jun 22, 2015 at 8:46 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Tue, Jun 23, 2015 at 9:06 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 6/12/15 5:00 PM, Thom Brown wrote:

 On 18 October 2014 at 15:36, Bruce Momjian br...@momjian.us wrote:

 On Fri, Oct 17, 2014 at 02:36:55PM -0400, Bruce Momjian wrote:

 On Fri, Oct 17, 2014 at 12:56:52PM -0400, Tom Lane wrote:

 David G Johnston david.g.johns...@gmail.com writes:

 The question is whether we explain the implications of not being
 WAL-logged
 in an error message or simply state the fact and let the documentation
 explain the hazards - basically just output:
 hash indexes are not WAL-logged and their use is discouraged


 +1.  The warning message is not the place to be trying to explain all
 the
 details.


 OK, updated patch attached.


 Patch applied.


 I only just noticed this item when I read the release notes.  Should
 we bother warning when used on an unlogged table?


 Not really; but I think the bigger question at this point is if we want to
 change it this late in the game.

 Changing it even during beta looks acceptable to me. I think that it
 is mainly a matter to have a patch (here is one), and someone to push
 it as everybody here seem to agree that for UNLOGGED tables this
 warning has little sense.

I think you should be testing RelationNeedsWAL(), not the
relpersistence directly.  The same point applies for temporary
indexes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 10:53 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2015-06-23 15:20 GMT+02:00 Robert Haas robertmh...@gmail.com:
 I was thinking of a background worker flag, not a GUC.
 BGWORKER_QUIET, or something like that.  But I guess we ought to just
 change it.

 I have not any problem with bg worker flag. The only question is, what
 should be by default.

Well, if the flag is BGWORKER_QUIET, then the default behavior remains
unchanged, but when that flag is used, the log level is reduced to
DEBUG1.  That has the advantage of not breaking backward
compatibility.  But I'm not sure whether anyone cares if we just break
it, and it's certainly simpler without the flag.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A couple of newlines missing in pg_rewind log entries

2015-06-23 Thread Heikki Linnakangas

On 06/23/2015 07:39 AM, Michael Paquier wrote:

Hi all,

Some grepping is showing up that a couple of newlines are missing in
pg_rewind, leading to unreadable log entries:
libpq_fetch.c:pg_log(PG_DEBUG, getting file chunks);


Fixed.


logging.c:pg_log(PG_PROGRESS, %*s/%s kB (%d%%) copied,


This one was on purpose; note the printf(\r) after that line. That's 
supposed to be a progress indicator that is updated on the single line.



filemap.c:pg_fatal(could not stat file \%s\: %s,


Peter fixed this already.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 4:41 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 So, unless we don't find out a solution around planner, 2-phase aggregation is
 like a curry without rice

Simon and I spoke with Tom about this upper planner path-ification
problem at PGCon, and he indicated that he intended to work on it
soon, and that we should bug him about it if he doesn't.

I'm pleased to here this as I seem to have a special talent in that area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-23 Thread Robert Haas
On Mon, Jun 22, 2015 at 9:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 On Tue, Jun 23, 2015 at 10:07 AM, Robert Haas wrote:
 On Mon, Jun 22, 2015 at 8:19 PM, Jim Nasby wrote:
 Anything ever happen with this? I agree that LOG is to high for reporting
 most (if not all) of these things.

 I think we should consider having a flag for this behavior rather than
 changing the behavior across the board.
 But then again, maybe we should just change it.

 What do others think?

 A GUC just for that looks like an overkill to me, this log is useful
 when debugging. And one could always have its bgworker call elog by
 itself at startup and before leaving to provide more or less similar
 information.

 I agree that we don't need YAGUC here, particularly not one that applies
 indiscriminately to all bgworkers.  I'd vote for just decreasing the log
 level.  The current coding is appropriate for a facility that's basically
 experimental; but as it moves towards being something that would be used
 routinely in production, the argument for being noisy in the log gets
 weaker and weaker.

I was thinking of a background worker flag, not a GUC.
BGWORKER_QUIET, or something like that.  But I guess we ought to just
change it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()

2015-06-23 Thread Abhijit Menon-Sen
At 2014-08-20 11:07:44 +0300, hlinnakan...@vmware.com wrote:

 I don't think the new GetBufferWithoutRelcache function is in line
 with the existing ReadBuffer API. I think it would be better to add a
 new ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if
 it's already in cache, and InvalidBuffer otherwise.

Hi Heikki.

I liked the idea of having a new ReadBufferMode of RBM_CACHE_ONLY, but I
wasn't happy with the result when I tried to modify the code to suit. It
didn't make the code easier for me to follow.

(I'll say straight up that it can be done the way you said, and if the
consensus is that it would be an improvement, I'll do it that way. I'm
just not convinced about it, hence this mail.)

For example:

 static Buffer
 ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
   BlockNumber blockNum, ReadBufferMode mode,
   BufferAccessStrategy strategy, bool *hit)
 {
 volatile BufferDesc *bufHdr;
 Block   bufBlock;
 boolfound;
 boolisExtend;
 boolisLocalBuf = SmgrIsTemp(smgr);
 
 *hit = false;
 
 /* Make sure we will have room to remember the buffer pin */
 ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
 
 isExtend = (blockNum == P_NEW);

isLocalBuf and isExtend will both be false for the RBM_CACHE_ONLY case,
which is good for us. But that probably needs to be mentioned in a
comment here.

 TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
smgr-smgr_rnode.node.spcNode,
smgr-smgr_rnode.node.dbNode,
smgr-smgr_rnode.node.relNode,
smgr-smgr_rnode.backend,
isExtend);

We know we're not going to be doing IO in the RBM_CACHE_ONLY case, but
that's probably OK? I don't understand this TRACE_* stuff. But we need
to either skip this, or also do the corresponding TRACE_*_DONE later.

 if (isLocalBuf)
 {
 bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, found);
 if (found)
 pgBufferUsage.local_blks_hit++;
 else
 pgBufferUsage.local_blks_read++;
 }
 else
 {
 /*
  * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
  * not currently in memory.
  */
 bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
  strategy, found);
 if (found)
 pgBufferUsage.shared_blks_hit++;
 else
 pgBufferUsage.shared_blks_read++;
 }

The nicest option I could think of here was to copy the shared_buffers
lookup from BufferAlloc into a new BufferAlloc_shared function, and add
a new branch for RBM_CACHE_ONLY. That would look like:

if (isLocalBuf)
…
else if (mode == RBM_CACHE_ONLY)
{
/*
 * We check to see if the buffer is already cached, and if it's
 * not, we return InvalidBuffer because we know it's not pinned.
 */
bufHdr = BufferAlloc_shared(…, found);
if (!found)
return InvalidBuffer;
LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
return BufferDescriptorGetBuffer(bufHdr);
}
else
{
/*
 * lookup the buffer …
}

…or if we went with the rest of the code and didn't do the lock/return
immediately, we would fall through to this code:

 /* At this point we do NOT hold any locks. */
 
 /* if it was already in the buffer pool, we're done */
 if (found)
 {
 if (!isExtend)
 {
 /* Just need to update stats before we exit */
 *hit = true;
 VacuumPageHit++;
 
 if (VacuumCostActive)
 VacuumCostBalance += VacuumCostPageHit;
 
 TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
   smgr-smgr_rnode.node.spcNode,
   smgr-smgr_rnode.node.dbNode,
   smgr-smgr_rnode.node.relNode,
   smgr-smgr_rnode.backend,
   isExtend,
   found);
 
 /*
  * In RBM_ZERO_AND_LOCK mode the caller expects the page to be
  * locked on return.
  */
 if (!isLocalBuf)
 {
 if (mode == RBM_ZERO_AND_LOCK)
 LWLockAcquire(bufHdr-content_lock, LW_EXCLUSIVE);
 else if (mode == RBM_ZERO_AND_CLEANUP_LOCK)
 LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
 }
 
 return BufferDescriptorGetBuffer(bufHdr);
 }

Some of this isn't applicable to RBM_CACHE_ONLY, so we could 

Re: [HACKERS] get_relation_info comment out of sync

2015-06-23 Thread Robert Haas
On Sun, Jun 21, 2015 at 8:10 AM, Thomas Munro
thomas.mu...@enterprisedb.com wrote:
 The comment for get_relation_info should probably include serverid in
 the list of rel members that it can update (see attached).

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL TAP tests and chmod

2015-06-23 Thread Robert Haas
On Sun, Jun 21, 2015 at 9:50 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Jun 19, 2015 at 2:49 PM, Michael Paquier wrote:
 The TAP tests in src/test/ssl are using system() in combination with
 chmod, but perl has a command chmod integrated into it, and it would
 work better on Windows as well.
 The attached patch is aimed at fixing that.

 For the sake of the archives, this has been committed by Robert (thanks!):
 commit: ca3f43aa48a83013ea50aeee7cd193a5859c4587
 author: Robert Haas rh...@postgresql.org
 date: Fri, 19 Jun 2015 10:46:30 -0400
 Change TAP test framework to not rely on having a chmod executable.

 This might not work at all on Windows, and is not ever efficient.

Whoops, sorry, I thought I had replied to this thread, but I guess I forgot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory context at PG_init call ?

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 7:41 AM, Sandro Santilli s...@keybit.net wrote:
 Empirically, I seem to be getting the _PG_init call for a module while
 the active memory context lifetime is that of the function call which
 first needed to load the shared object.

 Is this the case ? Documented anywhere ?
 Initializing memory meant to be alive for the whole lifetime of a backend
 in that function is a bit complex if that's confirmed.

If you want something that lasts for the lifetime of the backend, just do

MemoryContext oldctx = MemoryContextSwitchTo(TopMemoryContext);
...
MemoryContextSwitchTo(oldctx);

I'm not sure what context you should expect in _PG_init(), although
what you mention doesn't sound unreasonable.  But generally if you
want a long-lived context you should establish that explicitly
yourself.  We try to keep short-lived contexts active whenever
possible because that avoids long-term leaks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Insufficient locking for ALTER DEFAULT PRIVILEGES

2015-06-23 Thread Robert Haas
On Sun, Jun 21, 2015 at 11:11 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-06-21 11:45:24 -0300, Alvaro Herrera wrote:
 Alvaro Herrera wrote:
  Now that I actually check with a non-relation object, I see pretty much
  the same error.  So probably if instead of some narrow bug fix what we
  need is some general solution for all object types.  I know this has
  been discussed a number of times ...  Anyway I see now that we should
  not consider this a backpatchable bug fix, and I'm not doing the coding
  either, at least not now.

 Discussed this with a couple of 2ndQ colleagues and it became evident
 that MVCC catalog scans probably make this problem much more prominent.
 So historical branches are not affected all that much, but it's a real
 issue on 9.4+.

 Hm. I don't see how those would make a marked difference. The snapshot
 for catalogs scan are taken afresh for each scan (unless
 cached). There'll probably be some difference, but it'll be small.

Yeah, I think the same.  If those changes introduced a problem we
didn't have before, I'd like to see a reproducible test case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-06-23 Thread Fujii Masao
On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 06/23/2015 07:51 AM, Michael Paquier wrote:

 So... Attached are a set of patches dedicated at fixing this issue:


 Thanks for working on this!

 - 0001, add if_not_exists to pg_tablespace_location, returning NULL if
 path does not exist
 - 0002, same with pg_stat_file, returning NULL if file does not exist
 - 0003, same with pg_read_*file. I added them to all the existing
 functions for consistency.
 - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
 (thanks Robert for the naming!)
 - 0005, as things get complex, a set of regression tests aimed to
 covering those things. pg_tablespace_location is platform-dependent,
 so there are no tests for it.
 - 0006, the fix for pg_rewind, using what has been implemented before.


 With thes patches, pg_read_file() will return NULL for any failure to open
 the file, which makes pg_rewind to assume that the file doesn't exist in the
 source server, and will remove the file from the destination. That's
 dangerous, those functions should check specifically for ENOENT.

I'm wondering if using pg_read_file() to copy the file from source server
is reasonable. ISTM that it has two problems as follows.

1. It cannot read very large file like 1GB file. So if such large file was
created in source server after failover, pg_rewind would not be able
to copy the file. No?

2. Many users may not allow a remote client to connect to the
PostgreSQL server as a superuser for some security reasons. IOW,
there would be no entry in pg_hba.conf for such connection.
In this case, pg_rewind always fails because pg_read_file() needs
superuser privilege. No?

I'm tempting to implement the replication command version of
pg_read_file(). That is, it reads and sends the data like BASE_BACKUP
replication command does...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c

2015-06-23 Thread Tom Lane
Glen Knowles gknow...@ieee.org writes:
 It appears that, according to the standard, passing NULL to memcmp is
 undefined behavior, even if the count is 0. See
 http://stackoverflow.com/questions/16362925/can-i-pass-a-null-pointer-to-memcmp
 for C99 and C++ standard references.

Hmm ... looks like that's correct.  I had not noticed the introductory
paragraphs.  For those following along at home, the relevant text in
C99 is in 7.21.1 String function conventions:

   [#2]  Where  an  argument declared as size_t n specifies the
   length of the array for a function, n  can  have  the  value
   zero  on  a call to that function.  Unless explicitly stated
   otherwise in the description of  a  particular  function  in
   this subclause, pointer arguments on such a call shall still
   have valid values, as described in 7.1.4.  On such a call, a
   function  that  locates  a  character finds no occurrence, a
   function that compares two character sequences returns zero,
   and   a   function   that   copies  characters  copies  zero
   characters.

and the relevant text from 7.1.4 is

   [#1]   Each  of  the  following  statements  applies  unless
   explicitly stated otherwise  in  the  detailed  descriptions |
   that  follow:  If  an  argument to a function has an invalid
   value (such as a value outside the domain of  the  function,
   or  a pointer outside the address space of the program, or a
   null pointer) or a type (after promotion) not expected by  a
   function  with variable number of arguments, the behavior is
   undefined.

So it looks like we'd better change it.

I am not sure whether to put in the nargs == 0 test suggested yesterday
or to just insist that callers not pass NULL.  A quick grep suggests that
there is only one such caller right now, namely this bit in ruleutils.c:

appendStringInfo(buf, EXECUTE PROCEDURE %s(,
 generate_function_name(trigrec-tgfoid, 0,
NIL, NULL,
false, NULL, EXPR_KIND_NONE));

You could certainly argue that that's taking an unwarranted shortcut.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] btree_gin and BETWEEN

2015-06-23 Thread Jeff Janes
If I use the btree_gin extension to build a gin index on a scalar value, it
doesn't work well with BETWEEN queries.  It looks like it scans the whole
index, with the part of the index between the endpoints getting scanned
twice.  It is basically executed as if col1 between x and y were col1
between -inf and y and col1 between x and +inf.

It puts the correct tuples into the bitmap, because whichever inequality is
not being used to set the query endpoint currently is used as a filter
instead.

So I could just not build that index.  But I want it for other reasons, and
the problem is that the planner thinks the index can implement the BETWEEN
query efficiently.  So even if it has truly better options available, it
switches to using a falsely attractive btree_gin index.

create table foo as select random() as btree, random() as gin from
generate_series(1,300);
create index on foo using gin (gin);
create index on foo using btree (btree);
explain ( analyze, buffers) select count(*) from foo where btree between
0.001 and 0.00105;
explain ( analyze, buffers) select count(*) from foo where gin between
0.001 and 0.00105;

It would be nice if btree_gin supported BETWEEN and other range queries
efficiently, or at least if the planner knew it couldn't support them
efficiently.  But I don't see where to begin on either one of these tasks.
Is either one of them plausible?

Cheers,

Jeff


Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-06-23 Thread Fujii Masao
On Tue, Jun 23, 2015 at 11:21 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 06/23/2015 05:03 PM, Fujii Masao wrote:

 On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:

 On 06/23/2015 07:51 AM, Michael Paquier wrote:


 So... Attached are a set of patches dedicated at fixing this issue:



 Thanks for working on this!

 - 0001, add if_not_exists to pg_tablespace_location, returning NULL if
 path does not exist
 - 0002, same with pg_stat_file, returning NULL if file does not exist
 - 0003, same with pg_read_*file. I added them to all the existing
 functions for consistency.
 - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
 (thanks Robert for the naming!)
 - 0005, as things get complex, a set of regression tests aimed to
 covering those things. pg_tablespace_location is platform-dependent,
 so there are no tests for it.
 - 0006, the fix for pg_rewind, using what has been implemented before.



 With thes patches, pg_read_file() will return NULL for any failure to
 open
 the file, which makes pg_rewind to assume that the file doesn't exist in
 the
 source server, and will remove the file from the destination. That's
 dangerous, those functions should check specifically for ENOENT.


 I'm wondering if using pg_read_file() to copy the file from source server
 is reasonable. ISTM that it has two problems as follows.

 1. It cannot read very large file like 1GB file. So if such large file was
  created in source server after failover, pg_rewind would not be able
  to copy the file. No?


 pg_read_binary_file() handles large files just fine. It cannot return more
 than 1GB in one call, but you can call it several times and retrieve the
 file in chunks. That's what pg_rewind does, except for reading the control
 file, which is known to be small.

Yeah, you're right.

I found that pg_rewind creates a temporary table to fetch the file in chunks.
This would prevent pg_rewind from using the *hot standby* server as a source
server at all. This is of course a limitation of pg_rewind, but we might want
to alleviate it in the future.

 2. Many users may not allow a remote client to connect to the
  PostgreSQL server as a superuser for some security reasons. IOW,
  there would be no entry in pg_hba.conf for such connection.
  In this case, pg_rewind always fails because pg_read_file() needs
  superuser privilege. No?

 I'm tempting to implement the replication command version of
 pg_read_file(). That is, it reads and sends the data like BASE_BACKUP
 replication command does...


 Yeah, that would definitely be nice. Peter suggested it back in January
 (http://www.postgresql.org/message-id/54ac4801.7050...@gmx.net). I think
 it's way too late to do that for 9.5, however. I'm particularly worried that
 if we design the required API in a rush, we're not going to get it right,
 and will have to change it again soon. That might be difficult in a minor
 release. Using pg_read_file() and friends is quite flexible, even though we
 just find out that they're not quite flexible enough right now (the ENOENT
 problem).

I agree that it's too late to do what I said...

But just using pg_read_file() cannot address the #2 problem that I pointed
in my previous email. Also requiring a superuer privilege on pg_rewind
really conflicts with the motivation why we added replication privilege.

So we should change pg_read_file() so that even replication user can
read the file?
Or replication user version of pg_read_file() should be implemented?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-23 Thread Alvaro Herrera
Robert Haas wrote:
 On Tue, Jun 23, 2015 at 1:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:

  However, I'm not real sure we need a flag.  I think the use-case of
  wanting extra logging for a bgworker under development is unlikely to be
  satisfied very well by just causing existing start/stop logging messages
  to come out at higher priority.  You're likely to be wanting to log other,
  bgworker-specific, events, and so you'll probably end up writing a bunch
  of your own elog calls anyway (which you'll eventually remove, #ifdef out,
  or decrease the log levels of).
 
 Yeah.  So let's just change it.

+1

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-23 Thread Alvaro Herrera
Robert Haas wrote:
 On Tue, Jun 23, 2015 at 10:53 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
  2015-06-23 15:20 GMT+02:00 Robert Haas robertmh...@gmail.com:
  I was thinking of a background worker flag, not a GUC.
  BGWORKER_QUIET, or something like that.  But I guess we ought to just
  change it.
 
  I have not any problem with bg worker flag. The only question is, what
  should be by default.
 
 Well, if the flag is BGWORKER_QUIET, then the default behavior remains
 unchanged, but when that flag is used, the log level is reduced to
 DEBUG1.  That has the advantage of not breaking backward
 compatibility.  But I'm not sure whether anyone cares if we just break
 it, and it's certainly simpler without the flag.

I vote we do it the other way around, that is have a flag BGWORKER_VERBOSE.
This breaks backwards compatibility (I don't think there's too much
value in that in this case), but it copes with the more common use case
that you want to have the flag while the worker is being developed; and
things that are already working don't need to change in order to get the
natural behavior.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-23 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Robert Haas wrote:
 Well, if the flag is BGWORKER_QUIET, then the default behavior remains
 unchanged, but when that flag is used, the log level is reduced to
 DEBUG1.  That has the advantage of not breaking backward
 compatibility.  But I'm not sure whether anyone cares if we just break
 it, and it's certainly simpler without the flag.

 I vote we do it the other way around, that is have a flag BGWORKER_VERBOSE.
 This breaks backwards compatibility (I don't think there's too much
 value in that in this case), but it copes with the more common use case
 that you want to have the flag while the worker is being developed; and
 things that are already working don't need to change in order to get the
 natural behavior.

I concur: if we're to have a flag at all, it should work as Alvaro says.

However, I'm not real sure we need a flag.  I think the use-case of
wanting extra logging for a bgworker under development is unlikely to be
satisfied very well by just causing existing start/stop logging messages
to come out at higher priority.  You're likely to be wanting to log other,
bgworker-specific, events, and so you'll probably end up writing a bunch
of your own elog calls anyway (which you'll eventually remove, #ifdef out,
or decrease the log levels of).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 1:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Robert Haas wrote:
 Well, if the flag is BGWORKER_QUIET, then the default behavior remains
 unchanged, but when that flag is used, the log level is reduced to
 DEBUG1.  That has the advantage of not breaking backward
 compatibility.  But I'm not sure whether anyone cares if we just break
 it, and it's certainly simpler without the flag.

 I vote we do it the other way around, that is have a flag BGWORKER_VERBOSE.
 This breaks backwards compatibility (I don't think there's too much
 value in that in this case), but it copes with the more common use case
 that you want to have the flag while the worker is being developed; and
 things that are already working don't need to change in order to get the
 natural behavior.

 I concur: if we're to have a flag at all, it should work as Alvaro says.

 However, I'm not real sure we need a flag.  I think the use-case of
 wanting extra logging for a bgworker under development is unlikely to be
 satisfied very well by just causing existing start/stop logging messages
 to come out at higher priority.  You're likely to be wanting to log other,
 bgworker-specific, events, and so you'll probably end up writing a bunch
 of your own elog calls anyway (which you'll eventually remove, #ifdef out,
 or decrease the log levels of).

Yeah.  So let's just change it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_*_columns?

2015-06-23 Thread Magnus Hagander
On Tue, Jun 23, 2015 at 3:01 AM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Jun 21, 2015 at 11:43 AM, Magnus Hagander mag...@hagander.net
 wrote:
  On Sat, Jun 20, 2015 at 11:55 PM, Robert Haas robertmh...@gmail.com
 wrote:
  On Sat, Jun 20, 2015 at 7:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   But if the structure
   got too big to map (on a 32-bit system), then you'd be sort of hosed,
   because there's no way to attach just part of it.  That might not be
   worth worrying about, but it depends on how big it's likely to get -
 a
   32-bit system is very likely to choke on a 1GB mapping, and maybe
 even
   on a much smaller one.
  
   Yeah, I'm quite worried about assuming that we can map a data
 structure
   that might be of very significant size into shared memory on 32-bit
   machines.  The address space just isn't there.
 
  Considering the advantages of avoiding message queues, I think we
  should think a little bit harder about whether we can't find some way
  to skin this cat.  As I think about this a little more, I'm not sure
  there's really a problem with one stats DSM per database.  Sure, the
  system might have 100,000 databases in some crazy pathological case,
  but the maximum number of those that can be in use is bounded by
  max_connections, which means the maximum number of stats file DSMs we
  could ever need at one time is also bounded by max_connections.  There
  are a few corner cases to think about, like if the user writes a
  client that connects to all 100,000 databases in very quick
  succession, we've got to jettison the old DSMs fast enough to make
  room for the new DSMs before we run out of slots, but that doesn't
  seem like a particularly tough nut to crack.  If the stats collector
  ensures that it never attaches to more than MaxBackends stats DSMs at
  a time, and each backend ensures that it never attaches to more than
  one stats DSM at a time, then 2 * MaxBackends stats DSMs is always
  enough.  And that's just a matter of bumping
  PG_DYNSHMEM_SLOTS_PER_BACKEND from 2 to 4.
 
  In more realistic cases, it will probably be normal for many or all
  backends to be connected to the same database, and the number of stats
  DSMs required will be far smaller.
 
  What about a combination in the line of something like this: stats
 collector
  keeps the statistics in local memory as before. But when a backend needs
 to
  get a snapshot of it's data, it uses a shared memory queue to request it.
  What the stats collector does in this case is allocate a new DSM, copy
 the
  data into that DSM, and hands the DSM over to the backend. At this point
 the
  stats collector can forget about it, and it's up to the backend to get
 rid
  of it when it's done with it.

 Well, there seems to be little point in having the stats collector
 forget about a DSM that it could equally well have shared with the
 next guy who wants a stats snapshot for the same database.  That case
 is surely *plenty* common enough to be worth optimizing for.


Right, we only need to drop it once we have received a stats message for it
so something changed. And possibly that with a minimum time as well, as we
have now, if we want to limit the potential churn.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-23 Thread Tom Lane
Those of you who have been following
http://www.postgresql.org/message-id/flat/1d3bc192-970d-4b70-a5fe-38d2a9f76...@me.com
are aware that Red Hat shipped a rather broken version of openssl last
week.  While waiting for them to fix it, I've been poking at the behavior,
and have found out that PG 9.4 and later are much less badly broken than
older branches.  In the newer branches you'll see a failure only after
transmitting 2GB within a session, whereas the older branches fail at
the second renegotiation attempt, which would typically be 1GB of data
and could be a lot less.

I do not know at this point whether these behaviors are really the same
bug or not, but I wonder whether it's time to consider back-patching the
renegotiation fixes we did in 9.4.  Specifically, I think maybe we should
back-patch 31cf1a1a4, 86029b31e, and 36a3be654.  (There are more changes
in master, but since those haven't yet shipped in any released branch,
and there's been a lot of other rework in the same area, those probably
are not back-patch candidates.)

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Those of you who have been following
 http://www.postgresql.org/message-id/flat/1d3bc192-970d-4b70-a5fe-38d2a9f76...@me.com
 are aware that Red Hat shipped a rather broken version of openssl last
 week.  While waiting for them to fix it, I've been poking at the behavior,
 and have found out that PG 9.4 and later are much less badly broken than
 older branches.  In the newer branches you'll see a failure only after
 transmitting 2GB within a session, whereas the older branches fail at
 the second renegotiation attempt, which would typically be 1GB of data
 and could be a lot less.

 I do not know at this point whether these behaviors are really the same
 bug or not, but I wonder whether it's time to consider back-patching the
 renegotiation fixes we did in 9.4.  Specifically, I think maybe we should
 back-patch 31cf1a1a4, 86029b31e, and 36a3be654.  (There are more changes
 in master, but since those haven't yet shipped in any released branch,
 and there's been a lot of other rework in the same area, those probably
 are not back-patch candidates.)

 Thoughts?

I have no clear idea how safe it is to back-port these fixes.

Just as a point of reference, we had a customer hit a problem similar
to bug #12769 on 9.3.x.  I think (but am not sure) that 272923a0a may
have been intended to fix that issue.  In a quick search, I didn't
find any other complaints about renegotiation-related issues from our
customers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-23 Thread Alvaro Herrera
Robert Haas wrote:
 On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:

  I do not know at this point whether these behaviors are really the same
  bug or not, but I wonder whether it's time to consider back-patching the
  renegotiation fixes we did in 9.4.  Specifically, I think maybe we should
  back-patch 31cf1a1a4, 86029b31e, and 36a3be654.  (There are more changes
  in master, but since those haven't yet shipped in any released branch,
  and there's been a lot of other rework in the same area, those probably
  are not back-patch candidates.)

Yes, +1 for backpatching.  Don't forget 5674460b and b1aebbb6.

I could reproduce problems trivially with COPY in psql without that and
a small renegotiation limit, as I recall.

  Thoughts?
 
 I have no clear idea how safe it is to back-port these fixes.
 
 Just as a point of reference, we had a customer hit a problem similar
 to bug #12769 on 9.3.x.  I think (but am not sure) that 272923a0a may
 have been intended to fix that issue.

Maybe we should *also* backpatch that, then (and if so, then also its
fixup 1c2b7c087).  I do not think that the failure was introduced by
the fixes cited above.

 In a quick search, I didn't find any other complaints about
 renegotiation-related issues from our customers.

Other issues Andres was investigating seemed related to nonblocking
connections (which as I recall are used mostly by replication code).
Bug #12769 contained a link to the previous discussion.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I do not know at this point whether these behaviors are really the same
 bug or not, but I wonder whether it's time to consider back-patching the
 renegotiation fixes we did in 9.4.  Specifically, I think maybe we should
 back-patch 31cf1a1a4, 86029b31e, and 36a3be654.  (There are more changes
 in master, but since those haven't yet shipped in any released branch,
 and there's been a lot of other rework in the same area, those probably
 are not back-patch candidates.)
 
 Thoughts?

 I have no clear idea how safe it is to back-port these fixes.

Well, it would mean that pre-9.5 branches all behave the same, which
would be an improvement in my book.  Also, ISTM that the 9.4 code
for renegotiation assumes a whole lot less than prior branches about
OpenSSL's internal behavior; so it ought to be more robust, even if
some bugs remain.

 Just as a point of reference, we had a customer hit a problem similar
 to bug #12769 on 9.3.x.  I think (but am not sure) that 272923a0a may
 have been intended to fix that issue.  In a quick search, I didn't
 find any other complaints about renegotiation-related issues from our
 customers.

The problem with trying to adopt code from HEAD is that it probably
depends on the rather invasive changes explained here:
http://www.postgresql.org/message-id/20150126101405.ga31...@awork2.anarazel.de
Even assuming that there's no dependency on the immediate-interrupt
changes, I'm afraid to back-patch anything that invasive.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-06-23 Thread Michael Paquier
On Wed, Jun 24, 2015 at 1:40 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 23, 2015 at 11:21 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 06/23/2015 05:03 PM, Fujii Masao wrote:

 On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:

 On 06/23/2015 07:51 AM, Michael Paquier wrote:


 So... Attached are a set of patches dedicated at fixing this issue:



 Thanks for working on this!

 - 0001, add if_not_exists to pg_tablespace_location, returning NULL if
 path does not exist
 - 0002, same with pg_stat_file, returning NULL if file does not exist
 - 0003, same with pg_read_*file. I added them to all the existing
 functions for consistency.
 - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
 (thanks Robert for the naming!)
 - 0005, as things get complex, a set of regression tests aimed to
 covering those things. pg_tablespace_location is platform-dependent,
 so there are no tests for it.
 - 0006, the fix for pg_rewind, using what has been implemented before.



 With thes patches, pg_read_file() will return NULL for any failure to
 open
 the file, which makes pg_rewind to assume that the file doesn't exist in
 the
 source server, and will remove the file from the destination. That's
 dangerous, those functions should check specifically for ENOENT.


 I'm wondering if using pg_read_file() to copy the file from source server
 is reasonable. ISTM that it has two problems as follows.

 1. It cannot read very large file like 1GB file. So if such large file was
  created in source server after failover, pg_rewind would not be able
  to copy the file. No?


 pg_read_binary_file() handles large files just fine. It cannot return more
 than 1GB in one call, but you can call it several times and retrieve the
 file in chunks. That's what pg_rewind does, except for reading the control
 file, which is known to be small.

 Yeah, you're right.

 I found that pg_rewind creates a temporary table to fetch the file in chunks.
 This would prevent pg_rewind from using the *hot standby* server as a source
 server at all. This is of course a limitation of pg_rewind, but we might want
 to alleviate it in the future.

This is something that a replication command could address properly.

 2. Many users may not allow a remote client to connect to the
  PostgreSQL server as a superuser for some security reasons. IOW,
  there would be no entry in pg_hba.conf for such connection.
  In this case, pg_rewind always fails because pg_read_file() needs
  superuser privilege. No?

 I'm tempting to implement the replication command version of
 pg_read_file(). That is, it reads and sends the data like BASE_BACKUP
 replication command does...


 Yeah, that would definitely be nice. Peter suggested it back in January
 (http://www.postgresql.org/message-id/54ac4801.7050...@gmx.net). I think
 it's way too late to do that for 9.5, however. I'm particularly worried that
 if we design the required API in a rush, we're not going to get it right,
 and will have to change it again soon. That might be difficult in a minor
 release. Using pg_read_file() and friends is quite flexible, even though we
 just find out that they're not quite flexible enough right now (the ENOENT
 problem).

 I agree that it's too late to do what I said...

 But just using pg_read_file() cannot address the #2 problem that I pointed
 in my previous email. Also requiring a superuser privilege on pg_rewind
 really conflicts with the motivation why we added replication privilege.

 So we should change pg_read_file() so that even replication user can
 read the file?

From the security prospective, a replication user can take a base
backup so it can already retrieve easily the contents of PGDATA. Hence
I guess that it would be fine. However, what about cases where
pg_hba.conf authorizes access to a given replication user via psql and
blocks it for the replication protocol? We could say that OP should
not give out replication access that easily, but in this case the user
would have access to the content of PGDATA even if he should not. Is
that unrealistic?

 Or replication user version of pg_read_file() should be implemented?

You mean a new function? In what is it different from authorizing
pg_read_file usage for a replication user?

Honestly, I can live with this superuser restriction in 9.5. And come
back to the replication user restriction in 9.6 once things cool down
a bit.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-23 Thread Jim Nasby

On 6/23/15 12:21 PM, Tom Lane wrote:

I concur: if we're to have a flag at all, it should work as Alvaro says.

However, I'm not real sure we need a flag.  I think the use-case of
wanting extra logging for a bgworker under development is unlikely to be
satisfied very well by just causing existing start/stop logging messages
to come out at higher priority.  You're likely to be wanting to log other,
bgworker-specific, events, and so you'll probably end up writing a bunch
of your own elog calls anyway (which you'll eventually remove, #ifdef out,
or decrease the log levels of).


FWIW, I have this problem *constantly* with plpgsql. I put RAISE DEBUGs 
in, but once you have those in enough places SET 
client_min_messages=debug becomes next to useless because of the huge 
volume of spew.


What I'd like is a way to add an identifier to ereport/RAISE so you 
could turn on individual reports. If we had that we'd just make these 
particular ereports DEBUG1 and not worry about it because you could 
easily turn them on when needed.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_*_columns?

2015-06-23 Thread Andres Freund
On 2015-06-22 21:05:52 -0400, Robert Haas wrote:
 Interesting idea.  We could consider the set of stats files a database
 unto itself and reserve a low-numbered OID for it.  The obvious thing
 to do is use the database's OID as the relfilenode, but then how do
 you replace the stats file?

The relmapper infrastructure should be able to take care of that.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-23 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I do not know at this point whether these behaviors are really the same
 bug or not, but I wonder whether it's time to consider back-patching the
 renegotiation fixes we did in 9.4.  Specifically, I think maybe we should
 back-patch 31cf1a1a4, 86029b31e, and 36a3be654.

 Yes, +1 for backpatching.  Don't forget 5674460b and b1aebbb6.

Huh?  5674460b is ancient, and we concluded that b1aebbb6 didn't represent
anything much more than cosmetic fixes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: row_to_array function

2015-06-23 Thread Jim Nasby

On 6/23/15 9:45 AM, Pavel Stehule wrote:


2015-06-23 1:56 GMT+02:00 Jim Nasby jim.na...@bluetreble.com
mailto:jim.na...@bluetreble.com:

On 6/22/15 2:46 AM, Pavel Stehule wrote:


FOREACH key, val IN RECORD myrow
LOOP
IF pg_typeof(val) IN ('int4', 'double precision', 'numeric')
THEN
  val := val + 1; -- these variables can be mutable
  -- or maybe in futore
 myrow[key] := val + 1;
END IF;
END LOOP;

What is important - val is automatic variable, and it can has
different type in any step.

It is little bit strange, but impossible to solve, so we cannot to
support row[var] as right value (without immutable casting). But
we can
do it with left value.


Actually, you can (theoretically) solve it for the right value as
well with if val is an actual type and you have operators on that
type that know to search for a specific operator given the actual
types that are involved. So if val is int4, val + 1 becomes int4 + int4.

The problem I've run into with this is by the time you've added
enough casts to make this workable you've probably created a
situation where val + something is going to recurse back to itself.
I've partially solved this in [1], and intend to finish it by
calling back in via SPI to do the final resolution, the same way the
RI triggers do.

What would be a lot better is if we had better control over function
and operator resolution.

[1]

https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846


The solution of dynamic operators changes philosophy about 180° - and I
afraid about a performance.

Now if I am thinking about possibilities - probably it is solvable on
right side too. It needs to solve two steps:

1. parametrized record reference syntax - some like SELECT $1[$]
2. possibility to throw plan cache, if result has different type than is
expected in cache.


Well, the other option is we allow for cases where we don't know in 
advance what the type will be. That would handle this, JSON, variant, 
and possibly some other scenarios.


BTW, I think this relates to the desire to be able to do more OO-ish 
things in the database. Like do X to all elements in this array. And 
to have actual classes, private members, real arrays of arrays. It seems 
like there's a bigger need here that's only being addressed piecemeal. :/

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btree_gin and BETWEEN

2015-06-23 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 If I use the btree_gin extension to build a gin index on a scalar value, it
 doesn't work well with BETWEEN queries.  It looks like it scans the whole
 index, with the part of the index between the endpoints getting scanned
 twice.  It is basically executed as if col1 between x and y were col1
 between -inf and y and col1 between x and +inf.

 It puts the correct tuples into the bitmap, because whichever inequality is
 not being used to set the query endpoint currently is used as a filter
 instead.

 So I could just not build that index.  But I want it for other reasons, and
 the problem is that the planner thinks the index can implement the BETWEEN
 query efficiently.  So even if it has truly better options available, it
 switches to using a falsely attractive btree_gin index.

 create table foo as select random() as btree, random() as gin from
 generate_series(1,300);
 create index on foo using gin (gin);
 create index on foo using btree (btree);
 explain ( analyze, buffers) select count(*) from foo where btree between
 0.001 and 0.00105;
 explain ( analyze, buffers) select count(*) from foo where gin between
 0.001 and 0.00105;

 It would be nice if btree_gin supported BETWEEN and other range queries
 efficiently, or at least if the planner knew it couldn't support them
 efficiently.  But I don't see where to begin on either one of these tasks.
 Is either one of them plausible?

ISTM this is a bug/oversight in the core GIN code: it has multiple
GinScanEntry's for the same index column, but fails to realize that it
could start the index scans at the largest match value of any of those
GinScanEntry's.  This did not matter so much before the partial-match
feature went in; but with a partial match we might be talking about
scanning a large fraction of the index, and it's important to optimize
the start point if possible.  I think there is also a failure to notice
when we could stop the scans because one of a group of AND'ed entries
says no more matches are possible.

I'm not sure where the best place to hack in this consideration would
be, though.  Oleg, Teodor, any thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_*_columns?

2015-06-23 Thread Jim Nasby

On 6/22/15 8:05 PM, Robert Haas wrote:

In totally different crazy way we could just use the existing buffer
manager we have and simply put the stats file in
shared_buffers. Inventing a per-database relfilenode that doesn't
conflict doesn't seem impossible. With some care it shouldn't be hard to
make that stats file readable from all sessions, even if they're not
connected to the database (e.g. autovacuum launcher).

Interesting idea.  We could consider the set of stats files a database
unto itself and reserve a low-numbered OID for it.  The obvious thing
to do is use the database's OID as the relfilenode, but then how do
you replace the stats file?


I think one of the biggest use cases for the stats is to collect them 
over time and put them in a database. It's rather tempting to just say 
that's what we should be doing, and provide a knob for how often to 
record the values and how long to keep the data for. That would 
eliminate a lot of these problems.


The one part I don't see how to solve is the case where bad stuff is 
happening *right now* and you don't/can't wait for the next reporting 
interval. Presumably handling that requires all the stuff that's 
discussed already and you might as well just handle the recording in 
user space. But maybe someone has some clever ideas there...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: row_to_array function

2015-06-23 Thread Jim Nasby

On 6/23/15 3:22 PM, Merlin Moncure wrote:

I would rephrase that to: do X to all fields of an object.
Array handling is pretty good now (minus arrays of arrays, but arrays


Except that still won't make it easy to do something to each element of 
an array in SQL, which I think would be nice to have.



of objects containing arrays is 'good enough' for most real world
cases).  We've suffered for a while now with hstore/json as a
temporary container to handle operations that are not well supported
by postgres's particularly strongly typed flavor SQL.   The OO of
postgres has been gradually diluting away; it's not a 'object
relational' database anymore and the OO features, very much a product
of the silly 90's OO hysteria, have been recast into more useful
features like inheritance and/or pruned back.


Admittedly I've never played with an OO database, but I think our data 
features are pretty good [1]. Where I do think we can improve though is 
developing/coding things in the database. For example, I'd love to have 
the equivalent to a class. Perhaps that could be accomplished by 
allowing multiple instances of an extension. I'd also like stronger 
support for private objects (permissions don't really fit that bill).



I don't mind having to push everything to jsonb and back for tuple
manipulation and I expect that's how these types of things are going
to be done moving forwards. jsonb has clearly caught a bid judging by
what I'm reading in the blogosphere and will continue to accrete
features things like this.


I think it's unfortunate to lose the strong typing that we have. That 
can be especially important for something like numbers (was it 
originally a float or a numeric?). But maybe JSON is good enough.



[1] The one OO-ish data feature I'd like is the ability to de-reference 
a foreign key pointer. So if


CREATE TABLE b( a_id int REFERENCES a);

then have

SELECT a_id.some_field FROM b;

transform to

SELECT a.some_field FROM b JOIN a ...;
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-06-23 Thread Jim Nasby

On 6/22/15 11:59 PM, Fabien COELHO wrote:

which might not be helpful for cases when checkpoint could have
flushed soon-to-be-recycled buffers. I think flushing the sorted
buffers w.r.t tablespaces is a good idea, but not giving any
preference to clock-sweep point seems to me that we would loose in
some cases by this new change.


I do not see how to do both, as these two orders seem more or less
unrelated?  The traditionnal assumption is that the I/O are very slow
and they are to be optimized first, so going for buffer ordering to be
nice to the disk looks like the priority.


The point is that it's already expensive for backends to advance the 
clock; if they then have to wait on IO as well it gets REALLY expensive. 
So we want to avoid that.


Other than that though, it is pretty orthogonal, so perhaps another 
indication that the clock should be handled separately from both 
backends and bgwriter...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 3:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I do not know at this point whether these behaviors are really the same
 bug or not, but I wonder whether it's time to consider back-patching the
 renegotiation fixes we did in 9.4.  Specifically, I think maybe we should
 back-patch 31cf1a1a4, 86029b31e, and 36a3be654.  (There are more changes
 in master, but since those haven't yet shipped in any released branch,
 and there's been a lot of other rework in the same area, those probably
 are not back-patch candidates.)

 Thoughts?

 I have no clear idea how safe it is to back-port these fixes.

 Well, it would mean that pre-9.5 branches all behave the same, which
 would be an improvement in my book.  Also, ISTM that the 9.4 code
 for renegotiation assumes a whole lot less than prior branches about
 OpenSSL's internal behavior; so it ought to be more robust, even if
 some bugs remain.

 Just as a point of reference, we had a customer hit a problem similar
 to bug #12769 on 9.3.x.  I think (but am not sure) that 272923a0a may
 have been intended to fix that issue.  In a quick search, I didn't
 find any other complaints about renegotiation-related issues from our
 customers.

 The problem with trying to adopt code from HEAD is that it probably
 depends on the rather invasive changes explained here:
 http://www.postgresql.org/message-id/20150126101405.ga31...@awork2.anarazel.de
 Even assuming that there's no dependency on the immediate-interrupt
 changes, I'm afraid to back-patch anything that invasive.

What commits actually resulted from that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: row_to_array function

2015-06-23 Thread Jim Nasby

On 6/23/15 3:40 PM, Pavel Stehule wrote:

BTW, I think this relates to the desire to be able to do more OO-ish
things in the database. Like do X to all elements in this array.
And to have actual classes, private members, real arrays of arrays.
It seems like there's a bigger need here that's only being addressed
piecemeal. :/


I would not to open this box - and I would not to throw or redesign
almost all PostgreSQL type handling system. I am sure, so it is not
necessary. PL can be relative static if the dynamic is covered by query
language. The few features can implemented without to necessity to
redesign all. Still there are other PL - and we have not force to design
new Perl, JavaScript, ...


By that argument why are we putting it into plpgsql either? You can 
easily do the stuff we've been talking about in plperl (and presumably 
most other pl's). So why mess around with adding it to plpgsql?


More importantly, these are things that would be extremely useful at the 
SQL level. When it comes to records for example, we frequently know 
exactly what's in them, so why do we force users to statically specify 
that at the SQL level? This is why we don't support pivot tables (which 
in the BI world is a Big Deal).


I think it's a mistake to try and solve this strictly through plpgsql 
without recognizing the larger desire and trying to move the ball that 
direction. I'm not saying a first effort should boil the ocean, but if 
we keep piecemealing this without more though we're going to keep 
getting more warts (like a lot of the gotchas we have with arrays).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_*_columns?

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 3:47 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-06-22 21:05:52 -0400, Robert Haas wrote:
 Interesting idea.  We could consider the set of stats files a database
 unto itself and reserve a low-numbered OID for it.  The obvious thing
 to do is use the database's OID as the relfilenode, but then how do
 you replace the stats file?

 The relmapper infrastructure should be able to take care of that.

How?  I think it assumes that the number of mapped entries is pretty small.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: row_to_array function

2015-06-23 Thread Pavel Stehule
2015-06-23 21:57 GMT+02:00 Jim Nasby jim.na...@bluetreble.com:

 On 6/23/15 9:45 AM, Pavel Stehule wrote:


 2015-06-23 1:56 GMT+02:00 Jim Nasby jim.na...@bluetreble.com
 mailto:jim.na...@bluetreble.com:


 On 6/22/15 2:46 AM, Pavel Stehule wrote:


 FOREACH key, val IN RECORD myrow
 LOOP
 IF pg_typeof(val) IN ('int4', 'double precision', 'numeric')
 THEN
   val := val + 1; -- these variables can be mutable
   -- or maybe in futore
  myrow[key] := val + 1;
 END IF;
 END LOOP;

 What is important - val is automatic variable, and it can has
 different type in any step.

 It is little bit strange, but impossible to solve, so we cannot to
 support row[var] as right value (without immutable casting). But
 we can
 do it with left value.


 Actually, you can (theoretically) solve it for the right value as
 well with if val is an actual type and you have operators on that
 type that know to search for a specific operator given the actual
 types that are involved. So if val is int4, val + 1 becomes int4 +
 int4.

 The problem I've run into with this is by the time you've added
 enough casts to make this workable you've probably created a
 situation where val + something is going to recurse back to itself.
 I've partially solved this in [1], and intend to finish it by
 calling back in via SPI to do the final resolution, the same way the
 RI triggers do.

 What would be a lot better is if we had better control over function
 and operator resolution.

 [1]

 https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846


 The solution of dynamic operators changes philosophy about 180° - and I
 afraid about a performance.

 Now if I am thinking about possibilities - probably it is solvable on
 right side too. It needs to solve two steps:

 1. parametrized record reference syntax - some like SELECT $1[$]
 2. possibility to throw plan cache, if result has different type than is
 expected in cache.


 Well, the other option is we allow for cases where we don't know in
 advance what the type will be. That would handle this, JSON, variant, and
 possibly some other scenarios.

 BTW, I think this relates to the desire to be able to do more OO-ish
 things in the database. Like do X to all elements in this array. And to
 have actual classes, private members, real arrays of arrays. It seems like
 there's a bigger need here that's only being addressed piecemeal. :/


I would not to open this box - and I would not to throw or redesign almost
all PostgreSQL type handling system. I am sure, so it is not necessary. PL
can be relative static if the dynamic is covered by query language. The few
features can implemented without to necessity to redesign all. Still there
are other PL - and we have not force to design new Perl, JavaScript, ...


 --
 Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
 Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: [HACKERS] 9.5 release notes

2015-06-23 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote:
  listitem
   para
Improve concurrent locking and buffer scan performance (Andres
Freund, Kevin Grittner)
   /para
  /listitem

 If this is ab5194e6f, I don't think it makes sense to mention buffer
 scan - it's just any lwlock, and buffer locks aren't the primary
 benefit (ProcArrayLock, buffer mapping lock probably are that). I also

 don't think Kevin was involved?

It seems likely that 2ed5b87f9 was combined with something else in
this reference.  By reducing buffer pins and buffer content locking
during btree index scans it shows a slight performance gain in
btree scans and avoids some blocking of btree index vacuuming.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: row_to_array function

2015-06-23 Thread Merlin Moncure
On Tue, Jun 23, 2015 at 2:57 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 6/23/15 9:45 AM, Pavel Stehule wrote:
 1. parametrized record reference syntax - some like SELECT $1[$]
 2. possibility to throw plan cache, if result has different type than is
 expected in cache.


 Well, the other option is we allow for cases where we don't know in advance
 what the type will be. That would handle this, JSON, variant, and possibly
 some other scenarios.

 BTW, I think this relates to the desire to be able to do more OO-ish things
 in the database. Like do X to all elements in this array. And to have
 actual classes, private members, real arrays of arrays. It seems like
 there's a bigger need here that's only being addressed piecemeal. :/

I would rephrase that to: do X to all fields of an object.
Array handling is pretty good now (minus arrays of arrays, but arrays
of objects containing arrays is 'good enough' for most real world
cases).  We've suffered for a while now with hstore/json as a
temporary container to handle operations that are not well supported
by postgres's particularly strongly typed flavor SQL.   The OO of
postgres has been gradually diluting away; it's not a 'object
relational' database anymore and the OO features, very much a product
of the silly 90's OO hysteria, have been recast into more useful
features like inheritance and/or pruned back.

I don't mind having to push everything to jsonb and back for tuple
manipulation and I expect that's how these types of things are going
to be done moving forwards. jsonb has clearly caught a bid judging by
what I'm reading in the blogosphere and will continue to accrete
features things like this.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Making sure this isn't a new recovery bug ...

2015-06-23 Thread Josh Berkus
Hackers,

At one site, I have some duplicate row corruption in a staging database.
 This database was created by a binary backup from 9.3.5, which was
restored via PITR with a timestamp target to 9.3.5, so known-bad
versions.  The strange thing about the duplicate rows is that they were
all frozen prior to the original binary backup, as far as I can tell.
Here's a sample:

id| b_xmax | b_xmin |b_ctid
--+++--
 48871010 |  0 |  2 | (1529664,1)
 48871010 |  0 |  2 | (1529696,1)
 48871024 |  0 |  2 | (1529697,1)
 48871024 |  0 |  2 | (1529665,1)
 48871033 |  0 |  2 | (1529666,1)
 48871033 |  0 |  2 | (1529698,1)
 48871041 |  0 |  2 | (1529697,2)
 48871041 |  0 |  2 | (1529665,2)
 48871043 |  0 |  2 | (1529698,2)
 48871043 |  0 |  2 | (1529666,2)
 48871049 |  0 |  2 | (1529667,1)
 48871049 |  0 |  2 | (1529699,1)

This is about 1000 rows of a 100m row table.

So my question is, is this still likely to be one of the known multixact
issues?  I'm asking because this whole system will get scrubbed and
replaced with 9.3.9 in a couple days, so if it's a bug we want to
investigate, I need to do forensics now.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: row_to_array function

2015-06-23 Thread Merlin Moncure
On Tue, Jun 23, 2015 at 3:45 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 6/23/15 3:22 PM, Merlin Moncure wrote:

 I would rephrase that to: do X to all fields of an object.
 Array handling is pretty good now (minus arrays of arrays, but arrays


 Except that still won't make it easy to do something to each element of an
 array in SQL, which I think would be nice to have.

Maybe, or maybe we're framing the problem incorrectly.  To me, it's
not really all that difficult to do:
select foo(x) from unnest(bar) x;

Unless you have to maintain state inside of foo(), in which case I'd
probably using the highly underutilized 'window over custom aggregate'
technique.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-06-23 Thread Amit Kapila
On Tue, Jun 23, 2015 at 10:29 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:


 Hello Amit,

  medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min
   max_wal_size=4GB warmup=1200 time=7500

   flsh |  full speed tps   | percent of late tx, 4 clients
   /srt |  1 client   |  4 clients  |   100 |   200 |   400 |
N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
N/Y | 458 +- 327* | 743 +- 920* |  7.05 | 14.24 | 24.07 |
Y/N | 169 +- 166* | 187 +- 302* |  4.01 | 39.84 | 65.70 |
Y/Y | 546 +- 143  | 681 +- 459  |  1.55 |  3.51 |  2.84 |

 The effect of sorting is very positive (+150% to 270% tps). On this run,

 flushing has a positive (+20% with 1 client) or negative (-8 % with 4
 clients) on throughput, and late transactions are reduced by 92-95% when
 both options are activated.

 Why there is dip in performance with multiple clients,


 I'm not sure to see the dip. The performances are better with 4 clients
 compared to 1 client?


What do you mean by negative (-8 % with 4  clients) on throughput
in above sentence?  I thought by that you mean that there is dip
in TPS with patch as compare to HEAD at 4 clients.

Also I am not completely sure what's +- means in your data above?


  can it be due to reason that we started doing more stuff after holding
 bufhdr lock in below code?


 I think it is very unlikely that the buffer being locked would be
 simultaneously requested by one of the 4 clients for an UPDATE, so I do not
 think it should have a significant impact.

  BufferSync() [...]


  BufferSync()
 {
 ..
 - buf_id = StrategySyncStart(NULL, NULL);
 - num_to_scan = NBuffers;
 + active_spaces = nb_spaces;
 + space = 0;
  num_written = 0;
 - while (num_to_scan--  0)
 +
 + while (active_spaces != 0)
 ..
 }

 The changed code doesn't seems to give any consideration to
 clock-sweep point


 Indeed.

  which might not be helpful for cases when checkpoint could have flushed
 soon-to-be-recycled buffers. I think flushing the sorted buffers w.r.t
 tablespaces is a good idea, but not giving any preference to clock-sweep
 point seems to me that we would loose in some cases by this new change.


 I do not see how to do both, as these two orders seem more or less
 unrelated?


I understand your point and I also don't have any specific answer
for it at this moment, the point of worry is that it should not lead
to degradation of certain cases as compare to current algorithm.
The workload where it could effect is when your data doesn't fit
in shared buffers, but can fit in RAM.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-06-23 Thread Fujii Masao
On Wed, Jun 24, 2015 at 3:36 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Jun 24, 2015 at 1:40 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 23, 2015 at 11:21 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 06/23/2015 05:03 PM, Fujii Masao wrote:

 On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi
 wrote:

 On 06/23/2015 07:51 AM, Michael Paquier wrote:


 So... Attached are a set of patches dedicated at fixing this issue:



 Thanks for working on this!

 - 0001, add if_not_exists to pg_tablespace_location, returning NULL if
 path does not exist
 - 0002, same with pg_stat_file, returning NULL if file does not exist
 - 0003, same with pg_read_*file. I added them to all the existing
 functions for consistency.
 - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
 (thanks Robert for the naming!)
 - 0005, as things get complex, a set of regression tests aimed to
 covering those things. pg_tablespace_location is platform-dependent,
 so there are no tests for it.
 - 0006, the fix for pg_rewind, using what has been implemented before.



 With thes patches, pg_read_file() will return NULL for any failure to
 open
 the file, which makes pg_rewind to assume that the file doesn't exist in
 the
 source server, and will remove the file from the destination. That's
 dangerous, those functions should check specifically for ENOENT.


 I'm wondering if using pg_read_file() to copy the file from source server
 is reasonable. ISTM that it has two problems as follows.

 1. It cannot read very large file like 1GB file. So if such large file was
  created in source server after failover, pg_rewind would not be able
  to copy the file. No?


 pg_read_binary_file() handles large files just fine. It cannot return more
 than 1GB in one call, but you can call it several times and retrieve the
 file in chunks. That's what pg_rewind does, except for reading the control
 file, which is known to be small.

 Yeah, you're right.

 I found that pg_rewind creates a temporary table to fetch the file in chunks.
 This would prevent pg_rewind from using the *hot standby* server as a source
 server at all. This is of course a limitation of pg_rewind, but we might want
 to alleviate it in the future.

 This is something that a replication command could address properly.

 2. Many users may not allow a remote client to connect to the
  PostgreSQL server as a superuser for some security reasons. IOW,
  there would be no entry in pg_hba.conf for such connection.
  In this case, pg_rewind always fails because pg_read_file() needs
  superuser privilege. No?

 I'm tempting to implement the replication command version of
 pg_read_file(). That is, it reads and sends the data like BASE_BACKUP
 replication command does...


 Yeah, that would definitely be nice. Peter suggested it back in January
 (http://www.postgresql.org/message-id/54ac4801.7050...@gmx.net). I think
 it's way too late to do that for 9.5, however. I'm particularly worried that
 if we design the required API in a rush, we're not going to get it right,
 and will have to change it again soon. That might be difficult in a minor
 release. Using pg_read_file() and friends is quite flexible, even though we
 just find out that they're not quite flexible enough right now (the ENOENT
 problem).

 I agree that it's too late to do what I said...

 But just using pg_read_file() cannot address the #2 problem that I pointed
 in my previous email. Also requiring a superuser privilege on pg_rewind
 really conflicts with the motivation why we added replication privilege.

 So we should change pg_read_file() so that even replication user can
 read the file?

 From the security prospective, a replication user can take a base
 backup so it can already retrieve easily the contents of PGDATA. Hence
 I guess that it would be fine. However, what about cases where
 pg_hba.conf authorizes access to a given replication user via psql and
 blocks it for the replication protocol? We could say that OP should
 not give out replication access that easily, but in this case the user
 would have access to the content of PGDATA even if he should not. Is
 that unrealistic?

The most realistic case is that both source and target servers have
the pg_hba.conf containing the following authentication setting
regarding replication. That is, each server allows other to use the
replication user to connect to via replication protocol.

# TYPE  DATABASE  USER ADDRESSMETHOD
host  replication  repuser  X.X.X.X/Y  md5

This case makes me think that allowing even replication user to
call pg_read_file() may not be good solution for us. Because
in that case replication user needs to log in the real database
like postgres to call pg_read_file(), but usually there would be
no entry allowing replication user to connect to any real database in
pg_hba.conf. So if we want to address this problem, replication
command version of 

Re: [HACKERS] upper planner path-ification

2015-06-23 Thread Kouhei Kaigai
 -Original Message-
 From: David Rowley [mailto:david.row...@2ndquadrant.com]
 Sent: Tuesday, June 23, 2015 2:06 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: Robert Haas; pgsql-hackers@postgresql.org; Tom Lane
 Subject: Re: [HACKERS] upper planner path-ification
 
 
 On 23 June 2015 at 13:55, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 
 
   Once we support to add aggregation path during path consideration,
   we need to pay attention morphing of the final target-list according
   to the intermediate path combination, tentatively chosen.
   For example, if partial-aggregation makes sense from cost perspective;
   like SUM(NRows) of partial COUNT(*) AS NRows instead of COUNT(*) on
   billion rows, planner also has to adjust the final target-list according
   to the interim paths. In this case, final output shall be SUM(), instead
   of COUNT().
 
 
 
 
 This sounds very much like what's been discussed here:
 
 http://www.postgresql.org/message-id/CA+U5nMJ92azm0Yt8TT=hNxFP=VjFhDqFpaWfmj
 +66-4zvcg...@mail.gmail.com
 
 
 The basic concept is that we add another function set to aggregates that allow
 the combination of 2 states. For the case of MIN() and MAX() this will just be
 the same as the transfn. SUM() is similar for many types, more complex for 
 others.
 I've quite likely just borrowed SUM(BIGINT)'s transfer functions to allow
 COUNT()'s to be combined.

STDDEV, VARIANCE and relevant can be constructed using nrows, sum(X) and 
sum(X^2).
REGR_*, COVAR_* and relevant can be constructed using nrows, sum(X), sum(Y),
sum(X^2), sum(Y^2) and sum(X*Y).

Let me introduce a positive side effect of this approach.
Because final aggregate function processes values already aggregated partially,
the difference between the state value and transition value gets relatively 
small.
It reduces accidental errors around floating-point calculation. :-)

 More time does need spent inventing the new combining functions that don't
 currently exist, but that shouldn't matter as it can be done later.
 
 Commitfest link to patch here https://commitfest.postgresql.org/5/131/
 
 I see you've signed up to review it!

Yes, all of us looks at same direction.

Problem is, we have to cross the mountain of the planner enhancement to reach
all the valuable:
 - parallel aggregation
 - aggregation before join
 - remote aggregation via FDW

So, unless we don't find out a solution around planner, 2-phase aggregation is
like a curry without rice

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] how is a query passed between a coordinator and a datanode

2015-06-23 Thread Rui Hai Jiang


Hello, 

I'm trying to figure out how a query and its result is passed between a 
coordinator and a datanode. I know there are many messages passed between them 
to finish a query.


I did a test against the coordinator by adding a row to a table and the sql 
was, insert into hg1(id, name) values(1,'tom'). 

I found a command 'P' was sent from the coordinator to a datanode and there was 
a remote statement as following,


stmt_name=p_1_25af_f
query_string=Remote Subplan
plan_string={REMOTESTMT :commandType 3 :hasReturning false ...}


My questions are,
 1-does the coordinator use the remote statement to tell a datanode what to do? 
If so, how is the plan string created by the coordinator and how is the 
plan_string parsed by the datanode?

 2-if there are multiple rows in the result of the query, how are the rows of 
data passed from the datanode to the coordinator? Does the datanode just send 
all the rows of data to the coordinator? or the coordinator get each row of 
data by sending a query?


 Thank you very much!

 Rui Hai  
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PGXS check target forcing an install ?

2015-06-23 Thread Sandro Santilli
On Tue, Jun 23, 2015 at 02:31:03PM +0900, Michael Paquier wrote:
 On Tue, Jun 23, 2015 at 12:11 AM, Sandro Santilli s...@keybit.net wrote:
  I've noted that upgrading from PostgreSQL 9.3 to 9.5 I'm suddenly
  unable to specify a check rule in the Makefile that includes the
  PGXS one. The error is:
 
   $ make check
   rm -rf ''/tmp_install
   make -C '/home/postgresql-9.5/lib/pgxs/src/makefiles/../..' 
  DESTDIR=''/tmp_install install
   make[1]: Entering directory `/home/postgresql-9.5/lib/pgxs'
   make[1]: *** No rule to make target `install'.  Stop.
   make[1]: Leaving directory `/home/postgresql-9.5/lib/pgxs'
   make: *** [temp-install] Error 2
 
  I tracked the dangerous -rf to come from Makefile.global and it's empty
  string being due to abs_top_builddir not being define in my own Makefile.
  But beside that, which I can probably fix, it doesn't sound correct
  that a check rule insists in finding an install rule.
 
 Oops, this is a regression, and a dangerous one indeed. This is caused
 by dcae5fac.
 
 One fix is to use NO_TEMP_INSTALL=yes in Makefile.global in the
 context of PGXS, like in the patch attached, this variable needing to
 be set before Makefile.global is loaded. We could as well use directly
 PGXS in the section Testing, but that does not sound appealing for
 Makefile.global's readability.

Thanks, setting NO_TEMP_INSTALL=yes in the including Makefile fixes
this issue.

  I'm also
  surprised that there's no warning coming out from the make invocation
  given I'm defining a check rule myself (after inclusion).
 
 Why? It looks perfectly normal to me to be able to define your own
 check rule. That's more flexible this way.

I'm surprised because I used to get warnings on overrides, and I actually
still get them for other rules. For example:

Makefile:192: warning: overriding commands for target `install'
/home/postgresql-9.3.4/lib/pgxs/src/makefiles/pgxs.mk:120: warning: ignoring 
old commands for target `install'

The same warning isn't raised for the check rule, while it is clearly
defined in some upper Makefile (as shown by the forced-install-bug).

 Thoughts?

Mixed... One one hand I'm happy to implement my own rules, but in this
specific case the lack of a warning left me with no hint about where
the offending check rule was defined.

--strk;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c

2015-06-23 Thread Glen Knowles
It appears that, according to the standard, passing NULL to memcmp is
undefined behavior, even if the count is 0. See
http://stackoverflow.com/questions/16362925/can-i-pass-a-null-pointer-to-memcmp
for C99 and C++ standard references. I didn't see a good reference for C89
but I find it almost impossible to believe it was changed from defined to
undefined behavior between C89 and C99.


On Mon, Jun 22, 2015 at 2:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Mon, Jun 22, 2015 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  If I recall that code correctly, the assumption was that if the third
  argument is zero then memcmp() must not fetch any bytes (not should not,
  but MUST not) and therefore it doesn't matter if we pass a NULL.  Are
  you seeing any observable problem here, and if so what is it?

  I dunno, this seems like playing with fire to me.  A null-test would
  be pretty cheap insurance.

 A null test would be a pretty cheap way of masking a bug in that logic,
 if we ever introduced one; to wit, that it would cause a call with
 argtypes==NULL to match anything.

 Possibly saner is

 if (nargs == 0 ||
 memcmp(argtypes, best_candidate-args, nargs * sizeof(Oid)) == 0)
 break;

 I remain unconvinced that this is necessary, though.  It looks a *whole*
 lot like the guards we have against old Solaris' bsearch-of-zero-entries
 bug.  I maintain that what glibc has done is exactly to introduce a bug
 for the zero-entries case, and that Piotr ought to complain to them
 about it.  At the very least, if you commit this please annotate it
 as working around a memcmp bug.

 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Time to get rid of PQnoPasswordSupplied?

2015-06-23 Thread Craig Ringer
On 22 June 2015 at 22:00, Tom Lane t...@sss.pgh.pa.us wrote:

 I do not follow Craig's argument that this is somehow connected to the
 wire protocol version.

Upon revisiting  it, neither do I. You know when you read code and
think what idiot wrote this ... then git blame says it was you? I,
at least, know that feeling... and that's what reading that part of
that email was like.

Tom's right, of course. It's libpq-to-client. The string fe_sendauth:
no password supplied never goes over the wire; it's a response libpq
generates in response to an auth failure ErrorResponse message from
the server. It's only relevant for libpq-based clients.

Lets change it. PgAdmin-III will need a patch, but that's about the
only client I found that would care.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 make world failing due to sgml tools missing

2015-06-23 Thread Peter Eisentraut
On 6/23/15 1:06 AM, Keith Fiske wrote:
 
 
 
 http://www.keithf4.com
 
 On Sun, Jun 21, 2015 at 10:56 AM, Peter Eisentraut pete...@gmx.net
 mailto:pete...@gmx.net wrote:
 
 On 6/18/15 8:54 AM, Tom Lane wrote:
  Sure; the point is that libxml2 has suddenly been reclassified as a
  documentation build tool, which is at least a surprising categorization.
 
 libxml2 has been a required documentation build tool since PostgreSQL
 9.0.  The only thing that's new is that xmllint is in a different
 subpackage on some systems.  So just install that and you're all set for
 the foreseeable future.
 
 
 Well, something is different in 9.5. On this same system (Linux Mint
 17.1) I can build all previous versions with make world and I do not
 get this error.

Hence the use of the phrase The only thing that's new  ;-)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Memory context at PG_init call ?

2015-06-23 Thread Sandro Santilli
Empirically, I seem to be getting the _PG_init call for a module while
the active memory context lifetime is that of the function call which
first needed to load the shared object.

Is this the case ? Documented anywhere ?
Initializing memory meant to be alive for the whole lifetime of a backend
in that function is a bit complex if that's confirmed.

--strk;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-06-23 Thread Fabien COELHO



  flsh |  full speed tps   | percent of late tx, 4 clients
  /srt |  1 client   |  4 clients  |   100 |   200 |   400 |
   N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
   N/Y | 458 +- 327* | 743 +- 920* |  7.05 | 14.24 | 24.07 |
   Y/N | 169 +- 166* | 187 +- 302* |  4.01 | 39.84 | 65.70 |
   Y/Y | 546 +- 143  | 681 +- 459  |  1.55 |  3.51 |  2.84 |

The effect of sorting is very positive (+150% to 270% tps). On this run,


flushing has a positive (+20% with 1 client) or negative (-8 % with 4
clients) on throughput, and late transactions are reduced by 92-95% when
both options are activated.

Why there is dip in performance with multiple clients,


I'm not sure to see the dip. The performances are better with 4 clients
compared to 1 client?


What do you mean by negative (-8 % with 4 clients) on throughput in 
above sentence? I thought by that you mean that there is dip in TPS with 
patch as compare to HEAD at 4 clients.


Ok, I misunderstood your question. I thought you meant a dip between 1 
client and 4 clients. I meant that when flush is turned on tps goes down 
by 8% (743 to 681 tps) on this particular run. Basically tps improvements 
mostly come from sort, and flush has uncertain effects on tps 
(throuput), but much more on latency and performance stability (lower late 
rate, lower standard deviation).


Note that I'm not comparing to HEAD in the above tests, but with the new 
options desactivated, which should be more or less comparable to current 
HEAD, i.e. there is no sorting nor flushing done, but this is not strictly 
speaking HEAD behavior. Probably I should get some figures with HEAD as 
well to check the more or less assumption.



Also I am not completely sure what's +- means in your data above?


The first figure before +- is the tps, the second after is its standard 
deviation computed in per-second traces. Some runs are very bad, with 
pgbench stuck at times, and result on stddev larger than the average, they 
ere noted with *.



I understand your point and I also don't have any specific answer
for it at this moment, the point of worry is that it should not lead
to degradation of certain cases as compare to current algorithm.
The workload where it could effect is when your data doesn't fit
in shared buffers, but can fit in RAM.


Hmmm. My point of view is still that the logical priority is to optimize 
for disk IO first, then look for compatible RAM optimisations later.


I can run tests with a small shared_buffers, but probably it would just 
trigger a lot of checkpoints, or worse rely on the bgwriter to find space, 
which would generate random IOs.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-06-23 Thread Noah Misch
On Sun, Jun 14, 2015 at 12:37:00PM +0200, Magnus Hagander wrote:
 On Fri, Jun 12, 2015 at 4:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andrew Dunstan and...@dunslane.net writes:
   On 06/12/2015 09:31 AM, Robert Haas wrote:
   Could we update our git hook to refuse a push of a new commit whose
   timestamp is more than, say, 24 hours in the past?  Our commit history
   has some timestamps in it now that are over a month off, and it's
   really easy to do, because when you rebase a commit, it keeps the old
   timestamp.  If you then merge or cherry-pick that into an official
   branch rather than patch + commit, you end up with this problem unless
   you are careful to fix it by hand.  It would be nice to prevent
   further mistakes of this sort, as they create confusion.
 
   I think 24 hours is probably fairly generous,
 
  Yeah ... if we're going to try to enforce a linear-looking history, ISTM
  the requirement ought to be newer than the latest commit on the same
  branch.  Perhaps that would be unduly hard to implement though.
 
 
 From a quick look at our existing script, I think that's doable, but I'd
 have to do some more detailed verification before I'm sure. And we'd have
 to figure out some way to deal with a push with multiple commits in it, but
 it should certainly be doable if the first one is.
 
 Would we in that case want to enforce linearity *and* recent-ness, or just
 linearity? as in, do we care about the commit time even if it doesn't
 change the order?

If a recency check is essentially free, let's check both.  Otherwise,
enforcing linearity alone is a 95% solution that implicitly bounds recency.

  FWIW, our git_changelog script tries to avoid this problem by paying
  attention to CommitDate not Date.  But I agree that it's confusing when
  those fields are far apart.
 
 
 That brings it back to the enforcing - would we want to enforce both those?

May as well.  AuthorDate is the main source of trouble.  You would need to go
out of your way (e.g. git filter-branch) to push an old CommitDate, but let's
check it just the same.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Foreign join pushdown vs EvalPlanQual

2015-06-23 Thread Etsuro Fujita
Hi,

While reviewing the foreign join pushdown core patch, I noticed that the
patch doesn't perform an EvalPlanQual recheck properly.  The example
that crashes the server will be shown below (it uses the postgres_fdw
patch [1]).  I think the reason for that is because the ForeignScan node
performing the foreign join remotely has scanrelid = 0 while
ExecScanFetch assumes that its scan node has scanrelid  0.

I think this is a bug.  I've not figured out how to fix this yet, but I
thought we would also need another plan that evaluates the join locally
for the test tuples for EvalPlanQual.  Though I'm missing something though.

Create an environment:

postgres=# create table tab (a int, b int);
CREATE TABLE
postgres=# create foreign table foo (a int) server myserver options
(table_name 'foo');
CREATE FOREIGN TABLE
postgres=# create foreign table bar (a int) server myserver options
(table_name 'bar');
CREATE FOREIGN TABLE
postgres=# insert into tab values (1, 1);
INSERT 0 1
postgres=# insert into foo values (1);
INSERT 0 1
postgres=# insert into bar values (1);
INSERT 0 1
postgres=# analyze tab;
ANALYZE
postgres=# analyze foo;
ANALYZE
postgres=# analyze bar;
ANALYZE

Run the example:

[Terminal 1]
postgres=# begin;
BEGIN
postgres=# update tab set b = b + 1 where a = 1;
UPDATE 1

[Terminal 2]
postgres=# explain verbose select tab.* from tab, foo, bar where tab.a =
foo.a and foo.a = bar.a for update;

 QUERY PLAN



 LockRows  (cost=100.00..101.18 rows=4 width=70)
   Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
   -  Nested Loop  (cost=100.00..101.14 rows=4 width=70)
 Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
 Join Filter: (foo.a = tab.a)
 -  Seq Scan on public.tab  (cost=0.00..1.01 rows=1 width=14)
   Output: tab.a, tab.b, tab.ctid
 -  Foreign Scan  (cost=100.00..100.08 rows=4 width=64)
   Output: foo.*, foo.a, bar.*, bar.a
   Relations: (public.foo) INNER JOIN (public.bar)
   Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT
ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1,
a2) INNER
JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR
UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2))
(11 rows)

postgres=# select tab.* from tab, foo, bar where tab.a = foo.a and foo.a
= bar.a for update;

[Terminal 1]
postgres=# commit;
COMMIT

[Terminal 2]
(After the commit in Terminal 1, Terminal 2 will show the following.)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!

Best regards,
Etsuro Fujita

[1]
http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_wagh2rgzpg0o4pqgd+iauyaj8wtze+cyj...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] upper planner path-ification

2015-06-23 Thread Kouhei Kaigai
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
 Sent: Tuesday, June 23, 2015 10:18 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: David Rowley; pgsql-hackers@postgresql.org; Tom Lane
 Subject: Re: [HACKERS] upper planner path-ification
 
 On Tue, Jun 23, 2015 at 4:41 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
  So, unless we don't find out a solution around planner, 2-phase aggregation 
  is
  like a curry without rice
 
 Simon and I spoke with Tom about this upper planner path-ification
 problem at PGCon, and he indicated that he intended to work on it
 soon, and that we should bug him about it if he doesn't.
 
 I'm pleased to here this as I seem to have a special talent in that area.

It's fantastic support arm.

So, it may be more productive to pay my efforts for investigation
of David Rowley's patch, based on the assumption if we can add
partial/final-aggregation path node as if scan/join-paths.

My interest also stands on his infrastructure.
  https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf
It is a bit old paper but good starting point.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 5:48 PM, Kevin Grittner kgri...@ymail.com wrote:
 Andres Freund and...@anarazel.de wrote:
  listitem
   para
Improve concurrent locking and buffer scan performance (Andres
Freund, Kevin Grittner)
   /para
  /listitem

 If this is ab5194e6f, I don't think it makes sense to mention buffer
 scan - it's just any lwlock, and buffer locks aren't the primary
 benefit (ProcArrayLock, buffer mapping lock probably are that). I also

 don't think Kevin was involved?

 It seems likely that 2ed5b87f9 was combined with something else in
 this reference.  By reducing buffer pins and buffer content locking
 during btree index scans it shows a slight performance gain in
 btree scans and avoids some blocking of btree index vacuuming.

I think maybe we should separate that back out.  The list needs to be
user-accessible, but if it's hard to understand what it's referring
to, that's not good either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-23 Thread Craig Ringer
On 24 June 2015 at 03:23, Jim Nasby jim.na...@bluetreble.com wrote:
 On 6/23/15 12:21 PM, Tom Lane wrote:

 I concur: if we're to have a flag at all, it should work as Alvaro says.

 However, I'm not real sure we need a flag.  I think the use-case of
 wanting extra logging for a bgworker under development is unlikely to be
 satisfied very well by just causing existing start/stop logging messages
 to come out at higher priority.  You're likely to be wanting to log other,
 bgworker-specific, events, and so you'll probably end up writing a bunch
 of your own elog calls anyway (which you'll eventually remove, #ifdef out,
 or decrease the log levels of).


 FWIW, I have this problem *constantly* with plpgsql. I put RAISE DEBUGs in,
 but once you have those in enough places SET client_min_messages=debug
 becomes next to useless because of the huge volume of spew.

 What I'd like is a way to add an identifier to ereport/RAISE so you could
 turn on individual reports. If we had that we'd just make these particular
 ereports DEBUG1 and not worry about it because you could easily turn them on
 when needed.

So, log identifiers/classifiers, essentially.

Message numbers have been discussed before with regards to core and
rejected consistently. I don't think it's really come up in terms of
PLs and user-defined functions.

I've certainly had similar issues to you w.r.t. to debug messages from
user-level code, and wanted to be able to enable one particular log
line, all log output from a particular function, or all log output
from a particular extension / all functions in a schema.

I think it's a whole separate topicto Pavel's original proposal
though, and really merits a separate thread. For Pavel's issue I'm all
in favour of just changing the log message, I think LOG is way too
high for something that's internal implementation detail.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how is a query passed between a coordinator and a datanode

2015-06-23 Thread Robert Haas
On Tue, Jun 23, 2015 at 5:07 AM, Rui Hai Jiang ruihaiji...@msn.com wrote:
 I'm trying to figure out how a query and its result is passed between a 
 coordinator and a datanode. I know there are many messages passed between 
 them to finish a query.


 I did a test against the coordinator by adding a row to a table and the sql 
 was, insert into hg1(id, name) values(1,'tom').

 I found a command 'P' was sent from the coordinator to a datanode and there 
 was a remote statement as following,


 stmt_name=p_1_25af_f
 query_string=Remote Subplan
 plan_string={REMOTESTMT :commandType 3 :hasReturning false ...}


 My questions are,
  1-does the coordinator use the remote statement to tell a datanode what to 
 do? If so, how is the plan string created by the coordinator and how is the 
 plan_string parsed by the datanode?

  2-if there are multiple rows in the result of the query, how are the rows of 
 data passed from the datanode to the coordinator? Does the datanode just send 
 all the rows of data to the coordinator? or the coordinator get each row of 
 data by sending a query?


  Thank you very much!

This is probably an appropriate question for postgres-xl-developers,
but not pgsql-hackers.  Those concepts do not exist in PostgreSQL,
only Postgres-XC or one of its proliferating set of forks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Oh, this is embarrassing: init file logic is still broken

2015-06-23 Thread Tom Lane
Chasing a problem identified by my Salesforce colleagues led me to the
conclusion that my commit f3b5565dd (Use a safer method for determining
whether relcache init file is stale) is rather borked.  It causes
pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file,
because that index is not used by any syscache.  I had been aware of that
actually, but considered it a minor issue.  It's not so minor though,
because RelationCacheInitializePhase3 marks that index as nailed for
performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES.
That means that load_relcache_init_file *always* decides that the init
file is busted and silently(!) ignores it.  So we're taking a nontrivial
hit in backend startup speed as of the last set of minor releases.

Clearly, my shortcut of defining the init file contents as exactly the set
of syscache-supporting rels was too short.  I think the cleanest fix is
to create a wrapper function in relcache.c that encapsulates the knowledge
that pg_trigger_tgrelid_tgname_index is also to be included in the init
file.  But to prevent this type of problem from recurring, we need some
defenses against the wrapper getting out of sync with reality elsewhere.
I suggest that:

1. write_relcache_init_file ought to bitch if it concludes that a nailed
relation is not to be written to the init file.  Probably an Assert is
enough.

2. load_relcache_init_file ought to complain if it takes the wrong number
of nailed relations exit path.  Here, it seems like there might be a
chance of the path being taken validly, for example if a minor release
were to decide to nail a rel that wasn't nailed before, or vice versa.
So what I'm thinking about here is an elog(WARNING) complaint, which would
make logic bugs like this pretty visible in development builds.  If we
ever did make a change like that in a minor release, people might get a
one-time WARNING when upgrading, but I think that would be acceptable.

Thoughts, better ideas?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers