Re: [HACKERS] checkpointer continuous flushing
It'd be interesting to see numbers for tiny, without the overly small checkpoint timeout value. 30s is below the OS's writeback time. Here are some tests with longer timeout: tiny2: scale=10 shared_buffers=1GB checkpoint_timeout=5min max_wal_size=1GB warmup=600 time=4000 flsh | full speed tps | percent of late tx, 4 clients, for tps: /srt | 1 client | 4 clients | 100 | 200 | 400 | 800 | 1200 | 1600 N/N | 930 +- 124 | 2560 +- 394 | 0.70 | 1.03 | 1.27 | 1.56 | 2.02 | 2.38 N/Y | 924 +- 122 | 2612 +- 326 | 0.63 | 0.79 | 0.94 | 1.15 | 1.45 | 1.67 Y/N | 907 +- 112 | 2590 +- 315 | 0.58 | 0.83 | 0.68 | 0.71 | 0.81 | 1.26 Y/Y | 915 +- 114 | 2590 +- 317 | 0.60 | 0.68 | 0.70 | 0.78 | 0.88 | 1.13 There seems to be a small 1-2% performance benefit with 4 clients, this is reversed for 1 client, there are significantly and consistently less late transactions when options are activated, the performance is more stable (standard deviation reduced by 10-18%). The db is about 200 MB ~ 25000 pages, at 2500+ tps it is written 40 times over in 5 minutes, so the checkpoint basically writes everything over 220 seconds, 0.9 MB/s. Given the preload phase the buffers may be more or less in order in memory, so would be written out in order. medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min max_wal_size=4GB warmup=1200 time=7500 flsh | full speed tps | percent of late tx, 4 clients /srt | 1 client | 4 clients | 100 | 200 | 400 | N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 | N/Y | 458 +- 327* | 743 +- 920* | 7.05 | 14.24 | 24.07 | Y/N | 169 +- 166* | 187 +- 302* | 4.01 | 39.84 | 65.70 | Y/Y | 546 +- 143 | 681 +- 459 | 1.55 | 3.51 | 2.84 | The effect of sorting is very positive (+150% to 270% tps). On this run, flushing has a positive (+20% with 1 client) or negative (-8 % with 4 clients) on throughput, and late transactions are reduced by 92-95% when both options are activated. At 550 tps checkpoints are xlog-triggered and write about 1/3 of the database, (17 buffers to write very 220-260 seconds, 4 MB/s). -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less log level for success dynamic background workers for 9.5
2015-06-23 15:20 GMT+02:00 Robert Haas robertmh...@gmail.com: On Mon, Jun 22, 2015 at 9:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: On Tue, Jun 23, 2015 at 10:07 AM, Robert Haas wrote: On Mon, Jun 22, 2015 at 8:19 PM, Jim Nasby wrote: Anything ever happen with this? I agree that LOG is to high for reporting most (if not all) of these things. I think we should consider having a flag for this behavior rather than changing the behavior across the board. But then again, maybe we should just change it. What do others think? A GUC just for that looks like an overkill to me, this log is useful when debugging. And one could always have its bgworker call elog by itself at startup and before leaving to provide more or less similar information. I agree that we don't need YAGUC here, particularly not one that applies indiscriminately to all bgworkers. I'd vote for just decreasing the log level. The current coding is appropriate for a facility that's basically experimental; but as it moves towards being something that would be used routinely in production, the argument for being noisy in the log gets weaker and weaker. I was thinking of a background worker flag, not a GUC. BGWORKER_QUIET, or something like that. But I guess we ought to just change it. I have not any problem with bg worker flag. The only question is, what should be by default. Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] proposal: row_to_array function
2015-06-23 1:56 GMT+02:00 Jim Nasby jim.na...@bluetreble.com: On 6/22/15 2:46 AM, Pavel Stehule wrote: FOREACH key, val IN RECORD myrow LOOP IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') THEN val := val + 1; -- these variables can be mutable -- or maybe in futore myrow[key] := val + 1; END IF; END LOOP; What is important - val is automatic variable, and it can has different type in any step. It is little bit strange, but impossible to solve, so we cannot to support row[var] as right value (without immutable casting). But we can do it with left value. Actually, you can (theoretically) solve it for the right value as well with if val is an actual type and you have operators on that type that know to search for a specific operator given the actual types that are involved. So if val is int4, val + 1 becomes int4 + int4. The problem I've run into with this is by the time you've added enough casts to make this workable you've probably created a situation where val + something is going to recurse back to itself. I've partially solved this in [1], and intend to finish it by calling back in via SPI to do the final resolution, the same way the RI triggers do. What would be a lot better is if we had better control over function and operator resolution. [1] https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846 The solution of dynamic operators changes philosophy about 180° - and I afraid about a performance. Now if I am thinking about possibilities - probably it is solvable on right side too. It needs to solve two steps: 1. parametrized record reference syntax - some like SELECT $1[$] 2. possibility to throw plan cache, if result has different type than is expected in cache. Pavel -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: [HACKERS] pg_rewind failure by file deletion in source server
On 06/23/2015 07:51 AM, Michael Paquier wrote: So... Attached are a set of patches dedicated at fixing this issue: Thanks for working on this! - 0001, add if_not_exists to pg_tablespace_location, returning NULL if path does not exist - 0002, same with pg_stat_file, returning NULL if file does not exist - 0003, same with pg_read_*file. I added them to all the existing functions for consistency. - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs (thanks Robert for the naming!) - 0005, as things get complex, a set of regression tests aimed to covering those things. pg_tablespace_location is platform-dependent, so there are no tests for it. - 0006, the fix for pg_rewind, using what has been implemented before. With thes patches, pg_read_file() will return NULL for any failure to open the file, which makes pg_rewind to assume that the file doesn't exist in the source server, and will remove the file from the destination. That's dangerous, those functions should check specifically for ENOENT. There's still a small race condition with tablespaces. If you run CREATE TABLESPACE in the source server while pg_rewind is running, it's possible that the recursive query that pg_rewind uses sees the symlink in pg_tblspc/ directory, but its snapshot doesn't see the row in pg_tablespace yet. It will think that the symlink is a regular file, try to read it, and fail (if we checked for ENOENT). Actually, I think we need try to deal with symlinks a bit harder. Currently, pg_rewind assumes that anything in pg_tblspace that has a matching row in pg_tablespace is a symlink, and nothing else is. I think symlinks to directories. I just noticed that pg_rewind fails miserable if pg_xlog is a symlink, because of that: The servers diverged at WAL position 0/3023F08 on timeline 1. Rewinding from last common checkpoint at 0/260 on timeline 1 data-master//pg_xlog is not a directory Failure, exiting I think we need to add a column to pg_stat_file output, to indicate symbolic links, and add a pg_readlink() function. That still leaves a race condition if the type of a file changes, i.e. a file is deleted and a directory with the same name is created in its place, but that seems acceptable. I don't think PostgreSQL ever does such a thing, so that could only happen if you mess with the data directory manually while the server is running. I just realized another problem: We recently learned the hard way that some people have files in the data directory that are not writeable by the 'postgres' user (http://www.postgresql.org/message-id/20150523172627.ga24...@msg.df7cb.de). pg_rewind will try to overwrite all files it doesn't recognize as relation files, so it's going to fail on those. A straightforward fix would be to first open the destination file in read-only mode, and compare its contents, and only open the file in write mode if it has changed. It would still fail when the files really differ, but I think that's acceptable. I note that pg_rewind doesn't need to distinguish between an empty and a non-existent directory, so it's quite silly for it to pass include_dot_dirs=true, and then filter out . and .. from the result set. The documentation should mention the main reason for including . and ..: to distinguish between an empty and non-existent directory. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind failure by file deletion in source server
On 06/23/2015 05:03 PM, Fujii Masao wrote: On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/23/2015 07:51 AM, Michael Paquier wrote: So... Attached are a set of patches dedicated at fixing this issue: Thanks for working on this! - 0001, add if_not_exists to pg_tablespace_location, returning NULL if path does not exist - 0002, same with pg_stat_file, returning NULL if file does not exist - 0003, same with pg_read_*file. I added them to all the existing functions for consistency. - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs (thanks Robert for the naming!) - 0005, as things get complex, a set of regression tests aimed to covering those things. pg_tablespace_location is platform-dependent, so there are no tests for it. - 0006, the fix for pg_rewind, using what has been implemented before. With thes patches, pg_read_file() will return NULL for any failure to open the file, which makes pg_rewind to assume that the file doesn't exist in the source server, and will remove the file from the destination. That's dangerous, those functions should check specifically for ENOENT. I'm wondering if using pg_read_file() to copy the file from source server is reasonable. ISTM that it has two problems as follows. 1. It cannot read very large file like 1GB file. So if such large file was created in source server after failover, pg_rewind would not be able to copy the file. No? pg_read_binary_file() handles large files just fine. It cannot return more than 1GB in one call, but you can call it several times and retrieve the file in chunks. That's what pg_rewind does, except for reading the control file, which is known to be small. 2. Many users may not allow a remote client to connect to the PostgreSQL server as a superuser for some security reasons. IOW, there would be no entry in pg_hba.conf for such connection. In this case, pg_rewind always fails because pg_read_file() needs superuser privilege. No? I'm tempting to implement the replication command version of pg_read_file(). That is, it reads and sends the data like BASE_BACKUP replication command does... Yeah, that would definitely be nice. Peter suggested it back in January (http://www.postgresql.org/message-id/54ac4801.7050...@gmx.net). I think it's way too late to do that for 9.5, however. I'm particularly worried that if we design the required API in a rush, we're not going to get it right, and will have to change it again soon. That might be difficult in a minor release. Using pg_read_file() and friends is quite flexible, even though we just find out that they're not quite flexible enough right now (the ENOENT problem). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash index creation warning
On Mon, Jun 22, 2015 at 8:46 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Jun 23, 2015 at 9:06 AM, Jim Nasby jim.na...@bluetreble.com wrote: On 6/12/15 5:00 PM, Thom Brown wrote: On 18 October 2014 at 15:36, Bruce Momjian br...@momjian.us wrote: On Fri, Oct 17, 2014 at 02:36:55PM -0400, Bruce Momjian wrote: On Fri, Oct 17, 2014 at 12:56:52PM -0400, Tom Lane wrote: David G Johnston david.g.johns...@gmail.com writes: The question is whether we explain the implications of not being WAL-logged in an error message or simply state the fact and let the documentation explain the hazards - basically just output: hash indexes are not WAL-logged and their use is discouraged +1. The warning message is not the place to be trying to explain all the details. OK, updated patch attached. Patch applied. I only just noticed this item when I read the release notes. Should we bother warning when used on an unlogged table? Not really; but I think the bigger question at this point is if we want to change it this late in the game. Changing it even during beta looks acceptable to me. I think that it is mainly a matter to have a patch (here is one), and someone to push it as everybody here seem to agree that for UNLOGGED tables this warning has little sense. I think you should be testing RelationNeedsWAL(), not the relpersistence directly. The same point applies for temporary indexes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less log level for success dynamic background workers for 9.5
On Tue, Jun 23, 2015 at 10:53 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2015-06-23 15:20 GMT+02:00 Robert Haas robertmh...@gmail.com: I was thinking of a background worker flag, not a GUC. BGWORKER_QUIET, or something like that. But I guess we ought to just change it. I have not any problem with bg worker flag. The only question is, what should be by default. Well, if the flag is BGWORKER_QUIET, then the default behavior remains unchanged, but when that flag is used, the log level is reduced to DEBUG1. That has the advantage of not breaking backward compatibility. But I'm not sure whether anyone cares if we just break it, and it's certainly simpler without the flag. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A couple of newlines missing in pg_rewind log entries
On 06/23/2015 07:39 AM, Michael Paquier wrote: Hi all, Some grepping is showing up that a couple of newlines are missing in pg_rewind, leading to unreadable log entries: libpq_fetch.c:pg_log(PG_DEBUG, getting file chunks); Fixed. logging.c:pg_log(PG_PROGRESS, %*s/%s kB (%d%%) copied, This one was on purpose; note the printf(\r) after that line. That's supposed to be a progress indicator that is updated on the single line. filemap.c:pg_fatal(could not stat file \%s\: %s, Peter fixed this already. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] upper planner path-ification
On Tue, Jun 23, 2015 at 4:41 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: So, unless we don't find out a solution around planner, 2-phase aggregation is like a curry without rice Simon and I spoke with Tom about this upper planner path-ification problem at PGCon, and he indicated that he intended to work on it soon, and that we should bug him about it if he doesn't. I'm pleased to here this as I seem to have a special talent in that area. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less log level for success dynamic background workers for 9.5
On Mon, Jun 22, 2015 at 9:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: On Tue, Jun 23, 2015 at 10:07 AM, Robert Haas wrote: On Mon, Jun 22, 2015 at 8:19 PM, Jim Nasby wrote: Anything ever happen with this? I agree that LOG is to high for reporting most (if not all) of these things. I think we should consider having a flag for this behavior rather than changing the behavior across the board. But then again, maybe we should just change it. What do others think? A GUC just for that looks like an overkill to me, this log is useful when debugging. And one could always have its bgworker call elog by itself at startup and before leaving to provide more or less similar information. I agree that we don't need YAGUC here, particularly not one that applies indiscriminately to all bgworkers. I'd vote for just decreasing the log level. The current coding is appropriate for a facility that's basically experimental; but as it moves towards being something that would be used routinely in production, the argument for being noisy in the log gets weaker and weaker. I was thinking of a background worker flag, not a GUC. BGWORKER_QUIET, or something like that. But I guess we ought to just change it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
At 2014-08-20 11:07:44 +0300, hlinnakan...@vmware.com wrote: I don't think the new GetBufferWithoutRelcache function is in line with the existing ReadBuffer API. I think it would be better to add a new ReadBufferMode - RBM_CACHE_ONLY? - that only returns the buffer if it's already in cache, and InvalidBuffer otherwise. Hi Heikki. I liked the idea of having a new ReadBufferMode of RBM_CACHE_ONLY, but I wasn't happy with the result when I tried to modify the code to suit. It didn't make the code easier for me to follow. (I'll say straight up that it can be done the way you said, and if the consensus is that it would be an improvement, I'll do it that way. I'm just not convinced about it, hence this mail.) For example: static Buffer ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode, BufferAccessStrategy strategy, bool *hit) { volatile BufferDesc *bufHdr; Block bufBlock; boolfound; boolisExtend; boolisLocalBuf = SmgrIsTemp(smgr); *hit = false; /* Make sure we will have room to remember the buffer pin */ ResourceOwnerEnlargeBuffers(CurrentResourceOwner); isExtend = (blockNum == P_NEW); isLocalBuf and isExtend will both be false for the RBM_CACHE_ONLY case, which is good for us. But that probably needs to be mentioned in a comment here. TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum, smgr-smgr_rnode.node.spcNode, smgr-smgr_rnode.node.dbNode, smgr-smgr_rnode.node.relNode, smgr-smgr_rnode.backend, isExtend); We know we're not going to be doing IO in the RBM_CACHE_ONLY case, but that's probably OK? I don't understand this TRACE_* stuff. But we need to either skip this, or also do the corresponding TRACE_*_DONE later. if (isLocalBuf) { bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, found); if (found) pgBufferUsage.local_blks_hit++; else pgBufferUsage.local_blks_read++; } else { /* * lookup the buffer. IO_IN_PROGRESS is set if the requested block is * not currently in memory. */ bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum, strategy, found); if (found) pgBufferUsage.shared_blks_hit++; else pgBufferUsage.shared_blks_read++; } The nicest option I could think of here was to copy the shared_buffers lookup from BufferAlloc into a new BufferAlloc_shared function, and add a new branch for RBM_CACHE_ONLY. That would look like: if (isLocalBuf) … else if (mode == RBM_CACHE_ONLY) { /* * We check to see if the buffer is already cached, and if it's * not, we return InvalidBuffer because we know it's not pinned. */ bufHdr = BufferAlloc_shared(…, found); if (!found) return InvalidBuffer; LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr)); return BufferDescriptorGetBuffer(bufHdr); } else { /* * lookup the buffer … } …or if we went with the rest of the code and didn't do the lock/return immediately, we would fall through to this code: /* At this point we do NOT hold any locks. */ /* if it was already in the buffer pool, we're done */ if (found) { if (!isExtend) { /* Just need to update stats before we exit */ *hit = true; VacuumPageHit++; if (VacuumCostActive) VacuumCostBalance += VacuumCostPageHit; TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum, smgr-smgr_rnode.node.spcNode, smgr-smgr_rnode.node.dbNode, smgr-smgr_rnode.node.relNode, smgr-smgr_rnode.backend, isExtend, found); /* * In RBM_ZERO_AND_LOCK mode the caller expects the page to be * locked on return. */ if (!isLocalBuf) { if (mode == RBM_ZERO_AND_LOCK) LWLockAcquire(bufHdr-content_lock, LW_EXCLUSIVE); else if (mode == RBM_ZERO_AND_CLEANUP_LOCK) LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr)); } return BufferDescriptorGetBuffer(bufHdr); } Some of this isn't applicable to RBM_CACHE_ONLY, so we could
Re: [HACKERS] get_relation_info comment out of sync
On Sun, Jun 21, 2015 at 8:10 AM, Thomas Munro thomas.mu...@enterprisedb.com wrote: The comment for get_relation_info should probably include serverid in the list of rel members that it can update (see attached). Committed, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL TAP tests and chmod
On Sun, Jun 21, 2015 at 9:50 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Jun 19, 2015 at 2:49 PM, Michael Paquier wrote: The TAP tests in src/test/ssl are using system() in combination with chmod, but perl has a command chmod integrated into it, and it would work better on Windows as well. The attached patch is aimed at fixing that. For the sake of the archives, this has been committed by Robert (thanks!): commit: ca3f43aa48a83013ea50aeee7cd193a5859c4587 author: Robert Haas rh...@postgresql.org date: Fri, 19 Jun 2015 10:46:30 -0400 Change TAP test framework to not rely on having a chmod executable. This might not work at all on Windows, and is not ever efficient. Whoops, sorry, I thought I had replied to this thread, but I guess I forgot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory context at PG_init call ?
On Tue, Jun 23, 2015 at 7:41 AM, Sandro Santilli s...@keybit.net wrote: Empirically, I seem to be getting the _PG_init call for a module while the active memory context lifetime is that of the function call which first needed to load the shared object. Is this the case ? Documented anywhere ? Initializing memory meant to be alive for the whole lifetime of a backend in that function is a bit complex if that's confirmed. If you want something that lasts for the lifetime of the backend, just do MemoryContext oldctx = MemoryContextSwitchTo(TopMemoryContext); ... MemoryContextSwitchTo(oldctx); I'm not sure what context you should expect in _PG_init(), although what you mention doesn't sound unreasonable. But generally if you want a long-lived context you should establish that explicitly yourself. We try to keep short-lived contexts active whenever possible because that avoids long-term leaks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Insufficient locking for ALTER DEFAULT PRIVILEGES
On Sun, Jun 21, 2015 at 11:11 AM, Andres Freund and...@anarazel.de wrote: On 2015-06-21 11:45:24 -0300, Alvaro Herrera wrote: Alvaro Herrera wrote: Now that I actually check with a non-relation object, I see pretty much the same error. So probably if instead of some narrow bug fix what we need is some general solution for all object types. I know this has been discussed a number of times ... Anyway I see now that we should not consider this a backpatchable bug fix, and I'm not doing the coding either, at least not now. Discussed this with a couple of 2ndQ colleagues and it became evident that MVCC catalog scans probably make this problem much more prominent. So historical branches are not affected all that much, but it's a real issue on 9.4+. Hm. I don't see how those would make a marked difference. The snapshot for catalogs scan are taken afresh for each scan (unless cached). There'll probably be some difference, but it'll be small. Yeah, I think the same. If those changes introduced a problem we didn't have before, I'd like to see a reproducible test case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind failure by file deletion in source server
On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/23/2015 07:51 AM, Michael Paquier wrote: So... Attached are a set of patches dedicated at fixing this issue: Thanks for working on this! - 0001, add if_not_exists to pg_tablespace_location, returning NULL if path does not exist - 0002, same with pg_stat_file, returning NULL if file does not exist - 0003, same with pg_read_*file. I added them to all the existing functions for consistency. - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs (thanks Robert for the naming!) - 0005, as things get complex, a set of regression tests aimed to covering those things. pg_tablespace_location is platform-dependent, so there are no tests for it. - 0006, the fix for pg_rewind, using what has been implemented before. With thes patches, pg_read_file() will return NULL for any failure to open the file, which makes pg_rewind to assume that the file doesn't exist in the source server, and will remove the file from the destination. That's dangerous, those functions should check specifically for ENOENT. I'm wondering if using pg_read_file() to copy the file from source server is reasonable. ISTM that it has two problems as follows. 1. It cannot read very large file like 1GB file. So if such large file was created in source server after failover, pg_rewind would not be able to copy the file. No? 2. Many users may not allow a remote client to connect to the PostgreSQL server as a superuser for some security reasons. IOW, there would be no entry in pg_hba.conf for such connection. In this case, pg_rewind always fails because pg_read_file() needs superuser privilege. No? I'm tempting to implement the replication command version of pg_read_file(). That is, it reads and sends the data like BASE_BACKUP replication command does... Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c
Glen Knowles gknow...@ieee.org writes: It appears that, according to the standard, passing NULL to memcmp is undefined behavior, even if the count is 0. See http://stackoverflow.com/questions/16362925/can-i-pass-a-null-pointer-to-memcmp for C99 and C++ standard references. Hmm ... looks like that's correct. I had not noticed the introductory paragraphs. For those following along at home, the relevant text in C99 is in 7.21.1 String function conventions: [#2] Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters. and the relevant text from 7.1.4 is [#1] Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions | that follow: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined. So it looks like we'd better change it. I am not sure whether to put in the nargs == 0 test suggested yesterday or to just insist that callers not pass NULL. A quick grep suggests that there is only one such caller right now, namely this bit in ruleutils.c: appendStringInfo(buf, EXECUTE PROCEDURE %s(, generate_function_name(trigrec-tgfoid, 0, NIL, NULL, false, NULL, EXPR_KIND_NONE)); You could certainly argue that that's taking an unwarranted shortcut. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] btree_gin and BETWEEN
If I use the btree_gin extension to build a gin index on a scalar value, it doesn't work well with BETWEEN queries. It looks like it scans the whole index, with the part of the index between the endpoints getting scanned twice. It is basically executed as if col1 between x and y were col1 between -inf and y and col1 between x and +inf. It puts the correct tuples into the bitmap, because whichever inequality is not being used to set the query endpoint currently is used as a filter instead. So I could just not build that index. But I want it for other reasons, and the problem is that the planner thinks the index can implement the BETWEEN query efficiently. So even if it has truly better options available, it switches to using a falsely attractive btree_gin index. create table foo as select random() as btree, random() as gin from generate_series(1,300); create index on foo using gin (gin); create index on foo using btree (btree); explain ( analyze, buffers) select count(*) from foo where btree between 0.001 and 0.00105; explain ( analyze, buffers) select count(*) from foo where gin between 0.001 and 0.00105; It would be nice if btree_gin supported BETWEEN and other range queries efficiently, or at least if the planner knew it couldn't support them efficiently. But I don't see where to begin on either one of these tasks. Is either one of them plausible? Cheers, Jeff
Re: [HACKERS] pg_rewind failure by file deletion in source server
On Tue, Jun 23, 2015 at 11:21 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/23/2015 05:03 PM, Fujii Masao wrote: On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/23/2015 07:51 AM, Michael Paquier wrote: So... Attached are a set of patches dedicated at fixing this issue: Thanks for working on this! - 0001, add if_not_exists to pg_tablespace_location, returning NULL if path does not exist - 0002, same with pg_stat_file, returning NULL if file does not exist - 0003, same with pg_read_*file. I added them to all the existing functions for consistency. - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs (thanks Robert for the naming!) - 0005, as things get complex, a set of regression tests aimed to covering those things. pg_tablespace_location is platform-dependent, so there are no tests for it. - 0006, the fix for pg_rewind, using what has been implemented before. With thes patches, pg_read_file() will return NULL for any failure to open the file, which makes pg_rewind to assume that the file doesn't exist in the source server, and will remove the file from the destination. That's dangerous, those functions should check specifically for ENOENT. I'm wondering if using pg_read_file() to copy the file from source server is reasonable. ISTM that it has two problems as follows. 1. It cannot read very large file like 1GB file. So if such large file was created in source server after failover, pg_rewind would not be able to copy the file. No? pg_read_binary_file() handles large files just fine. It cannot return more than 1GB in one call, but you can call it several times and retrieve the file in chunks. That's what pg_rewind does, except for reading the control file, which is known to be small. Yeah, you're right. I found that pg_rewind creates a temporary table to fetch the file in chunks. This would prevent pg_rewind from using the *hot standby* server as a source server at all. This is of course a limitation of pg_rewind, but we might want to alleviate it in the future. 2. Many users may not allow a remote client to connect to the PostgreSQL server as a superuser for some security reasons. IOW, there would be no entry in pg_hba.conf for such connection. In this case, pg_rewind always fails because pg_read_file() needs superuser privilege. No? I'm tempting to implement the replication command version of pg_read_file(). That is, it reads and sends the data like BASE_BACKUP replication command does... Yeah, that would definitely be nice. Peter suggested it back in January (http://www.postgresql.org/message-id/54ac4801.7050...@gmx.net). I think it's way too late to do that for 9.5, however. I'm particularly worried that if we design the required API in a rush, we're not going to get it right, and will have to change it again soon. That might be difficult in a minor release. Using pg_read_file() and friends is quite flexible, even though we just find out that they're not quite flexible enough right now (the ENOENT problem). I agree that it's too late to do what I said... But just using pg_read_file() cannot address the #2 problem that I pointed in my previous email. Also requiring a superuer privilege on pg_rewind really conflicts with the motivation why we added replication privilege. So we should change pg_read_file() so that even replication user can read the file? Or replication user version of pg_read_file() should be implemented? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less log level for success dynamic background workers for 9.5
Robert Haas wrote: On Tue, Jun 23, 2015 at 1:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: However, I'm not real sure we need a flag. I think the use-case of wanting extra logging for a bgworker under development is unlikely to be satisfied very well by just causing existing start/stop logging messages to come out at higher priority. You're likely to be wanting to log other, bgworker-specific, events, and so you'll probably end up writing a bunch of your own elog calls anyway (which you'll eventually remove, #ifdef out, or decrease the log levels of). Yeah. So let's just change it. +1 -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less log level for success dynamic background workers for 9.5
Robert Haas wrote: On Tue, Jun 23, 2015 at 10:53 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2015-06-23 15:20 GMT+02:00 Robert Haas robertmh...@gmail.com: I was thinking of a background worker flag, not a GUC. BGWORKER_QUIET, or something like that. But I guess we ought to just change it. I have not any problem with bg worker flag. The only question is, what should be by default. Well, if the flag is BGWORKER_QUIET, then the default behavior remains unchanged, but when that flag is used, the log level is reduced to DEBUG1. That has the advantage of not breaking backward compatibility. But I'm not sure whether anyone cares if we just break it, and it's certainly simpler without the flag. I vote we do it the other way around, that is have a flag BGWORKER_VERBOSE. This breaks backwards compatibility (I don't think there's too much value in that in this case), but it copes with the more common use case that you want to have the flag while the worker is being developed; and things that are already working don't need to change in order to get the natural behavior. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less log level for success dynamic background workers for 9.5
Alvaro Herrera alvhe...@2ndquadrant.com writes: Robert Haas wrote: Well, if the flag is BGWORKER_QUIET, then the default behavior remains unchanged, but when that flag is used, the log level is reduced to DEBUG1. That has the advantage of not breaking backward compatibility. But I'm not sure whether anyone cares if we just break it, and it's certainly simpler without the flag. I vote we do it the other way around, that is have a flag BGWORKER_VERBOSE. This breaks backwards compatibility (I don't think there's too much value in that in this case), but it copes with the more common use case that you want to have the flag while the worker is being developed; and things that are already working don't need to change in order to get the natural behavior. I concur: if we're to have a flag at all, it should work as Alvaro says. However, I'm not real sure we need a flag. I think the use-case of wanting extra logging for a bgworker under development is unlikely to be satisfied very well by just causing existing start/stop logging messages to come out at higher priority. You're likely to be wanting to log other, bgworker-specific, events, and so you'll probably end up writing a bunch of your own elog calls anyway (which you'll eventually remove, #ifdef out, or decrease the log levels of). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less log level for success dynamic background workers for 9.5
On Tue, Jun 23, 2015 at 1:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Robert Haas wrote: Well, if the flag is BGWORKER_QUIET, then the default behavior remains unchanged, but when that flag is used, the log level is reduced to DEBUG1. That has the advantage of not breaking backward compatibility. But I'm not sure whether anyone cares if we just break it, and it's certainly simpler without the flag. I vote we do it the other way around, that is have a flag BGWORKER_VERBOSE. This breaks backwards compatibility (I don't think there's too much value in that in this case), but it copes with the more common use case that you want to have the flag while the worker is being developed; and things that are already working don't need to change in order to get the natural behavior. I concur: if we're to have a flag at all, it should work as Alvaro says. However, I'm not real sure we need a flag. I think the use-case of wanting extra logging for a bgworker under development is unlikely to be satisfied very well by just causing existing start/stop logging messages to come out at higher priority. You're likely to be wanting to log other, bgworker-specific, events, and so you'll probably end up writing a bunch of your own elog calls anyway (which you'll eventually remove, #ifdef out, or decrease the log levels of). Yeah. So let's just change it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_*_columns?
On Tue, Jun 23, 2015 at 3:01 AM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jun 21, 2015 at 11:43 AM, Magnus Hagander mag...@hagander.net wrote: On Sat, Jun 20, 2015 at 11:55 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Jun 20, 2015 at 7:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: But if the structure got too big to map (on a 32-bit system), then you'd be sort of hosed, because there's no way to attach just part of it. That might not be worth worrying about, but it depends on how big it's likely to get - a 32-bit system is very likely to choke on a 1GB mapping, and maybe even on a much smaller one. Yeah, I'm quite worried about assuming that we can map a data structure that might be of very significant size into shared memory on 32-bit machines. The address space just isn't there. Considering the advantages of avoiding message queues, I think we should think a little bit harder about whether we can't find some way to skin this cat. As I think about this a little more, I'm not sure there's really a problem with one stats DSM per database. Sure, the system might have 100,000 databases in some crazy pathological case, but the maximum number of those that can be in use is bounded by max_connections, which means the maximum number of stats file DSMs we could ever need at one time is also bounded by max_connections. There are a few corner cases to think about, like if the user writes a client that connects to all 100,000 databases in very quick succession, we've got to jettison the old DSMs fast enough to make room for the new DSMs before we run out of slots, but that doesn't seem like a particularly tough nut to crack. If the stats collector ensures that it never attaches to more than MaxBackends stats DSMs at a time, and each backend ensures that it never attaches to more than one stats DSM at a time, then 2 * MaxBackends stats DSMs is always enough. And that's just a matter of bumping PG_DYNSHMEM_SLOTS_PER_BACKEND from 2 to 4. In more realistic cases, it will probably be normal for many or all backends to be connected to the same database, and the number of stats DSMs required will be far smaller. What about a combination in the line of something like this: stats collector keeps the statistics in local memory as before. But when a backend needs to get a snapshot of it's data, it uses a shared memory queue to request it. What the stats collector does in this case is allocate a new DSM, copy the data into that DSM, and hands the DSM over to the backend. At this point the stats collector can forget about it, and it's up to the backend to get rid of it when it's done with it. Well, there seems to be little point in having the stats collector forget about a DSM that it could equally well have shared with the next guy who wants a stats snapshot for the same database. That case is surely *plenty* common enough to be worth optimizing for. Right, we only need to drop it once we have received a stats message for it so something changed. And possibly that with a minimum time as well, as we have now, if we want to limit the potential churn. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
[HACKERS] Should we back-patch SSL renegotiation fixes?
Those of you who have been following http://www.postgresql.org/message-id/flat/1d3bc192-970d-4b70-a5fe-38d2a9f76...@me.com are aware that Red Hat shipped a rather broken version of openssl last week. While waiting for them to fix it, I've been poking at the behavior, and have found out that PG 9.4 and later are much less badly broken than older branches. In the newer branches you'll see a failure only after transmitting 2GB within a session, whereas the older branches fail at the second renegotiation attempt, which would typically be 1GB of data and could be a lot less. I do not know at this point whether these behaviors are really the same bug or not, but I wonder whether it's time to consider back-patching the renegotiation fixes we did in 9.4. Specifically, I think maybe we should back-patch 31cf1a1a4, 86029b31e, and 36a3be654. (There are more changes in master, but since those haven't yet shipped in any released branch, and there's been a lot of other rework in the same area, those probably are not back-patch candidates.) Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we back-patch SSL renegotiation fixes?
On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Those of you who have been following http://www.postgresql.org/message-id/flat/1d3bc192-970d-4b70-a5fe-38d2a9f76...@me.com are aware that Red Hat shipped a rather broken version of openssl last week. While waiting for them to fix it, I've been poking at the behavior, and have found out that PG 9.4 and later are much less badly broken than older branches. In the newer branches you'll see a failure only after transmitting 2GB within a session, whereas the older branches fail at the second renegotiation attempt, which would typically be 1GB of data and could be a lot less. I do not know at this point whether these behaviors are really the same bug or not, but I wonder whether it's time to consider back-patching the renegotiation fixes we did in 9.4. Specifically, I think maybe we should back-patch 31cf1a1a4, 86029b31e, and 36a3be654. (There are more changes in master, but since those haven't yet shipped in any released branch, and there's been a lot of other rework in the same area, those probably are not back-patch candidates.) Thoughts? I have no clear idea how safe it is to back-port these fixes. Just as a point of reference, we had a customer hit a problem similar to bug #12769 on 9.3.x. I think (but am not sure) that 272923a0a may have been intended to fix that issue. In a quick search, I didn't find any other complaints about renegotiation-related issues from our customers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we back-patch SSL renegotiation fixes?
Robert Haas wrote: On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: I do not know at this point whether these behaviors are really the same bug or not, but I wonder whether it's time to consider back-patching the renegotiation fixes we did in 9.4. Specifically, I think maybe we should back-patch 31cf1a1a4, 86029b31e, and 36a3be654. (There are more changes in master, but since those haven't yet shipped in any released branch, and there's been a lot of other rework in the same area, those probably are not back-patch candidates.) Yes, +1 for backpatching. Don't forget 5674460b and b1aebbb6. I could reproduce problems trivially with COPY in psql without that and a small renegotiation limit, as I recall. Thoughts? I have no clear idea how safe it is to back-port these fixes. Just as a point of reference, we had a customer hit a problem similar to bug #12769 on 9.3.x. I think (but am not sure) that 272923a0a may have been intended to fix that issue. Maybe we should *also* backpatch that, then (and if so, then also its fixup 1c2b7c087). I do not think that the failure was introduced by the fixes cited above. In a quick search, I didn't find any other complaints about renegotiation-related issues from our customers. Other issues Andres was investigating seemed related to nonblocking connections (which as I recall are used mostly by replication code). Bug #12769 contained a link to the previous discussion. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we back-patch SSL renegotiation fixes?
Robert Haas robertmh...@gmail.com writes: On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: I do not know at this point whether these behaviors are really the same bug or not, but I wonder whether it's time to consider back-patching the renegotiation fixes we did in 9.4. Specifically, I think maybe we should back-patch 31cf1a1a4, 86029b31e, and 36a3be654. (There are more changes in master, but since those haven't yet shipped in any released branch, and there's been a lot of other rework in the same area, those probably are not back-patch candidates.) Thoughts? I have no clear idea how safe it is to back-port these fixes. Well, it would mean that pre-9.5 branches all behave the same, which would be an improvement in my book. Also, ISTM that the 9.4 code for renegotiation assumes a whole lot less than prior branches about OpenSSL's internal behavior; so it ought to be more robust, even if some bugs remain. Just as a point of reference, we had a customer hit a problem similar to bug #12769 on 9.3.x. I think (but am not sure) that 272923a0a may have been intended to fix that issue. In a quick search, I didn't find any other complaints about renegotiation-related issues from our customers. The problem with trying to adopt code from HEAD is that it probably depends on the rather invasive changes explained here: http://www.postgresql.org/message-id/20150126101405.ga31...@awork2.anarazel.de Even assuming that there's no dependency on the immediate-interrupt changes, I'm afraid to back-patch anything that invasive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind failure by file deletion in source server
On Wed, Jun 24, 2015 at 1:40 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 23, 2015 at 11:21 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/23/2015 05:03 PM, Fujii Masao wrote: On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/23/2015 07:51 AM, Michael Paquier wrote: So... Attached are a set of patches dedicated at fixing this issue: Thanks for working on this! - 0001, add if_not_exists to pg_tablespace_location, returning NULL if path does not exist - 0002, same with pg_stat_file, returning NULL if file does not exist - 0003, same with pg_read_*file. I added them to all the existing functions for consistency. - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs (thanks Robert for the naming!) - 0005, as things get complex, a set of regression tests aimed to covering those things. pg_tablespace_location is platform-dependent, so there are no tests for it. - 0006, the fix for pg_rewind, using what has been implemented before. With thes patches, pg_read_file() will return NULL for any failure to open the file, which makes pg_rewind to assume that the file doesn't exist in the source server, and will remove the file from the destination. That's dangerous, those functions should check specifically for ENOENT. I'm wondering if using pg_read_file() to copy the file from source server is reasonable. ISTM that it has two problems as follows. 1. It cannot read very large file like 1GB file. So if such large file was created in source server after failover, pg_rewind would not be able to copy the file. No? pg_read_binary_file() handles large files just fine. It cannot return more than 1GB in one call, but you can call it several times and retrieve the file in chunks. That's what pg_rewind does, except for reading the control file, which is known to be small. Yeah, you're right. I found that pg_rewind creates a temporary table to fetch the file in chunks. This would prevent pg_rewind from using the *hot standby* server as a source server at all. This is of course a limitation of pg_rewind, but we might want to alleviate it in the future. This is something that a replication command could address properly. 2. Many users may not allow a remote client to connect to the PostgreSQL server as a superuser for some security reasons. IOW, there would be no entry in pg_hba.conf for such connection. In this case, pg_rewind always fails because pg_read_file() needs superuser privilege. No? I'm tempting to implement the replication command version of pg_read_file(). That is, it reads and sends the data like BASE_BACKUP replication command does... Yeah, that would definitely be nice. Peter suggested it back in January (http://www.postgresql.org/message-id/54ac4801.7050...@gmx.net). I think it's way too late to do that for 9.5, however. I'm particularly worried that if we design the required API in a rush, we're not going to get it right, and will have to change it again soon. That might be difficult in a minor release. Using pg_read_file() and friends is quite flexible, even though we just find out that they're not quite flexible enough right now (the ENOENT problem). I agree that it's too late to do what I said... But just using pg_read_file() cannot address the #2 problem that I pointed in my previous email. Also requiring a superuser privilege on pg_rewind really conflicts with the motivation why we added replication privilege. So we should change pg_read_file() so that even replication user can read the file? From the security prospective, a replication user can take a base backup so it can already retrieve easily the contents of PGDATA. Hence I guess that it would be fine. However, what about cases where pg_hba.conf authorizes access to a given replication user via psql and blocks it for the replication protocol? We could say that OP should not give out replication access that easily, but in this case the user would have access to the content of PGDATA even if he should not. Is that unrealistic? Or replication user version of pg_read_file() should be implemented? You mean a new function? In what is it different from authorizing pg_read_file usage for a replication user? Honestly, I can live with this superuser restriction in 9.5. And come back to the replication user restriction in 9.6 once things cool down a bit. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less log level for success dynamic background workers for 9.5
On 6/23/15 12:21 PM, Tom Lane wrote: I concur: if we're to have a flag at all, it should work as Alvaro says. However, I'm not real sure we need a flag. I think the use-case of wanting extra logging for a bgworker under development is unlikely to be satisfied very well by just causing existing start/stop logging messages to come out at higher priority. You're likely to be wanting to log other, bgworker-specific, events, and so you'll probably end up writing a bunch of your own elog calls anyway (which you'll eventually remove, #ifdef out, or decrease the log levels of). FWIW, I have this problem *constantly* with plpgsql. I put RAISE DEBUGs in, but once you have those in enough places SET client_min_messages=debug becomes next to useless because of the huge volume of spew. What I'd like is a way to add an identifier to ereport/RAISE so you could turn on individual reports. If we had that we'd just make these particular ereports DEBUG1 and not worry about it because you could easily turn them on when needed. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_*_columns?
On 2015-06-22 21:05:52 -0400, Robert Haas wrote: Interesting idea. We could consider the set of stats files a database unto itself and reserve a low-numbered OID for it. The obvious thing to do is use the database's OID as the relfilenode, but then how do you replace the stats file? The relmapper infrastructure should be able to take care of that. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we back-patch SSL renegotiation fixes?
Alvaro Herrera alvhe...@2ndquadrant.com writes: On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: I do not know at this point whether these behaviors are really the same bug or not, but I wonder whether it's time to consider back-patching the renegotiation fixes we did in 9.4. Specifically, I think maybe we should back-patch 31cf1a1a4, 86029b31e, and 36a3be654. Yes, +1 for backpatching. Don't forget 5674460b and b1aebbb6. Huh? 5674460b is ancient, and we concluded that b1aebbb6 didn't represent anything much more than cosmetic fixes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
On 6/23/15 9:45 AM, Pavel Stehule wrote: 2015-06-23 1:56 GMT+02:00 Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com: On 6/22/15 2:46 AM, Pavel Stehule wrote: FOREACH key, val IN RECORD myrow LOOP IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') THEN val := val + 1; -- these variables can be mutable -- or maybe in futore myrow[key] := val + 1; END IF; END LOOP; What is important - val is automatic variable, and it can has different type in any step. It is little bit strange, but impossible to solve, so we cannot to support row[var] as right value (without immutable casting). But we can do it with left value. Actually, you can (theoretically) solve it for the right value as well with if val is an actual type and you have operators on that type that know to search for a specific operator given the actual types that are involved. So if val is int4, val + 1 becomes int4 + int4. The problem I've run into with this is by the time you've added enough casts to make this workable you've probably created a situation where val + something is going to recurse back to itself. I've partially solved this in [1], and intend to finish it by calling back in via SPI to do the final resolution, the same way the RI triggers do. What would be a lot better is if we had better control over function and operator resolution. [1] https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846 The solution of dynamic operators changes philosophy about 180° - and I afraid about a performance. Now if I am thinking about possibilities - probably it is solvable on right side too. It needs to solve two steps: 1. parametrized record reference syntax - some like SELECT $1[$] 2. possibility to throw plan cache, if result has different type than is expected in cache. Well, the other option is we allow for cases where we don't know in advance what the type will be. That would handle this, JSON, variant, and possibly some other scenarios. BTW, I think this relates to the desire to be able to do more OO-ish things in the database. Like do X to all elements in this array. And to have actual classes, private members, real arrays of arrays. It seems like there's a bigger need here that's only being addressed piecemeal. :/ -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] btree_gin and BETWEEN
Jeff Janes jeff.ja...@gmail.com writes: If I use the btree_gin extension to build a gin index on a scalar value, it doesn't work well with BETWEEN queries. It looks like it scans the whole index, with the part of the index between the endpoints getting scanned twice. It is basically executed as if col1 between x and y were col1 between -inf and y and col1 between x and +inf. It puts the correct tuples into the bitmap, because whichever inequality is not being used to set the query endpoint currently is used as a filter instead. So I could just not build that index. But I want it for other reasons, and the problem is that the planner thinks the index can implement the BETWEEN query efficiently. So even if it has truly better options available, it switches to using a falsely attractive btree_gin index. create table foo as select random() as btree, random() as gin from generate_series(1,300); create index on foo using gin (gin); create index on foo using btree (btree); explain ( analyze, buffers) select count(*) from foo where btree between 0.001 and 0.00105; explain ( analyze, buffers) select count(*) from foo where gin between 0.001 and 0.00105; It would be nice if btree_gin supported BETWEEN and other range queries efficiently, or at least if the planner knew it couldn't support them efficiently. But I don't see where to begin on either one of these tasks. Is either one of them plausible? ISTM this is a bug/oversight in the core GIN code: it has multiple GinScanEntry's for the same index column, but fails to realize that it could start the index scans at the largest match value of any of those GinScanEntry's. This did not matter so much before the partial-match feature went in; but with a partial match we might be talking about scanning a large fraction of the index, and it's important to optimize the start point if possible. I think there is also a failure to notice when we could stop the scans because one of a group of AND'ed entries says no more matches are possible. I'm not sure where the best place to hack in this consideration would be, though. Oleg, Teodor, any thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_*_columns?
On 6/22/15 8:05 PM, Robert Haas wrote: In totally different crazy way we could just use the existing buffer manager we have and simply put the stats file in shared_buffers. Inventing a per-database relfilenode that doesn't conflict doesn't seem impossible. With some care it shouldn't be hard to make that stats file readable from all sessions, even if they're not connected to the database (e.g. autovacuum launcher). Interesting idea. We could consider the set of stats files a database unto itself and reserve a low-numbered OID for it. The obvious thing to do is use the database's OID as the relfilenode, but then how do you replace the stats file? I think one of the biggest use cases for the stats is to collect them over time and put them in a database. It's rather tempting to just say that's what we should be doing, and provide a knob for how often to record the values and how long to keep the data for. That would eliminate a lot of these problems. The one part I don't see how to solve is the case where bad stuff is happening *right now* and you don't/can't wait for the next reporting interval. Presumably handling that requires all the stuff that's discussed already and you might as well just handle the recording in user space. But maybe someone has some clever ideas there... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
On 6/23/15 3:22 PM, Merlin Moncure wrote: I would rephrase that to: do X to all fields of an object. Array handling is pretty good now (minus arrays of arrays, but arrays Except that still won't make it easy to do something to each element of an array in SQL, which I think would be nice to have. of objects containing arrays is 'good enough' for most real world cases). We've suffered for a while now with hstore/json as a temporary container to handle operations that are not well supported by postgres's particularly strongly typed flavor SQL. The OO of postgres has been gradually diluting away; it's not a 'object relational' database anymore and the OO features, very much a product of the silly 90's OO hysteria, have been recast into more useful features like inheritance and/or pruned back. Admittedly I've never played with an OO database, but I think our data features are pretty good [1]. Where I do think we can improve though is developing/coding things in the database. For example, I'd love to have the equivalent to a class. Perhaps that could be accomplished by allowing multiple instances of an extension. I'd also like stronger support for private objects (permissions don't really fit that bill). I don't mind having to push everything to jsonb and back for tuple manipulation and I expect that's how these types of things are going to be done moving forwards. jsonb has clearly caught a bid judging by what I'm reading in the blogosphere and will continue to accrete features things like this. I think it's unfortunate to lose the strong typing that we have. That can be especially important for something like numbers (was it originally a float or a numeric?). But maybe JSON is good enough. [1] The one OO-ish data feature I'd like is the ability to de-reference a foreign key pointer. So if CREATE TABLE b( a_id int REFERENCES a); then have SELECT a_id.some_field FROM b; transform to SELECT a.some_field FROM b JOIN a ...; -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
On 6/22/15 11:59 PM, Fabien COELHO wrote: which might not be helpful for cases when checkpoint could have flushed soon-to-be-recycled buffers. I think flushing the sorted buffers w.r.t tablespaces is a good idea, but not giving any preference to clock-sweep point seems to me that we would loose in some cases by this new change. I do not see how to do both, as these two orders seem more or less unrelated? The traditionnal assumption is that the I/O are very slow and they are to be optimized first, so going for buffer ordering to be nice to the disk looks like the priority. The point is that it's already expensive for backends to advance the clock; if they then have to wait on IO as well it gets REALLY expensive. So we want to avoid that. Other than that though, it is pretty orthogonal, so perhaps another indication that the clock should be handled separately from both backends and bgwriter... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we back-patch SSL renegotiation fixes?
On Tue, Jun 23, 2015 at 3:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: I do not know at this point whether these behaviors are really the same bug or not, but I wonder whether it's time to consider back-patching the renegotiation fixes we did in 9.4. Specifically, I think maybe we should back-patch 31cf1a1a4, 86029b31e, and 36a3be654. (There are more changes in master, but since those haven't yet shipped in any released branch, and there's been a lot of other rework in the same area, those probably are not back-patch candidates.) Thoughts? I have no clear idea how safe it is to back-port these fixes. Well, it would mean that pre-9.5 branches all behave the same, which would be an improvement in my book. Also, ISTM that the 9.4 code for renegotiation assumes a whole lot less than prior branches about OpenSSL's internal behavior; so it ought to be more robust, even if some bugs remain. Just as a point of reference, we had a customer hit a problem similar to bug #12769 on 9.3.x. I think (but am not sure) that 272923a0a may have been intended to fix that issue. In a quick search, I didn't find any other complaints about renegotiation-related issues from our customers. The problem with trying to adopt code from HEAD is that it probably depends on the rather invasive changes explained here: http://www.postgresql.org/message-id/20150126101405.ga31...@awork2.anarazel.de Even assuming that there's no dependency on the immediate-interrupt changes, I'm afraid to back-patch anything that invasive. What commits actually resulted from that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
On 6/23/15 3:40 PM, Pavel Stehule wrote: BTW, I think this relates to the desire to be able to do more OO-ish things in the database. Like do X to all elements in this array. And to have actual classes, private members, real arrays of arrays. It seems like there's a bigger need here that's only being addressed piecemeal. :/ I would not to open this box - and I would not to throw or redesign almost all PostgreSQL type handling system. I am sure, so it is not necessary. PL can be relative static if the dynamic is covered by query language. The few features can implemented without to necessity to redesign all. Still there are other PL - and we have not force to design new Perl, JavaScript, ... By that argument why are we putting it into plpgsql either? You can easily do the stuff we've been talking about in plperl (and presumably most other pl's). So why mess around with adding it to plpgsql? More importantly, these are things that would be extremely useful at the SQL level. When it comes to records for example, we frequently know exactly what's in them, so why do we force users to statically specify that at the SQL level? This is why we don't support pivot tables (which in the BI world is a Big Deal). I think it's a mistake to try and solve this strictly through plpgsql without recognizing the larger desire and trying to move the ball that direction. I'm not saying a first effort should boil the ocean, but if we keep piecemealing this without more though we're going to keep getting more warts (like a lot of the gotchas we have with arrays). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_*_columns?
On Tue, Jun 23, 2015 at 3:47 PM, Andres Freund and...@anarazel.de wrote: On 2015-06-22 21:05:52 -0400, Robert Haas wrote: Interesting idea. We could consider the set of stats files a database unto itself and reserve a low-numbered OID for it. The obvious thing to do is use the database's OID as the relfilenode, but then how do you replace the stats file? The relmapper infrastructure should be able to take care of that. How? I think it assumes that the number of mapped entries is pretty small. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
2015-06-23 21:57 GMT+02:00 Jim Nasby jim.na...@bluetreble.com: On 6/23/15 9:45 AM, Pavel Stehule wrote: 2015-06-23 1:56 GMT+02:00 Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com: On 6/22/15 2:46 AM, Pavel Stehule wrote: FOREACH key, val IN RECORD myrow LOOP IF pg_typeof(val) IN ('int4', 'double precision', 'numeric') THEN val := val + 1; -- these variables can be mutable -- or maybe in futore myrow[key] := val + 1; END IF; END LOOP; What is important - val is automatic variable, and it can has different type in any step. It is little bit strange, but impossible to solve, so we cannot to support row[var] as right value (without immutable casting). But we can do it with left value. Actually, you can (theoretically) solve it for the right value as well with if val is an actual type and you have operators on that type that know to search for a specific operator given the actual types that are involved. So if val is int4, val + 1 becomes int4 + int4. The problem I've run into with this is by the time you've added enough casts to make this workable you've probably created a situation where val + something is going to recurse back to itself. I've partially solved this in [1], and intend to finish it by calling back in via SPI to do the final resolution, the same way the RI triggers do. What would be a lot better is if we had better control over function and operator resolution. [1] https://github.com/decibel/variant/commit/2b99067744a405da8a325de1ebabd213106f794f#diff-8aa2db4a577ee4201d6eb9041c2a457eR846 The solution of dynamic operators changes philosophy about 180° - and I afraid about a performance. Now if I am thinking about possibilities - probably it is solvable on right side too. It needs to solve two steps: 1. parametrized record reference syntax - some like SELECT $1[$] 2. possibility to throw plan cache, if result has different type than is expected in cache. Well, the other option is we allow for cases where we don't know in advance what the type will be. That would handle this, JSON, variant, and possibly some other scenarios. BTW, I think this relates to the desire to be able to do more OO-ish things in the database. Like do X to all elements in this array. And to have actual classes, private members, real arrays of arrays. It seems like there's a bigger need here that's only being addressed piecemeal. :/ I would not to open this box - and I would not to throw or redesign almost all PostgreSQL type handling system. I am sure, so it is not necessary. PL can be relative static if the dynamic is covered by query language. The few features can implemented without to necessity to redesign all. Still there are other PL - and we have not force to design new Perl, JavaScript, ... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: [HACKERS] 9.5 release notes
Andres Freund and...@anarazel.de wrote: listitem para Improve concurrent locking and buffer scan performance (Andres Freund, Kevin Grittner) /para /listitem If this is ab5194e6f, I don't think it makes sense to mention buffer scan - it's just any lwlock, and buffer locks aren't the primary benefit (ProcArrayLock, buffer mapping lock probably are that). I also don't think Kevin was involved? It seems likely that 2ed5b87f9 was combined with something else in this reference. By reducing buffer pins and buffer content locking during btree index scans it shows a slight performance gain in btree scans and avoids some blocking of btree index vacuuming. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
On Tue, Jun 23, 2015 at 2:57 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 6/23/15 9:45 AM, Pavel Stehule wrote: 1. parametrized record reference syntax - some like SELECT $1[$] 2. possibility to throw plan cache, if result has different type than is expected in cache. Well, the other option is we allow for cases where we don't know in advance what the type will be. That would handle this, JSON, variant, and possibly some other scenarios. BTW, I think this relates to the desire to be able to do more OO-ish things in the database. Like do X to all elements in this array. And to have actual classes, private members, real arrays of arrays. It seems like there's a bigger need here that's only being addressed piecemeal. :/ I would rephrase that to: do X to all fields of an object. Array handling is pretty good now (minus arrays of arrays, but arrays of objects containing arrays is 'good enough' for most real world cases). We've suffered for a while now with hstore/json as a temporary container to handle operations that are not well supported by postgres's particularly strongly typed flavor SQL. The OO of postgres has been gradually diluting away; it's not a 'object relational' database anymore and the OO features, very much a product of the silly 90's OO hysteria, have been recast into more useful features like inheritance and/or pruned back. I don't mind having to push everything to jsonb and back for tuple manipulation and I expect that's how these types of things are going to be done moving forwards. jsonb has clearly caught a bid judging by what I'm reading in the blogosphere and will continue to accrete features things like this. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Making sure this isn't a new recovery bug ...
Hackers, At one site, I have some duplicate row corruption in a staging database. This database was created by a binary backup from 9.3.5, which was restored via PITR with a timestamp target to 9.3.5, so known-bad versions. The strange thing about the duplicate rows is that they were all frozen prior to the original binary backup, as far as I can tell. Here's a sample: id| b_xmax | b_xmin |b_ctid --+++-- 48871010 | 0 | 2 | (1529664,1) 48871010 | 0 | 2 | (1529696,1) 48871024 | 0 | 2 | (1529697,1) 48871024 | 0 | 2 | (1529665,1) 48871033 | 0 | 2 | (1529666,1) 48871033 | 0 | 2 | (1529698,1) 48871041 | 0 | 2 | (1529697,2) 48871041 | 0 | 2 | (1529665,2) 48871043 | 0 | 2 | (1529698,2) 48871043 | 0 | 2 | (1529666,2) 48871049 | 0 | 2 | (1529667,1) 48871049 | 0 | 2 | (1529699,1) This is about 1000 rows of a 100m row table. So my question is, is this still likely to be one of the known multixact issues? I'm asking because this whole system will get scrubbed and replaced with 9.3.9 in a couple days, so if it's a bug we want to investigate, I need to do forensics now. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
On Tue, Jun 23, 2015 at 3:45 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 6/23/15 3:22 PM, Merlin Moncure wrote: I would rephrase that to: do X to all fields of an object. Array handling is pretty good now (minus arrays of arrays, but arrays Except that still won't make it easy to do something to each element of an array in SQL, which I think would be nice to have. Maybe, or maybe we're framing the problem incorrectly. To me, it's not really all that difficult to do: select foo(x) from unnest(bar) x; Unless you have to maintain state inside of foo(), in which case I'd probably using the highly underutilized 'window over custom aggregate' technique. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
On Tue, Jun 23, 2015 at 10:29 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Amit, medium2: scale=300 shared_buffers=5GB checkpoint_timeout=30min max_wal_size=4GB warmup=1200 time=7500 flsh | full speed tps | percent of late tx, 4 clients /srt | 1 client | 4 clients | 100 | 200 | 400 | N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 | N/Y | 458 +- 327* | 743 +- 920* | 7.05 | 14.24 | 24.07 | Y/N | 169 +- 166* | 187 +- 302* | 4.01 | 39.84 | 65.70 | Y/Y | 546 +- 143 | 681 +- 459 | 1.55 | 3.51 | 2.84 | The effect of sorting is very positive (+150% to 270% tps). On this run, flushing has a positive (+20% with 1 client) or negative (-8 % with 4 clients) on throughput, and late transactions are reduced by 92-95% when both options are activated. Why there is dip in performance with multiple clients, I'm not sure to see the dip. The performances are better with 4 clients compared to 1 client? What do you mean by negative (-8 % with 4 clients) on throughput in above sentence? I thought by that you mean that there is dip in TPS with patch as compare to HEAD at 4 clients. Also I am not completely sure what's +- means in your data above? can it be due to reason that we started doing more stuff after holding bufhdr lock in below code? I think it is very unlikely that the buffer being locked would be simultaneously requested by one of the 4 clients for an UPDATE, so I do not think it should have a significant impact. BufferSync() [...] BufferSync() { .. - buf_id = StrategySyncStart(NULL, NULL); - num_to_scan = NBuffers; + active_spaces = nb_spaces; + space = 0; num_written = 0; - while (num_to_scan-- 0) + + while (active_spaces != 0) .. } The changed code doesn't seems to give any consideration to clock-sweep point Indeed. which might not be helpful for cases when checkpoint could have flushed soon-to-be-recycled buffers. I think flushing the sorted buffers w.r.t tablespaces is a good idea, but not giving any preference to clock-sweep point seems to me that we would loose in some cases by this new change. I do not see how to do both, as these two orders seem more or less unrelated? I understand your point and I also don't have any specific answer for it at this moment, the point of worry is that it should not lead to degradation of certain cases as compare to current algorithm. The workload where it could effect is when your data doesn't fit in shared buffers, but can fit in RAM. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_rewind failure by file deletion in source server
On Wed, Jun 24, 2015 at 3:36 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Jun 24, 2015 at 1:40 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 23, 2015 at 11:21 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/23/2015 05:03 PM, Fujii Masao wrote: On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 06/23/2015 07:51 AM, Michael Paquier wrote: So... Attached are a set of patches dedicated at fixing this issue: Thanks for working on this! - 0001, add if_not_exists to pg_tablespace_location, returning NULL if path does not exist - 0002, same with pg_stat_file, returning NULL if file does not exist - 0003, same with pg_read_*file. I added them to all the existing functions for consistency. - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs (thanks Robert for the naming!) - 0005, as things get complex, a set of regression tests aimed to covering those things. pg_tablespace_location is platform-dependent, so there are no tests for it. - 0006, the fix for pg_rewind, using what has been implemented before. With thes patches, pg_read_file() will return NULL for any failure to open the file, which makes pg_rewind to assume that the file doesn't exist in the source server, and will remove the file from the destination. That's dangerous, those functions should check specifically for ENOENT. I'm wondering if using pg_read_file() to copy the file from source server is reasonable. ISTM that it has two problems as follows. 1. It cannot read very large file like 1GB file. So if such large file was created in source server after failover, pg_rewind would not be able to copy the file. No? pg_read_binary_file() handles large files just fine. It cannot return more than 1GB in one call, but you can call it several times and retrieve the file in chunks. That's what pg_rewind does, except for reading the control file, which is known to be small. Yeah, you're right. I found that pg_rewind creates a temporary table to fetch the file in chunks. This would prevent pg_rewind from using the *hot standby* server as a source server at all. This is of course a limitation of pg_rewind, but we might want to alleviate it in the future. This is something that a replication command could address properly. 2. Many users may not allow a remote client to connect to the PostgreSQL server as a superuser for some security reasons. IOW, there would be no entry in pg_hba.conf for such connection. In this case, pg_rewind always fails because pg_read_file() needs superuser privilege. No? I'm tempting to implement the replication command version of pg_read_file(). That is, it reads and sends the data like BASE_BACKUP replication command does... Yeah, that would definitely be nice. Peter suggested it back in January (http://www.postgresql.org/message-id/54ac4801.7050...@gmx.net). I think it's way too late to do that for 9.5, however. I'm particularly worried that if we design the required API in a rush, we're not going to get it right, and will have to change it again soon. That might be difficult in a minor release. Using pg_read_file() and friends is quite flexible, even though we just find out that they're not quite flexible enough right now (the ENOENT problem). I agree that it's too late to do what I said... But just using pg_read_file() cannot address the #2 problem that I pointed in my previous email. Also requiring a superuser privilege on pg_rewind really conflicts with the motivation why we added replication privilege. So we should change pg_read_file() so that even replication user can read the file? From the security prospective, a replication user can take a base backup so it can already retrieve easily the contents of PGDATA. Hence I guess that it would be fine. However, what about cases where pg_hba.conf authorizes access to a given replication user via psql and blocks it for the replication protocol? We could say that OP should not give out replication access that easily, but in this case the user would have access to the content of PGDATA even if he should not. Is that unrealistic? The most realistic case is that both source and target servers have the pg_hba.conf containing the following authentication setting regarding replication. That is, each server allows other to use the replication user to connect to via replication protocol. # TYPE DATABASE USER ADDRESSMETHOD host replication repuser X.X.X.X/Y md5 This case makes me think that allowing even replication user to call pg_read_file() may not be good solution for us. Because in that case replication user needs to log in the real database like postgres to call pg_read_file(), but usually there would be no entry allowing replication user to connect to any real database in pg_hba.conf. So if we want to address this problem, replication command version of
Re: [HACKERS] upper planner path-ification
-Original Message- From: David Rowley [mailto:david.row...@2ndquadrant.com] Sent: Tuesday, June 23, 2015 2:06 PM To: Kaigai Kouhei(海外 浩平) Cc: Robert Haas; pgsql-hackers@postgresql.org; Tom Lane Subject: Re: [HACKERS] upper planner path-ification On 23 June 2015 at 13:55, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Once we support to add aggregation path during path consideration, we need to pay attention morphing of the final target-list according to the intermediate path combination, tentatively chosen. For example, if partial-aggregation makes sense from cost perspective; like SUM(NRows) of partial COUNT(*) AS NRows instead of COUNT(*) on billion rows, planner also has to adjust the final target-list according to the interim paths. In this case, final output shall be SUM(), instead of COUNT(). This sounds very much like what's been discussed here: http://www.postgresql.org/message-id/CA+U5nMJ92azm0Yt8TT=hNxFP=VjFhDqFpaWfmj +66-4zvcg...@mail.gmail.com The basic concept is that we add another function set to aggregates that allow the combination of 2 states. For the case of MIN() and MAX() this will just be the same as the transfn. SUM() is similar for many types, more complex for others. I've quite likely just borrowed SUM(BIGINT)'s transfer functions to allow COUNT()'s to be combined. STDDEV, VARIANCE and relevant can be constructed using nrows, sum(X) and sum(X^2). REGR_*, COVAR_* and relevant can be constructed using nrows, sum(X), sum(Y), sum(X^2), sum(Y^2) and sum(X*Y). Let me introduce a positive side effect of this approach. Because final aggregate function processes values already aggregated partially, the difference between the state value and transition value gets relatively small. It reduces accidental errors around floating-point calculation. :-) More time does need spent inventing the new combining functions that don't currently exist, but that shouldn't matter as it can be done later. Commitfest link to patch here https://commitfest.postgresql.org/5/131/ I see you've signed up to review it! Yes, all of us looks at same direction. Problem is, we have to cross the mountain of the planner enhancement to reach all the valuable: - parallel aggregation - aggregation before join - remote aggregation via FDW So, unless we don't find out a solution around planner, 2-phase aggregation is like a curry without rice Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] how is a query passed between a coordinator and a datanode
Hello, I'm trying to figure out how a query and its result is passed between a coordinator and a datanode. I know there are many messages passed between them to finish a query. I did a test against the coordinator by adding a row to a table and the sql was, insert into hg1(id, name) values(1,'tom'). I found a command 'P' was sent from the coordinator to a datanode and there was a remote statement as following, stmt_name=p_1_25af_f query_string=Remote Subplan plan_string={REMOTESTMT :commandType 3 :hasReturning false ...} My questions are, 1-does the coordinator use the remote statement to tell a datanode what to do? If so, how is the plan string created by the coordinator and how is the plan_string parsed by the datanode? 2-if there are multiple rows in the result of the query, how are the rows of data passed from the datanode to the coordinator? Does the datanode just send all the rows of data to the coordinator? or the coordinator get each row of data by sending a query? Thank you very much! Rui Hai -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGXS check target forcing an install ?
On Tue, Jun 23, 2015 at 02:31:03PM +0900, Michael Paquier wrote: On Tue, Jun 23, 2015 at 12:11 AM, Sandro Santilli s...@keybit.net wrote: I've noted that upgrading from PostgreSQL 9.3 to 9.5 I'm suddenly unable to specify a check rule in the Makefile that includes the PGXS one. The error is: $ make check rm -rf ''/tmp_install make -C '/home/postgresql-9.5/lib/pgxs/src/makefiles/../..' DESTDIR=''/tmp_install install make[1]: Entering directory `/home/postgresql-9.5/lib/pgxs' make[1]: *** No rule to make target `install'. Stop. make[1]: Leaving directory `/home/postgresql-9.5/lib/pgxs' make: *** [temp-install] Error 2 I tracked the dangerous -rf to come from Makefile.global and it's empty string being due to abs_top_builddir not being define in my own Makefile. But beside that, which I can probably fix, it doesn't sound correct that a check rule insists in finding an install rule. Oops, this is a regression, and a dangerous one indeed. This is caused by dcae5fac. One fix is to use NO_TEMP_INSTALL=yes in Makefile.global in the context of PGXS, like in the patch attached, this variable needing to be set before Makefile.global is loaded. We could as well use directly PGXS in the section Testing, but that does not sound appealing for Makefile.global's readability. Thanks, setting NO_TEMP_INSTALL=yes in the including Makefile fixes this issue. I'm also surprised that there's no warning coming out from the make invocation given I'm defining a check rule myself (after inclusion). Why? It looks perfectly normal to me to be able to define your own check rule. That's more flexible this way. I'm surprised because I used to get warnings on overrides, and I actually still get them for other rules. For example: Makefile:192: warning: overriding commands for target `install' /home/postgresql-9.3.4/lib/pgxs/src/makefiles/pgxs.mk:120: warning: ignoring old commands for target `install' The same warning isn't raised for the check rule, while it is clearly defined in some upper Makefile (as shown by the forced-install-bug). Thoughts? Mixed... One one hand I'm happy to implement my own rules, but in this specific case the lack of a warning left me with no hint about where the offending check rule was defined. --strk; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NULL passed as an argument to memcmp() in parse_func.c
It appears that, according to the standard, passing NULL to memcmp is undefined behavior, even if the count is 0. See http://stackoverflow.com/questions/16362925/can-i-pass-a-null-pointer-to-memcmp for C99 and C++ standard references. I didn't see a good reference for C89 but I find it almost impossible to believe it was changed from defined to undefined behavior between C89 and C99. On Mon, Jun 22, 2015 at 2:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jun 22, 2015 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: If I recall that code correctly, the assumption was that if the third argument is zero then memcmp() must not fetch any bytes (not should not, but MUST not) and therefore it doesn't matter if we pass a NULL. Are you seeing any observable problem here, and if so what is it? I dunno, this seems like playing with fire to me. A null-test would be pretty cheap insurance. A null test would be a pretty cheap way of masking a bug in that logic, if we ever introduced one; to wit, that it would cause a call with argtypes==NULL to match anything. Possibly saner is if (nargs == 0 || memcmp(argtypes, best_candidate-args, nargs * sizeof(Oid)) == 0) break; I remain unconvinced that this is necessary, though. It looks a *whole* lot like the guards we have against old Solaris' bsearch-of-zero-entries bug. I maintain that what glibc has done is exactly to introduce a bug for the zero-entries case, and that Piotr ought to complain to them about it. At the very least, if you commit this please annotate it as working around a memcmp bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time to get rid of PQnoPasswordSupplied?
On 22 June 2015 at 22:00, Tom Lane t...@sss.pgh.pa.us wrote: I do not follow Craig's argument that this is somehow connected to the wire protocol version. Upon revisiting it, neither do I. You know when you read code and think what idiot wrote this ... then git blame says it was you? I, at least, know that feeling... and that's what reading that part of that email was like. Tom's right, of course. It's libpq-to-client. The string fe_sendauth: no password supplied never goes over the wire; it's a response libpq generates in response to an auth failure ErrorResponse message from the server. It's only relevant for libpq-based clients. Lets change it. PgAdmin-III will need a patch, but that's about the only client I found that would care. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 make world failing due to sgml tools missing
On 6/23/15 1:06 AM, Keith Fiske wrote: http://www.keithf4.com On Sun, Jun 21, 2015 at 10:56 AM, Peter Eisentraut pete...@gmx.net mailto:pete...@gmx.net wrote: On 6/18/15 8:54 AM, Tom Lane wrote: Sure; the point is that libxml2 has suddenly been reclassified as a documentation build tool, which is at least a surprising categorization. libxml2 has been a required documentation build tool since PostgreSQL 9.0. The only thing that's new is that xmllint is in a different subpackage on some systems. So just install that and you're all set for the foreseeable future. Well, something is different in 9.5. On this same system (Linux Mint 17.1) I can build all previous versions with make world and I do not get this error. Hence the use of the phrase The only thing that's new ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Memory context at PG_init call ?
Empirically, I seem to be getting the _PG_init call for a module while the active memory context lifetime is that of the function call which first needed to load the shared object. Is this the case ? Documented anywhere ? Initializing memory meant to be alive for the whole lifetime of a backend in that function is a bit complex if that's confirmed. --strk; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
flsh | full speed tps | percent of late tx, 4 clients /srt | 1 client | 4 clients | 100 | 200 | 400 | N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 | N/Y | 458 +- 327* | 743 +- 920* | 7.05 | 14.24 | 24.07 | Y/N | 169 +- 166* | 187 +- 302* | 4.01 | 39.84 | 65.70 | Y/Y | 546 +- 143 | 681 +- 459 | 1.55 | 3.51 | 2.84 | The effect of sorting is very positive (+150% to 270% tps). On this run, flushing has a positive (+20% with 1 client) or negative (-8 % with 4 clients) on throughput, and late transactions are reduced by 92-95% when both options are activated. Why there is dip in performance with multiple clients, I'm not sure to see the dip. The performances are better with 4 clients compared to 1 client? What do you mean by negative (-8 % with 4 clients) on throughput in above sentence? I thought by that you mean that there is dip in TPS with patch as compare to HEAD at 4 clients. Ok, I misunderstood your question. I thought you meant a dip between 1 client and 4 clients. I meant that when flush is turned on tps goes down by 8% (743 to 681 tps) on this particular run. Basically tps improvements mostly come from sort, and flush has uncertain effects on tps (throuput), but much more on latency and performance stability (lower late rate, lower standard deviation). Note that I'm not comparing to HEAD in the above tests, but with the new options desactivated, which should be more or less comparable to current HEAD, i.e. there is no sorting nor flushing done, but this is not strictly speaking HEAD behavior. Probably I should get some figures with HEAD as well to check the more or less assumption. Also I am not completely sure what's +- means in your data above? The first figure before +- is the tps, the second after is its standard deviation computed in per-second traces. Some runs are very bad, with pgbench stuck at times, and result on stddev larger than the average, they ere noted with *. I understand your point and I also don't have any specific answer for it at this moment, the point of worry is that it should not lead to degradation of certain cases as compare to current algorithm. The workload where it could effect is when your data doesn't fit in shared buffers, but can fit in RAM. Hmmm. My point of view is still that the logical priority is to optimize for disk IO first, then look for compatible RAM optimisations later. I can run tests with a small shared_buffers, but probably it would just trigger a lot of checkpoints, or worse rely on the bgwriter to find space, which would generate random IOs. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git push hook to check for outdated timestamps
On Sun, Jun 14, 2015 at 12:37:00PM +0200, Magnus Hagander wrote: On Fri, Jun 12, 2015 at 4:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 06/12/2015 09:31 AM, Robert Haas wrote: Could we update our git hook to refuse a push of a new commit whose timestamp is more than, say, 24 hours in the past? Our commit history has some timestamps in it now that are over a month off, and it's really easy to do, because when you rebase a commit, it keeps the old timestamp. If you then merge or cherry-pick that into an official branch rather than patch + commit, you end up with this problem unless you are careful to fix it by hand. It would be nice to prevent further mistakes of this sort, as they create confusion. I think 24 hours is probably fairly generous, Yeah ... if we're going to try to enforce a linear-looking history, ISTM the requirement ought to be newer than the latest commit on the same branch. Perhaps that would be unduly hard to implement though. From a quick look at our existing script, I think that's doable, but I'd have to do some more detailed verification before I'm sure. And we'd have to figure out some way to deal with a push with multiple commits in it, but it should certainly be doable if the first one is. Would we in that case want to enforce linearity *and* recent-ness, or just linearity? as in, do we care about the commit time even if it doesn't change the order? If a recency check is essentially free, let's check both. Otherwise, enforcing linearity alone is a 95% solution that implicitly bounds recency. FWIW, our git_changelog script tries to avoid this problem by paying attention to CommitDate not Date. But I agree that it's confusing when those fields are far apart. That brings it back to the enforcing - would we want to enforce both those? May as well. AuthorDate is the main source of trouble. You would need to go out of your way (e.g. git filter-branch) to push an old CommitDate, but let's check it just the same. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Foreign join pushdown vs EvalPlanQual
Hi, While reviewing the foreign join pushdown core patch, I noticed that the patch doesn't perform an EvalPlanQual recheck properly. The example that crashes the server will be shown below (it uses the postgres_fdw patch [1]). I think the reason for that is because the ForeignScan node performing the foreign join remotely has scanrelid = 0 while ExecScanFetch assumes that its scan node has scanrelid 0. I think this is a bug. I've not figured out how to fix this yet, but I thought we would also need another plan that evaluates the join locally for the test tuples for EvalPlanQual. Though I'm missing something though. Create an environment: postgres=# create table tab (a int, b int); CREATE TABLE postgres=# create foreign table foo (a int) server myserver options (table_name 'foo'); CREATE FOREIGN TABLE postgres=# create foreign table bar (a int) server myserver options (table_name 'bar'); CREATE FOREIGN TABLE postgres=# insert into tab values (1, 1); INSERT 0 1 postgres=# insert into foo values (1); INSERT 0 1 postgres=# insert into bar values (1); INSERT 0 1 postgres=# analyze tab; ANALYZE postgres=# analyze foo; ANALYZE postgres=# analyze bar; ANALYZE Run the example: [Terminal 1] postgres=# begin; BEGIN postgres=# update tab set b = b + 1 where a = 1; UPDATE 1 [Terminal 2] postgres=# explain verbose select tab.* from tab, foo, bar where tab.a = foo.a and foo.a = bar.a for update; QUERY PLAN LockRows (cost=100.00..101.18 rows=4 width=70) Output: tab.a, tab.b, tab.ctid, foo.*, bar.* - Nested Loop (cost=100.00..101.14 rows=4 width=70) Output: tab.a, tab.b, tab.ctid, foo.*, bar.* Join Filter: (foo.a = tab.a) - Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14) Output: tab.a, tab.b, tab.ctid - Foreign Scan (cost=100.00..100.08 rows=4 width=64) Output: foo.*, foo.a, bar.*, bar.a Relations: (public.foo) INNER JOIN (public.bar) Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2)) (11 rows) postgres=# select tab.* from tab, foo, bar where tab.a = foo.a and foo.a = bar.a for update; [Terminal 1] postgres=# commit; COMMIT [Terminal 2] (After the commit in Terminal 1, Terminal 2 will show the following.) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. ! Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_wagh2rgzpg0o4pqgd+iauyaj8wtze+cyj...@mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] upper planner path-ification
-Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas Sent: Tuesday, June 23, 2015 10:18 PM To: Kaigai Kouhei(海外 浩平) Cc: David Rowley; pgsql-hackers@postgresql.org; Tom Lane Subject: Re: [HACKERS] upper planner path-ification On Tue, Jun 23, 2015 at 4:41 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: So, unless we don't find out a solution around planner, 2-phase aggregation is like a curry without rice Simon and I spoke with Tom about this upper planner path-ification problem at PGCon, and he indicated that he intended to work on it soon, and that we should bug him about it if he doesn't. I'm pleased to here this as I seem to have a special talent in that area. It's fantastic support arm. So, it may be more productive to pay my efforts for investigation of David Rowley's patch, based on the assumption if we can add partial/final-aggregation path node as if scan/join-paths. My interest also stands on his infrastructure. https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf It is a bit old paper but good starting point. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes
On Tue, Jun 23, 2015 at 5:48 PM, Kevin Grittner kgri...@ymail.com wrote: Andres Freund and...@anarazel.de wrote: listitem para Improve concurrent locking and buffer scan performance (Andres Freund, Kevin Grittner) /para /listitem If this is ab5194e6f, I don't think it makes sense to mention buffer scan - it's just any lwlock, and buffer locks aren't the primary benefit (ProcArrayLock, buffer mapping lock probably are that). I also don't think Kevin was involved? It seems likely that 2ed5b87f9 was combined with something else in this reference. By reducing buffer pins and buffer content locking during btree index scans it shows a slight performance gain in btree scans and avoids some blocking of btree index vacuuming. I think maybe we should separate that back out. The list needs to be user-accessible, but if it's hard to understand what it's referring to, that's not good either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] less log level for success dynamic background workers for 9.5
On 24 June 2015 at 03:23, Jim Nasby jim.na...@bluetreble.com wrote: On 6/23/15 12:21 PM, Tom Lane wrote: I concur: if we're to have a flag at all, it should work as Alvaro says. However, I'm not real sure we need a flag. I think the use-case of wanting extra logging for a bgworker under development is unlikely to be satisfied very well by just causing existing start/stop logging messages to come out at higher priority. You're likely to be wanting to log other, bgworker-specific, events, and so you'll probably end up writing a bunch of your own elog calls anyway (which you'll eventually remove, #ifdef out, or decrease the log levels of). FWIW, I have this problem *constantly* with plpgsql. I put RAISE DEBUGs in, but once you have those in enough places SET client_min_messages=debug becomes next to useless because of the huge volume of spew. What I'd like is a way to add an identifier to ereport/RAISE so you could turn on individual reports. If we had that we'd just make these particular ereports DEBUG1 and not worry about it because you could easily turn them on when needed. So, log identifiers/classifiers, essentially. Message numbers have been discussed before with regards to core and rejected consistently. I don't think it's really come up in terms of PLs and user-defined functions. I've certainly had similar issues to you w.r.t. to debug messages from user-level code, and wanted to be able to enable one particular log line, all log output from a particular function, or all log output from a particular extension / all functions in a schema. I think it's a whole separate topicto Pavel's original proposal though, and really merits a separate thread. For Pavel's issue I'm all in favour of just changing the log message, I think LOG is way too high for something that's internal implementation detail. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how is a query passed between a coordinator and a datanode
On Tue, Jun 23, 2015 at 5:07 AM, Rui Hai Jiang ruihaiji...@msn.com wrote: I'm trying to figure out how a query and its result is passed between a coordinator and a datanode. I know there are many messages passed between them to finish a query. I did a test against the coordinator by adding a row to a table and the sql was, insert into hg1(id, name) values(1,'tom'). I found a command 'P' was sent from the coordinator to a datanode and there was a remote statement as following, stmt_name=p_1_25af_f query_string=Remote Subplan plan_string={REMOTESTMT :commandType 3 :hasReturning false ...} My questions are, 1-does the coordinator use the remote statement to tell a datanode what to do? If so, how is the plan string created by the coordinator and how is the plan_string parsed by the datanode? 2-if there are multiple rows in the result of the query, how are the rows of data passed from the datanode to the coordinator? Does the datanode just send all the rows of data to the coordinator? or the coordinator get each row of data by sending a query? Thank you very much! This is probably an appropriate question for postgres-xl-developers, but not pgsql-hackers. Those concepts do not exist in PostgreSQL, only Postgres-XC or one of its proliferating set of forks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Oh, this is embarrassing: init file logic is still broken
Chasing a problem identified by my Salesforce colleagues led me to the conclusion that my commit f3b5565dd (Use a safer method for determining whether relcache init file is stale) is rather borked. It causes pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file, because that index is not used by any syscache. I had been aware of that actually, but considered it a minor issue. It's not so minor though, because RelationCacheInitializePhase3 marks that index as nailed for performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES. That means that load_relcache_init_file *always* decides that the init file is busted and silently(!) ignores it. So we're taking a nontrivial hit in backend startup speed as of the last set of minor releases. Clearly, my shortcut of defining the init file contents as exactly the set of syscache-supporting rels was too short. I think the cleanest fix is to create a wrapper function in relcache.c that encapsulates the knowledge that pg_trigger_tgrelid_tgname_index is also to be included in the init file. But to prevent this type of problem from recurring, we need some defenses against the wrapper getting out of sync with reality elsewhere. I suggest that: 1. write_relcache_init_file ought to bitch if it concludes that a nailed relation is not to be written to the init file. Probably an Assert is enough. 2. load_relcache_init_file ought to complain if it takes the wrong number of nailed relations exit path. Here, it seems like there might be a chance of the path being taken validly, for example if a minor release were to decide to nail a rel that wasn't nailed before, or vice versa. So what I'm thinking about here is an elog(WARNING) complaint, which would make logic bugs like this pretty visible in development builds. If we ever did make a change like that in a minor release, people might get a one-time WARNING when upgrading, but I think that would be acceptable. Thoughts, better ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers