Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 23/01/14 14:45, Christian Kruse wrote: Well, is it context or detail? Those fields have reasonably well defined meanings IMO. I find the distinction somewhat blurry and think both would be appropriate. But since I wasn't sure I changed to detail. If we need errcontext_plural, let's add it, not adopt inferior solutions just because that isn't there for lack of previous need. I would've added it if I would've been sure. But having said that, I think this is indeed detail not context. (I kinda wonder whether some of the stuff that's now in the primary message shouldn't be pushed to errdetail as well. It looks like some previous patches in this area have been lazy.) I agree, the primary message is not very well worded. On the other hand finding an appropriate alternative seems hard for me. While I'm griping, this message isn't even trying to follow the project's message style guidelines. Detail or context messages are supposed to be complete sentence(s), with capitalization and punctuation to match. Hm, I hope I fixed it in this version of the patch. Lastly, is this information that we want to be shipping to clients? Perhaps from a security standpoint that's not such a wise idea, and errdetail_log() is what should be used. Fixed. I added an errdetail_log_plural() for this, too. I think you have attached wrong patch. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: variant of regclass
On 22 January 2014 13:09, Tatsuo Ishii is...@postgresql.org wrote: I, as a user would be happier if we also have to_regprocedure() and to_regoperator(). The following query looks a valid use-case where one needs to find if a particular function exists. Using to_regproc('sum') does not make sense here because it will return InvalidOid, which will not tell us whether that is because there is no such function or whether there are duplicate function names. select * from pg_proc where oid = to_regprocedure('sum(int)'); I doubt the value of the use case above. Hasn't psql already done an excellent job? test=# \df sum List of functions Schema | Name | Result data type | Argument data types | Type +--+--+-+-- pg_catalog | sum | numeric | bigint | agg pg_catalog | sum | double precision | double precision| agg pg_catalog | sum | bigint | integer | agg pg_catalog | sum | interval | interval| agg pg_catalog | sum | money| money | agg pg_catalog | sum | numeric | numeric | agg pg_catalog | sum | real | real| agg pg_catalog | sum | bigint | smallint| agg (8 rows) If you need simliar functionality in the backend, you could always define a view using the query generated by psql. * QUERY ** SELECT n.nspname as Schema, p.proname as Name, pg_catalog.pg_get_function_result(p.oid) as Result data type, pg_catalog.pg_get_function_arguments(p.oid) as Argument data types, CASE WHEN p.proisagg THEN 'agg' WHEN p.proiswindow THEN 'window' WHEN p.prorettype = 'pg_catalog.trigger'::pg_catalog.regtype THEN 'trigger' ELSE 'normal' END as Type FROM pg_catalog.pg_proc p LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace WHERE p.proname ~ '^(sum)$' AND pg_catalog.pg_function_is_visible(p.oid) ORDER BY 1, 2, 4; ** I thought the general use case is to be able to use such a functionality using SQL queries (as against \df), so that the DBA can automate things, without having to worry about the query returning error. And hence, I thought to_regprocedure() can be used in a query just like how ::regprocedure is used. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
Hi, I think you have attached wrong patch. Hurm. You are right, attached v3, not v4. Sorry. Best regards, -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index ee6c24c..6c648cf 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -1195,13 +1195,23 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) */ if (log_lock_waits deadlock_state != DS_NOT_YET_CHECKED) { - StringInfoData buf; + StringInfoData buf, + lock_waiters_sbuf, + lock_holders_sbuf; const char *modename; long secs; int usecs; long msecs; + SHM_QUEUE *procLocks; + PROCLOCK *proclock; + bool first_holder = true, + first_waiter = true; + int lockHoldersNum = 0; initStringInfo(buf); + initStringInfo(lock_waiters_sbuf); + initStringInfo(lock_holders_sbuf); + DescribeLockTag(buf, locallock-tag.lock); modename = GetLockmodeName(locallock-tag.lock.locktag_lockmethodid, lockmode); @@ -1211,10 +1221,67 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) msecs = secs * 1000 + usecs / 1000; usecs = usecs % 1000; + /* + * we loop over the lock's procLocks to gather a list of all + * holders and waiters. Thus we will be able to provide more + * detailed information for lock debugging purposes. + * + * lock-procLocks contains all processes which hold or wait for + * this lock. + */ + + LWLockAcquire(partitionLock, LW_SHARED); + + procLocks = (lock-procLocks); + proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks, + offsetof(PROCLOCK, lockLink)); + + while (proclock) + { +/* + * we are a waiter if myProc-waitProcLock == proclock; we are + * a holder if it is NULL or something different + */ +if (proclock-tag.myProc-waitProcLock == proclock) +{ + if (first_waiter) + { + appendStringInfo(lock_waiters_sbuf, %d, + proclock-tag.myProc-pid); + first_waiter = false; + } + else + appendStringInfo(lock_waiters_sbuf, , %d, + proclock-tag.myProc-pid); +} +else +{ + if (first_holder) + { + appendStringInfo(lock_holders_sbuf, %d, + proclock-tag.myProc-pid); + first_holder = false; + } + else + appendStringInfo(lock_holders_sbuf, , %d, + proclock-tag.myProc-pid); + + lockHoldersNum++; +} + +proclock = (PROCLOCK *) SHMQueueNext(procLocks, proclock-lockLink, + offsetof(PROCLOCK, lockLink)); + } + + LWLockRelease(partitionLock); + if (deadlock_state == DS_SOFT_DEADLOCK) ereport(LOG, (errmsg(process %d avoided deadlock for %s on %s by rearranging queue order after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errdetail_log_plural(Process holding the lock: %s. Request queue: %s., + Processes holding the lock: %s. Request queue: %s., + lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data; else if (deadlock_state == DS_HARD_DEADLOCK) { /* @@ -1226,13 +1293,19 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) */ ereport(LOG, (errmsg(process %d detected deadlock while waiting for %s on %s after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errdetail_log_plural(Process holding the lock: %s. Request queue: %s., + Processes holding lock: %s. Request queue: %s., + lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data; } if (myWaitStatus == STATUS_WAITING) ereport(LOG, (errmsg(process %d still waiting for %s on %s after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errdetail_log_plural(Process holding the lock: %s. Request queue: %s., + Processes holding the lock: %s. Request queue: %s., + lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data; else if (myWaitStatus == STATUS_OK) ereport(LOG, (errmsg(process %d acquired %s on %s after %ld.%03d ms, @@ -1252,7 +1325,10 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) if (deadlock_state != DS_HARD_DEADLOCK) ereport(LOG, (errmsg(process %d failed to acquire %s on %s after %ld.%03d ms, - MyProcPid, modename, buf.data, msecs, usecs))); +MyProcPid, modename, buf.data, msecs, usecs), + (errdetail_log_plural(Process holding the lock: %s. Request queue: %s., + Processes holding the lock: %s. Request queue: %s., + lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data; } /* @@
Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup
Peter, On Thu, Jan 23, 2014 at 2:06 AM, Peter Eisentraut pete...@gmx.net wrote: I'm tempted to think it should be mandatory to specify this option in plain mode when tablespaces are present. Otherwise, creating a base backup is liable to create random files all over the place. Obviously, there would be backward compatibility concerns. That was my initial thought too, the one thing that speaks FOR a change in behaviour is that there isn't a lot of people who have moved over to pg_basebackup yet and even fewer who use multiple tablespaces. For me at least pg_basebackup isn't an option at the moment since I use more than one tablespace. I'm not totally happy with the choice of : as the mapping separator, because that would always require escaping on Windows. We could make it analogous to the path handling and make ; the separator on Windows. Then again, this is not a path, so maybe it should look like one. We pick something else altogether, like =. The option name --tablespace isn't very clear. It ought to be something like --tablespace-mapping. This design choice I made about -T (--tablespace) and colon as separator was copied from pg_barman, which is the way I back up my clusters at the moment. Renaming the option to --tablespace-mapping and changing the syntax to something like = is totally fine by me. I don't think we should require the new directory to be an absolute path. It should be relative to the current directory, just like the -D option does it. Accepting a relative path should be fine, I made a failed attempt using realpath(3) initially but I guess checking for [0] != '/' and prepending getcwd(3) would suffice. I would try to write this patch without using MAXPGPATH. I know existing code uses it, but we should try to use it less, because it overallocates memory and requires handling additional error conditions. For example, you catch overflow in tablespace_list_append() but later report that as invalid tablespace format. We now have psprintf() to make coding with dynamic memory allocation easier. Is overallocating memory in a cli application really an issue though? I will obviously rewrite the code to fix that if necessary. It looks like when you ignore an escaped : as a separator, you don't actually unescape it for use as a directory. + if (*arg == '\\' *(arg + 1) == ':') + ; This code handles that case, I could try to make that cleaner. Somehow, I had the subconscious assumption that this feature would do prefix matching on the directories, not only complete string matching. So if I had tablespaces in /mnt/data1, /mnt/data2, /mnt/data3, I could map them all with -T /mnt:mnt and be done. Not sure if that is useful to many, but it's worth a thought. I like that a lot, but I'm afraid the code would have to get a bit more complex to add that functionality. It would be an easier rewrite if we added a hint character, something like -T '/mnt/*:mnt'. Review style guide for error messages: http://www.postgresql.org/docs/devel/static/error-style-guide.html I will do that. We need to think about how to handle this on platforms without symlinks. I don't like just printing an error message and moving on. It should be either pass or fail or an option to choose between them. I hope someone with experience with those kind of systems can come up with suggestions on how to solve that. I only run postgres on Linux. Please put the options in the getopt handling in alphabetical order. It's not very alphabetical now, but between D and F is probably not the best place. ;-) Done. //Steeve
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 23 January 2014 06:13, KaiGai Kohei kai...@ak.jp.nec.com wrote: A minor comment is below: ! /* !* Make sure that the query is marked correctly if the added qual !* has sublinks. !*/ ! if (!parsetree-hasSubLinks) ! parsetree-hasSubLinks = checkExprHasSubLink(viewqual); Is this logic really needed? This flag says the top-level query contains SubLinks, however, the above condition checks whether a sub-query to be constructed shall contain SubLinks. Also, securityQuals is not attached to the parse tree right now. Thanks for looking at this. Yes that logic is needed. Without it the top-level query wouldn't be marked as having sublinks, which could cause all sorts of things to go wrong --- for example, without it fireRIRrules() wouldn't walk the query tree looking for SELECT rules in sublinks to expand, so a security barrier qual with a sublink subquery that selected from another view wouldn't work. It is not true to say that securityQuals is not attached to the parse tree. The securityQuals *are* part of the parse tree, they just happen to be held in a different place to keep them isolated from other quals. So the remaining rewriter code that walks or mutates the parse tree will process them just like any other quals, recursively expanding any rules they contain. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 2014-01-22 18:19:25 -0600, Jim Nasby wrote: On 1/21/14, 6:46 PM, Andres Freund wrote: On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: On Tue, Jan 21, 2014 at 3:43 PM, Andres Freundand...@2ndquadrant.com wrote: I personally think this isn't worth complicating the code for. You're probably right. However, I don't see why the bar has to be very high when we're considering the trade-off between taking some emergency precaution against having a PANIC shutdown, and an assured PANIC shutdown Well, the problem is that the tradeoff would very likely include making already complex code even more complex. None of the proposals, even the one just decreasing the likelihood of a PANIC, like like they'd end up being simple implementation-wise. And that additional complexity would hurt robustness and prevent things I find much more important than this. If we're not looking for perfection, what's wrong with Peter's idea of a ballast file? Presumably the check to see if that file still exists would be cheap so we can do that before entering the appropriate critical section. That'd be noticeably expensive. Opening/stat a file isn't cheap, especially if you do it via filename and not via fd which we'd have to do. I am pretty sure it would be noticeably in single client workloads, but it'd damned sure will be noticeable on busy multi-socket workloads. I still think doing the checks in the wal writer is the best bet, setting a flag that can then cheaply be tested in shared memory. When set it will cause any further action that will write xlog to error out unless it's already in progress. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)
Hi, On 2014-01-22 13:00:44 -0500, Robert Haas wrote: Well, apparently, one is going to PANIC and reinitialize the system. I presume that upon reinitialization we'll decide that the slot is gone, and thus won't recreate it in shared memory. Yea, and if it's half-gone we'll continue deletion. And since yesterday evening we'll even fsync things during startup to handle scenarios similar to 20140122162115.gl21...@alap3.anarazel.de . Of course, if the entire system suffers a hard power failure after that and before the directory is succesfully fsync'd, then the slot could reappear on the next startup. Which is also exactly what would happen if we removed the slot from shared memory after doing the unlink, and then the system suffered a hard power failure before the directory contents made it to disk. Except that we also panicked. Yes, but that could only happen as long as no relevant data has been lost since we hold relevant locks during this. In the case of shared buffers, the way we handle fsync failures is by not allowing the system to checkpoint until all of the fsyncs succeed. I don't think shared buffers fsyncs are the apt comparison. It's more something like UpdateControlFile(). Which PANICs. I really don't get why you fight PANICs in general that much. There are some nasty PANICs in postgres which can happen in legitimate situations, which should be made to fail more gracefully, but this surely isn't one of them. We're doing rename(), unlink() and rmdir(). That's it. We should concentrate on the ones that legitimately can happen, not the ones created by an admin running a chmod -R 000 . ; rm -rf $PGDATA or mount -o remount,ro /. We don't increase reliability by a bit adding codepaths that will never get tested. If there's an OS-level reset before that happens, WAL replay will perform the same buffer modifications over again and the next checkpoint will again try to flush them to disk and will not complete unless it does. That forms a closed system where we never advance the redo pointer over the covering WAL record until the changes it covers are on the disk. But I don't think this code has any similar interlock; if it does, I missed it. No, it doesn't (until the first rename() at least), but the number of failure scenarios is far smaller. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/23 10:28), Peter Geoghegan wrote: On Wed, Jan 22, 2014 at 5:28 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Oh, thanks to inform me. I think essential problem of my patch has bottle neck in sqrt() function and other division caluculation. Well, that's a pretty easy theory to test. Just stop calling them (and do something similar to what we do for current counter fields instead) and see how much difference it makes. What means calling them? I think that part of heavy you think is pg_stat_statement view that is called by select query, not a part of LWLock getting statistic by hook. Right? I tested my patch in pgbench, but I cannot find bottleneck of my latest patch. (Sorry, I haven't been test select query in pg_stat_statement view...) Here is a test result. * Result (Test result is represented by tps.) method| try1 | try2 | try3 without pgss | 130938 | 131558 | 131796 with pgss| 125164 | 125146 | 125358 with patched pgss| 126091 | 126681 | 126433 * Test Setting shared_buffers=1024MB checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.7 * pgbench script pgbench -h xxx.xxx.xxx.xxx mitsu-ko -i -s 100 psql -h xxx.xxx.xxx.xxx mitsu-ko -c 'CHECKPOINT' pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180 pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180 pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180 * Server SPEC: CPU: Xeon E5-2670 1P/8C 2.6GHz #We don't have 32 core cpu... Memory: 24GB RAID: i420 2GB cache Disk: 15K * 6 (RAID 1+0) Attached is latest developping patch. It hasn't been test much yet, but sqrt caluclation may be faster. Regards, -- Mitsumasa KONDO NTT Open Source Software Center *** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql *** *** 19,24 CREATE FUNCTION pg_stat_statements( --- 19,27 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 41,43 CREATE VIEW pg_stat_statements AS --- 44,52 SELECT * FROM pg_stat_statements(); GRANT SELECT ON pg_stat_statements TO PUBLIC; + + /* New Function */ + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; *** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql *** *** 9,14 RETURNS void --- 9,19 AS 'MODULE_PATHNAME' LANGUAGE C; + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; + CREATE FUNCTION pg_stat_statements( OUT userid oid, OUT dbid oid, *** *** 16,21 CREATE FUNCTION pg_stat_statements( --- 21,29 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 42,44 GRANT SELECT ON pg_stat_statements TO PUBLIC; --- 50,53 -- Don't want this to be available to non-superusers. REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; + REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC; *** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql --- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql *** *** 4,8 --- 4,9 \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset(); + ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time(); ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements(); ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements; *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 78,84 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ ! #define JUMBLE_SIZE1024 /* query serialization buffer size */ /* --- 78,85 #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ ! #define EXEC_TIME_INIT_MIN DBL_MAX /* initial execution min time */ ! #define EXEC_TIME_INIT_MAX -DBL_MAX /* initial execution max time */
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 23 January 2014 01:19, Jim Nasby j...@nasby.net wrote: On 1/21/14, 6:46 PM, Andres Freund wrote: On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: On Tue, Jan 21, 2014 at 3:43 PM, Andres Freundand...@2ndquadrant.com wrote: I personally think this isn't worth complicating the code for. You're probably right. However, I don't see why the bar has to be very high when we're considering the trade-off between taking some emergency precaution against having a PANIC shutdown, and an assured PANIC shutdown Well, the problem is that the tradeoff would very likely include making already complex code even more complex. None of the proposals, even the one just decreasing the likelihood of a PANIC, like like they'd end up being simple implementation-wise. And that additional complexity would hurt robustness and prevent things I find much more important than this. If we're not looking for perfection, what's wrong with Peter's idea of a ballast file? Presumably the check to see if that file still exists would be cheap so we can do that before entering the appropriate critical section. There's still a small chance that we'd end up panicing, but it's better than today. I'd argue that even if it doesn't work for CoW filesystems it'd still be a win. I grant that it does sound simple enough for a partial stop gap. My concern is that it provides only a short delay before the eventual disk-full situation, which it doesn't actually prevent. IMHO the main issue now is how we clear down old WAL files. We need to perform a checkpoint to do that - and as has been pointed out in relation to my proposal, we cannot complete that because of locks that will be held for some time when we do eventually lock up. That issue is not solved by having a ballast file(s). IMHO we need to resolve the deadlock inherent in the disk-full/WALlock-up/checkpoint situation. My view is that can be solved in a similar way to the way the buffer pin deadlock was resolved for Hot Standby. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On 2014-01-15 02:11:11 -0300, Alvaro Herrera wrote: Then execute commands to your liking. So, I am giving a quick look, given that I very likely will have to write a consumer for it... * deparse_utility_command returns payload via parameter, not return value? * functions beginning with an underscore formally are reserved, we shouldn't add new places using such a convention. * I don't think dequote_jsonval is correct as is, IIRC that won't correctly handle unicode escapes and such. I think json.c needs to expose functionality for this. * I wonder if expand_jsonval_identifier shouldn't truncate as well? It should get handled by the individual created commands, right? * So, if I read things correctly, identifiers in json are never downcased, is that correct? I.e. they are implicitly quoted? * Why must we not schema qualify system types (c.f. expand_jsonval_typename)? It seems to be perfectly sensible to me to just use pg_catalog there. * It looks like pg_event_trigger_expand_command will misparse nested {, error out instead? * I wonder if deparseColumnConstraint couldn't somehow share a bit more code with ruleutils.c's pg_get_constraintdef_worker(). * I guess you know, but deparseColumnConstraint() doesn't handle foreign keys yet. * Is tcop/ the correct location for deparse_utility.c? I wonder if it shouldn't be alongside ruleutils.c instead. * shouldn't pg_event_trigger_get_creation_commands return command as json instead of text? * Not your department, but a builtin json pretty printer would be really, really handy. Something like CREATE FUNCTION json_prettify(j json) RETURNS TEXT AS $$ import json return json.dumps(json.loads(j), sort_keys=True, indent=4) $$ LANGUAGE PLPYTHONU; makes the json so much more readable. Some minimal tests: * CREATE SEQUENCE errors out with: NOTICE: JSON blob: {sequence:{relation:frakbar2,schema:public},persistence:,fmt:CREATE %{persistence}s SEQUENCE %{identity}D} ERROR: non-existant element identity in JSON formatting object *CREATE TABLE frakkbar2(id int); error out with: postgres=# CREATE TABLE frakkbar2(id int); NOTICE: JSON blob: {on_commit:{present:false,on_commit_value:null,fmt:ON COMMIT %{on_commit_value}s},tablespace:{present:false,tablespace:null,fmt:TABLESPACE %{tablespace}I},inherits:{present:false,parents:null,fmt:INHERITS (%{parents:, }D)},table_elements:[{collation:{present:false,fmt:COLLATE %{name}I},type:{typmod:,typename:integer,is_system:true,is_array:false},name:id,fmt:%{name}I %{type}T %{collation}s}],of_type:{present:false,of_type:null,fmt:OF %{of_type}T},if_not_exists:,identity:{relation:frakkbar2,schema:public},persistence:,fmt:CREATE %{persistence}s TABLE %{identity}D %{if_not_exists}s %{of_type}s (%{table_elements:, }s) %{inherits}s %{on_commit}s %{tablespace}s} ERROR: invalid NULL is_system flag in %T element CONTEXT: PL/pgSQL function snitch() line 8 at RAISE Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [Review] inherit support for foreign tables
Shigeru Hanada wrote: Attached patch allows a foreign table to be a child of a table. It also allows foreign tables to have CHECK constraints. Sorry for the delay. I started to look at this patch. Though this would be debatable, in current implementation, constraints defined on a foreign table (now only NOT NULL and CHECK are supported) are not enforced during INSERT or UPDATE executed against foreign tables. This means that retrieved data might violates the constraints defined on local side. This is debatable issue because integrity of data is important for DBMS, but in the first cut, it is just documented as a note. I agree with you, but we should introduce a new keyword such as ASSERTIVE or something in 9.4, in preparation for the enforcement of constraints on the local side in a future release? What I'm concerned about is, otherwise, users will have to rewrite those DDL queries at that point. No? Because I don't see practical case to have a foreign table as a parent, and it avoid a problem about recursive ALTER TABLE operation, foreign tables can't be a parent. I agree with you on that point. For other commands recursively processed such as ANALYZE, foreign tables in the leaf of inheritance tree are ignored. I'm not sure that in the processing of the ANALYZE command, we should ignore child foreign tables. It seems to me that the recursive processing is not that hard. Rather what I'm concerned about is that if the recursive processing is allowed, then autovacuum will probably have to access to forign tables on the far side in order to acquire those sample rows. It might be undesirable from the viewpoint of security or from the viewpoint of efficiency. --- a/doc/src/sgml/ref/create_foreign_table.sgml +++ b/doc/src/sgml/ref/create_foreign_table.sgml @@ -22,6 +22,7 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable class=PARAMETERtable_name replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ OPTIONS ( replaceable class=PA\ RAMETERoption/replaceable 'replaceable class=PARAMETERvalue/replaceable' [, ... ] ) ] [ COLLATE replaceablecollation/replaceable ] [ rep\ laceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ] [, ... ] ] ) +[ INHERITS ( replaceableparent_table/replaceable [, ... ] ) ] SERVER replaceable class=parameterserver_name/replaceable [ OPTIONS ( replaceable class=PARAMETERoption/replaceable 'replaceable class=PARAMETERvalue/replaceable' [, ... ] ) ] @@ -159,6 +160,17 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable class=PARAMETERtable_name /varlistentry varlistentry + termreplaceable class=PARAMETERparent_table/replaceable/term + listitem + para + The name of an existing table or foreign table from which the new foreign + table automatically inherits all columns, see + xref linkend=ddl-inherit for details of table inheritance. + /para + /listitem + /varlistentry Correct? I think we should not allow a foreign table to be a parent. I'll look at this patch more closely. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] commit fest 2014-01 week 1 report
We started this commit fest with 103 patches and got off to a flyer, with 20 patches committed during the first week. Somehow we ended up with 113 patches at the end of the first week, but let's please leave it at that. I'm tracking down patch authors who have not signed up for a review. I'll also be following up with reviewers to send in their first review soon. It looks like we could especially use a few reviewers for a number of (small) Windows-specific changes. If you have access to that, please sign up. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Hi, On 2014-01-15 02:00:51 -0500, Jaime Casanova wrote: On Mon, Nov 18, 2013 at 12:27 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2013-11-15 22:38:05 -0500, Jaime Casanova wrote: sorry, i clearly misunderstood you. attached a rebased patch with those functions removed. * --write-standby-enable seems to loose quite some functionality in comparison to --write-recovery-conf since it doesn't seem to set primary_conninfo, standby anymore. we can add code that looks for postgresql.conf in $PGDATA but if postgresql.conf is not there (like the case in debian, there is not much we can do about it) or we can write a file ready to be included in postgresql.conf, any sugestion? People might not like me for the suggestion, but I think we should simply always include a 'recovery.conf' in $PGDATA unconditionally. That'd make this easier. Alternatively we could pass a filename to --write-recovery-conf. * CheckRecoveryReadyFile() doesn't seem to be a very descriptive function name. I left it as CheckStartingAsStandby() but i still have a problem of this not being completely clear. this function is useful for standby or pitr. which leads me to the other problem i have: the recovery trigger file, i have left it as standby.enabled but i still don't like it. recovery.trigger (Andres objected on this name) forced_recovery.trigger user_forced_recovery.trigger stay_in_recovery.trigger? That'd be pretty clear for anybody involved in pg, but otherwise... * the description of archive_cleanup_command seems wrong to me. why? it seems to be the same that was in recovery.conf. where did you see the description you're complaining at? I dislike the description in guc.c + {archive_cleanup_command, PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, + gettext_noop(Sets the shell command that will be executed at every restartpoint.), + NULL + }, + archive_cleanup_command, previously it was: -# specifies an optional shell command to execute at every restartpoint. -# This can be useful for cleaning up the archive of a standby server. * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being really strangely formatted (multiline :? inside a function?) it doesn't a) seem to be correct to ignore potential memory allocation errors by just switching back into the context that just errored out, and continue to work there b) forgo cleanup by just continuing as if nothing happened. That's unlikely to be acceptable. the code that read recovery.conf didn't has that, so i just removed it Well, that's not necessarily correct. recovery.conf was only read during startup, while this is read on SIGHUP. * Why do you include xlog_internal.h in guc.c and not xlog.h? we actually need both but including xlog_internal.h also includes xlog.h i added xlog.h and if someone things is enough only putting xlog_internal.h let me know What's required from xlog_internal.h? diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b53ae87..54f6a0d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -64,11 +64,12 @@ extern uint32 bootstrap_data_checksum_version; /* File path names (all relative to $PGDATA) */ -#define RECOVERY_COMMAND_FILErecovery.conf -#define RECOVERY_COMMAND_DONErecovery.done +#define RECOVERY_ENABLE_FILE standby.enabled Imo the file and variable names should stay coherent. +/* recovery.conf is not supported anymore */ +#define RECOVERY_COMMAND_FILErecovery.conf +bool StandbyModeRequested = false; +static TimestampTz recoveryDelayUntilTime; This imo should be lowercase now, the majority of GUC variables are. +/* are we currently in standby mode? */ +bool StandbyMode = false; Why did you move this? - if (rtliGiven) + if (strcmp(recovery_target_timeline_string, ) != 0) { Why not have the convention that NULL indicates a unset target_timeline like you use for other GUCs? Mixing things like this is confusing. Why is recovery_target_timeline stored as a string? Because it's a unsigned int? If so, you should have an assign hook setting up a) rtliGiven, b) properly typed variable. - if (rtli) + if (recovery_target_timeline) { /* Timeline 1 does not have a history file, all else should */ - if (rtli != 1 !existsTimeLineHistory(rtli)) + if (recovery_target_timeline != 1 + !existsTimeLineHistory(recovery_target_timeline)) ereport(FATAL, (errmsg(recovery target timeline %u does not exist, - rtli))); - recoveryTargetTLI = rtli;
Re: [HACKERS] dynamic shared memory and locks
Isn't it necessary to have an interface to initialize LWLock structure being allocated on a dynamic shared memory segment? Even though LWLock structure is exposed at lwlock.h, we have no common way to initialize it. How about to have a following function? void InitLWLock(LWLock *lock) { SpinLockInit(lock-lock.mutex); lock-lock.releaseOK = true; lock-lock.exclusive = 0; lock-lock.shared = 0; lock-lock.head = NULL; lock-lock.tail = NULL; } Thanks, 2014/1/22 KaiGai Kohei kai...@ak.jp.nec.com: (2014/01/22 1:37), Robert Haas wrote: On Mon, Jan 20, 2014 at 11:23 PM, KaiGai Kohei kai...@ak.jp.nec.com wrote: I briefly checked the patch. Most of lines are mechanical replacement from LWLockId to LWLock *, and compiler didn't claim anything with -Wall -Werror option. My concern is around LWLockTranche mechanism. Isn't it too complicated towards the purpose? For example, it may make sense to add char lockname[NAMEDATALEN]; at the tail of LWLock structure if LOCK_DEBUG is enabled. Probably, it also adds an argument of LWLockAssign() to gives the human readable name. Is the additional 64bytes (= NAMEDATALEN) per lock too large for recent hardware? Well, we'd need it when either LOCK_DEBUG was defined or when LWLOCK_STATS was defined or when --enable-dtrace was used, and while the first two are probably rarely enough used that that would be OK, I think the third case is probably fairly common, and I don't think we want to have such a potentially performance-critical difference between builds with and without dtrace. Also... yeah, it's a lot of memory. If we add an additional 64 bytes to the structure, then we're looking at 96 bytes per lwlock instead of 32, after padding out to a 32-byte boundary to avoid crossing cache lines. We need 2 lwlocks per buffer, so that's an additional 128 bytes per 8kB buffer. For shared_buffers = 8GB, that's an extra 128MB for storing lwlocks. I'm not willing to assume nobody cares about that. And while I agree that this is a bit complex, I don't think it's really as bad as all that. We've gotten by for a long time without people being able to put lwlocks in other parts of memory, and most extension authors have gotten by with LWLockAssign() just fine and can continue doing things that way; only users with particularly sophisticated needs should bother to worry about the tranche stuff. Hmm... 1/64 of main memory (if large buffer system) might not be an ignorable memory consumption. One idea I just had is to improve the dsm_toc module so that it can optionally set up a tranche of lwlocks for you, and provide some analogues of RequestAddinLWLocks and LWLockAssign for that case. That would probably make this quite a bit simpler to use, at least for people using it with dynamic shared memory. But I think that's a separate patch. I agree with this idea. It seems to me quite natural to keep properties of objects held on shared memory (LWLock) on shared memory. Also, a LWLock once assigned shall not be never released. So, I think dsm_toc can provide a well suitable storage for them. Thanks, -- OSS Promotion Center / The PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory and locks
On 2014-01-23 23:03:40 +0900, Kohei KaiGai wrote: Isn't it necessary to have an interface to initialize LWLock structure being allocated on a dynamic shared memory segment? Even though LWLock structure is exposed at lwlock.h, we have no common way to initialize it. There's LWLockInitialize() or similar in the patch afair. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory and locks
2014/1/23 Andres Freund and...@2ndquadrant.com: On 2014-01-23 23:03:40 +0900, Kohei KaiGai wrote: Isn't it necessary to have an interface to initialize LWLock structure being allocated on a dynamic shared memory segment? Even though LWLock structure is exposed at lwlock.h, we have no common way to initialize it. There's LWLockInitialize() or similar in the patch afair. Ahh, I oversight the code around tranche-id. Sorry for this noise. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
On Thu, Jan 23, 2014 at 10:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kapil...@gmail.com writes: On Wed, Jan 22, 2014 at 9:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: So? Anything which can know the value of a GUC parameter can certainly know the selected port number. 1. In case of registration of event source, either user has to pass the name or it uses hard coded default value, so if we use version number along with 'PostgreSQL', it can be consistent. I don't see any way pgevent.c can know port number to append it to default value, am I missing something here? I think what we might want to do is redefine the server's behavior as creating an event named after the concatenation of event_source and port number, or maybe even get rid of event_source entirely and just say it's PostgreSQL followed by the port number. To accomplish this behaviour, each time server starts and stops, we need to register and unregister event log using mechanism described at below link to ensure that there is no mismatch between what server uses and what OS knows. http://www.postgresql.org/docs/devel/static/event-log-registration.html IIUC, this will be a much larger behaviour change. What is the problem you are envisioning if we use version number, yesterday I have given some examples about other softwares which are registered in event log and uses version number, so I don't understand why it is big deal if we also use version number along with PostgreSQL as a default value and if user specifies any particular name then use the same. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/22/2014 11:33 PM, KONDO Mitsumasa wrote: (2014/01/23 12:00), Andrew Dunstan wrote: On 01/22/2014 08:28 PM, KONDO Mitsumasa wrote: (2014/01/22 22:26), Robert Haas wrote: On Wed, Jan 22, 2014 at 3:32 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: OK, Kondo, please demonstrate benchmarks that show we have 1% impact from this change. Otherwise we may need a config parameter to allow the calculation. OK, testing DBT-2 now. However, error range of benchmark might be 1% higher. So I show you detail HTML results. To see any impact from spinlock contention, I think you're pretty much going to need a machine with 32 cores, I think, and lots of concurrency. pgbench -S is probably a better test than DBT-2, because it leaves out all the writing, so percentage-wise more time will be spent doing things like updating the pgss hash table. Oh, thanks to inform me. I think essential problem of my patch has bottle neck in sqrt() function and other division caluculation. I will replcace sqrt() function in math.h to more faster algorithm. And moving unneccessary part of caluculation in LWlocks or other locks. It might take time to improvement, so please wait for a while. Umm, I have not read the patch, but are you not using Welford's method? Its per-statement overhead should be absolutely tiny (and should not compute a square root at all per statement - the square root should only be computed when the standard deviation is actually wanted, e.g. when a user examines pg_stat_statements) See for example http://www.johndcook.com/standard_deviation.html Thanks for your advice. I read your example roughly, however, I think calculating variance is not so heavy in my patch. Double based sqrt caluculation is most heavily in my mind. And I find fast square root algorithm that is used in 3D games. http://en.wikipedia.org/wiki/Fast_inverse_square_root This page shows inverse square root algorithm, but it can caluculate normal square root, and it is faster algorithm at the price of precision than general algorithm. I think we want to fast algorithm, so it will be suitable. According to the link I gave above: The most obvious way to compute variance then would be to have two sums: one to accumulate the sum of the x's and another to accumulate the sums of the squares of the x's. If the x's are large and the differences between them small, direct evaluation of the equation above would require computing a small number as the difference of two large numbers, a red flag for numerical computing. The loss of precision can be so bad that the expression above evaluates to a /negative/ number even though variance is always positive. As I read your patch that's what it seems to be doing. What is more, if the square root calculation is affecting your benchmarks, I suspect you are benchmarking the wrong thing. The benchmarks should not call for a single square root calculation. What we really want to know is what is the overhead from keeping these stats. But your total runtime code (i.e. code NOT from calling pg_stat_statements()) for stddev appears to be this: e-counters.total_sqtime += total_time * total_time; cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fest 2014-01 week 1 report
On 01/23/2014 08:11 AM, Peter Eisentraut wrote: We started this commit fest with 103 patches and got off to a flyer, with 20 patches committed during the first week. Somehow we ended up with 113 patches at the end of the first week, but let's please leave it at that. I'm tracking down patch authors who have not signed up for a review. I'll also be following up with reviewers to send in their first review soon. It looks like we could especially use a few reviewers for a number of (small) Windows-specific changes. If you have access to that, please sign up. I have picked four of these. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
On 2014-01-21 11:33:40 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 12:11:23 -0300, Alvaro Herrera wrote: How difficult would it be to have expand_fmt_string deal with positional modifiers? I don't think we need anything from it other than the %n$ notation, so perhaps it's not so problematic. I don't think there's much reason to go there. I didn't go for the pg-supplied sprintf() because I thought it'd be considered to invasive. Since that's apparently not the case... Yeah, that would make expand_fmt_string considerably more complicated and so presumably slower. We don't really need that when we can make what I expect is a pretty trivial addition to snprintf.c. Also, fixing snprintf.c will make it safe to use the z flag in contexts other than ereport/elog. So, here's a patch adding %z support to port/snprintf.c including a configure check to test whether we need to fall back. I am not too happy about the runtime check as the test isn't all that meaningful, but I couldn't think of anything better. The second patch is a slightly updated version of a previously sent version which is starting to use %z in some more places. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 5c91e08cb2c4b3cc8779ed0cd89387eb38ea3690 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Thu, 23 Jan 2014 15:45:36 +0100 Subject: [PATCH 1/2] Add support for the %z modifier in error messages and other usages of printf. The %z modifier allows to print size_t/Size variables without platform dependent modifiers like UINT64_FORMAT and is included in C99 and posix. As it's not present in all supported platforms add a configure check which tests whether it's supported and actually working to some degree and fall back to our existing snprintf.c fallback which now supports the modifier if not. --- config/c-library.m4 | 38 ++ configure | 52 configure.in| 8 src/port/snprintf.c | 26 ++ 4 files changed, 124 insertions(+) diff --git a/config/c-library.m4 b/config/c-library.m4 index 1e3997b..171d39c 100644 --- a/config/c-library.m4 +++ b/config/c-library.m4 @@ -303,6 +303,44 @@ int main() AC_MSG_RESULT([$pgac_cv_printf_arg_control]) ])# PGAC_FUNC_PRINTF_ARG_CONTROL +# PGAC_FUNC_PRINTF_SIZE_T_SUPPORT +# --- +# Determine if printf supports the z length modifier for printing +# sizeof(size_t) sized variables. That's supported by C99 and posix but not +# all platforms play ball, so we test whether it's working. +# Useful for printing sizes in error messages et al. without up/downcasting +# arguments. +# +AC_DEFUN([PGAC_FUNC_PRINTF_SIZE_T_SUPPORT], +[AC_MSG_CHECKING([whether printf supports the %z modifier]) +AC_CACHE_VAL(pgac_cv_printf_size_t_support, +[AC_TRY_RUN([#include stdio.h +#include string.h + +int main() +{ + char buf64[100]; + char bufz[100]; + + /* + * Check whether we print correctly using %z by printing the biggest + * unsigned number fitting in a size_t and using both %zu and the format for + * 64bit numbers. + */ + + snprintf(bufz, 100, %zu, ~(size_t)0); + snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0); + if (strcmp(bufz, buf64) != 0) +return 1; + return 0; +}], +[pgac_cv_printf_size_t_support=yes], +[pgac_cv_printf_size_t_support=no], +[pgac_cv_printf_size_t_support=cross]) +])dnl AC_CACHE_VAL +AC_MSG_RESULT([$pgac_cv_printf_size_t_support]) +])# PGAC_FUNC_PRINTF_SIZE_T_SUPPORT + # PGAC_TYPE_LOCALE_T # -- diff --git a/configure b/configure index e1ff704..d786b17 100755 --- a/configure +++ b/configure @@ -13036,6 +13036,58 @@ cat confdefs.h _ACEOF _ACEOF +# Also force our snprintf if the system's doesn't support the %z modifier. +if test $pgac_need_repl_snprintf = no; then + { $as_echo $as_me:${as_lineno-$LINENO}: checking whether printf supports the %z modifier 5 +$as_echo_n checking whether printf supports the %z modifier... 6; } +if ${pgac_cv_printf_size_t_support+:} false; then : + $as_echo_n (cached) 6 +else + if test $cross_compiling = yes; then : + pgac_cv_printf_size_t_support=cross +else + cat confdefs.h - _ACEOF conftest.$ac_ext +/* end confdefs.h. */ +#include stdio.h +#include string.h + +int main() +{ + char buf64[100]; + char bufz[100]; + + /* + * Check whether we print correctly using %z by printing the biggest + * unsigned number fitting in a size_t and using both %zu and the format for + * 64bit numbers. + */ + + snprintf(bufz, 100, %zu, ~(size_t)0); + snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0); + if (strcmp(bufz, buf64) != 0) +return 1; + return 0; +} +_ACEOF +if ac_fn_c_try_run $LINENO; then : + pgac_cv_printf_size_t_support=yes +else + pgac_cv_printf_size_t_support=no
Re: [HACKERS] dynamic shared memory and locks
On Wed, Jan 22, 2014 at 12:42 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-22 12:40:34 -0500, Robert Haas wrote: On Wed, Jan 22, 2014 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid breaking external code using lwlocks? +1, in fact there's probably no reason to touch most *internal* code using that type name either. I thought about this but figured it was too much of a misnomer to refer to a pointer as an ID. But, if we're sure we want to go that route, I can go revise the patch along those lines. I personally don't care either way for internal code as long as external code continues to work. There's the argument of making the commit better readable by having less noise and less divergence in the branches and there's your argument of that being less clear. OK, well then, if no one objects violently, I'll stick my current approach of getting rid of all core mentions of LWLockId in favor of LWLock *, but also add typedef LWLock *LWLockId with a comment that this is to minimize breakage of third-party code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Passing direct args of ordered-set aggs to the transition function
Hi, Is there a particular reason why the direct arguments of ordered-set aggregates are not passed to the transition function too? It seems that evaluating of some ordered-set aggregates would be much cheaper if we did that. For example, dense_rank() would then just need to count the number of rows smaller than the hypothetical row, AFAICS. Another example (that we don't currently provide, but still) would be a histogram aggregate which receives an array of buckets as direct args and returns a similarly shaped array of counters. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
On 2014-01-23 11:14:05 -0500, Tom Lane wrote: OK, I'll take a look. Thanks. I am not too happy about the runtime check as the test isn't all that meaningful, but I couldn't think of anything better. Yeah, it's problematic for cross-compiles, but no more so than configure's existing test for %n$ support. In practice, since both these features are required by C99, I think it wouldn't be such an issue for most people. Currently we automatically fall back to our implementation if we're cross compiling unless I am missing something, that's a bit odd, but it should work ;) I was wondering more about the nature of the runtime check than the fact that it's a runtime check at all... E.g. snprintf.c simply skips over unknown format characters and might not have been detected as faulty on 32bit platforms by that check. Which might be considered a good thing :) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
Andres Freund and...@2ndquadrant.com writes: So, here's a patch adding %z support to port/snprintf.c including a configure check to test whether we need to fall back. OK, I'll take a look. I am not too happy about the runtime check as the test isn't all that meaningful, but I couldn't think of anything better. Yeah, it's problematic for cross-compiles, but no more so than configure's existing test for %n$ support. In practice, since both these features are required by C99, I think it wouldn't be such an issue for most people. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 01/14/2014 05:35 PM, Alexander Korotkov wrote: Attached version is rebased against last version of packed posting lists. Thanks! I think we're missing a trick with multi-key queries. We know that when multiple scan keys are used, they are ANDed together, so we can do the skip optimization even without the new tri-state consistent function. To get started, I propose the three attached patches. These only implement the optimization for the multi-key case, which doesn't require any changes to the consistent functions and hence no catalog changes. Admittedly this isn't anywhere near as useful in practice as the single key case, but let's go for the low-hanging fruit first. This nevertheless introduces some machinery that will be needed by the full patch anyway. I structured the code somewhat differently than your patch. There is no separate fast-path for the case where the optimization applies. Instead, I'm passing the advancePast variable all the way down to where the next batch of items are loaded from the posting tree. keyGetItem is now responsible for advancing the entry streams, and the logic in scanGetItem has been refactored so that it advances advancePast aggressively, as soon as one of the key streams let us conclude that no items a certain point can match. scanGetItem might yet need to be refactored when we get to the full preconsistent check stuff, but one step at a time. The first patch is the most interesting one, and contains the scanGetItem changes. The second patch allows seeking to the right segment in a posting tree page, and the third allows starting the posting tree scan from root, when skipping items (instead of just following the right-links). Here are some simple performance test results, demonstrating the effect of each of these patches. This is a best-case scenario. I don't think these patches has any adverse effects even in the worst-case scenario, although I haven't actually tried hard to measure that. The used this to create a test table: create table foo (intarr int[]); -- Every row contains 0 (frequent term), and a unique number. insert into foo select array[0,g] from generate_series(1, 1000) g; -- Add another tuple with 0, 1 combo physically to the end of the table. insert into foo values (array[0,1]); The query I used is this: postgres=# select count(*) from foo where intarr @ array[0] and intarr @ array[1]; count --- 2 (1 row) I measured the time that query takes, and the number of pages hit, using explain (analyze, buffers true) patches time (ms) buffers --- unpatched 650 1316 patch 1 0.521316 patches 1+2 0.501316 patches 1+2+3 0.1315 So, the second patch isn't doing much in this particular case. But it's trivial, and I think it will make a difference in other queries where you have the opportunity skip, but return a lot of tuples overall. In summary, these are fairly small patches, and useful on their, so I think these should be committed now. But please take a look and see if the logic in scanGetItem/keyGetItem looks correct to you. After this, I think the main fast scan logic will go into keyGetItem. PS. I find it a bit surprising that in your patch, you're completely bailing out if there are any partial-match keys involved. Is there some fundamental reason for that, or just not implemented? - Heikki From 53e33c931c41f5ff8bb22ecfc011e717d2dbb9fd Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Thu, 23 Jan 2014 15:41:43 +0200 Subject: [PATCH 1/3] Optimize GIN multi-key queries. In a multi-key search, ie. something like col @ 'foo' AND col @ 'bar', as soon as we find the next item that matches the first criteria, we don't need to check the second criteria for TIDs smaller the first match. That saves a lot of effort, especially if one of the first term is rare, while the second occurs very frequently. Based on ideas from Alexander Korotkov's fast scan patch --- src/backend/access/gin/ginget.c | 465 ++-- 1 file changed, 255 insertions(+), 210 deletions(-) diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c index 4bdbd45..4de7a10 100644 --- a/src/backend/access/gin/ginget.c +++ b/src/backend/access/gin/ginget.c @@ -68,29 +68,6 @@ callConsistentFn(GinState *ginstate, GinScanKey key) } /* - * Tries to refind previously taken ItemPointer on a posting page. - */ -static bool -needToStepRight(Page page, ItemPointer item) -{ - if (GinPageGetOpaque(page)-flags GIN_DELETED) - /* page was deleted by concurrent vacuum */ - return true; - - if (ginCompareItemPointers(item, GinDataPageGetRightBound(page)) 0 - !GinPageRightMost(page)) - { - /* - * the item we're looking is the right bound of the page, so it - * can't be on this page. - */ - return true; - } - - return false; -} - -/* * Goes to the next
Re: [HACKERS] Passing direct args of ordered-set aggs to the transition function
Florian Pflug f...@phlo.org writes: Is there a particular reason why the direct arguments of ordered-set aggregates are not passed to the transition function too? Because they have to be evaluated only once. I did consider evaluating them once at the start and saving the values, but that's a bit problematic from a memory management standpoint. Still, if you have a good solution to that and the cycles to produce a patch, it's not too late to reconsider. I concur that it's not that hard to think of cases where it'd be useful. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
Andres Freund and...@2ndquadrant.com writes: I was wondering more about the nature of the runtime check than the fact that it's a runtime check at all... E.g. snprintf.c simply skips over unknown format characters and might not have been detected as faulty on 32bit platforms by that check. Which might be considered a good thing :) Oh ... gotcha. Yeah, it's possible that snprintf would behave in a way that masks the fact that it doesn't really recognize the z flag, but that seems rather unlikely to me. More likely it would abandon processing the %-sequence on grounds it's malformed. I will try the patch on my old HPUX dinosaur, which I'm pretty sure does not know z, and verify this is the case. Also, I'm guessing Windows' version of snprintf doesn't have z either. Could someone try the patch's configure test program on Windows and see what the result is? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Passing direct args of ordered-set aggs to the transition function
On Jan23, 2014, at 17:20 , Tom Lane t...@sss.pgh.pa.us wrote: Florian Pflug f...@phlo.org writes: Is there a particular reason why the direct arguments of ordered-set aggregates are not passed to the transition function too? Because they have to be evaluated only once. I did consider evaluating them once at the start and saving the values, but that's a bit problematic from a memory management standpoint. Yeah, that's what I had in mind. I not sure I understand that memory management problems you mention though - couldn't we just copy them to some appropriate memory context, say aggcontext? What I'm more concerned about is whether it'd still be possible to have ordered_set_transition and ordered_set_transition_multi work for all the ordered-set aggregates we currently have. But I have yet to wrap my head fully around the VARIADIC any ANY stuff that goes on there... Still, if you have a good solution to that and the cycles to produce a patch, it's not too late to reconsider. I concur that it's not that hard to think of cases where it'd be useful. I'll see what I can do. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)
On Thu, Jan 23, 2014 at 7:05 AM, Andres Freund and...@2ndquadrant.com wrote: I don't think shared buffers fsyncs are the apt comparison. It's more something like UpdateControlFile(). Which PANICs. I really don't get why you fight PANICs in general that much. There are some nasty PANICs in postgres which can happen in legitimate situations, which should be made to fail more gracefully, but this surely isn't one of them. We're doing rename(), unlink() and rmdir(). That's it. We should concentrate on the ones that legitimately can happen, not the ones created by an admin running a chmod -R 000 . ; rm -rf $PGDATA or mount -o remount,ro /. We don't increase reliability by a bit adding codepaths that will never get tested. Sorry, I don't buy it. Lots of people I know have stories that go like this $HORRIBLE happened, and PostgreSQL kept on running, and it didn't even lose my data!, where $HORRIBLE may be variously that the disk filled up, that disk writes started failing with I/O errors, that somebody changed the permissions on the data directory inadvertently, that the entire data directory got removed, and so on. I've been through some of those scenarios myself, and the care and effort that's been put into failure modes has saved my bacon more than a few times, too. We *do* increase reliability by worrying about what will happen even in code paths that very rarely get exercised. It's certainly true that our bug count there is higher there than for the parts of our code that get exercised more regularly, but it's also lower than it would be if we didn't make the effort, and the dividend that we get from that effort is that we have a well-deserved reputation for reliability. I think it's completely unacceptable for the failure of routine filesystem operations to result in a PANIC. I grant you that we have some existing cases where that can happen (like UpdateControlFile), but that doesn't mean we should add more. Right this very minute there is massive bellyaching on a nearby thread caused by the fact that a full disk condition while writing WAL can PANIC the server, while on this thread at the very same time you're arguing that adding more ways for a full disk to cause PANICs won't inconvenience anyone. The other thread is right, and your argument here is wrong. We have been able to - and have taken the time to - fix comparable problems in other cases, and we should do the same thing here. As for why I fight PANICs so much in general, there are two reasons. First, I believe that to be project policy. I welcome correction if I have misinterpreted our stance in that area. Second, I have encountered a few situations where customers had production servers that repeatedly PANICked due to some bug or other. If I've ever encountered angrier customers, I can't remember when. A PANIC is no big deal when it happens on your development box, but when it happens on a machine with 100 users connected to it, it's a big deal, especially if a single underlying cause makes it happen over and over again. I think we should be devoting our time to figuring how to improve this, not whether to improve it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
On 2014-01-23 11:25:56 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I was wondering more about the nature of the runtime check than the fact that it's a runtime check at all... E.g. snprintf.c simply skips over unknown format characters and might not have been detected as faulty on 32bit platforms by that check. Which might be considered a good thing :) Oh ... gotcha. Yeah, it's possible that snprintf would behave in a way that masks the fact that it doesn't really recognize the z flag, but that seems rather unlikely to me. More likely it would abandon processing the %-sequence on grounds it's malformed. Yea, hopefully. I will try the patch on my old HPUX dinosaur, which I'm pretty sure does not know z, and verify this is the case. I don't know how, but I've introduced a typo in the version I sent if you haven't noticed yet, there's a missing in PGAC_FUNC_PRINTF_SIZE_T_SUPPORT. %zu instead of %zu Also, I'm guessing Windows' version of snprintf doesn't have z either. Could someone try the patch's configure test program on Windows and see what the result is? I've attached a version of that here, for $windowsperson's convenience. I hope I got the llp stuff right... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services /* confdefs.h */ #define UINT64_FORMAT %llu /* end confdefs.h. */ #include stdio.h #include string.h int main() { char buf64[100]; char bufz[100]; /* * Check whether we print correctly using %z by printing the biggest * unsigned number fitting in a size_t and using both %zu and the format for * 64bit numbers. */ snprintf(bufz, 100, %zu, ~(size_t)0); snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0); if (strcmp(bufz, buf64) != 0) fprintf(stderr, no can do %%z\n); else fprintf(stderr, can do %%z\n); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)
On 2014-01-23 11:50:57 -0500, Robert Haas wrote: On Thu, Jan 23, 2014 at 7:05 AM, Andres Freund and...@2ndquadrant.com wrote: I don't think shared buffers fsyncs are the apt comparison. It's more something like UpdateControlFile(). Which PANICs. I really don't get why you fight PANICs in general that much. There are some nasty PANICs in postgres which can happen in legitimate situations, which should be made to fail more gracefully, but this surely isn't one of them. We're doing rename(), unlink() and rmdir(). That's it. We should concentrate on the ones that legitimately can happen, not the ones created by an admin running a chmod -R 000 . ; rm -rf $PGDATA or mount -o remount,ro /. We don't increase reliability by a bit adding codepaths that will never get tested. Sorry, I don't buy it. Lots of people I know have stories that go like this $HORRIBLE happened, and PostgreSQL kept on running, and it didn't even lose my data!, where $HORRIBLE may be variously that the disk filled up, that disk writes started failing with I/O errors, that somebody changed the permissions on the data directory inadvertently, that the entire data directory got removed, and so on. Especially the not loosing data imo is because postgres is conservative with continuing in situations it doesn't know anything about. Most prominently the cluster wide restart after a segfault. I've been through some of those scenarios myself, and the care and effort that's been put into failure modes has saved my bacon more than a few times, too. We *do* increase reliability by worrying about what will happen even in code paths that very rarely get exercised. A part of thinking about them *is* restricting what happens in those cases by keeping the possible states to worry about to a minimum. Just splapping on an ERROR instead of PANIC can make things much worse. Not releasing space until a restart, without a chance to do anything about it because we failed to properly release the in-memory slot will just make the problem bigger because now the cleanup might take a week (VACUUM FULLing the entire cluster?). I think it's completely unacceptable for the failure of routine filesystem operations to result in a PANIC. I grant you that we have some existing cases where that can happen (like UpdateControlFile), but that doesn't mean we should add more. Right this very minute there is massive bellyaching on a nearby thread caused by the fact that a full disk condition while writing WAL can PANIC the server, while on this thread at the very same time you're arguing that adding more ways for a full disk to cause PANICs won't inconvenience anyone. A full disk won't cause any of the problems for the case we're discussing, will it? We're just doing rename(), unlink(), rmdir() here, all should succeed while the FS is full (afair rename() does on all common FSs because inodes are kept separately). The other thread is right, and your argument here is wrong. We have been able to - and have taken the time to - fix comparable problems in other cases, and we should do the same thing here. I don't think the WAL case is comparable at all. ENOSPC is something expected that can happen during normal operation and doesn't include malintended operator and is reasonably easy to test. unlink() or fsync() randomly failing is not. In fact, isn't the consequence out of that thread that we need a significant amount of extra complexity to handle the case? We shouldn't spend that effort for cases that don't deserve it because they are too unlikely in practice. And yes, there's not too many other places PANICing - because most can rely on WAL handling those tricky cases for them... Second, I have encountered a few situations where customers had production servers that repeatedly PANICked due to some bug or other. If I've ever encountered angrier customers, I can't remember when. A PANIC is no big deal when it happens on your development box, but when it happens on a machine with 100 users connected to it, it's a big deal, especially if a single underlying cause makes it happen over and over again. Sure. But blindly continuing and then, possibly quite a bit later, loosing data, causing an outage that takes a long while to recover or something isn't any better. I think we should be devoting our time to figuring how to improve this, not whether to improve it. If you'd argue that creating a new slot should fail gracefull, ok, I can relatively easily be convinced of that. But trying to handle failures in the midst of deletion in cases that won't happen in reality is just inviting trouble imo. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case sensitive mode in windows build option
On 01/13/2014 10:49 PM, Dilip kumar wrote: As per current behavior if user want to build in debug mode in windows, then he need to give debug in capital letters (DEBUG), I think many user will always make mistake in giving this option, in my opinion we can make it case insensitive. I have attached a small patch for the same ( just converted comparison to case insensitive). I have committed this. It's in the commitfest as a bug fix, but I don't think it's strictly a bug. OTOH, it's pretty harmless. Do we want to backpatch it? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The problems of PQhost()
On Wed, Jan 22, 2014 at 11:48 PM, Fujii Masao masao.fu...@gmail.com wrote: Hi, I reported in other thread that PQhost() has three problems. http://www.postgresql.org/message-id/cahgqgwe77akyabywde5+8bjldopthp7cnswao_syedjogyv...@mail.gmail.com (1) PQhost() can return Unix-domain socket directory path even in the platform that doesn't support Unix-domain socket. (2) In the platform that doesn't support Unix-domain socket, when neither host nor hostaddr are specified, the default host 'localhost' is used to connect to the server and PQhost() must return that, but it doesn't. (3) PQhost() cannot return the hostaddr. As the result of these problems, you can see the incorrect result of \conninfo, for example. $ psql -d hostaddr=127.0.0.1 =# \conninfo You are connected to database postgres as user postgres via socket in /tmp at port 5432. Obviously /tmp should be 127.0.0.1. Attached patch fixes these problems. We can fix the problem (3) by changing PQhost() so that it also returns the hostaddr. But this change might break the existing application using PQhost(). So, I added new libpq function PQhostaddr() which returns the hostaddr, and changed \conninfo so that it reports correct connection information by using both PQhost() and PQhostaddr(). These problems exist in v9.3 or before. Basically we should backport the patch into those versions. But obviously it's too late to add new libpq function PQhostaddr() into those versions. Then, I'm thinking to backport only the change for (1) and (2) because we can fix them without adding new libpq function. BTW, I think that the patch also fixes the problem of \conninfo that MauMau reported in other thread. http://www.postgresql.org/message-id/8913B51B3B534F878D54B89E1EB964C6@maumau Committed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] psql's \conninfo reports incorrect destination on Windows
On Wed, Dec 4, 2013 at 9:21 PM, MauMau maumau...@gmail.com wrote: Hello, I've found a bug that psql's \conninfo displays incorrect information on Windows. Please find attached the patch and commit this. [Problem] When I run psql postgres on Windows and execute \conninfo, it outputs the text below. It reports that psql connected to the server via UNIX domain socket, but the actual connection is of course via TCP/IP socket to localhost. You are connected to database postgres as user Administrator via socket in /tmp at port 5432. It should output: You are connected to database postgres as user Administrator on host localhost at port 5432. I think that 77035fa8a92d8c39f4c689e54f46813f203f09a8 fixed this problem. Please check that. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
Andres Freund and...@2ndquadrant.com writes: snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0); Actually, that coding isn't gonna work at all on platforms where size_t isn't the same size as uint64. We could make it work by explicitly casting the argument to whatever type we've decided to use as uint64 ... but unless we want to include c.h here, that would require a lot of extra cruft, and I'm really not sure it's gaining anything anyway. I'm inclined to just print (size_t)0x and see if it produces the expected result. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
On 2014-01-23 12:54:22 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0); Actually, that coding isn't gonna work at all on platforms where size_t isn't the same size as uint64. We could make it work by explicitly casting the argument to whatever type we've decided to use as uint64 ... but unless we want to include c.h here, that would require a lot of extra cruft, and I'm really not sure it's gaining anything anyway. Hm, yea, it should be casted. I think we should have the type ready in PG_INT64_TYPE, confdefs.h should contain it at that point. I'm inclined to just print (size_t)0x and see if it produces the expected result. Well, the reasoning, weak as it may be, was that that we want to see whether we successfully recognize z as a 64bit modifier or not. And I couldn't think of a better way to test that both for 32 and 64 bit platforms... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
Pavel Stehule escribió: I though about it too. But I didn't continue - reasons was named by Dean - and RemoveObjects are not difficult code - lot of code is mechanical - and it is not on critical path. I have pushed it after some editorialization. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing commitfest 2013-11
Alvaro Herrera wrote: With apologies to our beloved commitfest-mace-wielding CFM, commitfest 2013-11 intentionally still contains a few open patches. I think that CF is largely being ignored by most people now that we have CF 2014-01 in progress. If we don't want to do anything about these patches in the immediate future, I propose we move them to CF 2014-01. * shared memory message queues I closed this one as committed. * Shave a few instructions from child-process startup sequence * Widening application of indices. I marked these as returned with feedback. * fault tolerant DROP IF EXISTS Committed this one and marked as such. * SSL: better default ciphersuite This one was moved to 2014-01 (not by me). So there is nothing remaining in 2013-11 and we can (continue to) focus exclusively on 2014-01. Yay! -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
Andres Freund and...@2ndquadrant.com writes: On 2014-01-23 12:54:22 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0); Actually, that coding isn't gonna work at all on platforms where size_t isn't the same size as uint64. We could make it work by explicitly casting the argument to whatever type we've decided to use as uint64 ... but unless we want to include c.h here, that would require a lot of extra cruft, and I'm really not sure it's gaining anything anyway. Hm, yea, it should be casted. I think we should have the type ready in PG_INT64_TYPE, confdefs.h should contain it at that point. Ah, I'd forgotten that configure defined any such symbol. Yeah, that will work. Well, the reasoning, weak as it may be, was that that we want to see whether we successfully recognize z as a 64bit modifier or not. I'm dubious that this is really adding much, but whatever. I checked on my HPUX box and find that what it prints for %zu is zu, confirming my thought that it'd just abandon processing of the %-sequence. (Interesting that it doesn't eat the z while doing so, though.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Case sensitive mode in windows build option
Andrew Dunstan and...@dunslane.net writes: I have committed this. It's in the commitfest as a bug fix, but I don't think it's strictly a bug. OTOH, it's pretty harmless. Do we want to backpatch it? Given the lack of field complaints, I'd say it's not worth the trouble. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Warning in new GIN code
The rather ancient gcc on my HPUX box is complaining thusly about HEAD: ginbtree.c: In function `ginPlaceToPage': ginbtree.c:602: warning: control reaches end of non-void function I would imagine this would happen on any compiler that doesn't recognize the hint about elog(ERROR) not returning. Could we stick a dummy return at the end? } else + { elog(ERROR, unknown return code from GIN placeToPage method: %d, rc); + return false; /* keep compiler quiet */ + } } regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
Andres Freund escribió: * Why must we not schema qualify system types (c.f. expand_jsonval_typename)? It seems to be perfectly sensible to me to just use pg_catalog there. So, the reason for doing things this way is to handle cases like varchar(10) being turned into character varying; and that name requires that the typename NOT be schema-qualified, otherwise it fails. But thinking about this again, I don't see a reason why this can't be returned simply as pg_catalog.varchar(10); this should work fine on the receiving end as well, and give the same result. The other cases I'm worried about are types like bit(1) vs. unadorned bit vs. double-quoted bit, and char, etc. I'm not sure I'm dealing with them correctly right now. So even if by the above paragraph I could make the is_system thingy go away, I might still need it to cover this case. Thanks for the review, I will post an updated version later after fixing the other issues you mentioned plus adding support for more commands. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.
On 01/22/2014 06:28 PM, Heikki Linnakangas wrote: Compress GIN posting lists, for smaller index size. GIN posting lists are now encoded using varbyte-encoding, which allows them to fit in much smaller space than the straight ItemPointer array format used before. The new encoding is used for both the lists stored in-line in entry tree items, and in posting tree leaf pages. To maintain backwards-compatibility and keep pg_upgrade working, the code can still read old-style pages and tuples. Posting tree leaf pages in the new format are flagged with GIN_COMPRESSED flag, to distinguish old and new format pages. Likewise, entry tree tuples in the new format have a GIN_ITUP_COMPRESSED flag set in a bit that was previously unused. This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with version 9.4 will therefore have version number 2 in the metapage, while old pg_upgraded indexes will have version 1. The code treats them the same, but it might be come handy in the future, if we want to drop support for the uncompressed format. Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote. it seems that this commit made spoonbill an unhappy animal: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2014-01-23%2000%3A00%3A04 Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] Client-only installation on Windows
On 12/06/2013 09:16 AM, MauMau wrote: Hello, According to this page, http://www.postgresql.org/docs/current/static/install-procedure.html client-only installation is possible on UNIX/Linux like this: gmake -C src/bin install gmake -C src/include install gmake -C src/interfaces install gmake -C doc install With the attached patch, you can do client-only installation on Windows like this: install.bat install_dir client This installs: * client applications (both core and contrib) * DLLs for libpq and ECPG * header files * import libraries * pg_service.conf.sample and psqlrc.sample * symbol files (*.pdb) for the above modules If the second argument is given as client or omitted, all files are installed. With 9.4, the whole installation takes up about 80 MB, and the client-only installation takes up only 24 MB. Any comments would be appreciated This looks OK, and I'll commit it after I have a chance to give it a quick test (probably at the same time as I test the VS2013 patch).. Is there any reason why pgbench is listed in @client_program_files as well as @client_contribs? AFAICT it should only be in the latter. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.
On 01/23/2014 09:18 PM, Stefan Kaltenbrunner wrote: On 01/22/2014 06:28 PM, Heikki Linnakangas wrote: Compress GIN posting lists, for smaller index size. GIN posting lists are now encoded using varbyte-encoding, which allows them to fit in much smaller space than the straight ItemPointer array format used before. The new encoding is used for both the lists stored in-line in entry tree items, and in posting tree leaf pages. To maintain backwards-compatibility and keep pg_upgrade working, the code can still read old-style pages and tuples. Posting tree leaf pages in the new format are flagged with GIN_COMPRESSED flag, to distinguish old and new format pages. Likewise, entry tree tuples in the new format have a GIN_ITUP_COMPRESSED flag set in a bit that was previously unused. This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with version 9.4 will therefore have version number 2 in the metapage, while old pg_upgraded indexes will have version 1. The code treats them the same, but it might be come handy in the future, if we want to drop support for the uncompressed format. Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote. it seems that this commit made spoonbill an unhappy animal: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2014-01-23%2000%3A00%3A04 Hmm, all the Sparcs. Some kind of an alignment issue, perhaps? I will investigate.. - Heikki -- - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
I wrote: I checked on my HPUX box and find that what it prints for %zu is zu, confirming my thought that it'd just abandon processing of the %-sequence. (Interesting that it doesn't eat the z while doing so, though.) Further testing on that box shows that its ancient gcc (2.95.3) doesn't know z either, which means that the patch produces a boatload of compile warnings like this: mcxt.c: In function `MemoryContextAllocZero': mcxt.c:605: warning: unknown conversion type character `z' in format mcxt.c:605: warning: too many arguments for format While I am not really expecting this gcc to compile PG cleanly anymore, the idea that we might get many dozen such warnings on more-current compilers is scarier, as that might well interfere with people's ability to do development on, say, Windows. Could somebody check whether MSVC for instance complains about format strings using z? Or shall I just commit this and we'll see what the buildfarm says? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Warning in new GIN code
On 01/23/2014 08:41 PM, Tom Lane wrote: The rather ancient gcc on my HPUX box is complaining thusly about HEAD: ginbtree.c: In function `ginPlaceToPage': ginbtree.c:602: warning: control reaches end of non-void function I would imagine this would happen on any compiler that doesn't recognize the hint about elog(ERROR) not returning. Could we stick a dummy return at the end? } else + { elog(ERROR, unknown return code from GIN placeToPage method: %d, rc); + return false; /* keep compiler quiet */ + } } Fixed, thanks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
Alvaro Herrera alvhe...@2ndquadrant.com writes: So, the reason for doing things this way is to handle cases like varchar(10) being turned into character varying; and that name requires that the typename NOT be schema-qualified, otherwise it fails. But thinking about this again, I don't see a reason why this can't be returned simply as pg_catalog.varchar(10); this should work fine on the receiving end as well, and give the same result. I think people would be unhappy if we changed the output of, say, pg_dump that way. But it's presumably not a problem for strings inside event triggers. Once upon a time, the typmods would have been an issue, but now that we support them in generic typename syntax I think we're good. The other cases I'm worried about are types like bit(1) vs. unadorned bit vs. double-quoted bit, and char, etc. I'm not sure I'm dealing with them correctly right now. So even if by the above paragraph I could make the is_system thingy go away, I might still need it to cover this case. Yeah, there are some weird cases there. I think you can make them go away if you always print the fully qualified type name, ie don't omit pg_catalog, and use pg_type.typname not any converted version (and don't forget to double-quote anything that might be a reserved word). But I've not looked closely at the code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why do we let autovacuum give up?
Hello, I have run into yet again another situation where there was an assumption that autovacuum was keeping up and it wasn't. It was caused by autovacuum quitting because another process requested a lock. In turn we received a ton of bloat on pg_attribute which caused all kinds of other issues (as can be expected). The more I run into it, the more it seems like autovacuum should behave like vacuum, in that it gets precedence when it is running. First come, first serve as they say. Thoughts? JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.
Heikki Linnakangas hlinnakan...@vmware.com writes: On 01/23/2014 09:18 PM, Stefan Kaltenbrunner wrote: it seems that this commit made spoonbill an unhappy animal: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2014-01-23%2000%3A00%3A04 Hmm, all the Sparcs. Some kind of an alignment issue, perhaps? I will investigate.. My HPUX box, which is also picky about alignment, is unhappy as well. It's crashing here: ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0) at ginpostinglist.c:263 263 return ginPostingListDecodeAllSegments(plist, (gdb) bt #0 ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0) at ginpostinglist.c:263 #1 0x205308 in ginReadTuple (ginstate=0xc39efac1, attnum=48864, itup=0x7b03bee0, nitems=0x403ee9a4) at ginentrypage.c:170 #2 0x21074c in startScanEntry (ginstate=0x403ec3ac, entry=0x403ee970) at ginget.c:463 #3 0x21086c in startScan (scan=0xc39efac1) at ginget.c:493 #4 0x212c14 in gingetbitmap (fcinfo=0xc39efac1) at ginget.c:1531 #5 0x5ffc50 in FunctionCall2Coll (flinfo=0xc39efac1, collation=2063843040, arg1=2063843040, arg2=1077864868) at fmgr.c:1323 #6 0x24ee5c in index_getbitmap (scan=0x40163878, bitmap=0x403ee620) at indexam.c:649 #7 0x3b9430 in MultiExecBitmapIndexScan (node=0x40163768) at nodeBitmapIndexscan.c:89 #8 0x3a5a3c in MultiExecProcNode (node=0x40163768) at execProcnode.c:562 #9 0x3b8610 in BitmapHeapNext (node=0x401628f0) at nodeBitmapHeapscan.c:104 #10 0x3ae5b0 in ExecScan (node=0x401628f0, accessMtd=0x4001a2c2 DINFINITY+3802, recheckMtd=0x4001a2ca DINFINITY+3810) at execScan.c:82 #11 0x3b8e9c in ExecBitmapHeapScan (node=0xc39efac1) at nodeBitmapHeapscan.c:441 #12 0x3a56e0 in ExecProcNode (node=0x401628f0) at execProcnode.c:414 ... (gdb) p debug_query_string $1 = 0x4006d4a8 SELECT * FROM array_index_op_test WHERE i @ '{38,34,32,89}' ORDER BY seqno; The problem appears to be due to the misaligned plist pointer (0xc39efac1 here). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
Joshua D. Drake j...@commandprompt.com writes: I have run into yet again another situation where there was an assumption that autovacuum was keeping up and it wasn't. It was caused by autovacuum quitting because another process requested a lock. In turn we received a ton of bloat on pg_attribute which caused all kinds of other issues (as can be expected). The more I run into it, the more it seems like autovacuum should behave like vacuum, in that it gets precedence when it is running. First come, first serve as they say. 1. Back when it worked like that, things were worse. 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/23/2014 12:34 PM, Joshua D. Drake wrote: Hello, I have run into yet again another situation where there was an assumption that autovacuum was keeping up and it wasn't. It was caused by autovacuum quitting because another process requested a lock. In turn we received a ton of bloat on pg_attribute which caused all kinds of other issues (as can be expected). The more I run into it, the more it seems like autovacuum should behave like vacuum, in that it gets precedence when it is running. First come, first serve as they say. Thoughts? If we let autovacuum block user activity, a lot more people would turn it off. Now, if you were to argue that we should have some way to monitor the tables which autovac can never touch because of conflicts, I would agree with you. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On Thu, Jan 23, 2014 at 12:53 PM, Josh Berkus j...@agliodbs.com wrote: On 01/23/2014 12:34 PM, Joshua D. Drake wrote: Hello, I have run into yet again another situation where there was an assumption that autovacuum was keeping up and it wasn't. It was caused by autovacuum quitting because another process requested a lock. In turn we received a ton of bloat on pg_attribute which caused all kinds of other issues (as can be expected). The more I run into it, the more it seems like autovacuum should behave like vacuum, in that it gets precedence when it is running. First come, first serve as they say. Thoughts? If we let autovacuum block user activity, a lot more people would turn it off. Now, if you were to argue that we should have some way to monitor the tables which autovac can never touch because of conflicts, I would agree with you. Agree completely. Easy ways to monitor this would be great. Once you know there's a problem, tweaking autovacuum settings is very hard and misunderstood, and explaining how to be effective at it is a dark art too. -Harold -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.1
Patch 0001: +errmsg(could not find free replication slot), Suggest: all replication slots are in use + elog(ERROR, cannot aquire a slot while another slot has been acquired); Suggest: this backend has already acquired a replication slot Or demote it to Assert(). I'm not really sure why this needs to be checked in non-assert builds. I also wonder if we should use the terminology attach instead of acquire; that pairs more naturally with release. Then the message, if we want more than an assert, might be this backend is already attached to a replication slot. + if (slot == NULL) + { + LWLockRelease(ReplicationSlotCtlLock); + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), +errmsg(could not find replication slot \%s\, name))); + } The error will release the LWLock anyway; I'd get rid of the manual LWLockRelease, and the braces. Similarly in ReplicationSlotDrop. + /* acquire spinlock so we can test and set -active safely */ + SpinLockAcquire(slot-mutex); + + if (slot-active) + { + SpinLockRelease(slot-mutex); + LWLockRelease(ReplicationSlotCtlLock); + ereport(ERROR, + (errcode(ERRCODE_OBJECT_IN_USE), +errmsg(slot \%s\ already active, name))); + } + + /* we definitely have the slot, no errors possible anymore */ + slot-active = true; + MyReplicationSlot = slot; + SpinLockRelease(slot-mutex); This doesn't need the LWLockRelease either. It does need the SpinLockRelease, but as I think I noted previously, the right way to write this is probably: SpinLockAcquire(slot-mutex); was_active = slot-active; slot-active = true; SpinLockRelease(slot-mutex); if (was_active) ereport(). MyReplicatonSlot = slot. ReplicationSlotsComputeRequiredXmin still acquires ProcArrayLock, and the comment Provide interlock against concurrent recomputations doesn't seem adequate to me. I guess the idea here is that we regard ProcArrayLock as protecting ReplicationSlotCtl-catalog_xmin and ReplicationSlotCtl-data_xmin, but if that's the idea then we only need to hold the lock during the time when we actually update those values, not the loop where we compute them. Also, if that's the design, maybe they should be part of PROC_HDR *ProcGlobal rather than here. It seems weird to have some of the values protected by ProcArrayLock live in a completely different data structure managed almost entirely by some other part of the system. It's pretty evident that what's currently patch #4 (only peg the xmin horizon for catalog tables during logical decoding) needs to become patch #1, because it doesn't make any sense to apply this before we do that. I'm still not 100% confident in that approach, but maybe I'd better try to look at it RSN and get confident, because too much of the rest of what you've got here hangs on that to proceed without it. Or to put all that another way, if for any reason we decide that the separate catalog xmin stuff is not viable, the rest of this is going to need a lot of rework, so we'd better sort that now rather than later. With respect to the synchronize-slots-to-disk stuff we're arguing about on the other thread, I think the basic design problem here is that you assume that you can change stuff in memory and then change stuff on disk, without either set of changes being atomic. What I think you need to do is making atomic actions on disk correspond to atomic state changes in memory. IOW, suppose that creating a slot consists of two steps: mkdir() + fsync(). Then I think what you need to do is - do the mkdir(). If it errors out, fine. If it succeeds, the mark the slot half-created. This is just an assignment so it can done immediately after you learn that mkdir() worked with no risk of an intervening failure. Then, try to fsync(). If it errors out, the slot will get left in the half-created state. If it works, then immediately mark the slot as fully created. Now, when the next guy comes along and looks at the slot, he can tell what he needs to do. Specifically, if the slot is half-created, and he wants to do anything other than remove it, he's got to fsync() it first, and if that errors out, so be it. The next access to the slot will merely find it still half-created and simply try the fsync() yet again. Alternatively, since nearly everything we're trying to do here is a two-step operation - do something and then fsync - maybe we have a more generic fsync-pending flag, and each slot operation checks that and retries the fsync() if it's set. But it might be situation dependent which thing we need to fsync, since there are multiple files involved. Broadly, what you're trying to accomplish here is to have something that is crash-safe but without relying on WAL, so that it can work on
Re: [HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.
On 01/23/2014 10:37 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 01/23/2014 09:18 PM, Stefan Kaltenbrunner wrote: it seems that this commit made spoonbill an unhappy animal: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2014-01-23%2000%3A00%3A04 Hmm, all the Sparcs. Some kind of an alignment issue, perhaps? I will investigate.. My HPUX box, which is also picky about alignment, is unhappy as well. It's crashing here: ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0) at ginpostinglist.c:263 263 return ginPostingListDecodeAllSegments(plist, (gdb) bt #0 ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0) at ginpostinglist.c:263 #1 0x205308 in ginReadTuple (ginstate=0xc39efac1, attnum=48864, itup=0x7b03bee0, nitems=0x403ee9a4) at ginentrypage.c:170 #2 0x21074c in startScanEntry (ginstate=0x403ec3ac, entry=0x403ee970) at ginget.c:463 #3 0x21086c in startScan (scan=0xc39efac1) at ginget.c:493 #4 0x212c14 in gingetbitmap (fcinfo=0xc39efac1) at ginget.c:1531 #5 0x5ffc50 in FunctionCall2Coll (flinfo=0xc39efac1, collation=2063843040, arg1=2063843040, arg2=1077864868) at fmgr.c:1323 #6 0x24ee5c in index_getbitmap (scan=0x40163878, bitmap=0x403ee620) at indexam.c:649 #7 0x3b9430 in MultiExecBitmapIndexScan (node=0x40163768) at nodeBitmapIndexscan.c:89 #8 0x3a5a3c in MultiExecProcNode (node=0x40163768) at execProcnode.c:562 #9 0x3b8610 in BitmapHeapNext (node=0x401628f0) at nodeBitmapHeapscan.c:104 #10 0x3ae5b0 in ExecScan (node=0x401628f0, accessMtd=0x4001a2c2 DINFINITY+3802, recheckMtd=0x4001a2ca DINFINITY+3810) at execScan.c:82 #11 0x3b8e9c in ExecBitmapHeapScan (node=0xc39efac1) at nodeBitmapHeapscan.c:441 #12 0x3a56e0 in ExecProcNode (node=0x401628f0) at execProcnode.c:414 ... (gdb) p debug_query_string $1 = 0x4006d4a8 SELECT * FROM array_index_op_test WHERE i @ '{38,34,32,89}' ORDER BY seqno; The problem appears to be due to the misaligned plist pointer (0xc39efac1 here). Ah, thanks! Looks like I removed a SHORTALIGN from ginFormTuple that was in fact very much necessary.. Fixed now, let's see if that pacifies the sparcs. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
On Thu, Jan 23, 2014 at 9:23 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jan 23, 2014 at 10:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kapil...@gmail.com writes: On Wed, Jan 22, 2014 at 9:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: So? Anything which can know the value of a GUC parameter can certainly know the selected port number. 1. In case of registration of event source, either user has to pass the name or it uses hard coded default value, so if we use version number along with 'PostgreSQL', it can be consistent. I don't see any way pgevent.c can know port number to append it to default value, am I missing something here? I think what we might want to do is redefine the server's behavior as creating an event named after the concatenation of event_source and port number, or maybe even get rid of event_source entirely and just say it's PostgreSQL followed by the port number. To accomplish this behaviour, each time server starts and stops, we need to register and unregister event log using mechanism described at below link to ensure that there is no mismatch between what server uses and what OS knows. http://www.postgresql.org/docs/devel/static/event-log-registration.html Why wouldn't that be necessary with your approach, too? I mean, if there's a GUC that controls the event source name, then it can be changed between restarts, regardless of what you call it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. But... how does that result on a vacuum-incompatible lock request against pg_attribute? I see that it'll insert lots of rows into pg_attribute, and maybe later delete them, but none of that blocks vacuum. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On 1/20/14 9:46 AM, Mel Gorman wrote: They could potentially be used to evalate any IO scheduler changes. For example -- deadline scheduler with these parameters has X transactions/sec throughput with average latency of Y millieseconds and a maximum fsync latency of Z seconds. Evaluate how well the out-of-box behaviour compares against it with and without some set of patches. At the very least it would be useful for tracking historical kernel performance over time and bisecting any regressions that got introduced. Once we have a test I think many kernel developers (me at least) can run automated bisections once a test case exists. That's the long term goal. What we used to get out of pgbench were things like 60 second latencies when a checkpoint hit with GBs of dirty memory. That does happen in the real world, but that's not a realistic case you can tune for very well. In fact, tuning for it can easily degrade performance on more realistic workloads. The main complexity I don't have a clear view of yet is how much unavoidable storage level latency there is in all of the common deployment types. For example, I can take a server with a 256MB battery-backed write cache and set dirty_background_bytes to be smaller than that. So checkpoint spikes go away, right? No. Eventually you will see dirty_background_bytes of data going into an already full 256MB cache. And when that happens, the latency will be based on how long it takes to write the cached 256MB out to the disks. If you have a single disk or RAID-1 pair, that random I/O could easily happen at 5MB/s or less, and that makes for a 51 second cache clearing time. This is a lot better now than it used to be because fsync hasn't flushed the whole cache in many years now. (Only RHEL5 systems still in the field suffer much from that era of code) But you do need to look at the distribution of latency a bit because of how the cache impact things, you can't just consider min/max values. Take the BBWC out of the equation, and you'll see latency proportional to how long it takes to clear the disk's cache out. It's fun upgrading from a disk with 32MB of cache to 64MB only to watch worst case latency double. At least the kernel does the right thing now, using that cache when it can while forcing data out when fsync calls arrive. (That's another important kernel optimization we'll never be able to teach the database) -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/23/2014 01:03 PM, Mark Kirkwood wrote: On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. Yep... that's the one. They are creating lots and lots of temp tables. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
Dne 23.1.2014 22:04 Mark Kirkwood mark.kirkw...@catalyst.net.nz napsal(a): On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. Our customer had same problem with temp tables by intensively plpgsql functions. For higher load a temp tables are performance and stability killer. Vacuum of pg attrib has very ugly impacts :( Regars Pavel After redesign - without tmp tables - his applications works well. We needs a global temp tables Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
Mark Kirkwood mark.kirkw...@catalyst.net.nz writes: On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. That shouldn't result in any table-level exclusive locks on system catalogs, though. [ thinks... ] It's possible that what you saw is not the kick-out-autovacuum-entirely behavior, but the behavior added in commit bbb6e559c, whereby vacuum (auto or regular) will skip over pages that it can't immediately get an exclusive buffer lock on. On a heavily used table, we might skip the same page repeatedly, so that dead tuples don't get cleaned for a long time. To add insult to injury, despite having done that, vacuum would reset the pgstats dead-tuple count to zero, thus postponing the next autovacuum. I think commit 115f41412 may have improved the situation, but I'd want to see some testing of this theory before I'd propose back-patching it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 24/01/14 10:09, Robert Haas wrote: On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. But... how does that result on a vacuum-incompatible lock request against pg_attribute? I see that it'll insert lots of rows into pg_attribute, and maybe later delete them, but none of that blocks vacuum. That was my thought too - if I see it happening again here (was a year or so ago that I saw some serious pg_attribute bloat) I'll dig deeper. regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
From: Amit Kapila amit.kapil...@gmail.com How about below message: event source event_source_name is already registered. OK, I added several lines for this. Please check the attached patch. What I had in mind was to change it during initdb, we are already doing it for some other parameter (unix_socket_directories), please refer below code in initdb.c Yes, It seems we can do this. However, could you forgive me for leaving this untouched? I'm afraid postgresql.conf.sample's issue is causing unnecessary war among people here. That doesn't affect the point of this patch --- make pg_ctl use the event_source setting. Anyway, not all parameters in postgresql.conf cannot be used just by uncommenting them. That's another issue. I'll update the CommitFest entry soon. Regards MauMau pg_ctl_eventsrc_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
On 1/23/14, 4:08 PM, Robert Haas wrote: Why wouldn't that be necessary with your approach, too? I mean, if there's a GUC that controls the event source name, then it can be changed between restarts, regardless of what you call it. I don't know if it's practical, but the logical conclusion here would be to use an identifier that you cannot change, such as the system identifier. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
Peter Eisentraut pete...@gmx.net writes: On 1/23/14, 4:08 PM, Robert Haas wrote: Why wouldn't that be necessary with your approach, too? I mean, if there's a GUC that controls the event source name, then it can be changed between restarts, regardless of what you call it. I don't know if it's practical, but the logical conclusion here would be to use an identifier that you cannot change, such as the system identifier. That particular ID would be a horrid choice, because we don't try very hard to ensure it's unique. In particular, a standby server on the same machine as the master (not an uncommon case, at least for testing purposes) would be a guaranteed fail with that approach. I'm still not clear on why we can't just use the port number. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
Tom Lane escribió: Peter Eisentraut pete...@gmx.net writes: On 1/23/14, 4:08 PM, Robert Haas wrote: Why wouldn't that be necessary with your approach, too? I mean, if there's a GUC that controls the event source name, then it can be changed between restarts, regardless of what you call it. I don't know if it's practical, but the logical conclusion here would be to use an identifier that you cannot change, such as the system identifier. That particular ID would be a horrid choice, because we don't try very hard to ensure it's unique. In particular, a standby server on the same machine as the master (not an uncommon case, at least for testing purposes) would be a guaranteed fail with that approach. I'm still not clear on why we can't just use the port number. I wonder if it would make sense to generate a unique name at some initial point in the history of the service (perhaps at initdb time, or at the first postmaster start) and store this name in a special, separate file in PGDATA. On subsequent starts we read the name from there and always use it consistently. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun
On Thu, Nov 14, 2013 at 9:23 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Ok, here's a new version of the patch to handle incomplete B-tree splits. I finally got around to taking a look at this. Unlike with the as-yet uncommitted Race condition in b-tree page deletion patch that Kevin looked at, which there is a dependency on here, I could not really consider your patch in isolation. I'm really reviewing both patches, but with a particular focus on this recent set of additions (btree-incomplete-splits-2.patch), and the issues addressed by it. However, since the distinction between this patch and the patch that it has a dependency on is fuzzy, expect me to be fuzzy in differentiating the two. This patch is only very loosely an atomic unit. This is not a criticism - I understand that it just turned out that way. The first thing I noticed about this patchset is that it completely expunges btree_xlog_startup(), btree_xlog_cleanup() and btree_safe_restartpoint(). The post-recovery cleanup that previously occurred to address both sets of problems (the problem addressed by this patch, incomplete page splits, and the problem addressed by the dependency patch, incomplete page deletions) now both occur opportunistically/lazily. Notably, this means that there are now exactly zero entries in the resource manager list with a 'restartpoint' callback. I guess we should consider removing it entirely for that reason. As you said in the commit message where gin_safe_restartpoint was similarly retired (commit 631118fe): The post-recovery cleanup mechanism was never totally reliable, as insertion to the parent could fail e.g because of running out of memory or disk space, leaving the tree in an inconsistent state. I'm of the general impression that post-recovery cleanup is questionable. I'm surprised that you didn't mention this commit previously. You just mentioned the original analogous work on GiST, but this certainly seems to be something you've been thinking about *systematically* eliminating for a while. So while post-recovery callbacks no longer exist for any rmgr-managed-resource, 100% of remaining startup and cleanup callbacks concern the simple management of memory of AM-specific recovery contexts (for GiST, GiN and SP-GiST). I have to wonder if there isn't a better abstraction than that, such as a generic recovery memory context, allowing you to retire all 3 callbacks. I mean, StartupXLOG() just calls those callbacks for each resource at exactly the same time anyway, just as it initializes resource managers in precisely the same manner earlier on. Plus if you look at what those AM-local memory management routines do, it all seems very simple. I think you should bump XLOG_PAGE_MAGIC. Perhaps you should increase the elevel here: + elog(DEBUG1, finishing incomplete split of %u/%u, +BufferGetBlockNumber(lbuf), BufferGetBlockNumber(rbuf)); Since this is a very rare occurrence that involves some subtle interactions, I can't think why you wouldn't want to LOG this for forensic purposes. Why did you remove the local linkage of _bt_walk_left(), given that it is called in exactly the same way as before? I guess that that is just a vestige of some earlier revision. I think I see some bugs in _bt_moveright(). If you examine _bt_finish_split() in detail, you'll see that it doesn't just drop the write buffer lock that the caller will have provided (per its comments) - it also drops the buffer pin. It calls _bt_insert_parent() last, which was previously only called last thing by _bt_insertonpg() (some of the time), and _bt_insertonpg() is indeed documented to drop both the lock and the pin. And if you look at _bt_moveright() again, you'll see that there is a tacit assumption that the buffer pin isn't dropped, or at least that opaque (the BTPageOpaque BT special page area pointer) continues to point to the same page held in the same buffer, even though the code doesn't set the opaque' pointer again and it may not point to a pinned buffer or even the appropriate buffer. Ditto page. So opaque may point to the wrong thing on subsequent iterations - you don't do anything with the return value of _bt_getbuf(), unlike all of the existing call sites. I believe you should store its return value, and get the held page and the special pointer on the page, and assign that to the opaque pointer before the next iteration (an iteration in which you very probably really do make progress not limited to completing a page split, the occurrence of which the _bt_moveright() loop gets stuck on). You know, do what is done in the non-split-handling case. It may not always be the same buffer as before, obviously. For a moment I thought that you might have accounted for that here: *** _bt_insert_parent(Relation rel, *** 1685,1696 * 05/27/97 */ ItemPointerSet((stack-bts_btentry.t_tid), bknum, P_HIKEY); - pbuf = _bt_getstackbuf(rel,
Re: [HACKERS] Why do we let autovacuum give up?
On 24/01/14 10:16, Mark Kirkwood wrote: On 24/01/14 10:09, Robert Haas wrote: On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. But... how does that result on a vacuum-incompatible lock request against pg_attribute? I see that it'll insert lots of rows into pg_attribute, and maybe later delete them, but none of that blocks vacuum. That was my thought too - if I see it happening again here (was a year or so ago that I saw some serious pg_attribute bloat) I'll dig deeper. Actually not much digging required. Running the attached script via pgbench (8 sessions) against a default configured postgres 8.4 sees pg_attribute get to 1G after about 15 minutes. BEGIN; DROP TABLE IF EXISTS tab0; CREATE TEMP TABLE tab0 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab0 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab1; CREATE TEMP TABLE tab1 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab1 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab2; CREATE TEMP TABLE tab2 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab2 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab3; CREATE TEMP TABLE tab3 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab3 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab4; CREATE TEMP TABLE tab4 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab4 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab5; CREATE TEMP TABLE tab5 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab5 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab6; CREATE TEMP TABLE tab6 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab6 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab7; CREATE TEMP TABLE tab7 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab7 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab8; CREATE TEMP TABLE tab8 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab8 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab9; CREATE TEMP TABLE tab9 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab9 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab10; CREATE TEMP TABLE tab10 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab10 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab11; CREATE TEMP TABLE tab11 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab11 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab12; CREATE TEMP TABLE tab12 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab12 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab13; CREATE TEMP TABLE tab13 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab13 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab14; CREATE TEMP TABLE tab14 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab14 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab15; CREATE TEMP TABLE tab15 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab15 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab16; CREATE TEMP TABLE tab16 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab16 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab17; CREATE TEMP TABLE tab17 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab17 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab18; CREATE TEMP TABLE tab18 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab18 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab19; CREATE TEMP TABLE tab19 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab19 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab20; CREATE TEMP TABLE tab20 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab20 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab21; CREATE TEMP TABLE tab21 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab21 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab22; CREATE TEMP TABLE tab22 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab22 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab23; CREATE TEMP TABLE tab23 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab23 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab24; CREATE TEMP TABLE tab24 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab24 SELECT generate_series(1,1000),'xx'; DROP TABLE IF EXISTS tab25; CREATE TEMP TABLE tab25 ( id INTEGER PRIMARY KEY, val TEXT); INSERT INTO tab25 SELECT
Re: [HACKERS] Changeset Extraction v7.1
I wonder if it wouldn't be better to get rid of the subdirectories for the individual slots, and just have a file pg_replslot/$SLOTNAME, or not. I know there are later patches that need subdirectories for their own private data, but they could just create pg_replslot/$SLOTNAME.dir and put whatever in it they like, without really implicating this code that much. The advantage of that is that there would be fewer intermediate states. The slot exists if the file exists, and not if it doesn't. You still need half-alive and half-dead until the fsync finishes, but you don't need to worry about tracking both the state of the directory and the state of the file. Why do we need directories at all? I know there might be subsidiary files to store stuff in separate files, but maybe we can just name files using the slot name (or a transformation thereof) as a prefix. It shouldn't be difficult to remove the right files whenever there's a need, and not having to worry about a directory that might need a separate fsync might make things easier. On the other hand, there might still be a need to fsync the parent directory, so maybe there is not that much gain. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane escribió: That particular ID would be a horrid choice, because we don't try very hard to ensure it's unique. In particular, a standby server on the same machine as the master (not an uncommon case, at least for testing purposes) would be a guaranteed fail with that approach. I wonder if it would make sense to generate a unique name at some initial point in the history of the service (perhaps at initdb time, or at the first postmaster start) and store this name in a special, separate file in PGDATA. On subsequent starts we read the name from there and always use it consistently. Meh --- that would have the same behavior as the system identifier, ie it would propagate to slave servers without extraordinary (and easily bypassed) measures to prevent it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.
Heikki Linnakangas hlinnakan...@vmware.com writes: On 01/23/2014 10:37 PM, Tom Lane wrote: The problem appears to be due to the misaligned plist pointer (0xc39efac1 here). Ah, thanks! Looks like I removed a SHORTALIGN from ginFormTuple that was in fact very much necessary.. Fixed now, let's see if that pacifies the sparcs. My HPPA box is happy again, anyway. Thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On Thu, Jan 23, 2014 at 10:00 PM, Harold Giménez har...@heroku.com wrote: On Thu, Jan 23, 2014 at 12:53 PM, Josh Berkus j...@agliodbs.com wrote: On 01/23/2014 12:34 PM, Joshua D. Drake wrote: Hello, I have run into yet again another situation where there was an assumption that autovacuum was keeping up and it wasn't. It was caused by autovacuum quitting because another process requested a lock. In turn we received a ton of bloat on pg_attribute which caused all kinds of other issues (as can be expected). The more I run into it, the more it seems like autovacuum should behave like vacuum, in that it gets precedence when it is running. First come, first serve as they say. Thoughts? If we let autovacuum block user activity, a lot more people would turn it off. Now, if you were to argue that we should have some way to monitor the tables which autovac can never touch because of conflicts, I would agree with you. Agree completely. Easy ways to monitor this would be great. Once you know there's a problem, tweaking autovacuum settings is very hard and misunderstood, and explaining how to be effective at it is a dark art too. FWIW, I have a patch around somewhere that I never cleaned up properly for submissions that simply added a counter to pg_stat_user_tables indicating how many times vacuum had aborted on that specific table. If that's enough info (it was for my case) to cover this case, I can try to dig it out again and clean it up... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Add %z support to elog/ereport?
I wrote: the idea that we might get many dozen such warnings on more-current compilers is scarier, as that might well interfere with people's ability to do development on, say, Windows. Could somebody check whether MSVC for instance complains about format strings using z? Or shall I just commit this and we'll see what the buildfarm says? Given the lack of response, I've pushed the patch, and will canvass the buildfarm results later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
From: Tom Lane t...@sss.pgh.pa.us I'm still not clear on why we can't just use the port number. It will be possible to use port to set the default value of event_source GUC when starting postmaster. But using port during event source registration will involve much more. To use port, we have to tell the location of $PGDATA to regsvr32.exe. However, regsvr32.exe can only take an argument from /i, and we are using /i for event source name specification. If we want to pass data directory, we have to change the usage. Instead, we could probably have regsvr32.exe check PGDATA env variable and invoke postgres -C event_source, but that would require much more complicated code (e.g. for locating postgres.exe, because regsvr32.exe is in Windows directory) Anyway, the point of my patch is to just make pg_ctl use event_source GUC for outputing to event log. I want to rely on postgres -C, because pg_ctl already uses it for retrieving data_directory GUC. I'd like to avoid further complication in code and discussion. If you request, I can revert the default value of event_source and regsvr32.exe /i to PostgreSQL. I'm okay with that, because syslog_ident also has the default value postgres, which doesn't contain the major release. Any (not complicated) suggestions? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
MauMau maumau...@gmail.com writes: From: Tom Lane t...@sss.pgh.pa.us I'm still not clear on why we can't just use the port number. To use port, we have to tell the location of $PGDATA to regsvr32.exe. [ scratches head... ] Exactly which of the other proposals *doesn't* require that? Certainly anything that involves parsing settings out of postgresql.conf will. A more significant problem, probably, is that even knowing $PGDATA doesn't tell you the port number with certainty, since the postmaster might end up taking the port number from some other source than postgresql.conf (command line, PGPORT environment, ...). We used to require that pg_ctl know the port accurately, and it was a big step forward in reliability when we got rid of that; so maybe going back to that is not such a good idea. I note though that pg_ctl does still need to know accurately where $PGDATA is. Is there any particular length limit on event source names? Maybe the full path to $PGDATA could be used instead of the port number. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/23/2014 02:17 PM, Magnus Hagander wrote: FWIW, I have a patch around somewhere that I never cleaned up properly for submissions that simply added a counter to pg_stat_user_tables indicating how many times vacuum had aborted on that specific table. If that's enough info (it was for my case) to cover this case, I can try to dig it out again and clean it up... It would be 100% more information than we currently have. How much more difficult would it be to count completed autovacuums as well? It's really the ratio of the two which matters ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/23/2014 02:55 PM, Josh Berkus wrote: On 01/23/2014 02:17 PM, Magnus Hagander wrote: FWIW, I have a patch around somewhere that I never cleaned up properly for submissions that simply added a counter to pg_stat_user_tables indicating how many times vacuum had aborted on that specific table. If that's enough info (it was for my case) to cover this case, I can try to dig it out again and clean it up... It would be 100% more information than we currently have. How much more difficult would it be to count completed autovacuums as well? It's really the ratio of the two which matters ... Actually, now that I think about it, the ratio of the two doesn't matter as much as whether the most recent autovacuum aborted or not. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On Thu, Jan 23, 2014 at 1:41 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 24/01/14 10:16, Mark Kirkwood wrote: On 24/01/14 10:09, Robert Haas wrote: On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. But... how does that result on a vacuum-incompatible lock request against pg_attribute? I see that it'll insert lots of rows into pg_attribute, and maybe later delete them, but none of that blocks vacuum. That was my thought too - if I see it happening again here (was a year or so ago that I saw some serious pg_attribute bloat) I'll dig deeper. Actually not much digging required. Running the attached script via pgbench (8 sessions) against a default configured postgres 8.4 sees pg_attribute get to 1G after about 15 minutes. At that rate, with default throttling, it will be a close race whether autovac can vacuum pages as fast as they are being added. Even if it never gets cancelled, it might not ever finish. Cheers, Jeff
Re: [HACKERS] Why do we let autovacuum give up?
On 24/01/14 12:13, Jeff Janes wrote: On Thu, Jan 23, 2014 at 1:41 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 24/01/14 10:16, Mark Kirkwood wrote: On 24/01/14 10:09, Robert Haas wrote: On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. But... how does that result on a vacuum-incompatible lock request against pg_attribute? I see that it'll insert lots of rows into pg_attribute, and maybe later delete them, but none of that blocks vacuum. That was my thought too - if I see it happening again here (was a year or so ago that I saw some serious pg_attribute bloat) I'll dig deeper. Actually not much digging required. Running the attached script via pgbench (8 sessions) against a default configured postgres 8.4 sees pg_attribute get to 1G after about 15 minutes. At that rate, with default throttling, it will be a close race whether autovac can vacuum pages as fast as they are being added. Even if it never gets cancelled, it might not ever finish. Yes - I should have set the cost delay to 0 first (checking that now). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.1
Hi, On 2014-01-23 16:04:10 -0500, Robert Haas wrote: Patch 0001: +errmsg(could not find free replication slot), Suggest: all replication slots are in use That sounds better indeed. + elog(ERROR, cannot aquire a slot while another slot has been acquired); Suggest: this backend has already acquired a replication slot Or demote it to Assert(). I'm not really sure why this needs to be checked in non-assert builds. Hm. Fine with me, not sure why I went with an elog(). Maybe because I thought output plugin authors could have the idea of using another slot while inside one? I also wonder if we should use the terminology attach instead of acquire; that pairs more naturally with release. Then the message, if we want more than an assert, might be this backend is already attached to a replication slot. I went with Acquire/Release because our locking code does so, and it seemed sensible to be consistent. I don't have strong feelings about it. + if (slot == NULL) + { + LWLockRelease(ReplicationSlotCtlLock); + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), +errmsg(could not find replication slot \%s\, name))); + } The error will release the LWLock anyway; I'd get rid of the manual LWLockRelease, and the braces. Similarly in ReplicationSlotDrop. Unfortunately not. Inside the walsender there's currently no LWLockReleaseAll() for ERRORs since commands aren't run inside a transaction command... But maybe I should have fixed this by adding the release to WalSndErrorCleanup() instead? That'd still leave the problematic case that currently we try to delete a replication slot inside a CATCH when we fail while initializing the rest of logical replication... But I guess adding it would be a good idea independent of that. We could also do a StartTransactionCommand() but I'd rather not, that currently prevents code in that vicinity from doing anything it shouldn't via various Assert()s in existing code. + /* acquire spinlock so we can test and set -active safely */ + SpinLockAcquire(slot-mutex); + + if (slot-active) + { + SpinLockRelease(slot-mutex); + LWLockRelease(ReplicationSlotCtlLock); + ereport(ERROR, + (errcode(ERRCODE_OBJECT_IN_USE), +errmsg(slot \%s\ already active, name))); + } + + /* we definitely have the slot, no errors possible anymore */ + slot-active = true; + MyReplicationSlot = slot; + SpinLockRelease(slot-mutex); This doesn't need the LWLockRelease either. It does need the SpinLockRelease, but as I think I noted previously, the right way to write this is probably: SpinLockAcquire(slot-mutex); was_active = slot-active; slot-active = true; SpinLockRelease(slot-mutex); if (was_active) ereport(). MyReplicatonSlot = slot. That's not really simpler tho? But if you prefer I can go that way. ReplicationSlotsComputeRequiredXmin still acquires ProcArrayLock, and the comment Provide interlock against concurrent recomputations doesn't seem adequate to me. I guess the idea here is that we regard ProcArrayLock as protecting ReplicationSlotCtl-catalog_xmin and ReplicationSlotCtl-data_xmin, but if that's the idea then we only need to hold the lock during the time when we actually update those values, not the loop where we compute them. There's a comment someplace else to that end, but yes, that's essentially the idea. I decided to take it during the whole recomputation because we also take ProcArrayLock when creating a new decoding slot and initially setting -catalog_xmin. That's not strictly required but seemed simpler that way, and the code shouldn't be very hot. The code that initially computes the starting value for catalog_xmin when creating a new decoding slot has to take ProcArrayLock to be safe, that's why I though it'd be convenient to always use it for those values. In all other cases where we modify *_xmin we're only increasing it which doesn't need a lock (HS feedback never has taken one, and GetSnapshotData() modifies -xmin while holding a shared lock), the only potential danger is a slight delay in increasing the overall value. Also, if that's the design, maybe they should be part of PROC_HDR *ProcGlobal rather than here. It seems weird to have some of the values protected by ProcArrayLock live in a completely different data structure managed almost entirely by some other part of the system. Don't we already have cases of that? I seem to remember so. If you prefer having them there, I am certainly fine with doing that. This way they aren't allocated if slots are disabled but it's just two TransactionIds. It's pretty evident that what's currently patch #4 (only peg the xmin horizon for catalog tables during
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 2014-01-23 13:56:49 +0100, Simon Riggs wrote: IMHO we need to resolve the deadlock inherent in the disk-full/WALlock-up/checkpoint situation. My view is that can be solved in a similar way to the way the buffer pin deadlock was resolved for Hot Standby. I don't think that approach works here. We're not talking about mere buffer pins but the big bad exclusively locked buffer which is held by a backend in a critical section. Killing such a backend costs you a PANIC. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
On 2014-01-23 17:21:11 -0500, Tom Lane wrote: I wrote: the idea that we might get many dozen such warnings on more-current compilers is scarier, as that might well interfere with people's ability to do development on, say, Windows. Could somebody check whether MSVC for instance complains about format strings using z? Or shall I just commit this and we'll see what the buildfarm says? Given the lack of response, I've pushed the patch, and will canvass the buildfarm results later. Thanks, I was afk, otherwise I'd have tried to pushing it to windows via jenkins if that machine was running (it's running in Craig's flat...)… Do we have a real policy against indenting nested preprocessor statments? Just so I don't do those in future patches... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 24/01/14 12:28, Mark Kirkwood wrote: On 24/01/14 12:13, Jeff Janes wrote: On Thu, Jan 23, 2014 at 1:41 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 24/01/14 10:16, Mark Kirkwood wrote: On 24/01/14 10:09, Robert Haas wrote: On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 24/01/14 09:49, Tom Lane wrote: 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. But... how does that result on a vacuum-incompatible lock request against pg_attribute? I see that it'll insert lots of rows into pg_attribute, and maybe later delete them, but none of that blocks vacuum. That was my thought too - if I see it happening again here (was a year or so ago that I saw some serious pg_attribute bloat) I'll dig deeper. Actually not much digging required. Running the attached script via pgbench (8 sessions) against a default configured postgres 8.4 sees pg_attribute get to 1G after about 15 minutes. At that rate, with default throttling, it will be a close race whether autovac can vacuum pages as fast as they are being added. Even if it never gets cancelled, it might not ever finish. Yes - I should have set the cost delay to 0 first (checking that now). Doing that (and a few other autovac tweaks): autovacuum_max_workers = 4 autovacuum_naptime = 10s autovacuum_vacuum_scale_factor = 0.1 autovacuum_analyze_scale_factor = 0.1 autovacuum_vacuum_cost_delay = 0ms Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute in this case. Back to drawing board for a test case. Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 2014-01-24 12:49:57 +1300, Mark Kirkwood wrote: autovacuum_max_workers = 4 autovacuum_naptime = 10s autovacuum_vacuum_scale_factor = 0.1 autovacuum_analyze_scale_factor = 0.1 autovacuum_vacuum_cost_delay = 0ms Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute in this case. Back to drawing board for a test case. Well, I think quite many people don't realize it might be necessary to tune autovac on busy workloads. As it very well might be the case in Josh's case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 2014-01-23 16:15:50 -0500, Tom Lane wrote: [ thinks... ] It's possible that what you saw is not the kick-out-autovacuum-entirely behavior, but the behavior added in commit bbb6e559c, whereby vacuum (auto or regular) will skip over pages that it can't immediately get an exclusive buffer lock on. On a heavily used table, we might skip the same page repeatedly, so that dead tuples don't get cleaned for a long time. I don't think it's too likely as an explanation here. Such workloads are likely to fill a page with only dead tuples, right? Once all tuples are safely dead they will be killed from the btree which should cause the page not to be visited anymore and thus safely vacuumable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
Andres Freund and...@2ndquadrant.com writes: Do we have a real policy against indenting nested preprocessor statments? Just so I don't do those in future patches... Indent 'em if you like, but pgindent will undo it. I ran the patch through pgindent to see what it would do with those, and it left-justified them, so I committed it that way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
Andres Freund and...@2ndquadrant.com writes: On 2014-01-23 16:15:50 -0500, Tom Lane wrote: [ thinks... ] It's possible that what you saw is not the kick-out-autovacuum-entirely behavior, but the behavior added in commit bbb6e559c, whereby vacuum (auto or regular) will skip over pages that it can't immediately get an exclusive buffer lock on. On a heavily used table, we might skip the same page repeatedly, so that dead tuples don't get cleaned for a long time. I don't think it's too likely as an explanation here. Such workloads are likely to fill a page with only dead tuples, right? Once all tuples are safely dead they will be killed from the btree which should cause the page not to be visited anymore and thus safely vacuumable. I added some instrumentation to vacuumlazy.c to count the number of pages skipped in this way. You're right, it seems to be negligible, at least with Mark's test case. I saw at most two pages skipped per vacuum, and usually none; so there's no way that a whole lot of tuples are going unvacuumed because of this. (I wonder though if we ought to add such counting as a permanent feature ...) I concur with the other reports that the main problem in this test case is just that the default cost delay settings throttle autovacuum so hard that it has no chance of keeping up. If I reduce autovacuum_vacuum_cost_delay from the default 20ms to 2ms, it seems to keep up quite nicely, on my machine anyway. Probably other combinations of changes would do it too. Perhaps we need to back off the default cost delay settings a bit? We've certainly heard more than enough reports of table bloat in heavily-updated tables. A system that wasn't hitting the updates as hard as it could might not need this, but on the other hand it probably wouldn't miss the I/O cycles from a more aggressive autovacuum, either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 2014-01-23 19:29:23 -0500, Tom Lane wrote: I saw at most two pages skipped per vacuum, and usually none; so there's no way that a whole lot of tuples are going unvacuumed because of this. (I wonder though if we ought to add such counting as a permanent feature ...) I generally think we need to work a bit on the data reported back by vacuum. Adding data and also making the data output when using autovacuum more consistent with what VACUUM VERBOSE reports. The latter curiously often has less detail than autovacuum. I had hoped to get to that for 9.4, but it doesn't look like it. I concur with the other reports that the main problem in this test case is just that the default cost delay settings throttle autovacuum so hard that it has no chance of keeping up. If I reduce autovacuum_vacuum_cost_delay from the default 20ms to 2ms, it seems to keep up quite nicely, on my machine anyway. Probably other combinations of changes would do it too. Perhaps we need to back off the default cost delay settings a bit? We've certainly heard more than enough reports of table bloat in heavily-updated tables. A system that wasn't hitting the updates as hard as it could might not need this, but on the other hand it probably wouldn't miss the I/O cycles from a more aggressive autovacuum, either. Yes, I think adjusting the default makes sense, most setups that have enough activity that costing plays a role have to greatly increase the values. I'd rather increase the cost limit than reduce cost delay so drastically though, but that's admittedly just gut feeling. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
Andres Freund and...@2ndquadrant.com writes: On 2014-01-23 19:29:23 -0500, Tom Lane wrote: I concur with the other reports that the main problem in this test case is just that the default cost delay settings throttle autovacuum so hard that it has no chance of keeping up. If I reduce autovacuum_vacuum_cost_delay from the default 20ms to 2ms, it seems to keep up quite nicely, on my machine anyway. Probably other combinations of changes would do it too. Perhaps we need to back off the default cost delay settings a bit? We've certainly heard more than enough reports of table bloat in heavily-updated tables. A system that wasn't hitting the updates as hard as it could might not need this, but on the other hand it probably wouldn't miss the I/O cycles from a more aggressive autovacuum, either. Yes, I think adjusting the default makes sense, most setups that have enough activity that costing plays a role have to greatly increase the values. I'd rather increase the cost limit than reduce cost delay so drastically though, but that's admittedly just gut feeling. Well, I didn't experiment with intermediate values, I was just trying to test the theory that autovac could keep up given less-extreme throttling. I'm not taking any position on just where we need to set the values, only that what we've got is probably too extreme. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/24/2014 07:52 AM, Andres Freund wrote: On 2014-01-24 12:49:57 +1300, Mark Kirkwood wrote: autovacuum_max_workers = 4 autovacuum_naptime = 10s autovacuum_vacuum_scale_factor = 0.1 autovacuum_analyze_scale_factor = 0.1 autovacuum_vacuum_cost_delay = 0ms Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute in this case. Back to drawing board for a test case. Well, I think quite many people don't realize it might be necessary to tune autovac on busy workloads. As it very well might be the case in Josh's case. Oh, lots of people realise it's a good idea to tune autovac on busy workloads. They just do it in the wrong direction, making it run less often and less aggressively, causing more bloat, and making their problem worse. I've seen this enough times that I'm starting to think the autovauum tuning knobs need a child safety lock ;-) More seriously, people don't understand autovacuum, how it works, or why they need it. They notice it when things are already bad, see that it's doing lots of work and doing lots of I/O that competes with queries, and turn it off to solve the problem. I'm not sure how to tackle that. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On Thu, Jan 23, 2014 at 10:38 PM, Craig Ringer cr...@2ndquadrant.com wrote: Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute in this case. Back to drawing board for a test case. Well, I think quite many people don't realize it might be necessary to tune autovac on busy workloads. As it very well might be the case in Josh's case. Oh, lots of people realise it's a good idea to tune autovac on busy workloads. They just do it in the wrong direction, making it run less often and less aggressively, causing more bloat, and making their problem worse. I've seen this enough times that I'm starting to think the autovauum tuning knobs need a child safety lock ;-) More seriously, people don't understand autovacuum, how it works, or why they need it. They notice it when things are already bad, see that it's doing lots of work and doing lots of I/O that competes with queries, and turn it off to solve the problem. AFAIK, tuning down autovacuum is common advice **when compounded with manually scheduled vacuuming**. The problem of autovacuum is that it always picks the wrong time to work. That is, when the DB is the most active. Because statistically that's when the thresholds are passed. If you ask me, I'd like autovac to know when not to run (or rather wait a bit, not forever), perhaps by checking load factors or some other tell-tale of an already-saturated I/O system. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On 23.1.2014 17:22, Heikki Linnakangas wrote: On 01/14/2014 05:35 PM, Alexander Korotkov wrote: Attached version is rebased against last version of packed posting lists. Thanks! I think we're missing a trick with multi-key queries. We know that when multiple scan keys are used, they are ANDed together, so we can do the skip optimization even without the new tri-state consistent function. To get started, I propose the three attached patches. These only implement the optimization for the multi-key case, which doesn't require any changes to the consistent functions and hence no catalog changes. Admittedly this isn't anywhere near as useful in practice as the single key case, but let's go for the low-hanging fruit first. This nevertheless introduces some machinery that will be needed by the full patch anyway. I structured the code somewhat differently than your patch. There is no separate fast-path for the case where the optimization applies. Instead, I'm passing the advancePast variable all the way down to where the next batch of items are loaded from the posting tree. keyGetItem is now responsible for advancing the entry streams, and the logic in scanGetItem has been refactored so that it advances advancePast aggressively, as soon as one of the key streams let us conclude that no items a certain point can match. scanGetItem might yet need to be refactored when we get to the full preconsistent check stuff, but one step at a time. The first patch is the most interesting one, and contains the scanGetItem changes. The second patch allows seeking to the right segment in a posting tree page, and the third allows starting the posting tree scan from root, when skipping items (instead of just following the right-links). Here are some simple performance test results, demonstrating the effect of each of these patches. This is a best-case scenario. I don't think these patches has any adverse effects even in the worst-case scenario, although I haven't actually tried hard to measure that. The used this to create a test table: create table foo (intarr int[]); -- Every row contains 0 (frequent term), and a unique number. insert into foo select array[0,g] from generate_series(1, 1000) g; -- Add another tuple with 0, 1 combo physically to the end of the table. insert into foo values (array[0,1]); The query I used is this: postgres=# select count(*) from foo where intarr @ array[0] and intarr @ array[1]; count --- 2 (1 row) I measured the time that query takes, and the number of pages hit, using explain (analyze, buffers true) patchestime (ms)buffers --- unpatched6501316 patch 10.521316 patches 1+20.501316 patches 1+2+30.1315 So, the second patch isn't doing much in this particular case. But it's trivial, and I think it will make a difference in other queries where you have the opportunity skip, but return a lot of tuples overall. In summary, these are fairly small patches, and useful on their, so I think these should be committed now. But please take a look and see if the logic in scanGetItem/keyGetItem looks correct to you. After this, I think the main fast scan logic will go into keyGetItem. PS. I find it a bit surprising that in your patch, you're completely bailing out if there are any partial-match keys involved. Is there some fundamental reason for that, or just not implemented? I've done some initial testing (with all the three patches applied) today to see if there are any crashes or obvious failures and found none so far. The only issue I've noticed is this LOG message in ginget.c: elog(LOG, entryLoadMoreItems, %u/%u, skip: %d, ItemPointerGetBlockNumber(advancePast), ItemPointerGetOffsetNumber(advancePast), !stepright); which produces enormous amount of messages. I suppose that was used for debugging purposes and shouldn't be there? I plan to do more thorough testing over the weekend, but I'd like to make sure I understand what to expect. My understanding is that this patch should: - give the same results as the current code (e.g. the fulltext should not return different rows / change the ts_rank etc.) - improve the performance of fulltext queries Are there any obvious rules what queries will benefit most from this? The queries generated by the tool I'm using for testing are mostly of this form: SELECT id FROM messages WHERE body_tsvector @ plainto_tsquery('english', 'word1 word2 ...') ORDER BY ts_rank(...) DESC LIMIT :n; with varying number of words and LIMIT values. During the testing today I haven't noticed any obvious performance difference, but I haven't spent much time on that. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Change authentication error message (patch)
On Wed, Jun 19, 2013 at 01:27:39PM -0700, Joshua D. Drake wrote: On 06/19/2013 01:18 PM, Markus Wanner wrote: Authentication failed or password has expired for user \%s\ Authentication failed covers any combination of a username/password being wrong and obviously password expired covers the other. Works for me. Considering the password to be the thing that expires (rather than the account) is probably more accurate as well. It is also how it is worded in the docs (which is why I used it). Patch below. I have developed the attached patch to fix this problem. Do I need to say invalid user or invalid or expired password? --- -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c new file mode 100644 index 882dc8f..fa96238 *** a/src/backend/libpq/auth.c --- b/src/backend/libpq/auth.c *** auth_failed(Port *port, int status) *** 245,251 break; case uaPassword: case uaMD5: ! errstr = gettext_noop(password authentication failed for user \%s\); /* We use it to indicate if a .pgpass password failed. */ errcode_return = ERRCODE_INVALID_PASSWORD; break; --- 245,251 break; case uaPassword: case uaMD5: ! errstr = gettext_noop(password authentication failed for user \%s\: invalid or expired password); /* We use it to indicate if a .pgpass password failed. */ errcode_return = ERRCODE_INVALID_PASSWORD; break; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On 01/24/2014 10:12 AM, Craig Ringer wrote: (Re-sending; I forgot to cc the list) On 01/20/2014 02:15 PM, Craig Ringer wrote: On 01/20/2014 09:58 AM, Craig Ringer wrote: As it is I'm spending today reworking the RLS patch on top of the new approach to updatable security barrier views. To get that rolling I've split the RLS patch up into chunks, so we can argue about the catalogs, ALTER syntax, and the actual row-filtering implementation separately ;-) It's currently on g...@github.com:ringerc/postgres.git in the branch rls-9.4-split, which is subject to rebasing. I'm still going through it making sure each chunk at least compiles and preferably passes make check. That branch is now pretty stable, and passes checks at every stage up to the new RLS regression tests. I've pushed a new version to branch rls-9.4-split. Further updates will rebase this branch. The tag rls-9.4-split-v5 identifies this particular push, and won't get rebased away. Pushed a new rebase to the main working branch, merging in the fixes I made to KaiGai's patch last time. Tagged rls-9.4-split-v6 I haven't bothered with a patchset for this one, I'll be replacing it again soon. This is just for anybody following along. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
Claudio Freire escribió: If you ask me, I'd like autovac to know when not to run (or rather wait a bit, not forever), perhaps by checking load factors or some other tell-tale of an already-saturated I/O system. We had a proposed design to tell autovac when not to run (or rather, when to switch settings very high so that in practice it'd never run). At some point somebody said but we can just change autovacuum=off in postgresql.conf via crontab when the high load period starts, and turn it back on afterwards --- and that was the end of it. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
Alvaro Herrera alvhe...@2ndquadrant.com writes: Claudio Freire escribió: If you ask me, I'd like autovac to know when not to run (or rather wait a bit, not forever), perhaps by checking load factors or some other tell-tale of an already-saturated I/O system. We had a proposed design to tell autovac when not to run (or rather, when to switch settings very high so that in practice it'd never run). At some point somebody said but we can just change autovacuum=off in postgresql.conf via crontab when the high load period starts, and turn it back on afterwards --- and that was the end of it. The hard part of this is that shutting down autovacuum during heavy load may be exactly the wrong thing to do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we let autovacuum give up?
On 01/24/2014 11:32 AM, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Claudio Freire escribió: If you ask me, I'd like autovac to know when not to run (or rather wait a bit, not forever), perhaps by checking load factors or some other tell-tale of an already-saturated I/O system. We had a proposed design to tell autovac when not to run (or rather, when to switch settings very high so that in practice it'd never run). At some point somebody said but we can just change autovacuum=off in postgresql.conf via crontab when the high load period starts, and turn it back on afterwards --- and that was the end of it. The hard part of this is that shutting down autovacuum during heavy load may be exactly the wrong thing to do. Yep. In fact, it may be appropriate to limit or stop autovacuum's work on some big tables, while pushing its activity even higher for small, high churn tables. If you stop autovacuum on a message-queue system when load gets high, you'll get a giant messy bloat explosion. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgresql for cygwin - 3rd
On Mon, Jun 10, 2013 at 11:08:36AM +0200, marco atzeri wrote: Il 3/6/2013 11:46 PM, Andrew Dunstan ha scritto: Excellent. Will test it out soon. cheers andrew Andrew, please find attached a full patch for cygwin relative to 9.3beta1 : - DLLTOLL/DLLWRAP are not used anymore, replaced by gcc also for postgres.exe (*) - DLL versioning is added Check failures: - prepared_xacts is still freezing The cygwin failure you highlighted was solved, so it should be something else - attached the 2 regressions diffs tsearch ... FAILED without_oid ... FAILED The second one seems a new one, not sure cygwin specific Andrew, should this configuration/code patch be applied to 9.4? http://www.postgresql.org/message-id/51b59794.3000...@gmail.com I think we would have to make Cygwin-specific regression output to handle the regression failures, but frankly I am not even sure if they are right. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Fwd: [HACKERS] Proposal: variant of regclass
On Thu, 23 Jan 2014 13:19:37 +0200 Marti Raudsepp ma...@juffo.org wrote: Resending to Tatsuo Ishii and Yugo Nagata, your email server was having problems yesterday: Thanks for resending! This is the mail system at host sraigw2.sra.co.jp. yug...@sranhm.sra.co.jp: mail for srasce.sra.co.jp loops back to myself t-is...@sra.co.jp: mail for srasce.sra.co.jp loops back to myself -- Forwarded message -- From: Marti Raudsepp ma...@juffo.org Date: Thu, Jan 23, 2014 at 3:39 AM Subject: Re: [HACKERS] Proposal: variant of regclass To: Yugo Nagata nag...@sraoss.co.jp Cc: Tatsuo Ishii is...@postgresql.org, pgsql-hackers pgsql-hackers@postgresql.org, Vik Fearing vik.fear...@dalibo.com, Robert Haas robertmh...@gmail.com, Tom Lane t...@sss.pgh.pa.us, Pavel Golub pa...@gf.microolap.com, Pavel Golub pa...@microolap.com, Andres Freund and...@2ndquadrant.com, Pavel Stěhule pavel.steh...@gmail.com On Wed, Jan 22, 2014 at 1:44 PM, Yugo Nagata nag...@sraoss.co.jp wrote: On Wed, 22 Jan 2014 20:04:12 +0900 (JST) Tatsuo Ishii is...@postgresql.org wrote: parseTypeString() is called by some other functions and I avoided influences of modifying the definition on them, since this should raise errors in most cases. This is same reason for other *MissingOk functions in parse_type.c. Is it better to write definitions of these function and all there callers? Yes, for parseTypeString certainly. There have been many refactorings like that in the past and all of them use this pattern. Ok. I'll rewrite the definition and there callers. typenameTypeIdAndMod is less clear since the code paths differ so much, maybe keep 2 versions (merging back to 1 function is OK too, but in any case you don't need 3). I'll also fix this in either way to not use typenameTypeIdAndMod_guts. typenameTypeIdAndModMissingOk(...) { Type tup = LookupTypeName(pstate, typeName, typmod_p); if (tup == NULL || !((Form_pg_type) GETSTRUCT(tup))-typisdefined) *typeid_p = InvalidOid; else *typeid_p = HeapTupleGetOid(tup); if (tup) ReleaseSysCache(tup); } typenameTypeIdAndMod(...) { Type tup = typenameType(pstate, typeName, typmod_p); *typeid_p = HeapTupleGetOid(tup); ReleaseSysCache(tup); } Also, there's no need for else here: if (raiseError) ereport(ERROR, ...); else return InvalidOid; Regards, Marti -- Yugo Nagata nag...@sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers