Re: [HACKERS] inherit support for foreign tables
Hi Hanada-san, While still reviwing this patch, I feel this patch has given enough consideration to interactions with other commands, but found the following incorrect? behabior: postgres=# CREATE TABLE product (id INTEGER, description TEXT); CREATE TABLE postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product) SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv'); CREATE FOREIGN TABLE postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE EXTERNAL; ERROR: product1 is not a table or materialized view ISTN the ALTER TABLE simple recursion mechanism (ie ATSimpleRecursion()) should be modified for the ALTER COLUMN SET STORAGE case. I just wanted to quickly tell you this for you to take time to consider. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET
On 25 January 2014 23:08, Simon Riggs si...@2ndquadrant.com wrote: On 25 January 2014 22:33, Stephen Frost sfr...@snowman.net wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: AFAICT, there was no consensus in this thread on what to do, which probably has something to do with the lack of concrete performance tests presented to back up any particular proposal. This I entirely agree with- more testing and more information on how such a change impacts other workloads would be great. Unfortunately, while I've provided a couple of test cases and seen similar situations on IRC, this is very data-dependent which makes it difficult to have concrete answers for every workload. Still, I'll try and spend some time w/ pg_bench's schema definition and writing up some larger queries to run through it (aiui, the default set of queries won't typically result in a hashjoin) and see what happens there. The case that action of some kind was needed was clear, for me. Hopefully some small improvement can be found from that investigation, even if the greatest gain is in some way under dispute. I don't see anything for 9.4 in here now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Hackers, Is somebody available to volunteer to review the custom-scan patch? Even though Hanada-san acknowledged before, it seems to me this patch has potentially arguable implementations. Even if you have enough time to review whole of the code, it helps me if you can comment on the following topics. (1) Interface to add alternative paths instead of built-in join paths This patch adds add_join_path_hook on add_paths_to_joinrel to allow extensions to provide alternative scan path in addition to the built-in join paths. Custom-scan path being added is assumed to perform to scan on a (virtual) relation that is a result set of joining relations. My concern is its arguments to be pushed. This hook is declared as follows: /* Hook for plugins to add custom join path, in addition to default ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, SpecialJoinInfo *sjinfo, List *restrictlist, List *mergeclause_list, SemiAntiJoinFactors *semifactors, Relids param_source_rels, Relids extra_lateral_rels); extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; Likely, its arguments upper than restrictlist should be informed to extensions, because these are also arguments of add_paths_to_joinrel(). However, I'm not 100% certain how about other arguments should be informed. Probably, it makes sense to inform param_source_rels and extra_lateral_rels to check whether the path is sensible for parameterized paths. On the other hand, I doubt whether mergeclause_list is usuful to deliver. (It may make sense if someone tries to implement their own merge-join implementation??) I'd like to seem idea to improve the current interface specification. (2) CUSTOM_VAR for special Var reference @@ -134,6 +134,7 @@ typedef struct Expr #defineINNER_VAR 65000 /* reference to inner subplan */ #defineOUTER_VAR 65001 /* reference to outer subplan */ #defineINDEX_VAR 65002 /* reference to index column */ +#defineCUSTOM_VAR 65003 /* reference to custom column */ I newly added CUSTOM_VAR to handle a case when custom-scan override join relations. Var-nodes within join plan are adjusted to refer either ecxt_innertuple or ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs instead of built-in joins, because its tuples being fetched are usually stored on the ecxt_scantuple, thus Var-nodes also need to have right varno neither inner nor outer. SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows extensions to rewrite Var-nodes within custom-scan node to indicate ecxt_scantuple using CUSTOM_VAR, instead of inner or outer. For example, a var-node with varno=CUSTOM_VAR and varattno=3 means this node reference the third attribute of the tuple in ecxt_scantuple. I think it is a reasonable solution, however, I'm not 100% certain whether people have more graceful idea to implement it. If you have comments around above two topic, or others, please give your ideas. Thanks, -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kohei KaiGai Sent: Tuesday, January 14, 2014 11:20 PM To: Shigeru Hanada Cc: Kaigai, Kouhei(海外, 浩平); Jim Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) Hello, The attached patches are the ones rebased to the latest git tree, but no functional changes from the previous revision on the commit-fest:Nov. Hanada-san volunteered to review the series of patches, including the portion for postgres_fdw, then marked it as ready for committer on the last commit fest. So, I hope someone of committer also volunteer to review the patches for final checking. * Part-1 - CustomScan APIs This patch provides a set of interfaces to interact query-optimizer and -executor for extensions. The new add_scan_path_hook or add_join_path_hook allows to offer alternative ways to scan a particular relation or to join a particular relations. Then, once the alternative ways are chosen by the optimizer, associated callbacks shall be kicked from the executor. In this case, extension has responsibility to return a slot that hold a tuple (or empty for end of scan) being scanned from the underlying relation. * Part-2 - contrib/ctidscan This patch provides a simple example implementation of CustomScan API. It enables to skip pages when inequality operators are given on ctid system
Re: [HACKERS] GIN improvements part2: fast scan
On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/26/2014 08:24 AM, Tomas Vondra wrote: Hi! On 25.1.2014 22:21, Heikki Linnakangas wrote: Attached is a new version of the patch set, with those bugs fixed. I've done a bunch of tests with all the 4 patches applied, and it seems to work now. I've done tests with various conditions (AND/OR, number of words, number of conditions) and I so far I did not get any crashes, infinite loops or anything like that. I've also compared the results to 9.3 - by dumping the database and running the same set of queries on both machines, and indeed I got 100% match. I also did some performance tests, and that's when I started to worry. For example, I generated and ran 1000 queries that look like this: SELECT id FROM messages WHERE body_tsvector @@ to_tsquery('english','(header 53 32 useful dropped)') ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header 53 32 useful dropped)')) DESC; i.e. in this case the query always was 5 words connected by AND. This query is a pretty common pattern for fulltext search - sort by a list of words and give me the best ranked results. On 9.3, the script was running for ~23 seconds, on patched HEAD it was ~40. It's perfectly reproducible, I've repeated the test several times with exactly the same results. The test is CPU bound, there's no I/O activity at all. I got the same results with more queries (~100k). Attached is a simple chart with x-axis used for durations measured on 9.3.2, y-axis used for durations measured on patched HEAD. It's obvious a vast majority of queries is up to 2x slower - that's pretty obvious from the chart. Only about 50 queries are faster on HEAD, and 700 queries are more than 50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes 150ms on HEAD). Typically, the EXPLAIN ANALYZE looks something like this (on 9.3): http://explain.depesz.com/s/5tv and on HEAD (same query): http://explain.depesz.com/s/1lI Clearly the main difference is in the Bitmap Index Scan which takes 60ms on 9.3 and 120ms on HEAD. On 9.3 the perf top looks like this: 34.79% postgres [.] gingetbitmap 28.96% postgres [.] ginCompareItemPointers 9.36% postgres [.] TS_execute 5.36% postgres [.] check_stack_depth 3.57% postgres [.] FunctionCall8Coll while on 9.4 it looks like this: 28.20% postgres [.] gingetbitmap 21.17% postgres [.] TS_execute 8.08% postgres [.] check_stack_depth 7.11% postgres [.] FunctionCall8Coll 4.34% postgres [.] shimTriConsistentFn Not sure how to interpret that, though. For example where did the ginCompareItemPointers go? I suspect it's thanks to inlining, and that it might be related to the performance decrease. Or maybe not. Yeah, inlining makes it disappear from the profile, and spreads that time to the functions calling it. The profile tells us that the consistent function is called a lot more than before. That is expected - with the fast scan feature, we're calling consistent not only for potential matches, but also to refute TIDs based on just a few entries matching. If that's effective, it allows us to skip many TIDs and avoid consistent calls, which compensates, but if it's not effective, it's just overhead. I would actually expect it to be fairly effective for that query, so that's a bit surprising. I added counters to see where the calls are coming from, and it seems that about 80% of the calls are actually coming from this little the feature I explained earlier: In addition to that, I'm using the ternary consistent function to check if minItem is a match, even if we haven't loaded all the entries yet. That's less important, but I think for something like rare1 | (rare2 frequent) it might be useful. It would allow us to skip fetching 'frequent', when we already know that 'rare1' matches for the current item. I'm not sure if that's worth the cycles, but it seemed like an obvious thing to do, now that we have the ternary consistent function. So, that clearly isn't worth the cycles :-). At least not with an expensive consistent function; it might be worthwhile if we pre-build the truth-table, or cache the results of the consistent function. Attached is a quick patch to remove that, on top of all the other patches, if you want to test the effect. Every single change you did in fast scan seems to be reasonable, but testing shows that something went wrong. Simple test with 3 words of different selectivities. After applying your patches: # select count(*) from fts_test where fti @@ plainto_tsquery('english', 'gin index select'); count ─── 627 (1 row) Time: 21,252 ms In original fast-scan: # select
Re: [HACKERS] plpgsql.warn_shadow
On Sun, Jan 26, 2014 at 11:19 AM, Simon Riggs si...@2ndquadrant.com wrote: For 9.4, we should cut down the patch so it has plpgsql.warnings = none (default) | all | [individual item list] plpgsql.warnings_as_errors = off (default) | on I hope I'm not late for the bikeshedding :) Why not have 2 similar options: plpgsql.warnings = none (default) | all | [individual item list] plpgsql.errors = none (default) | all | [individual item list] That would be cleaner, more flexible, and one less option to set if you just want errors and no warnings. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.warn_shadow
2014-01-27 Marti Raudsepp ma...@juffo.org On Sun, Jan 26, 2014 at 11:19 AM, Simon Riggs si...@2ndquadrant.com wrote: For 9.4, we should cut down the patch so it has plpgsql.warnings = none (default) | all | [individual item list] plpgsql.warnings_as_errors = off (default) | on I hope I'm not late for the bikeshedding :) Why not have 2 similar options: plpgsql.warnings = none (default) | all | [individual item list] plpgsql.errors = none (default) | all | [individual item list] That would be cleaner, more flexible, and one less option to set if you just want errors and no warnings. what means plpgsql.errors = none ??? I strongly disagree so this design is clean Regards Pavel Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.warn_shadow
On 27 January 2014 10:40, Marti Raudsepp ma...@juffo.org wrote: On Sun, Jan 26, 2014 at 11:19 AM, Simon Riggs si...@2ndquadrant.com wrote: For 9.4, we should cut down the patch so it has plpgsql.warnings = none (default) | all | [individual item list] plpgsql.warnings_as_errors = off (default) | on I hope I'm not late for the bikeshedding :) Why not have 2 similar options: plpgsql.warnings = none (default) | all | [individual item list] plpgsql.errors = none (default) | all | [individual item list] That would be cleaner, more flexible, and one less option to set if you just want errors and no warnings. That would allow you to mis-set the parameters and then cause a runtime error for something that was only a warning at compile time. Florian's point was well made and we must come up with something that allows warning/errors at compile time and once accepted, nothing at run time. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire
On 23/01/14, Christian Kruse wrote: Well, is it context or detail? Those fields have reasonably well defined meanings IMO. I find the distinction somewhat blurry and think both would be appropriate. But since I wasn't sure I changed to detail. If we need errcontext_plural, let's add it, not adopt inferior solutions just because that isn't there for lack of previous need. I would've added it if I would've been sure. But having said that, I think this is indeed detail not context. (I kinda wonder whether some of the stuff that's now in the primary message shouldn't be pushed to errdetail as well. It looks like some previous patches in this area have been lazy.) I agree, the primary message is not very well worded. On the other hand finding an appropriate alternative seems hard for me. While I'm griping, this message isn't even trying to follow the project's message style guidelines. Detail or context messages are supposed to be complete sentence(s), with capitalization and punctuation to match. Hm, I hope I fixed it in this version of the patch. Lastly, is this information that we want to be shipping to clients? Perhaps from a security standpoint that's not such a wise idea, and errdetail_log() is what should be used. Fixed. I added an errdetail_log_plural() for this, too. I have checked the revised patch. It looks fine to me except one minor code formatting issue. In elog.c, two tabs are missing in the definition of function errdetail_log_plural. Please run pgindent tool to check the same. Also I would like to highlight one behavior here is that process ID of process trying to acquire lock is also listed in the list of Request queue. E.g. session 1 with process id X: BEGIN; LOCK TABLE foo IN SHARE MODE; session 2 with process id Y: BEGIN; LOCK TABLE foo IN EXCLUSIVE MODE; On execution of LOCK in session-2, as part of log it will display as: DETAIL: Process holding the lock: X. Request queue: Y. Where Y is the process ID of same process, which was trying to acquire lock. To me, it seems to be correct behavior as the meaning of message will be list of all processes already holding the lock and processes waiting in queue and position of self process in wait-list. In above example, it will indicate that process Y in on top of wait list. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/23 23:18), Andrew Dunstan wrote: What is more, if the square root calculation is affecting your benchmarks, I suspect you are benchmarking the wrong thing. I run another test that has two pgbench-clients in same time, one is select-only-query and another is executing 'SELECT * pg_stat_statement' query in every one second. I used v6 patch in this test. * Benchmark Commands $bin/pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180 -n $bin/pgbench -h xxx.xxx.xxx.xxx mitsu-ko -T 180 -n -f file.sql ** file.sql SELECT * FROM pg_stat_statement; \sleep 1s * Select-only-query Result (Test result is represented by tps.) method| try1 | try2 | try3 with pgss| 125502 | 125818 | 125809 with patched pgss| 125909 | 125699 | 126040 This result shows my patch is almost same performance than before. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
(2014/01/26 17:43), Mitsumasa KONDO wrote: 2014-01-26 Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com wrote: On 21 January 2014 12:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp wrote: Rebased patch is attached. Does this fix the Windows bug reported by Kumar on 20/11/2013 ? Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is Kumar! I searched only e-mail address and title by his name... I don't have windows compiler enviroment, but attached patch might be fixed. Could I ask Mr. Rajeev Rastogi to test my patch again? Regards, -- Mitsumasa KONDO NTT Open Source Software Center *** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql *** *** 19,24 CREATE FUNCTION pg_stat_statements( --- 19,27 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 41,43 CREATE VIEW pg_stat_statements AS --- 44,52 SELECT * FROM pg_stat_statements(); GRANT SELECT ON pg_stat_statements TO PUBLIC; + + /* New Function */ + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; *** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql --- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql *** *** 9,14 RETURNS void --- 9,19 AS 'MODULE_PATHNAME' LANGUAGE C; + CREATE FUNCTION pg_stat_statements_reset_time() + RETURNS void + AS 'MODULE_PATHNAME' + LANGUAGE C; + CREATE FUNCTION pg_stat_statements( OUT userid oid, OUT dbid oid, *** *** 16,21 CREATE FUNCTION pg_stat_statements( --- 21,29 OUT query text, OUT calls int8, OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT stdev_time float8, OUT rows int8, OUT shared_blks_hit int8, OUT shared_blks_read int8, *** *** 42,44 GRANT SELECT ON pg_stat_statements TO PUBLIC; --- 50,53 -- Don't want this to be available to non-superusers. REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC; + REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC; *** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql --- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql *** *** 4,8 --- 4,9 \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset(); + ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time(); ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements(); ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements; *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** *** 78,84 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100; #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ ! #define JUMBLE_SIZE1024 /* query serialization buffer size */ /* --- 78,85 #define USAGE_DECREASE_FACTOR (0.99) /* decreased every entry_dealloc */ #define STICKY_DECREASE_FACTOR (0.50) /* factor for sticky entries */ #define USAGE_DEALLOC_PERCENT 5 /* free this % of entries at once */ ! #define EXEC_TIME_INIT_MIN DBL_MAX /* initial execution min time */ ! #define EXEC_TIME_INIT_MAX -DBL_MAX /* initial execution max time */ #define JUMBLE_SIZE1024 /* query serialization buffer size */ /* *** *** 114,119 typedef struct Counters --- 115,123 { int64 calls; /* # of times executed */ double total_time; /* total execution time, in msec */ + double total_sqtime; /* cumulated square execution time, in msec */ + double min_time; /* maximum execution time, in msec */ + double max_time; /* minimum execution time, in msec */ int64 rows; /* total # of retrieved or affected rows */ int64 shared_blks_hit; /* # of shared buffer hits */ int64 shared_blks_read; /* # of shared disk blocks read */ *** *** 237,245 void _PG_init(void); --- 241,251 void _PG_fini(void); Datum pg_stat_statements_reset(PG_FUNCTION_ARGS); + Datum pg_stat_statements_reset_time(PG_FUNCTION_ARGS); Datum pg_stat_statements(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(pg_stat_statements_reset); +
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/27/2014 07:09 AM, KONDO Mitsumasa wrote: (2014/01/23 23:18), Andrew Dunstan wrote: What is more, if the square root calculation is affecting your benchmarks, I suspect you are benchmarking the wrong thing. I run another test that has two pgbench-clients in same time, one is select-only-query and another is executing 'SELECT * pg_stat_statement' query in every one second. I used v6 patch in this test. The issue of concern is not the performance of pg_stat_statements, AUIU. The issue is whether this patch affects performance generally, i.e. is there a significant cost in collecting these extra stats. To test this you would compare two general pgbench runs, one with the patch applied and one without. I personally don't give a tinker's cuss about whether the patch slows down pg_stat_statements a bit. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Review] inherit support for foreign tables
2014-01-27 Etsuro Fujita fujita.ets...@lab.ntt.co.jp: (2014/01/25 11:27), Shigeru Hanada wrote: Yeah, the consistency is essential for its ease of use. But I'm not sure that inherited stats ignoring foreign tables is actually useful for query optimization. What I think about the consistency is a) the ANALYZE command with no table names skips ANALYZEing inheritance trees that include at least one foreign table as a child, but b) the ANALYZE command with a table name indicating an inheritance tree that includes any foreign tables does compute the inherited stats in the same way as an inheritance tree consiting of only ordinary tables by acquiring the sample rows from each foreign table on the far side. b) sounds little complex to understand or explain. Is it too big change that making ANALYZE command to handle foreign tables too even if no table name was specified? IIRC, performance issue was the reason to exclude foreign tables from auto-analyze processing. ANALYZEing large database contains local huge data also takes long time. One idea to avoid unexpected long processing is to add option to foreign tables to mark it as not-auto-analyzable. Thoughts? -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.1
On Sat, Jan 25, 2014 at 5:25 PM, Andres Freund and...@2ndquadrant.com wrote: Looking over patch 0002, I see that there's code to allow a walsender to create or drop a physical replication slot. Also, if we've acquired a replication slot, there's code to update it, and code to make sure we disconnect from it, but there's no code to acquire it. I think maybe the hunk in StartReplication() is supposed to be calling ReplicationSlotAcquire() instead of ReplicationSlotRelease(), Uh. You had me worried here for a minute or two, a hunk or two earlier than the ReplicationSlotRelease() you mention. What probably confused you is that StartReplication only returns once all streaming is finished. Not my idea... No, what confuses me is that there's no call to ReplicationSlotAcquire() in patch 0001 or patch 0002 the function is added but not called. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
2014-01-27 Etsuro Fujita fujita.ets...@lab.ntt.co.jp: While still reviwing this patch, I feel this patch has given enough consideration to interactions with other commands, but found the following incorrect? behabior: postgres=# CREATE TABLE product (id INTEGER, description TEXT); CREATE TABLE postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product) SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv'); CREATE FOREIGN TABLE postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE EXTERNAL; ERROR: product1 is not a table or materialized view ISTN the ALTER TABLE simple recursion mechanism (ie ATSimpleRecursion()) should be modified for the ALTER COLUMN SET STORAGE case. I just wanted to quickly tell you this for you to take time to consider. Thanks for the review. It must be an oversight, so I'll fix it up soon. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.1
On Sat, Jan 25, 2014 at 5:25 PM, Andres Freund and...@2ndquadrant.com wrote: I'm also wondering about whether we've got the right naming here. AFAICT, it's not the case that we're going to use the catalog xmin for catalogs and the data xmin for non-catalogs. Rather, the catalog xmin is going to always be included in globalxmin calculations, so IOW it should just be called xmin. Well, not really. That's true for GetSnapshotData(), but not for GetOldestXmin() where we calculate an xmin *not* including the catalog xmin. And the data_xmin is always used, regardless of catalog/non_catalog, we peg the xmin further for catalog tables, based on that value. The reason for doing things this way is that it makes all current usages of RecentGlobalXmin safe, since that is the more conservative value. Only in inspected location we can use RecentGlobalDataXmin which *does* include data_xmin but *not* catalog_xmin. Well, OK, so I guess I'm turned around. But I guess my point is - if one of data_xmin and catalog_xmin is really just xmin, then I think it would be more clear to call that one xmin. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
2014-01-27 Andrew Dunstan and...@dunslane.net On 01/27/2014 07:09 AM, KONDO Mitsumasa wrote: (2014/01/23 23:18), Andrew Dunstan wrote: What is more, if the square root calculation is affecting your benchmarks, I suspect you are benchmarking the wrong thing. I run another test that has two pgbench-clients in same time, one is select-only-query and another is executing 'SELECT * pg_stat_statement' query in every one second. I used v6 patch in this test. The issue of concern is not the performance of pg_stat_statements, AUIU. The issue is whether this patch affects performance generally, i.e. is there a significant cost in collecting these extra stats. To test this you would compare two general pgbench runs, one with the patch applied and one without. I showed first test result which is compared with without pg_stat_statements and without patch last day. They ran in same server and same benchmark settings(clients and scale factor) as today's result. When you merge and see the results, you can confirm not to affect of performance in my patch. Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] What is happening on buildfarm member crake?
On Sat, Jan 25, 2014 at 5:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah. If Robert's diagnosis is correct, and it sounds pretty plausible, then this is really just one instance of a bug that's probably pretty widespread in our signal handlers. Somebody needs to go through 'em all and look for touches of shared memory. I haven't made a comprehensive study of every signal handler we have, but looking at procsignal_sigusr1_handler, the list of functions that can get called from here is quite short: CheckProcSignal(), RecoveryConflictInterrupt(), SetLatch(), and latch_sigusr1_handler(). Taking those in reverse order: - latch_sigusr1_handler() is fine. Nothing down this path touches shared memory; moreover, if we've already disowned our latch, the waiting flag won't be set and this will do nothing at all. - The call to SetLatch() is problematic as we already know. This is new code in 9.4. - RecoveryConflictInterrupt() does nothing if proc_exit_inprogress is set. So it's fine. - CheckProcSignal() also appears problematic. If we've already detached shared memory, MyProcSignalSlot will be pointing to garbage, but we'll try to dereference it anyway. I think maybe the best fix is to *clear* MyProc in ProcKill/AuxiliaryProcKill and MyProcSignalSlot in CleanupProcSignalState, as shown in the attached patch. Most places that dereference those pointers already check that they aren't null, and we can easily add a NULL guard to the SetLatch() call in procsignal_sigusr1_handler, which the attached patch also does. This might not be a complete fix to every problem of this type that exists anywhere in our code, but I think it's enough to make the world safe for procsignal_sigusr1_handler. We also have a *large* number of signal handlers that do little more than this: if (MyProc) SetLatch(MyProc-procLatch); ...and this change should make all of those safe as well. So I think this is a pretty good start. Assuming nobody objects too much to this basic approach, should I back-patch the parts of this that apply pre-9.4? The problem with CleanupProcSignalState, at least, goes all the way back to 9.0, when the signal-multiplexing infrastructure was introduced. But the probability of an actual crash must be pretty low, or I imagine we would have noticed this sooner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index 6ebabce..1372a7e 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -149,6 +149,8 @@ CleanupProcSignalState(int status, Datum arg) slot = ProcSignalSlots[pss_idx - 1]; Assert(slot == MyProcSignalSlot); + MyProcSignalSlot = NULL; + /* sanity check */ if (slot-pss_pid != MyProcPid) { diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index ee6c24c..f64e1c4 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -772,6 +772,7 @@ ProcKill(int code, Datum arg) { /* use volatile pointer to prevent code rearrangement */ volatile PROC_HDR *procglobal = ProcGlobal; + PGPROC *proc; Assert(MyProc != NULL); @@ -796,31 +797,34 @@ ProcKill(int code, Datum arg) */ LWLockReleaseAll(); - /* Release ownership of the process's latch, too */ - DisownLatch(MyProc-procLatch); + /* + * Clear MyProc first; then disown the process latch. This is so that + * signal handlers won't try to clear the process latch after it's no + * longer ours. + */ + proc = MyProc; + MyProc = NULL; + DisownLatch(proc-procLatch); SpinLockAcquire(ProcStructLock); /* Return PGPROC structure (and semaphore) to appropriate freelist */ if (IsAnyAutoVacuumProcess()) { - MyProc-links.next = (SHM_QUEUE *) procglobal-autovacFreeProcs; - procglobal-autovacFreeProcs = MyProc; + proc-links.next = (SHM_QUEUE *) procglobal-autovacFreeProcs; + procglobal-autovacFreeProcs = proc; } else if (IsBackgroundWorker) { - MyProc-links.next = (SHM_QUEUE *) procglobal-bgworkerFreeProcs; - procglobal-bgworkerFreeProcs = MyProc; + proc-links.next = (SHM_QUEUE *) procglobal-bgworkerFreeProcs; + procglobal-bgworkerFreeProcs = proc; } else { - MyProc-links.next = (SHM_QUEUE *) procglobal-freeProcs; - procglobal-freeProcs = MyProc; + proc-links.next = (SHM_QUEUE *) procglobal-freeProcs; + procglobal-freeProcs = proc; } - /* PGPROC struct isn't mine anymore */ - MyProc = NULL; - /* Update shared estimate of spins_per_delay */ procglobal-spins_per_delay = update_spins_per_delay(procglobal-spins_per_delay); @@ -849,6 +853,7 @@ AuxiliaryProcKill(int code, Datum arg) { int proctype = DatumGetInt32(arg); PGPROC *auxproc PG_USED_FOR_ASSERTS_ONLY; + PGPROC *proc; Assert(proctype = 0 proctype NUM_AUXILIARY_PROCS); @@ -859,16 +864,19 @@ AuxiliaryProcKill(int code, Datum arg) /* Release any LW locks I am holding (see notes above) */
Re: [HACKERS] inherit support for foreign tables
Sent from my iPad On 27-Jan-2014, at 21:03, David Fetter da...@fetter.org wrote: On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote: Hi Hanada-san, While still reviwing this patch, I feel this patch has given enough consideration to interactions with other commands, but found the following incorrect? behabior: postgres=# CREATE TABLE product (id INTEGER, description TEXT); CREATE TABLE postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product) SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv'); CREATE FOREIGN TABLE postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE EXTERNAL; ERROR: product1 is not a table or materialized view ISTN the ALTER TABLE simple recursion mechanism (ie ATSimpleRecursion()) should be modified for the ALTER COLUMN SET STORAGE case. This points to a larger discussion about what precisely foreign tables can and cannot inherit from local ones. I don't think that a generic solution will be satisfactory, as the PostgreSQL FDW could, at least in principle, support many more than the CSV FDW, as shown above. In my estimation, the outcome of discussion above is not a blocker for this I wonder what shall be the cases when foreign table is on a server which does not support *all* SQL features. Does a FDW need to have the possible inherit options mentioned in its documentation for this patch? Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
On Mon, Jan 27, 2014 at 11:18 PM, Amit Langote amitlangot...@gmail.com wrote: On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila amit.kapil...@gmail.com wrote: I have extended test (contrib) module dsm_demo such that now user can specify during dsm_demo_create the lifespan of segment. The values it can accept are 0 or 1. Default value is 0. 0 -- means segment will be accessible for session life time 1 -- means segment will be accessible for postmaster life time The behaviour is as below: Test -1 (Session life time) Session - 1 -- here it will create segment for session lifetime select dsm_demo_create('this message is from session-1', 0); dsm_demo_create - 82712 Session - 2 - select dsm_demo_read(82712); dsm_demo_read this message is from session-1 (1 row) Session-1 \q Session-2 postgres=# select dsm_demo_read(82712); dsm_demo_read --- (1 row) Conclusion of Test-1 : As soon as session which has created segment finished, the segment becomes non-accessible. Test -2 (Postmaster life time) Session - 1 -- here it will create segment for postmaster lifetime select dsm_demo_create('this message is from session-1', 1); dsm_demo_create - 82712 Session - 2 - select dsm_demo_read(82712); dsm_demo_read this message is from session-1 (1 row) Session-1 \q Session-2 postgres=# select dsm_demo_read(82712); dsm_demo_read --- this message is from session-1 (1 row) Conclusion of Test-2 : a. Segment is accessible for postmaster lifetime. b. if user restart server, segment is not accessible. Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch. Got the following warning when I tried above example: postgres=# select dsm_demo_create('this message is from session-new', 1); WARNING: dynamic shared memory leak: segment 1402373971 still referenced WARNING: dynamic shared memory leak: segment 1402373971 still referenced dsm_demo_create - 1402373971 (1 row) I see that PrintDSMLeakWarning() which emits this warning is for debugging. -- Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
Amit Kapila amit.kapil...@gmail.com writes: On Fri, Jan 24, 2014 at 9:10 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Jan 24, 2014 at 2:38 AM, Robert Haas robertmh...@gmail.com wrote: I mean, if there's a GUC that controls the event source name, then it can be changed between restarts, regardless of what you call it. Yes, but not default values (when user don't provide any value for event_soource). Here the question is about default value of event_source. To proceed with the review of this patch, I need to know about whether appending version number or any other constant togli Default Event Source name is acceptable or not, else for now we can remove this part of code from patch and handle non-default case where the change will be that pg_ctl will enquire non-default event_source value from server. Could you please let me know your views about same? Unless I'm missing something, this entire thread is a tempest in a teapot, because the default event_source value does not matter, because *by default we don't log to eventlog*. The presumption is that if the user turns on logging to eventlog, it's his responsibility to first make sure that event_source is set to something appropriate. And who's to say that plain PostgreSQL isn't what he wanted, anyway? Even if he's got multiple servers on one machine, maybe directing all their logs to the same place is okay by him. Also, those who don't run multiple servers are probably not going to thank us for moving their logs around unnecessarily. In short, I think we should just reject this idea as introducing more problems than it solves, and not fully solving even the problem it purports to solve. Possibly there's room for a documentation patch reminding users to make sure that event_source is set appropriately before they turn on eventlog. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition in b-tree page deletion
On 12/17/2013 04:55 AM, Jim Nasby wrote: On 11/9/13, 10:02 AM, Heikki Linnakangas wrote: 3. Another approach would be to get rid of the can't delete rightmost child limitation. We currently have that limitation because it ensures that we never need to change the high-key of a page. If we delete a page that is the rightmost child of its parent, we transfer the deleted keyspace from the parent page to its right sibling. To do that, we need to update the high key of the parent, as well as the downlink of the right sibling at the grandparent level. That's a bit complicated, because updating the high key might require splitting the page. Is the rightmost child issue likely to affect indexes on increasing values, like a queue table? No. In a FIFO queue, you keep adding new items to the right-end of the index, so old pages become non-rightmost fairly quickly. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Sun, Jan 26, 2014 at 10:56 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 01/25/2014, Josh Berkus wrote: ISTM the consensus is that we need better monitoring/administration interfaces so that people can script the behavior they want in external tools. Also, a new synchronous apply replication mode would be handy, but that'd be a whole different patch. We don't have a patch on the table that we could consider committing any time soon, so I'm going to mark this as rejected in the commitfest app. I don't feel that we'll never do auto-degrade is determinative; several hackers were for auto-degrade, and they have a good use-case argument. However, we do have consensus that we need more scaffolding than this patch supplies in order to make auto-degrade *safe*. I encourage the submitter to resumbit and improved version of this patch (one with more monitorability) for 9.5 CF1. That'll give us a whole dev cycle to argue about it. I shall rework to improve this patch. Below are the summarization of all discussions, which will be used as input for improving the patch: 1. Method of degrading the synchronous mode: a. Expose the configuration variable to a new SQL-callable functions. b. Using ALTER SYSTEM SET. c. Auto-degrade using some sort of configuration parameter as done in current patch. d. Or may be combination of above, which DBA can use depending on their use-cases. We can discuss further to decide on one of the approach. 2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with the master. 3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system. I shall propose a better approach for this. 4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode. 5. Please add more if I am missing something. All of those things have been mentioned, but I'm not sure we have consensus on which of them we actually want to do, or how. Figuring that out seems like the next step. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
Simon Riggs si...@2ndquadrant.com writes: Am I right in thinking that we have this fully working now? I will look at this at some point during the CF, but have not yet, and probably won't as long as it's not marked ready for committer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 27 January 2014 15:04, Dean Rasheed dean.a.rash...@gmail.com wrote: So for example, when planning the query to update an inheritance child, the rtable will contain an RTE for the parent, but it will not be referenced in the parse tree, and so it will not be expanded while planning the child update. Am I right in thinking that we have this fully working now? If we commit this aspect soon, we stand a chance of also touching upon RLS. AFAICS the only area of objection is the handling of inherited relations, which occurs within the planner in the current patch. I can see that would be a cause for concern since the planner is pluggable and it would then be possible to bypass security checks. Obviously installing a new planner isn't trivial, but doing so shouldn't cause collateral damage. We have long had restrictions around updateable views. My suggestion from here is that we accept the restriction that we cannot yet have the 3-way combination of updateable views, security views and views on inherited tables. Most people aren't using inherited tables and people that are have special measures in place for their apps. We won't lose much by accepting that restriction for 9.4 and re-addressing the issue in a later release. We need not adopt an all or nothing approach. Perhaps we might yet find a solution for 9.4, but again, that need not delay the rest of the patch. From a review perspective, I'd want to see some greatly expanded README comments, but given the Wiki entry, I think we can do that quickly. Other than that, the code seems clear, modular and well tested, so is something I could see me committing the uncontentious parts of. Thoughts? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory and locks
On Thu, Jan 23, 2014 at 11:10 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jan 22, 2014 at 12:42 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-22 12:40:34 -0500, Robert Haas wrote: On Wed, Jan 22, 2014 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid breaking external code using lwlocks? +1, in fact there's probably no reason to touch most *internal* code using that type name either. I thought about this but figured it was too much of a misnomer to refer to a pointer as an ID. But, if we're sure we want to go that route, I can go revise the patch along those lines. I personally don't care either way for internal code as long as external code continues to work. There's the argument of making the commit better readable by having less noise and less divergence in the branches and there's your argument of that being less clear. OK, well then, if no one objects violently, I'll stick my current approach of getting rid of all core mentions of LWLockId in favor of LWLock *, but also add typedef LWLock *LWLockId with a comment that this is to minimize breakage of third-party code. Hearing no objections, violent or otherwise, I've done that, made the other adjustments suggested by Andres and KaiGai, and committed this. Let's see what the buildfarm thinks... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
KaiGai Kohei, * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote: Is somebody available to volunteer to review the custom-scan patch? I looked through it a bit and my first take away from it was that the patches to actually use the new hooks were also making more changes to the backend code, leaving me with the impression that the proposed interface isn't terribly stable. Perhaps those changes should have just been in the first patch, but they weren't and that certainly gave me pause. I'm also not entirely convinced that this is the direction to go in when it comes to pushing down joins to FDWs. While that's certainly a goal that I think we all share, this seems to be intending to add a completely different feature which happens to be able to be used for that. For FDWs, wouldn't we only present the FDW with the paths where the foreign tables for that FDW, or perhaps just a given foreign server, are being joined? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila amit.kapil...@gmail.com wrote: I have extended test (contrib) module dsm_demo such that now user can specify during dsm_demo_create the lifespan of segment. The values it can accept are 0 or 1. Default value is 0. 0 -- means segment will be accessible for session life time 1 -- means segment will be accessible for postmaster life time The behaviour is as below: Test -1 (Session life time) Session - 1 -- here it will create segment for session lifetime select dsm_demo_create('this message is from session-1', 0); dsm_demo_create - 82712 Session - 2 - select dsm_demo_read(82712); dsm_demo_read this message is from session-1 (1 row) Session-1 \q Session-2 postgres=# select dsm_demo_read(82712); dsm_demo_read --- (1 row) Conclusion of Test-1 : As soon as session which has created segment finished, the segment becomes non-accessible. Test -2 (Postmaster life time) Session - 1 -- here it will create segment for postmaster lifetime select dsm_demo_create('this message is from session-1', 1); dsm_demo_create - 82712 Session - 2 - select dsm_demo_read(82712); dsm_demo_read this message is from session-1 (1 row) Session-1 \q Session-2 postgres=# select dsm_demo_read(82712); dsm_demo_read --- this message is from session-1 (1 row) Conclusion of Test-2 : a. Segment is accessible for postmaster lifetime. b. if user restart server, segment is not accessible. Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch. Got the following warning when I tried above example: postgres=# select dsm_demo_create('this message is from session-new', 1); WARNING: dynamic shared memory leak: segment 1402373971 still referenced WARNING: dynamic shared memory leak: segment 1402373971 still referenced dsm_demo_create - 1402373971 (1 row) -- Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?
On Mon, Jan 27, 2014 at 3:43 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jan 26, 2014 at 1:03 PM, Andres Freund and...@2ndquadrant.com wrote: For some reason CheckForStandbyTrigger() doesn't report permission errors when stat()int the trigger file. Shouldn't we fix that? static bool CheckForStandbyTrigger(void) { ... if (stat(TriggerFile, stat_buf) == 0) { ereport(LOG, (errmsg(trigger file found: %s, TriggerFile))); unlink(TriggerFile); triggered = true; fast_promote = true; return true; } Imo the stat() should warn about all errors but ENOENT? Seems reasonable. It could lead to quite a bit of log spam, I suppose, but the way things are now could be pretty mystifying if you've located your trigger file somewhere outside $PGDATA, and a parent directory is lacking permissions. +1. Since it actually indicates something that's quite broken (since with that you can never make the trigger work until you fix it), the log spam seems like it would be appropriate. (Logspam is never nice, but a single log line is also very easy to miss - this should log enough that you wouldn't) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/27/2014 08:48 AM, Mitsumasa KONDO wrote: The issue of concern is not the performance of pg_stat_statements, AUIU. The issue is whether this patch affects performance generally, i.e. is there a significant cost in collecting these extra stats. To test this you would compare two general pgbench runs, one with the patch applied and one without. I showed first test result which is compared with without pg_stat_statements and without patch last day. They ran in same server and same benchmark settings(clients and scale factor) as today's result. When you merge and see the results, you can confirm not to affect of performance in my patch. Yeah, sorry, I misread your message. I think this is good to go from a performance point of view, although I'm still a bit worried about the validity of the method (accumulating a sum of squares). OTOH, Welford's method probably requires slightly more per statement overhead, and certainly requires one more accumulator per statement. I guess if we find ill conditioned results it wouldn't be terribly hard to change. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?
On Sun, Jan 26, 2014 at 1:03 PM, Andres Freund and...@2ndquadrant.com wrote: For some reason CheckForStandbyTrigger() doesn't report permission errors when stat()int the trigger file. Shouldn't we fix that? static bool CheckForStandbyTrigger(void) { ... if (stat(TriggerFile, stat_buf) == 0) { ereport(LOG, (errmsg(trigger file found: %s, TriggerFile))); unlink(TriggerFile); triggered = true; fast_promote = true; return true; } Imo the stat() should warn about all errors but ENOENT? Seems reasonable. It could lead to quite a bit of log spam, I suppose, but the way things are now could be pretty mystifying if you've located your trigger file somewhere outside $PGDATA, and a parent directory is lacking permissions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 27 January 2014 07:54, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Hello, I checked the latest updatable security barrier view patch. Even though I couldn't find a major design problem in this revision, here are two minor comments below. I think, it needs to be reviewed by committer to stick direction to implement this feature. Of course, even I know Tom argued the current design of this feature on the up-thread, it does not seem to me Dean's design is not reasonable. Thanks for looking at this. Below is minor comments of mine: @@ -932,9 +938,32 @@ inheritance_planner(PlannerInfo *root) if (final_rtable == NIL) final_rtable = subroot.parse-rtable; else - final_rtable = list_concat(final_rtable, + { + List *tmp_rtable = NIL; + ListCell *cell1, *cell2; + + /* +* Planning this new child may have turned some of the original +* RTEs into subqueries (if they had security barrier quals). If +* so, we want to use these in the final rtable. +*/ + forboth(cell1, final_rtable, cell2, subroot.parse-rtable) + { + RangeTblEntry *rte1 = (RangeTblEntry *) lfirst(cell1); + RangeTblEntry *rte2 = (RangeTblEntry *) lfirst(cell2); + + if (rte1-rtekind == RTE_RELATION + rte1-securityQuals != NIL + rte2-rtekind == RTE_SUBQUERY) + tmp_rtable = lappend(tmp_rtable, rte2); + else + tmp_rtable = lappend(tmp_rtable, rte1); + } Do we have a case if rte1 is regular relation with securityQuals but rte2 is not a sub-query? If so, rte2-rtekind == RTE_SUBQUERY should be a condition in Assert, but the third condition in if-block. Yes it is possible for rte1 to be a RTE_RELATION with securityQuals and rte2 to be something other than a RTE_SUBQUERY because the subquery expansion code in expand_security_quals() only expands RTEs that are actually used in the query. So for example, when planning the query to update an inheritance child, the rtable will contain an RTE for the parent, but it will not be referenced in the parse tree, and so it will not be expanded while planning the child update. In case when a sub-query is simple enough; no qualifier and no projection towards underlying scan, is it pulled-up even if this sub-query has security-barrier attribute, isn't it? See the example below. The view v2 is defined as follows. postgres=# CREATE VIEW v2 WITH (security_barrier) AS SELECT * FROM t2 WHERE x % 10 = 5; CREATE VIEW postgres=# EXPLAIN SELECT * FROM v2 WHERE f_leak(z); QUERY PLAN - Subquery Scan on v2 (cost=0.00..3.76 rows=1 width=41) Filter: f_leak(v2.z) - Seq Scan on t2 (cost=0.00..3.50 rows=1 width=41) Filter: ((x % 10) = 5) (4 rows) postgres=# EXPLAIN SELECT * FROM v2; QUERY PLAN --- Seq Scan on t2 (cost=0.00..3.50 rows=1 width=41) Filter: ((x % 10) = 5) (2 rows) The second explain result shows the underlying sub-query is pulled-up even though it has security-barrier attribute. (IIRC, it was a new feature in v9.3.) Actually what happens is that it is planned as a subquery scan, then at the very end of the planning process, it detects that the subquery scan is trivial (has no quals, and has a no-op targetlist) and it removes that plan node --- see set_subqueryscan_references() and trivial_subqueryscan(). That subquery scan removal code requires the targetlist to have exactly the same number of attributes, in exactly the same order as the underlying relation. As soon as you add anything non-trivial to the select list in the above queries, or even just change the order of its attributes, the subquery scan node is no longer removed. On the other hand, this kind of optimization was not applied on a sub-query being extracted from a relation with securityQuals postgres=# EXPLAIN UPDATE v2 SET z = z; QUERY PLAN Update on t2 t2_1 (cost=0.00..3.51 rows=1 width=47) - Subquery Scan on t2 (cost=0.00..3.51 rows=1 width=47) - Seq Scan on t2 t2_2 (cost=0.00..3.50 rows=1 width=47) Filter: ((x % 10) = 5) (4 rows) If it has no security_barrier option, the view reference is extracted in the rewriter stage, it was pulled up as we expected. postgres=# ALTER VIEW v2 RESET (security_barrier); ALTER VIEW postgres=# EXPLAIN UPDATE t2 SET z = z; QUERY PLAN --- Update on t2 (cost=0.00..3.00 rows=100 width=47) - Seq Scan on t2 (cost=0.00..3.00 rows=100 width=47) (2 rows) Probably, it
Re: [HACKERS] inherit support for foreign tables
On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote: Hi Hanada-san, While still reviwing this patch, I feel this patch has given enough consideration to interactions with other commands, but found the following incorrect? behabior: postgres=# CREATE TABLE product (id INTEGER, description TEXT); CREATE TABLE postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product) SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv'); CREATE FOREIGN TABLE postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE EXTERNAL; ERROR: product1 is not a table or materialized view ISTN the ALTER TABLE simple recursion mechanism (ie ATSimpleRecursion()) should be modified for the ALTER COLUMN SET STORAGE case. This points to a larger discussion about what precisely foreign tables can and cannot inherit from local ones. I don't think that a generic solution will be satisfactory, as the PostgreSQL FDW could, at least in principle, support many more than the CSV FDW, as shown above. In my estimation, the outcome of discussion above is not a blocker for this patch. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
On Fri, Jan 24, 2014 at 9:10 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Jan 24, 2014 at 2:38 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 23, 2014 at 9:23 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jan 23, 2014 at 10:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: I think what we might want to do is redefine the server's behavior as creating an event named after the concatenation of event_source and port number, or maybe even get rid of event_source entirely and just say it's PostgreSQL followed by the port number. To accomplish this behaviour, each time server starts and stops, we need to register and unregister event log using mechanism described at below link to ensure that there is no mismatch between what server uses and what OS knows. http://www.postgresql.org/docs/devel/static/event-log-registration.html Why wouldn't that be necessary with your approach, too? Because in my approach we are using compile time constant + #define DEFAULT_EVENT_SOURCE PostgreSQL PG_MAJORVERSION I mean, if there's a GUC that controls the event source name, then it can be changed between restarts, regardless of what you call it. Yes, but not default values (when user don't provide any value for event_soource). Here the question is about default value of event_source. To proceed with the review of this patch, I need to know about whether appending version number or any other constant to Default Event Source name is acceptable or not, else for now we can remove this part of code from patch and handle non-default case where the change will be that pg_ctl will enquire non-default event_source value from server. Could you please let me know your views about same? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: In addition to that, I'm using the ternary consistent function to check if minItem is a match, even if we haven't loaded all the entries yet. That's less important, but I think for something like rare1 | (rare2 frequent) it might be useful. It would allow us to skip fetching 'frequent', when we already know that 'rare1' matches for the current item. I'm not sure if that's worth the cycles, but it seemed like an obvious thing to do, now that we have the ternary consistent function. So, that clearly isn't worth the cycles :-). At least not with an expensive consistent function; it might be worthwhile if we pre-build the truth-table, or cache the results of the consistent function. I believe cache consistent function results is quite same as lazy truth-table. I think it's a good option to use with two-state consistent function. However, I don't think it's a reason to refuse from three-state consistent function because number of entries could be large. -- With best regards, Alexander Korotkov.
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 27 January 2014 16:11, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: Am I right in thinking that we have this fully working now? I will look at this at some point during the CF, but have not yet, and probably won't as long as it's not marked ready for committer. I've marked it Ready for Committer, to indicate my personal opinion. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fwd: Request for error explaination || Adding a new integer in indextupleData Structure
Hi All, I was trying to modify indextupledata structure by adding an integer variable. ButI faced an error message psql: FATAL: could not find tuple for opclass 10032. Could anyone please help me in resolving this issue. Regards, Rohit Goyal -- Regards, Rohit Goyal
Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET
* Simon Riggs (si...@2ndquadrant.com) wrote: I don't see anything for 9.4 in here now. Attached is what I was toying with (thought I had attached it previously somewhere.. perhaps not), but in re-testing, it doesn't appear to do enough to move things in the right direction in all cases. I did play with this a fair bit yesterday and while it improved some cases by 20% (eg: a simple join between pgbench_accounts and pgbench_history), when we decide to *still* hash the larger side (as in my 'test_case2.sql'), it can cause a similairly-sized decrease in performance. Of course, if we can push that case to hash the smaller side (which I did by hand with cpu_tuple_cost), then it goes back to being a win to use a larger number of buckets. I definitely feel that there's room for improvment here but it's not an easily done thing, unfortunately. To be honest, I was pretty surprised when I saw that the larger number of buckets performed worse, even if it was when we picked the wrong side to hash and I plan to look into that more closely to try and understand what's happening. My first guess would be what Tom had mentioned over the summer- if the size of the bucket array ends up being larger than the CPU cache, we can end up paying a great deal more to build the hash table than it costs to scan through the deeper buckets that we end up with as a result (particularly when we're scanning a smaller table). Of course, choosing to hash the larger table makes that more likely.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET
* Stephen Frost (sfr...@snowman.net) wrote: * Simon Riggs (si...@2ndquadrant.com) wrote: I don't see anything for 9.4 in here now. Attached [...] I'm apparently bad at this 'attaching' thing, particularly on this subject. Here it is. Thanks, Stephen colordiff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c new file mode 100644 index 6a2f236..a8a8168 *** a/src/backend/executor/nodeHash.c --- b/src/backend/executor/nodeHash.c *** ExecChooseHashTableSize(double ntuples, *** 489,496 /* We expect the hashtable to fit in memory */ double dbuckets; ! dbuckets = ceil(ntuples / NTUP_PER_BUCKET); ! dbuckets = Min(dbuckets, max_pointers); nbuckets = (int) dbuckets; nbatch = 1; --- 489,495 /* We expect the hashtable to fit in memory */ double dbuckets; ! dbuckets = Min(ntuples, max_pointers); nbuckets = (int) dbuckets; nbatch = 1; signature.asc Description: Digital signature
Re: [HACKERS] Fwd: Request for error explaination || Adding a new integer in indextupleData Structure
Rohit Goyal rhtgyl...@gmail.com writes: Hi All, I was trying to modify indextupledata structure by adding an integer variable. ButI faced an error message psql: FATAL: could not find tuple for opclass 10032. Could anyone please help me in resolving this issue. You broke a system catalog index. Without seeing what you changed and where, it's impossible to say just how, but that's the bottom line. In recent versions of PG, opclass 10032 is btree name_ops (unless you've also added/removed system catalog entries), which is a pretty plausible thing to be one of the first indexscanned fetches during relcache.c initialization, so I don't think there's any great significance in this particular error message. It's likely that you broke *all* indexscans not just one specific one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly
Le mardi 7 janvier 2014 17:05:03 Michael Paquier a écrit : On Sun, Jan 5, 2014 at 3:49 PM, MauMau maumau...@gmail.com wrote: Could you confirm again and tell me what problem is happening? FWIW, I just quickly tested those two patches independently and got them correctly applied with patch -p1 $PATCH on master at edc4345. They compiled and passed as well make check. Regards, Both patches apply cleanly, I'll focus on the pg_stop_fail_v3 patch. Tests are running correctly. The bug this patch is supposed to fix has been reproduced on current HEAD, and cannot be reproduced using the patch. Previous concerns about using both get_pgpid and postmaster_is_alive are adressed. There is no regression tests covering this bugfix, althought I don't know if it would be practical to implement them. -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET
2014-01-27 Stephen Frost sfr...@snowman.net * Simon Riggs (si...@2ndquadrant.com) wrote: I don't see anything for 9.4 in here now. Attached is what I was toying with (thought I had attached it previously somewhere.. perhaps not), but in re-testing, it doesn't appear to do enough to move things in the right direction in all cases. I did play with this a fair bit yesterday and while it improved some cases by 20% (eg: a simple join between pgbench_accounts and pgbench_history), when we decide to *still* hash the larger side (as in my 'test_case2.sql'), it can cause a similairly-sized decrease in performance. Of course, if we can push that case to hash the smaller side (which I did by hand with cpu_tuple_cost), then it goes back to being a win to use a larger number of buckets. I definitely feel that there's room for improvment here but it's not an easily done thing, unfortunately. To be honest, I was pretty surprised when I saw that the larger number of buckets performed worse, even if it was when we picked the wrong side to hash and I plan to look into that more closely to try and understand what's happening. My first guess would be what Tom had mentioned over the summer- if the size of the bucket array ends up being larger than the CPU cache, we can end up paying a great deal more to build the hash table than it costs to scan through the deeper buckets that we end up with as a result (particularly when we're scanning a smaller table). Of course, choosing to hash the larger table makes that more likely.. This topic is interesting - we found very bad performance with hashing large tables with high work_mem. MergeJoin with quicksort was significantly faster. I didn't deeper research - there is a possibility of virtualization overhead. Regards Pavel Thanks, Stephen
Re: [HACKERS] new json funcs
On Fri, Jan 24, 2014 at 3:26 PM, Josh Berkus j...@agliodbs.com wrote: On 01/24/2014 12:59 PM, Andrew Dunstan wrote: On 01/24/2014 03:40 PM, Laurence Rowe wrote: For consistency with the existing json functions (json_each, json_each_text, etc.) it might be better to add separate json_to_record_text and json_to_recordset_text functions in place of the nested_as_text parameter to json_to_record and json_to_recordset. It wouldn't be consistent with json_populate_record() and json_populate_recordset(), the two closest relatives, however. And yes, I appreciate that we have not been 100% consistent. Community design can be a bit messy that way. FWIW, I prefer the parameter to having differently named functions. +1. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add %z support to elog/ereport?
I wrote: the idea that we might get many dozen such warnings on more-current compilers is scarier, as that might well interfere with people's ability to do development on, say, Windows. Could somebody check whether MSVC for instance complains about format strings using z? Or shall I just commit this and we'll see what the buildfarm says? Given the lack of response, I've pushed the patch, and will canvass the buildfarm results later. Just to follow up on that, I couldn't find any related warnings in the buildfarm this morning (although there is one laggard machine, anole, which uses an unusual compiler and still hasn't reported in). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
On 01/27/2014 12:43 PM, Merlin Moncure wrote: On Fri, Jan 24, 2014 at 3:26 PM, Josh Berkus j...@agliodbs.com wrote: On 01/24/2014 12:59 PM, Andrew Dunstan wrote: On 01/24/2014 03:40 PM, Laurence Rowe wrote: For consistency with the existing json functions (json_each, json_each_text, etc.) it might be better to add separate json_to_record_text and json_to_recordset_text functions in place of the nested_as_text parameter to json_to_record and json_to_recordset. It wouldn't be consistent with json_populate_record() and json_populate_recordset(), the two closest relatives, however. And yes, I appreciate that we have not been 100% consistent. Community design can be a bit messy that way. FWIW, I prefer the parameter to having differently named functions. +1. Note that we can only do this when the result type stays the same. It does not for json_each/json_each_text or json_extract_path/json_extract_path_text, which is why we have different functions for those cases. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On 24 January 2014 08:33, Simon Riggs si...@2ndquadrant.com wrote: On 24 January 2014 07:08, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs si...@2ndquadrant.com wrote: v15 to fix the above problem. v16 attached v17 attached This version adds a GUC called ddl_exclusive_locks which allows us to keep the 9.3 behaviour if we wish it. Some people may be surprised that their programs don't wait in the same places they used to. We hope that is a positive and useful behaviour, but it may not always be so. I'll commit this on Thurs 30 Jan unless I hear objections. Thanks -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
On 01/13/2014 09:48 PM, Robert Haas wrote: What I'm saying is that if EXPLAIN reports something that's labelled Planning Time, it should *be* the planning time, and not anything else. When we retrieve a plan from cache, it would be sensible not to report the planning time at all, and IMHO it would also be sensible to report the time it actually took to plan whenever we originally did it. But reporting a value that is not the planning time and calling it the planning time does not seem like a good idea to me. Here is a patch which only prints when Planning time when a prepared statment actually planned a query. I do not really like how I check for if it was replanned, but I tried to avoid making changes in plancache.c. Does this idea look ok? -- Andreas Karlsson diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml new file mode 100644 index 2af1738..482490b *** a/doc/src/sgml/perform.sgml --- b/doc/src/sgml/perform.sgml *** EXPLAIN SELECT * FROM tenk1; *** 89,94 --- 89,95 QUERY PLAN - Seq Scan on tenk1 (cost=0.00..458.00 rows=1 width=244) + Planning time: 0.113 ms /screen /para *** EXPLAIN SELECT * FROM tenk1; *** 162,167 --- 163,174 /para para + The literalPlanning time/literal shown is the time it took to generate + the query plan from the parsed query and optimize it. It does not include + rewriting and parsing. +/para + +para Returning to our example: screen *** EXPLAIN SELECT * FROM tenk1; *** 170,175 --- 177,183 QUERY PLAN - Seq Scan on tenk1 (cost=0.00..458.00 rows=1 width=244) + Planning time: 0.113 ms /screen /para *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 198,203 --- 206,212 Seq Scan on tenk1 (cost=0.00..483.00 rows=7001 width=244) Filter: (unique1 lt; 7000) + Planning time: 0.104 ms /screen Notice that the commandEXPLAIN/ output shows the literalWHERE/ *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 234,239 --- 243,249 Recheck Cond: (unique1 lt; 100) -gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 lt; 100) + Planning time: 0.093 ms /screen Here the planner has decided to use a two-step plan: the child plan *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 262,267 --- 272,278 Filter: (stringu1 = 'xxx'::name) -gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) Index Cond: (unique1 lt; 100) + Planning time: 0.089 ms /screen The added condition literalstringu1 = 'xxx'/literal reduces the *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 283,288 --- 294,300 - Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244) Index Cond: (unique1 = 42) + Planning time: 0.076 ms /screen In this type of plan the table rows are fetched in index order, which *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 311,316 --- 323,329 Index Cond: (unique1 lt; 100) -gt; Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) Index Cond: (unique2 gt; 9000) + Planning time: 0.094 ms /screen But this requires visiting both indexes, so it's not necessarily a win *** EXPLAIN SELECT * FROM tenk1 WHERE unique *** 331,336 --- 344,350 -gt; Index Scan using tenk1_unique2 on tenk1 (cost=0.29..71.27 rows=10 width=244) Index Cond: (unique2 gt; 9000) Filter: (unique1 lt; 100) + Planning time: 0.087 ms /screen /para *** WHERE t1.unique1 lt; 10 AND t1.unique2 *** 364,369 --- 378,384 Index Cond: (unique1 lt; 10) -gt; Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) Index Cond: (unique2 = t1.unique2) + Planning time: 0.117 ms /screen /para *** WHERE t1.unique1 lt; 10 AND t2.unique2 *** 415,420 --- 430,436 -gt; Materialize (cost=0.29..8.51 rows=10 width=244) -gt; Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..8.46 rows=10 width=244) Index Cond: (unique2 lt; 10) + Planning time: 0.119 ms /screen The condition literalt1.hundred lt; t2.hundred/literal can't be *** WHERE t1.unique1 lt; 100 AND t1.unique2 *** 462,467 --- 478,484 Recheck Cond: (unique1 lt; 100) -gt; Bitmap Index Scan on
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On 27 January 2014 17:58, Simon Riggs si...@2ndquadrant.com wrote: On 24 January 2014 08:33, Simon Riggs si...@2ndquadrant.com wrote: On 24 January 2014 07:08, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs si...@2ndquadrant.com wrote: v15 to fix the above problem. v16 attached v17 attached Frostbite... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services reduce_lock_levels.v17.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET
On 27 January 2014 17:44, Pavel Stehule pavel.steh...@gmail.com wrote: This topic is interesting - we found very bad performance with hashing large tables with high work_mem. MergeJoin with quicksort was significantly faster. I've seen this also. I didn't deeper research - there is a possibility of virtualization overhead. I took measurements and the effect was repeatable and happened for all sizes of work_mem, but nothing more to add. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun
On 01/23/2014 11:36 PM, Peter Geoghegan wrote: The first thing I noticed about this patchset is that it completely expunges btree_xlog_startup(), btree_xlog_cleanup() and btree_safe_restartpoint(). The post-recovery cleanup that previously occurred to address both sets of problems (the problem addressed by this patch, incomplete page splits, and the problem addressed by the dependency patch, incomplete page deletions) now both occur opportunistically/lazily. Notably, this means that there are now exactly zero entries in the resource manager list with a 'restartpoint' callback. I guess we should consider removing it entirely for that reason. As you said in the commit message where gin_safe_restartpoint was similarly retired (commit 631118fe): The post-recovery cleanup mechanism was never totally reliable, as insertion to the parent could fail e.g because of running out of memory or disk space, leaving the tree in an inconsistent state. I'm of the general impression that post-recovery cleanup is questionable. I'm surprised that you didn't mention this commit previously. You just mentioned the original analogous work on GiST, but this certainly seems to be something you've been thinking about *systematically* eliminating for a while. Yes, that's correct, these b-tree patches are part of a grand plan to eliminate post-recovery cleanup actions altogether. So while post-recovery callbacks no longer exist for any rmgr-managed-resource, 100% of remaining startup and cleanup callbacks concern the simple management of memory of AM-specific recovery contexts (for GiST, GiN and SP-GiST). I have to wonder if there isn't a better abstraction than that, such as a generic recovery memory context, allowing you to retire all 3 callbacks. I mean, StartupXLOG() just calls those callbacks for each resource at exactly the same time anyway, just as it initializes resource managers in precisely the same manner earlier on. Plus if you look at what those AM-local memory management routines do, it all seems very simple. I think you should bump XLOG_PAGE_MAGIC. Perhaps you should increase the elevel here: + elog(DEBUG1, finishing incomplete split of %u/%u, +BufferGetBlockNumber(lbuf), BufferGetBlockNumber(rbuf)); Since this is a very rare occurrence that involves some subtle interactions, I can't think why you wouldn't want to LOG this for forensic purposes. Hmm. I'm not sure I agree with that line of thinking in general - what is an admin supposed to do about the message? But there's a precedent in vacuumlazy.c, which logs a message when it finds all-zero pages in the heap. So I guess that should be a LOG. Why did you remove the local linkage of _bt_walk_left(), given that it is called in exactly the same way as before? I guess that that is just a vestige of some earlier revision. Yeah, reverted. I think I see some bugs in _bt_moveright(). If you examine _bt_finish_split() in detail, you'll see that it doesn't just drop the write buffer lock that the caller will have provided (per its comments) - it also drops the buffer pin. It calls _bt_insert_parent() last, which was previously only called last thing by _bt_insertonpg() (some of the time), and _bt_insertonpg() is indeed documented to drop both the lock and the pin. And if you look at _bt_moveright() again, you'll see that there is a tacit assumption that the buffer pin isn't dropped, or at least that opaque (the BTPageOpaque BT special page area pointer) continues to point to the same page held in the same buffer, even though the code doesn't set the opaque' pointer again and it may not point to a pinned buffer or even the appropriate buffer. Ditto page. So opaque may point to the wrong thing on subsequent iterations - you don't do anything with the return value of _bt_getbuf(), unlike all of the existing call sites. I believe you should store its return value, and get the held page and the special pointer on the page, and assign that to the opaque pointer before the next iteration (an iteration in which you very probably really do make progress not limited to completing a page split, the occurrence of which the _bt_moveright() loop gets stuck on). You know, do what is done in the non-split-handling case. It may not always be the same buffer as before, obviously. Yep, fixed. Do you suppose that there are similar problems in other direct callers of _bt_finish_split()? I'm thinking in particular of _bt_findinsertloc(). Hmm, no, the other callers look OK to me. I'm also not sure about the lock escalation that may occur within _bt_moveright() for callers with BT_READ access - there is nothing to prevent a caller from ending up being left with a write lock where before they only had a read lock if they find that !P_INCOMPLETE_SPLIT() with the write lock (after lock promotion) and conclude that there is no split finishing to be done after all. It looks like all callers currently won't care about that, but it seems like that
Re: [HACKERS] alternative back-end block formats
Hi Craig, On Sun, Jan 26, 2014 at 5:47 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 01/21/2014 07:43 PM, Christian Convey wrote: Hi all, I'm playing around with Postgres, and I thought it might be fun to experiment with alternative formats for relation blocks, to see if I can get smaller files and/or faster server performance. It's not clear how you'd do this without massively rewriting the guts of Pg. Per the docs on internal structure, Pg has a block header, then tuples within the blocks, each with a tuple header and list of Datum values for the tuple. Each Datum has a generic Datum header (handling varlena vs fixed length values etc) then a type-specific on-disk representation controlled by the type output function for that type. I'm still in the process of getting familiar with the pg backend code, so I don't have a concrete plan yet. However, I'm working on the assumption that some set of macros and functions encapsulates the page layout. If/when I tackle this, I expect to add a layer of indirection somewhere around that boundary, so that some non-catalog tables, whose schemas meet certain simplifying assumptions, are read and modified using specialized code. I don't want to get into the specific optimizations I'd like to try, only because I haven't fully studied the code yet, so I don't want to put my foot in my mouth. What concrete problem do you mean to tackle? What idea do you want to explore or implement? My real motivation is that I'd like to get more familiar with the pg backend codebase, and tilting at this windmill seemed like an interesting way to accomplish that. If I was focused on really solving a real-world problem, I'd say that this lays the groundwork for table-schema-specific storage optimizations and optimized record-filtering code. But I'd only make that argument if I planned to (a) perform a careful study with statistically significant benchmarks, and/or (b) produce a merge-worthy patch. At this point I have no intentions of doing so. My main goal really is just to have fun with the code. Does anyone know if this has been done before with Postgres? I would have assumed yes, but I'm not finding anything in Google about people having done this. AFAIK (and I don't know much in this area) the storage manager isn't very pluggable compared to the rest of Pg. Thanks for the warning. Duly noted. Kind regards, Christian
Re: [HACKERS] [PATCH] Support for pg_stat_archiver view
On Sat, Jan 25, 2014 at 3:19 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Sat, Jan 25, 2014 at 5:41 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Jan 23, 2014 at 4:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Jan 9, 2014 at 6:36 AM, Gabriele Bartolini gabriele.bartol...@2ndquadrant.it wrote: Il 08/01/14 18:42, Simon Riggs ha scritto: Not sure I see why it needs to be an SRF. It only returns one row. Attached is version 4, which removes management of SRF stages. I have been looking at v4 a bit more, and found a couple of small things: - a warning in pgstatfuncs.c - some typos and format fixing in the sgml docs - some corrections in a couple of comments - Fixed an error message related to pg_stat_reset_shared referring only to bgwriter and not the new option archiver. Here is how the new message looks: =# select pg_stat_reset_shared('popo'); ERROR: 22023: unrecognized reset target: popo HINT: Target must be bgwriter or archiver A new version is attached containing those fixes. I played also with the patch and pgbench, simulating some archive failures and successes while running pgbench and the reports given by pg_stat_archiver were correct, so I am marking this patch as Ready for committer. I refactored the patch further. * Remove ArchiverStats global variable to simplify pgarch.c. * Remove stats_timestamp field from PgStat_ArchiverStats struct because it's not required. Thanks, pgstat_send_archiver is cleaner now. I have some review comments: +s.archived_wals, +s.last_archived_wal, +s.last_archived_wal_time, +s.failed_attempts, +s.last_failed_wal, +s.last_failed_wal_time, The column names of pg_stat_archiver look not consistent at least to me. What about the followings? archive_count last_archived_wal last_archived_time fail_count last_failed_wal last_failed_time And what about archived_count and failed_count instead of respectively archive_count and failed_count? The rest of the names are better now indeed. I think that it's time to rename all the variables related to pg_stat_bgwriter. For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter. I think that it's okay to make this change as separate patch, though. Yep, this is definitely a different patch. +char last_archived_wal[MAXFNAMELEN];/* last WAL file archived */ +TimestampTz last_archived_wal_timestamp;/* last archival success */ +PgStat_Counter failed_attempts; +char last_failed_wal[MAXFNAMELEN];/* last WAL file involved in failure */ Some hackers don't like the increase of the size of the statsfile. In order to reduce the size as much as possible, we should use the exact size of WAL file here instead of MAXFNAMELEN? The first versions of the patch used a more limited size length more inline with what you say: +#define MAX_XFN_CHARS40 But this is inconsistent with xlog_internal.h. Using MAX_XFN_CHARS instead of MAXFNAMELEN has a clear advantage to reduce the size of the statistics file. Though I'm not sure how much this change improve the performance of the statistics collector, basically I'd like to use MAX_XFN_CHARS here. +(errmsg(corrupted statistics file (global) \%s\, statfile))); I think that it's better to use the view name pg_stat_bgwriter instead of internal name global here. The view name is easier-to-understand to a user. +(errmsg(corrupted statistics file (archiver) \%s\, statfile))); Same as above. What using pg_stat_archiver instead of just archive? +if (fread(myArchiverStats, 1, sizeof(myArchiverStats), + fpin) != sizeof(myArchiverStats)) +{ +ereport(pgStatRunningInCollector ? LOG : WARNING, +(errmsg(corrupted statistics file \%s\, statfile))); Same as above. Isn't it better to add something like pg_stat_archiver even here? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/26/2014 07:56 PM, Rajeev rastogi wrote: I shall rework to improve this patch. Below are the summarization of all discussions, which will be used as input for improving the patch: 1. Method of degrading the synchronous mode: a. Expose the configuration variable to a new SQL-callable functions. b. Using ALTER SYSTEM SET. c. Auto-degrade using some sort of configuration parameter as done in current patch. d. Or may be combination of above, which DBA can use depending on their use-cases. We can discuss further to decide on one of the approach. 2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with the master. 3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system. I shall propose a better approach for this. 4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode. 5. Please add more if I am missing something. I think we actually need two degrade modes: A. degrade once: if the sync standby connection is ever lost, degrade and do not resync. B. reconnect: if the sync standby catches up again, return it to sync status. The reason you'd want degrade once is to avoid the flaky network issue where you're constantly degrading then reattaching the sync standby, resulting in horrible performance. If we did offer degrade once though, we'd need some easy way to determine that the master was in a state of permanent degrade, and a command to make it resync. Discuss? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun
On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think I see some bugs in _bt_moveright(). If you examine _bt_finish_split() in detail, you'll see that it doesn't just drop the write buffer lock that the caller will have provided (per its comments) - it also drops the buffer pin. It calls _bt_insert_parent() last, which was previously only called last thing by _bt_insertonpg() (some of the time), and _bt_insertonpg() is indeed documented to drop both the lock and the pin. And if you look at _bt_moveright() again, you'll see that there is a tacit assumption that the buffer pin isn't dropped, or at least that opaque (the BTPageOpaque BT special page area pointer) continues to point to the same page held in the same buffer, even though the code doesn't set the opaque' pointer again and it may not point to a pinned buffer or even the appropriate buffer. Ditto page. So opaque may point to the wrong thing on subsequent iterations - you don't do anything with the return value of _bt_getbuf(), unlike all of the existing call sites. I believe you should store its return value, and get the held page and the special pointer on the page, and assign that to the opaque pointer before the next iteration (an iteration in which you very probably really do make progress not limited to completing a page split, the occurrence of which the _bt_moveright() loop gets stuck on). You know, do what is done in the non-split-handling case. It may not always be the same buffer as before, obviously. Yep, fixed. Can you explain what the fix was, please? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
Heikki Linnakangas wrote: I spent some time whacking this around, new patch version attached. I moved the mmap() code into a new function, that leaves the PGSharedMemoryCreate more readable. Did this patch go anywhere? Someone just pinged me about a kernel scalability problem in Linux with huge pages; if someone did performance measurements with this patch, perhaps it'd be good to measure again with the kernel patch in place. https://lkml.org/lkml/2014/1/26/227 -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
Hi! On Tue, Jan 21, 2014 at 3:24 AM, Marti Raudsepp ma...@juffo.org wrote: On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov aekorot...@gmail.com wrote: On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp ma...@juffo.org wrote: I've been trying it out in a few situations. I implemented a new enable_partialsort GUC to make it easier to turn on/off, this way it's a lot easier to test. The attached patch applies on top of partial-sort-5.patch I though about such option. Generally not because of testing convenience, but because of overhead of planning. This way you implement it is quite naive :) I don't understand. I had another look at this and cost_sort still seems like the best place to implement this, since that's where the patch decides how many pre-sorted columns to use. Both mergejoin and simple order-by plans call into it. If enable_partialsort=false then I skip all pre-sorted options except full sort, making cost_sort behave pretty much like it did before the patch. I could change pathkeys_common to return 0, but that seems like a generic function that shouldn't be tied to partialsort. The old code paths called pathkeys_contained_in anyway, which has similar complexity. (Apart for initial_cost_mergejoin, but that doesn't seem special enough to make an exception for). Or should I use?: enable_partialsort ? pathkeys_common(...) : 0 For instance, merge join rely on partial sort which will be replaced with simple sort. Are you saying that enable_partialsort=off should keep partialsort-based mergejoins enabled? Or are you saying that merge joins shouldn't use simple sort at all? But merge join was already able to use full Sort nodes before your patch. Sorry that I didn't explained it. In particular I mean following: 1) With enable_partialsort = off all mergejoin logic should behave as without partial sort patch. 2) With partial sort patch get_cheapest_fractional_path_for_pathkeys function is much more expensive to execute. With enable_partialsort = off it should be as cheap as without partial sort patch. I'll try to implement this option in this week. For now, I have attempt to fix extra columns in mergejoin problem. It would be nice if you test it. -- With best regards, Alexander Korotkov. partial-sort-7.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun
On Mon, Jan 27, 2014 at 10:58 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Okay, promise not to laugh. I did write a bunch of hacks, to generate graphviz .dot files from the btree pages, and render them into pictures. It consist of multiple parts, all in the attached tarball. It's funny that you should say that, because I was thinking about building something similar over the past few days. I find it very useful to build ad-hoc tools like that, and I thought that it would be particularly useful to have something visual for testing btree code. Sure, you can come up with a test and keep the details straight in your head while coordinating everything, but that won't scale as new testing scenarios inevitably occur to you. I've done plenty of things with contrib/pageinspect along these lines in the past, but this is better. Anything that reduces the friction involved in building an accurate mental model of things is a very good thing. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On Mon, Jan 27, 2014 at 12:58 PM, Simon Riggs si...@2ndquadrant.com wrote: On 24 January 2014 08:33, Simon Riggs si...@2ndquadrant.com wrote: On 24 January 2014 07:08, Peter Geoghegan p...@heroku.com wrote: On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs si...@2ndquadrant.com wrote: v15 to fix the above problem. v16 attached v17 attached This version adds a GUC called ddl_exclusive_locks which allows us to keep the 9.3 behaviour if we wish it. Some people may be surprised that their programs don't wait in the same places they used to. We hope that is a positive and useful behaviour, but it may not always be so. I'll commit this on Thurs 30 Jan unless I hear objections. I haven't reviewed the patch, but -1 for adding a GUC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas robertmh...@gmail.com wrote: I haven't reviewed the patch, but -1 for adding a GUC. I'm pretty surprised that it's been suggested that some people might prefer AccessExclusiveLocks. Why would anyone prefer that? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: On Sun, Jan 26, 2014 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Meh. This line of argument seems to reduce to we don't need to worry about performance of this code path because it won't be reached often. I think I may have over-elaborated, giving you the false impression that this was something I felt strongly about. I'm glad that the overhead has been shown to be quite low, and I think that lexing without the lock held will be fine. OK. Committed after a couple of small further revisions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
Hi, Sergey Muraviov sergey.k.murav...@gmail.com writes: Now patch applies cleanly and works. :-) Cool ;-) But I have some notes: 1. There is an odd underscore character in functions find_in_extension_control_path and list_extension_control_paths: \extension_control__path\ Fixed in the new version of the patch, attached. 2. If we have several versions of one extension in different directories (which are listed in extension_control_path parameter) then we get strange output from pg_available_extensions and pg_available_extension_versions views (Information about extension, whose path is at the beginning of the list, is duplicated). And only one version of the extension can be created. Fixed. 3. It would be fine to see an extension control path in pg_available_extensions and pg_available_extension_versions views (in separate column or within of extension name). I think the on-disk location is an implementation detail and decided in the attached version not to change those system view definitions. 4. Perhaps the CREATE EXTENSION command should be improved to allow creation of the required version of the extension. So we can use different versions of extensions in different databases. Fixed in the attached. I also fixed ALTER EXTENSION UPDATE to search for udpate scripts in the same directory where the main control file is found, but I suspect this part requires more thinking. When we ALTER EXTENSION UPDATE we might now have several places where we find extname.control files, with possibly differents default_version properties. In the attached, we select the directory containing the control file where default_version matches the already installed extension version. That matches with a model where the new version of the extension changes the default_version in an auxiliary file. We might want to instead match on the default_version in the control file to match with the new version we are asked to upgrade to. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 5773,5778 SET XML OPTION { DOCUMENT | CONTENT }; --- 5773,5827 variablelist + varlistentry id=guc-extension-control-path xreflabel=extension_control_path + termvarnameextension_control_path/varname (typestring/type)/term + indexterm +primaryvarnameextension_control_path/ configuration parameter/primary + /indexterm + indextermprimaryextension packaging// + listitem +para + The command commandCREATE EXTENSION/ searches for the extension + control file in order to install it. The value + for varnameextension_control_path/varname is used to search for + the literalname.control/literal files. +/para + +para + The value for varnameextension_control_path/varname must be a list + of absolute directory paths separated by colons (or semi-colons on + Windows). If a list element starts with the special + string literal$extdir/literal, the + compiled-in productnamePostgreSQL/productname package extension + directory is substituted for literal$extdir/literal; this is where + the extensions provided by the standard + productnamePostgreSQL/productname distribution are installed. + (Use literalpg_config --extdir/literal to find out the name of + this directory.) For example: + programlisting + extension_control_path = '/usr/local/postgresql/extension:/home/my_project:$extdir' + /programlisting + or, in a Windows environment: + programlisting + extension_control_path = 'C:\tools\postgresql;H:\my_project\lib;$extdir' + /programlisting +/para + +para + The default value for this parameter is literal'$extdir'/literal. +/para + +para + This parameter can be changed at run time by superusers, but a + setting done that way will only persist until the end of the + client connection, so this method should be reserved for + development purposes. The recommended way to set this parameter + is in the filenamepostgresql.conf/filename configuration + file. +/para + /listitem + /varlistentry + varlistentry id=guc-dynamic-library-path xreflabel=dynamic_library_path termvarnamedynamic_library_path/varname (typestring/type)/term indexterm *** a/src/backend/commands/extension.c --- b/src/backend/commands/extension.c *** *** 25,30 --- 25,31 #include dirent.h #include limits.h + #include sys/stat.h #include unistd.h #include access/htup_details.h *** *** 60,71 --- 61,76 bool creating_extension = false; Oid CurrentExtensionObject = InvalidOid; + /* GUC extension_control_path */ + char *Extension_control_path; + /* *
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Mon, Jan 27, 2014 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: OK. Committed after a couple of small further revisions. Great. Thank you. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
Peter Geoghegan p...@heroku.com writes: On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas robertmh...@gmail.com wrote: I haven't reviewed the patch, but -1 for adding a GUC. I'm pretty surprised that it's been suggested that some people might prefer AccessExclusiveLocks. Why would anyone prefer that? For one thing, so they can back this out if it proves to be broken, as the last committed version was. Given that this patch was marked (by its author) as Ready for Committer without any review in the current CF, I can't say that I have any faith in it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
Andrew Dunstan escribió: Note that we can only do this when the result type stays the same. It does not for json_each/json_each_text or json_extract_path/json_extract_path_text, which is why we have different functions for those cases. In C code, if I extract a value using json_object_field or json_array_element, is there a way to turn it into the dequoted version, that is, the value that I would have gotten had I called json_object_field_text or json_array_element_text instead? I wrote a quick and dirty hack in the event triggers patch that just removes the outermost and turns any \ into , but that's probably incomplete. Does jsonfuncs.c offer any way to do this? That might be useful for the crowd that cares about the detail being discussed in this subthread. Thanks, -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On 27 January 2014 20:35, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas robertmh...@gmail.com wrote: I haven't reviewed the patch, but -1 for adding a GUC. I'm pretty surprised that it's been suggested that some people might prefer AccessExclusiveLocks. Why would anyone prefer that? Nobody has said prefer. I said... Some people may be surprised that their programs don't wait in the same places they used to. We hope that is a positive and useful behaviour, but it may not always be so. We get the new behaviour by default and I expect we'll be very happy with it. A second thought is that if we have problems of some kind in the field as a result of the new lock modes then we will be able to turn them off. I'm happy to fix any problems that occur, but that doesn't mean there won't be any. If everybody is confident that we've foreseen every bug, then no problem, lets remove it. I recall being asked to add hot_standby = on | off for similar reasons. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
On 01/27/2014 03:53 PM, Alvaro Herrera wrote: Andrew Dunstan escribió: Note that we can only do this when the result type stays the same. It does not for json_each/json_each_text or json_extract_path/json_extract_path_text, which is why we have different functions for those cases. In C code, if I extract a value using json_object_field or json_array_element, is there a way to turn it into the dequoted version, that is, the value that I would have gotten had I called json_object_field_text or json_array_element_text instead? I wrote a quick and dirty hack in the event triggers patch that just removes the outermost and turns any \ into , but that's probably incomplete. Does jsonfuncs.c offer any way to do this? That might be useful for the crowd that cares about the detail being discussed in this subthread. I'm not sure I understand the need. This is the difference between the _text variants and their parents. Why would you call json_object_field when you want the dequoted text? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
Andrew Dunstan escribió: I'm not sure I understand the need. This is the difference between the _text variants and their parents. Why would you call json_object_field when you want the dequoted text? Because I first need to know its type. Sometimes it's an array, or an object, or a boolean, and for those I won't call the _text version afterwards but just use the original. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On 27 January 2014 20:47, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan p...@heroku.com writes: On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas robertmh...@gmail.com wrote: I haven't reviewed the patch, but -1 for adding a GUC. I'm pretty surprised that it's been suggested that some people might prefer AccessExclusiveLocks. Why would anyone prefer that? For one thing, so they can back this out if it proves to be broken, as the last committed version was. Agreed Given that this patch was marked (by its author) as Ready for Committer without any review in the current CF True. The main review happened in a previous commitfest and there was a small addition for this CF. It was my understanding that you wanted us to indicate that to allow you to review. I am happy to set status differently, as you wish, presumably back to needs review. I can't say that I have any faith in it. That's a shame. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On 26 January 2014 12:58, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-25 20:26:16 -0800, Peter Geoghegan wrote: Shouldn't this patch be in the January commitfest? I think we previously concluded that there wasn't much chance to get this into 9.4 and there's significant work to be done on the patch before new reviews are required, so not submitting it imo makes sense. I think we should make this a priority feature for 9.5 -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Mon, Jan 27, 2014 at 4:45 AM, Andrew Dunstan and...@dunslane.net wrote: I personally don't give a tinker's cuss about whether the patch slows down pg_stat_statements a bit. Why not? The assurance that the overhead is generally very low is what makes it possible to install it widely usually without a second thought. That's hugely valuable. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INTERVAL overflow detection is terribly broken
On Sun, Jan 26, 2014 at 02:13:03PM +0100, Florian Pflug wrote: On Jan26, 2014, at 03:50 , Bruce Momjian br...@momjian.us wrote: Patch attached. + if ((float)tm-tm_year * MONTHS_PER_YEAR + tm-tm_mon INT_MAX) + return -1; Is this bullet-proof? If float and int are both 32-bit, the float's mantissa will be less than 32-bit (24 or so, I think), and thus won't be able to represent INT_MAX accurately. This means there's a value of tm_year * MONTHS_PER_YEAR which is less than INT_MAX, yet for which tm_year * MONTHS_PER_YEAR + tm_mon will return tm_year * MONTHS_PER_YEAR unmodified if tm_mon is small enough (e.g, 1). But if tm_year * MONTHS_PER_YEAR was close enough to INT_MAX, the actual value of tm_year * MONTHS_PER_YEAR + tm_mon might still be larger than INT_MAX, the floating-point computation just won't detect it. In that case, the check would succeed, yet the actual integer computation would still overflow. It should be safe with double instead of float. You are correct. I have adjusted the code to use double. + if (SAMESIGN(span1-month, span2-month) + !SAMESIGN(result-month, span1-month)) This assumes wrapping semantics for signed integral types, which isn't guaranteed by the C standard - it says overflows of signed integral types produce undefined results. We currently depend on wrapping semantics for these types in more places, and therefore need GCC's -fwrapv anway, but I still wonder if adding more of these kinds of checks is a good idea. Well, I copied this from int.c, so what I did was to mention the other function I got this code from. If we ever change int.c we would need to change this too. The updated attached patch has overflow detection for interval subtraction, multiply, and negate. There are also some comparison cleanups. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml new file mode 100644 index 6bf4cf6..b7d7d80 *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *** SELECT E'\\xDEADBEEF'; *** 1587,1593 /row row entrytypeinterval [ replaceablefields/replaceable ] [ (replaceablep/replaceable) ]/type/entry ! entry12 bytes/entry entrytime interval/entry entry-17800 years/entry entry17800 years/entry --- 1587,1593 /row row entrytypeinterval [ replaceablefields/replaceable ] [ (replaceablep/replaceable) ]/type/entry ! entry16 bytes/entry entrytime interval/entry entry-17800 years/entry entry17800 years/entry diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c new file mode 100644 index a61b40e..70284e9 *** a/src/backend/utils/adt/datetime.c --- b/src/backend/utils/adt/datetime.c *** DecodeInterval(char **field, int *ftype, *** 2976,2981 --- 2976,2983 type = DTK_MONTH; if (*field[i] == '-') val2 = -val2; + if (((double)val * MONTHS_PER_YEAR + val2) INT_MAX) + return DTERR_FIELD_OVERFLOW; val = val * MONTHS_PER_YEAR + val2; fval = 0; } diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c new file mode 100644 index 4581862..3b8e8e8 *** a/src/backend/utils/adt/timestamp.c --- b/src/backend/utils/adt/timestamp.c *** *** 41,46 --- 41,47 #error -ffast-math is known to break this code #endif + #define SAMESIGN(a,b) (((a) 0) == ((b) 0)) /* Set at postmaster start */ TimestampTz PgStartTime; *** interval2tm(Interval span, struct pg_tm *** 1694,1700 #ifdef HAVE_INT64_TIMESTAMP tfrac = time / USECS_PER_HOUR; time -= tfrac * USECS_PER_HOUR; ! tm-tm_hour = tfrac; /* could overflow ... */ tfrac = time / USECS_PER_MINUTE; time -= tfrac * USECS_PER_MINUTE; tm-tm_min = tfrac; --- 1695,1705 #ifdef HAVE_INT64_TIMESTAMP tfrac = time / USECS_PER_HOUR; time -= tfrac * USECS_PER_HOUR; ! tm-tm_hour = tfrac; ! if (!SAMESIGN(tm-tm_hour, tfrac)) ! ereport(ERROR, ! (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), ! errmsg(interval out of range))); tfrac = time / USECS_PER_MINUTE; time -= tfrac * USECS_PER_MINUTE; tm-tm_min = tfrac; *** recalc: *** 1725,1731 int tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span) { ! span-month = tm-tm_year * MONTHS_PER_YEAR + tm-tm_mon; span-day = tm-tm_mday; #ifdef HAVE_INT64_TIMESTAMP span-time = (tm-tm_hour * INT64CONST(60)) + --- 1730,1740 int tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span) { ! double total_months = (double)tm-tm_year * MONTHS_PER_YEAR + tm-tm_mon; ! ! if (total_months INT_MAX || total_months INT_MIN) ! return -1; ! span-month
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 01/27/2014 04:37 PM, Peter Geoghegan wrote: On Mon, Jan 27, 2014 at 4:45 AM, Andrew Dunstan and...@dunslane.net wrote: I personally don't give a tinker's cuss about whether the patch slows down pg_stat_statements a bit. Why not? The assurance that the overhead is generally very low is what makes it possible to install it widely usually without a second thought. That's hugely valuable. I care very much what the module does to the performance of all statements. But I don't care much if selecting from pg_stat_statements itself is a bit slowed. Perhaps I didn't express myself as clearly as I could have. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INTERVAL overflow detection is terribly broken
On Mon, Jan 27, 2014 at 04:47:22PM -0500, Bruce Momjian wrote: The updated attached patch has overflow detection for interval subtraction, multiply, and negate. There are also some comparison cleanups. Oh, one odd thing about this patch. I found I needed to use INT64_MAX, but I don't see it used anywhere else in our codebase. Is this OK? Is there a better way? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On 25 January 2014 02:21, Florian Pflug f...@phlo.org wrote: I've now split this up into invtrans_base: Basic machinery plus tests with SQL-language aggregates invtrans_arith: COUNT(),SUM(),AVG(),STDDEV() and the like invtrans_minmax: MIN(),MAX(),BOOL_AND(),BOOL_OR() invtrans_collecting: ARRAY_AGG(), STRING_AGG() invtrans_docs: Documentation Thanks. That makes it a bit easier to review, and hopefully easier to commit. The thing that I'm currently trying to get my head round is the null/not null, strict/not strict handling in this patch, which I find a bit odd, particularly when transfn and inv_transfn have different strictness settings: strict transfn vs non-strict inv_transfn This case is explicitly forbidden, both in CREATE AGGREGATE and in the executor. To me, that seems overly restrictive --- if transfn is strict, then you know for sure that no NULL values are added to the aggregate state, so surely it doesn't matter whether or not inv_transfn is strict. It will never be asked to remove NULL values. I think there are definite use-cases where a user might want to use a pre-existing function as the inverse transition function, so it seems harsh to force them to wrap it in a strict function in this case. AFAICS, the code in advance/retreat_windowaggregate should just work if those checks are removed. non-strict transfn vs strict inv_transfn At first this seems as though it must be an error --- the forwards transition function allows NULL values into the aggregate state, and the inverse transition function is strict, so it cannot remove them. But actually what the patch is doing in this case is treating the forwards transition function as partially strict --- it won't be passed NULL values (at least in a window context), but it may be passed a NULL state in order to build the initial state when the first non-NULL value arrives. It looks like this behaviour is intended to support aggregates that ignore NULL values, but cannot actually be made to have a strict transition function. I think typically this is because the aggregate's initial state is NULL and it's state type differs from the type of the values being aggregated, and so can't be automatically created from the first non-NULL value. That all seems quite ugly though, because now you have a transition function that is not strict, which is passed NULL values when used in a normal aggregate context, and not passed NULL values when used in a window context (whether sliding or not), except for the NULL state for the first non-NULL value. I'm not sure if there is a better way to do it though. If we disallow this case, these aggregates would have to use non-strict functions for both the forward and inverse transition functions, which means they would have to implement their own non-NULL value counting. Alternatively, allowing strict transition functions for these aggregates would require that we provide some other way to initialise the state from the first non-NULL input, such as a new initfn. Thoughts? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A minor correction in comment in heaptuple.c
Bruce Momjian br...@momjian.us wrote: On Tue, Jun 18, 2013 at 11:04:25AM -0700, Kevin Grittner wrote: D'Arcy J.M. Cain da...@druid.net Although, the more I think about it, the more I think that the comment is both confusing and superfluous. The code itself is much clearer. Seriously, if there is any comment there at all, it should be a succinct explanation for why we didn't do this Is everyone OK with me applying this patch from Kevin, attached? I guess my intent was misunderstood -- what I was trying to get across was that the comment added exactly nothing to what you could get more quickly by reading the code below it. I felt the comment should either be removed entirely, or a concise explanation of why it was right thing to do should be there rather than just echoing the code in English. I wasn't suggesting applying the mini-patch, but suggesting that *if* we have a comment there at all, it should make clear why such a patch would be wrong; i.e., why is it is OK to have attnum tupdesc-natts here? How would we get to such a state, and why is NULL the right thing for it? Such a comment would actually help someone trying to understand the code, rather than wasting their time. After all, in the same file we have this: /* Check for caller error */ if (attnum = 0 || attnum slot-tts_tupleDescriptor-natts) elog(ERROR, invalid attribute number %d, attnum); An explanation of why it is caller error one place and not another isn't a waste of space. (which passes `make check-world`) And I was disappointed that our regression tests don't actually exercise that code path, which would be another good way to make the point. So anyway, *I* would object to applying that; it was meant to illustrate what the comment, if any, should cover; not to be an actual code change. I don't think the change that was pushed helps that comment carry its own weight, either. If we can't do better than that, we should just drop it. I guess I won't try to illustrate a point *that* particular way again -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Mon, Jan 27, 2014 at 2:01 PM, Andrew Dunstan and...@dunslane.net wrote: I care very much what the module does to the performance of all statements. But I don't care much if selecting from pg_stat_statements itself is a bit slowed. Perhaps I didn't express myself as clearly as I could have. Oh, I see. Of course the overhead of calling the pg_stat_statements() function is much less important. Actually, I think that calling that function is going to add a lot of noise to any benchmark aimed at measuring added overhead as the counters struct is expanded. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INTERVAL overflow detection is terribly broken
Bruce Momjian br...@momjian.us writes: Oh, one odd thing about this patch. I found I needed to use INT64_MAX, but I don't see it used anywhere else in our codebase. Is this OK? Is there a better way? Most of the overflow tests in int.c and int8.c are coded to avoid relying on the MIN or MAX constants; which seemed like better style at the time. I'm not sure whether relying on INT64_MAX to exist is portable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Hi Stephen, Thanks for your comments. * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote: Is somebody available to volunteer to review the custom-scan patch? I looked through it a bit and my first take away from it was that the patches to actually use the new hooks were also making more changes to the backend code, leaving me with the impression that the proposed interface isn't terribly stable. Perhaps those changes should have just been in the first patch, but they weren't and that certainly gave me pause. Yes, the part-1 patch provides a set of interface portion to interact between the backend code and extension code. Rest of part-2 and part-3 portions are contrib modules that implements its feature on top of custom-scan API. I'm also not entirely convinced that this is the direction to go in when it comes to pushing down joins to FDWs. While that's certainly a goal that I think we all share, this seems to be intending to add a completely different feature which happens to be able to be used for that. For FDWs, wouldn't we only present the FDW with the paths where the foreign tables for that FDW, or perhaps just a given foreign server, are being joined? FDW's join pushing down is one of the valuable use-cases of this interface, but not all. As you might know, my motivation is to implement GPU acceleration feature on top of this interface, that offers alternative way to scan or join relations or potentially sort or aggregate. Probably, it is too stretch interpretation if we implement radix-sort on top of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down) is a demonstration of custom-scan interface for application, but not designed for a special purpose. Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw side, it might be an idea to have common code (like a logic to check whether the both relations to be joined belongs to same foreign server) on the backend side as something like a gateway of them. As an aside, what should be the scope of FDW interface? In my understanding, it allows extension to implement something on behalf of a particular data structure being declared with CREATE FOREIGN TABLE. In other words, extension's responsibility is to generate a view of something according to PostgreSQL' internal data structure, instead of the object itself. On the other hands, custom-scan interface allows extensions to implement alternative methods to scan or join particular relations, but it is not a role to perform as a target being referenced in queries. In other words, it is methods to access objects. It is natural both features are similar because both of them intends extensions to hook the planner and executor, however, its purpose is different. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Race condition in b-tree page deletion
On Sat, Jan 18, 2014 at 11:45 AM, Kevin Grittner kgri...@ymail.com wrote: Heikki Linnakangas hlinnakan...@vmware.com wrote: Here's a patch implementing this scheme. I thought I'd weigh in here too, since this is closely tied to the page split patch, which is dependent on this patch. The basic approach is sound. The papers on which we based our btree implementation did not discuss entry deletion, and our current approach introduces new possible states into the on-disk index structure which are not covered by the papers and are not always handled correctly by the current code. Lehman and Yao don't discuss deletion. But V. Lanin and D. Shasha. do, in the paper A Symmetric Concurrent B-Tree Algorithm[1], as the README notes - we currently follow a simplified version of that. Apparently a number of people proposed solutions to address the omission of LY, as one survey of those approaches notes [2]. One of the approaches appearing in that survey is LS. The classic LY algorithm only has right-links, and not left-links. The same applies to LS. But I'd describe this patch as bringing what we do closer to LS, which is more or less a refinement of B-Link trees (that is, LY B-Trees). LS does have a concept of an outlink, crucial to their symmetric deletion approach, which is something that we don't need, because we already have left-links (we have them primarily to support backwards index scans). LS says In general, the link technique calls for processes that change sections of a data structure in a way that would cause other processes to fail to provide a link to another section of the data structure where recovery is possible. So loosely speaking we're doing a conceptual _bt_moveleft() for deletion as the inverse of a literal _bt_moveright for insertion. LS says a deletion needs no more than two write locks at a time during its ascent (the descent is just a garden variety _bt_search() insertion scankey descent). However, we currently may obtain as many as 4 write locks for page deletion. As the README notes: To delete an empty page, we acquire write lock on its left sibling (if any), the target page itself, the right sibling (there must be one), and the parent page, in that order. We obtain 2 write locks in the first phase (on target and parent). In the second phase, we do what is described above: lock 4 pages in that order. However, the reason for this difference from the LS paper is made clear in 2.5 Freeing Empty nodes, which kind of cops out of addressing how to *unlink* empty/half-dead pages, with a couple of brief descriptions of approaches that others have come up with that are not at all applicable to us. For that reason, might I suggest that you change this in the README: + Page Deletion + - I suggest that it should read Page Deletion and Unlinking to emphasize the distinction. The general feeling is that this patch is not suitable for back-patching. I agree with this, but this is a bug which could lead to index corruption which exists in all supported branches. If nobody can suggest an alternative which we can back-patch, we might want to reconsider this after this has been in production releases for several months. That was what I thought. Let's revisit the question of back-patching this when 9.4 has been released for 6 months or so. Other than that, I have not found any problems. Incidentally, during my research of these techniques, I discovered that Johnson and Shasha argue that for most concurrent B-tree applications, merge-at-empty produces significantly lower restructuring rates, and only a slightly lower space utilization, than merge-at-half. This is covered in the paper B-trees with inserts, deletes, and modifies: Why Free-at-empty is Better than Merge-at-half [3]. I was pleased to learn that there was a sound, well regarded theoretical basis for that behavior, even if the worst-case bloat can be unpleasant. Though having said that, that paper doesn't weigh what we do in the event of non-HOT updates, and that could change the equation. That isn't an argument against merge-at-empty; it's an argument against the current handling of non-HOT updates. Currently, the comments above _bt_doinsert() say at one point: * This routine is called by the public interface routines, btbuild * and btinsert. This is obsolete; since commit 89bda95d, only the insert public interface routine calls _bt_doinsert(). Perhaps fix this in passing. [1] LS: https://archive.org/stream/symmetricconcurr00lani/symmetricconcurr00lani_djvu.txt [2] http://www.dtic.mil/get-tr-doc/pdf?AD=ADA232287Location=U2doc=GetTRDoc.pdf (Chapter 3, The Coherent Shared Memory Algorithm) [3] http://cs.nyu.edu/shasha/papers/bjournal.ps -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] git-review: linking commits to review discussion in git
Hello All, We have linked peer review discussions on 'pgsql-hackers' to their respective commits within the main postgresql.git repository. You can view the linked reviews from 2012 until present in the GitHub repo at https://github.com/mmukadam/postgres/tree/review If you want to work with these reviews locally, you can use our git-review tool. It allows you to create reviews and attach them to commits in git. We didn't modify git, instead we added some scripts that use standard git commands. git-review is beta, but since it only adds a detached 'review' branch and modifies the contents of this branch, it has minimal impact and can easily be removed by deleting the 'review' branch and scripts. The online man-page is here: http://users.encs.concordia.ca/~m_mukada/git-review.html In order to install git-review, you need to clone the repository: https://github.com/mmukadam/git-review.git The online tutorial is available here: http://users.encs.concordia.ca/~m_mukada/git-review-tutorial.html The clone of postgresql.git with linked review discussion is here (new review discussion are linked nightly) https://github.com/mmukadam/postgres This work is part of my Master's thesis. If you'd like us to change the tool to better suit your review process, have another git repo you'd like us to link commits with review discussion, or have other feedback, please let us know. Cheers, Murtuza -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On 11/30/13, 6:59 AM, Haribabu kommi wrote: To detect provided data and xlog directories are same or not, I reused the Existing make_absolute_path() code as follows. I note that initdb does not detect whether the data and xlog directories are the same. I think there is no point in addressing this only in pg_basebackup. If we want to forbid it, it should be done in initdb foremost. I'm not sure it's worth the trouble, but if I were to do it, I'd just stat() the two directories and compare their inodes. That seems much easier and more robust than comparing path strings. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On Mon, Jan 27, 2014 at 4:16 PM, Simon Riggs si...@2ndquadrant.com wrote: On 26 January 2014 12:58, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-25 20:26:16 -0800, Peter Geoghegan wrote: Shouldn't this patch be in the January commitfest? I think we previously concluded that there wasn't much chance to get this into 9.4 and there's significant work to be done on the patch before new reviews are required, so not submitting it imo makes sense. I think we should make this a priority feature for 9.5 +1. I can't think of many things we might do that would be more important. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add force option to dropdb
On 2014年1月17日 0:56, salah jubeh s_ju...@yahoo.com wrote: If the user owns objects, that will prevent this from working also. I have the feeling that adding DROP OWNED BY and/or REASSIGNED OWNED BY calls to this utility would be a bit excessive, but who knows. Please find attached the first attempt to drop drop the client connections. I have added an option -k, --kill instead of force since killing client connection does not guarantee -drop force-. Regards On Tuesday, January 14, 2014 8:06 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: salah jubeh wrote: For the sake of completeness: 1. I think also, I need also to temporary disallow conecting to the database, is that right? 2. Is there other factors can hinder dropping database? If the user owns objects, that will prevent this from working also. I have the feeling that adding DROP OWNED BY and/or REASSIGNED OWNED BY calls to this utility would be a bit excessive, but who knows. 3. Should I write two patches one for pg_version=9.2 and one for pg_version9.2 No point -- nothing gets applied to branches older than current development anyway. Thank you for the patch. And sorry for delay in reviewing. I started to look this patch, So the following is first review comment. - This patch is not patched to master branch I tried to patch this patch file to master branch, but I got following error. $ cd postgresql $ patch -d. -p1 ../dropdb.patch can't find fiel to patch at input line 3 Perhaps you used the wrong -p or --strip option? the text leading up to this was: -- |--- dropdb_org.c 2014-01-16 |+++ dropdb.c 2014-01-16 -- There is not dropdb_org.c. I think that you made mistake when the patch is created. - This patch is not according the coding rule For example, line 71 of the patch: //new connections are not allowed It should be: /* new connections are not allowed */ (Comment blocks that need specific line breaks should be formatted as block comments, where the comment starts as /*--.) Please refer to coding rule. http://wiki.postgresql.org/wiki/Developer_FAQ#What.27s_the_formatting_style_used_in_PostgreSQL_source_code.3F Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix comment typo in /src/backend/command/cluster.c
Hi all, Attached patch fixes the typo which is in src/backend/command/cluster.c. Regards, --- Sawada Masahiko fix_typo-cluster.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c
On 01/27/2014 10:24 PM, Sawada Masahiko wrote: Hi all, Attached patch fixes the typo which is in src/backend/command/cluster.c. Are you sure that's a typo? iff is usually short hand for if and only if. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 01/28/2014 12:09 AM, Simon Riggs wrote: On 27 January 2014 15:04, Dean Rasheed dean.a.rash...@gmail.com wrote: So for example, when planning the query to update an inheritance child, the rtable will contain an RTE for the parent, but it will not be referenced in the parse tree, and so it will not be expanded while planning the child update. Am I right in thinking that we have this fully working now? I haven't found any further problems, though I've been focusing more on reworking RLS on top of it. AFAICS the only area of objection is the handling of inherited relations, which occurs within the planner in the current patch. I can see that would be a cause for concern since the planner is pluggable and it would then be possible to bypass security checks. Obviously installing a new planner isn't trivial, but doing so shouldn't cause collateral damage. FWIW, I don't see any way _not_ to do that in RLS. If there are security quals on a child table, they must be added, and that can only happen once inheritance expansion happens. That's in the planner. I don't see it as acceptable to ignore security quals on child tables, and if we can't, we've got to do some work in the planner. (I'm starting to really loathe inheritance). We have long had restrictions around updateable views. My suggestion from here is that we accept the restriction that we cannot yet have the 3-way combination of updateable views, security views and views on inherited tables. That prevents the use of updatable security barrier views over partitioned tables, and therefore prevents row-security use on inherited tables. That seems like a very big thing to close off. I'm perfectly happy having that limitation for 9.4, we just need to make it possible to remove the limitation later. Most people aren't using inherited tables Again, because we (ab)use them for paritioning, I'm not sure they're as little-used as I'd like. and people that are have special measures in place for their apps. We won't lose much by accepting that restriction for 9.4 and re-addressing the issue in a later release. Yep, I'd be happy with that. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
On Mon, Jan 27, 2014 at 9:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kapil...@gmail.com writes: On Fri, Jan 24, 2014 at 9:10 AM, Amit Kapila amit.kapil...@gmail.com wrote: To proceed with the review of this patch, I need to know about whether appending version number or any other constant togli Default Event Source name is acceptable or not, else for now we can remove this part of code from patch and handle non-default case where the change will be that pg_ctl will enquire non-default event_source value from server. Could you please let me know your views about same? Unless I'm missing something, this entire thread is a tempest in a teapot, because the default event_source value does not matter, because *by default we don't log to eventlog*. The presumption is that if the user turns on logging to eventlog, it's his responsibility to first make sure that event_source is set to something appropriate. And who's to say that plain PostgreSQL isn't what he wanted, anyway? Even if he's got multiple servers on one machine, maybe directing all their logs to the same place is okay by him. I think it's matter of user preference, how exactly he wants the setup and as currently we don't have any strong reason to change default, so lets keep it intact. Also, those who don't run multiple servers are probably not going to thank us for moving their logs around unnecessarily. In short, I think we should just reject this idea as introducing more problems than it solves, and not fully solving even the problem it purports to solve. Possibly there's room for a documentation patch reminding users to make sure that event_source is set appropriately before they turn on eventlog. Okay, but in that case also right now pg_ctl doesn't know the value of event source, so I think thats a clear bug and we should go ahead and fix it. As you said, I think we can improve documentation in this regard so that user will be able to setup event log without any such problems. As part of this patch we can fix the issue (make pg_ctl aware for event source name) and improve documentation. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Mon, Jan 27, 2014 at 12:03 PM, Amit Kapila amit.kapil...@gmail.com wrote: I think that's a good thing to try. Can you code it up? I have tried to improve algorithm in another way so that we can get benefit of same chunks during find match (something similar to lz). The main change is to consider chunks at fixed boundary (4 byte) and after finding match, try to find if there is a longer match than current chunk. While finding longer match, it still takes care that next bigger match should be at chunk boundary. I am not completely sure about the chunk boundary may be 8 or 16 can give better results. I think now we can once run with this patch on high end m/c. Here are the results I got on the community PPC64 box. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company pgrb-v5 test.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Mon, Jan 27, 2014 at 9:26 PM, Alexander Korotkov aekorot...@gmail.com wrote: For now, I have attempt to fix extra columns in mergejoin problem. It would be nice if you test it. Yes, it solves the test cases I was trying with, thanks. 1) With enable_partialsort = off all mergejoin logic should behave as without partial sort patch. 2) With partial sort patch get_cheapest_fractional_path_for_pathkeys function is much more expensive to execute. With enable_partialsort = off it should be as cheap as without partial sort patch. When it comes to planning time, I really don't think you should bother. The planner enable_* settings are meant for troubleshooting, debugging and learning about the planner. You should not expect people to disable them in a production setting. It's not worth complicating the code for that rare case. This is stated in the documentation (http://www.postgresql.org/docs/current/static/runtime-config-query.html) and repeatedly on the mailing lists. But some benchmarks of planning performance are certainly warranted. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c
On 28/01/14 16:33, Andrew Dunstan wrote: On 01/27/2014 10:24 PM, Sawada Masahiko wrote: Hi all, Attached patch fixes the typo which is in src/backend/command/cluster.c. Are you sure that's a typo? iff is usually short hand for if and only if. cheers andrew Certainly, that is how I would interpret it. I came across that abbreviation in a first years Maths course Principles of Mathematics in 1968 at the University of Auckland.. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INTERVAL overflow detection is terribly broken
On Mon, Jan 27, 2014 at 07:19:21PM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Oh, one odd thing about this patch. I found I needed to use INT64_MAX, but I don't see it used anywhere else in our codebase. Is this OK? Is there a better way? Most of the overflow tests in int.c and int8.c are coded to avoid relying on the MIN or MAX constants; which seemed like better style at the time. Yes, I looked at those but they seemed like overkill for interval. For a case where there was an int64 multiplied by a double, then cast back to an int64, I checked the double against INT64_MAX before casting to an int64. I'm not sure whether relying on INT64_MAX to exist is portable. The only use I found was in pgbench: #ifndef INT64_MAX #define INT64_MAX INT64CONST(0x7FFF) #endif so I have just added that to my patch, and INT64_MIN: #ifndef INT64_MIN #define INT64_MIN (-INT64CONST(0x7FFF) - 1) #endif This is only used for HAVE_INT64_TIMESTAMP. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c
On 01/27/2014 10:24 PM, Sawada Masahiko wrote: Hi all, Attached patch fixes the typo which is in src/backend/command/cluster. c. Are you sure that's a typo? iff is usually short hand for if and only if. Oops, I made mistake. Thanks! Regards, - Masahiko Sawada -- Regards, --- Sawada Masahiko
[HACKERS] pg_basebackup and pg_stat_tmp directory
Hi, The files in pg_stat_tmp directory don't need to be backed up because they are basically reset at the archive recovery. So I think it's worth changing pg_basebackup so that it skips any files in pg_stat_tmp directory. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
(2014/01/28 0:55), Atri Sharma wrote: On 27-Jan-2014, at 21:03, David Fetter da...@fetter.org wrote: On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote: Hi Hanada-san, While still reviwing this patch, I feel this patch has given enough consideration to interactions with other commands, but found the following incorrect? behabior: postgres=# CREATE TABLE product (id INTEGER, description TEXT); CREATE TABLE postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product) SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv'); CREATE FOREIGN TABLE postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE EXTERNAL; ERROR: product1 is not a table or materialized view ISTN the ALTER TABLE simple recursion mechanism (ie ATSimpleRecursion()) should be modified for the ALTER COLUMN SET STORAGE case. This points to a larger discussion about what precisely foreign tables can and cannot inherit from local ones. I don't think that a generic solution will be satisfactory, as the PostgreSQL FDW could, at least in principle, support many more than the CSV FDW, as shown above. In my estimation, the outcome of discussion above is not a blocker for this I just thought that among the structures that local tables can alter, the ones that foreign tables also can by ALTER FOREIGN TABLE are inherited, and the others are not inherited. So for the case as shown above, I thought that we silently ignore executing the ALTER COLUMN SET STORAGE command for the foreign table. I wonder that would be the first step. I wonder what shall be the cases when foreign table is on a server which does not support *all* SQL features. Does a FDW need to have the possible inherit options mentioned in its documentation for this patch? The answer is no, in my understanding. The altering operation simply declares some chages for foreign tables in the inheritance tree and does nothing to the underlying storages. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao masao.fu...@gmail.com wrote: Hi, The files in pg_stat_tmp directory don't need to be backed up because they are basically reset at the archive recovery. So I think it's worth changing pg_basebackup so that it skips any files in pg_stat_tmp directory. Thought? Skipping pgstat_temp_directory in basebackup.c would make more sense than directly touching pg_basebackup. My 2c. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
AFAICS the only area of objection is the handling of inherited relations, which occurs within the planner in the current patch. I can see that would be a cause for concern since the planner is pluggable and it would then be possible to bypass security checks. Obviously installing a new planner isn't trivial, but doing so shouldn't cause collateral damage. FWIW, I don't see any way _not_ to do that in RLS. If there are security quals on a child table, they must be added, and that can only happen once inheritance expansion happens. That's in the planner. I don't see it as acceptable to ignore security quals on child tables, and if we can't, we've got to do some work in the planner. (I'm starting to really loathe inheritance). Let me ask an elemental question. What is the reason why inheritance expansion logic should be located on the planner stage, not on the tail of rewriter? Reference to a relation with children is very similar to reference of multiple tables using UNION ALL. Isn't it a crappy idea to move the logic into rewriter stage (if we have no technical reason here)? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer Sent: Tuesday, January 28, 2014 12:35 PM To: Simon Riggs; Dean Rasheed Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PostgreSQL Hackers; Kohei KaiGai; Robert Haas Subject: Re: [HACKERS] WIP patch (v2) for updatable security barrier views On 01/28/2014 12:09 AM, Simon Riggs wrote: On 27 January 2014 15:04, Dean Rasheed dean.a.rash...@gmail.com wrote: So for example, when planning the query to update an inheritance child, the rtable will contain an RTE for the parent, but it will not be referenced in the parse tree, and so it will not be expanded while planning the child update. Am I right in thinking that we have this fully working now? I haven't found any further problems, though I've been focusing more on reworking RLS on top of it. AFAICS the only area of objection is the handling of inherited relations, which occurs within the planner in the current patch. I can see that would be a cause for concern since the planner is pluggable and it would then be possible to bypass security checks. Obviously installing a new planner isn't trivial, but doing so shouldn't cause collateral damage. FWIW, I don't see any way _not_ to do that in RLS. If there are security quals on a child table, they must be added, and that can only happen once inheritance expansion happens. That's in the planner. I don't see it as acceptable to ignore security quals on child tables, and if we can't, we've got to do some work in the planner. (I'm starting to really loathe inheritance). We have long had restrictions around updateable views. My suggestion from here is that we accept the restriction that we cannot yet have the 3-way combination of updateable views, security views and views on inherited tables. That prevents the use of updatable security barrier views over partitioned tables, and therefore prevents row-security use on inherited tables. That seems like a very big thing to close off. I'm perfectly happy having that limitation for 9.4, we just need to make it possible to remove the limitation later. Most people aren't using inherited tables Again, because we (ab)use them for paritioning, I'm not sure they're as little-used as I'd like. and people that are have special measures in place for their apps. We won't lose much by accepting that restriction for 9.4 and re-addressing the issue in a later release. Yep, I'd be happy with that. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
I wonder what shall be the cases when foreign table is on a server which does not support *all* SQL features. Does a FDW need to have the possible inherit options mentioned in its documentation for this patch? The answer is no, in my understanding. The altering operation simply declares some chages for foreign tables in the inheritance tree and does nothing to the underlying storages. It seems to me Atri mention about the idea to enforce constraint when partitioned foreign table was referenced... BTW, isn't it feasible to consign FDW drivers its decision? If a foreign table has a check constraint (X BETWEEN 101 AND 200), postgres_fdw will be capable to run this check on the remote server, however, file_fdw will not be capable. It is usual job of them when qualifiers are attached on Path node. Probably, things to do is simple. If the backend appends the configured check constraint as if a user-given qualifier, FDW driver will handle it appropriately. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita Sent: Tuesday, January 28, 2014 1:08 PM To: Atri Sharma; David Fetter Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] inherit support for foreign tables (2014/01/28 0:55), Atri Sharma wrote: On 27-Jan-2014, at 21:03, David Fetter da...@fetter.org wrote: On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote: Hi Hanada-san, While still reviwing this patch, I feel this patch has given enough consideration to interactions with other commands, but found the following incorrect? behabior: postgres=# CREATE TABLE product (id INTEGER, description TEXT); CREATE TABLE postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product) SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv'); CREATE FOREIGN TABLE postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE EXTERNAL; ERROR: product1 is not a table or materialized view ISTN the ALTER TABLE simple recursion mechanism (ie ATSimpleRecursion()) should be modified for the ALTER COLUMN SET STORAGE case. This points to a larger discussion about what precisely foreign tables can and cannot inherit from local ones. I don't think that a generic solution will be satisfactory, as the PostgreSQL FDW could, at least in principle, support many more than the CSV FDW, as shown above. In my estimation, the outcome of discussion above is not a blocker for this I just thought that among the structures that local tables can alter, the ones that foreign tables also can by ALTER FOREIGN TABLE are inherited, and the others are not inherited. So for the case as shown above, I thought that we silently ignore executing the ALTER COLUMN SET STORAGE command for the foreign table. I wonder that would be the first step. I wonder what shall be the cases when foreign table is on a server which does not support *all* SQL features. Does a FDW need to have the possible inherit options mentioned in its documentation for this patch? The answer is no, in my understanding. The altering operation simply declares some chages for foreign tables in the inheritance tree and does nothing to the underlying storages. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime
On Mon, Jan 27, 2014 at 7:48 PM, Amit Langote amitlangot...@gmail.com wrote: On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila amit.kapil...@gmail.com wrote: I have extended test (contrib) module dsm_demo such that now user can specify during dsm_demo_create the lifespan of segment. Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch. Got the following warning when I tried above example: postgres=# select dsm_demo_create('this message is from session-new', 1); WARNING: dynamic shared memory leak: segment 1402373971 still referenced WARNING: dynamic shared memory leak: segment 1402373971 still referenced dsm_demo_create - 1402373971 (1 row) Thanks for verification. The reason is that resource owner has reference to segment till commit time which leads to this warning. The fix is to remove reference from resource owner when user calls this new API dsm_keep_segment, similar to what currently dsm_keep_mapping does. Find the updated patch to fix this problem attached with this mail. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com dsm_keep_segment_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Tue, Jan 28, 2014 at 1:08 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao masao.fu...@gmail.com wrote: Hi, The files in pg_stat_tmp directory don't need to be backed up because they are basically reset at the archive recovery. So I think it's worth changing pg_basebackup so that it skips any files in pg_stat_tmp directory. Thought? Skipping pgstat_temp_directory in basebackup.c would make more sense than directly touching pg_basebackup. My 2c. Yeah, that's what I was thinking :) Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and pg_stat_tmp directory
On Tue, Jan 28, 2014 at 9:26 AM, Fujii Masao masao.fu...@gmail.com wrote: Hi, The files in pg_stat_tmp directory don't need to be backed up because they are basically reset at the archive recovery. So I think it's worth changing pg_basebackup so that it skips any files in pg_stat_tmp directory. Thought? I think this is good idea, but can't it also avoid PGSTAT_STAT_PERMANENT_TMPFILE along with temp files in pg_stat_tmp With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c
On Tue, Jan 28, 2014 at 04:48:35PM +1300, Gavin Flower wrote: On 28/01/14 16:33, Andrew Dunstan wrote: On 01/27/2014 10:24 PM, Sawada Masahiko wrote: Hi all, Attached patch fixes the typo which is in src/backend/command/cluster.c. Are you sure that's a typo? iff is usually short hand for if and only if. cheers andrew Certainly, that is how I would interpret it. I came across that abbreviation in a first years Maths course Principles of Mathematics in 1968 at the University of Auckland.. By my rough count (ack -l '\biff\b' |wc -l), it's used to mean equivalence 81 times in the source tree. Should we have a glossary of such terms? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers