Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
Because of this refactor handing of database objects between pg_dump and pg_dumpall, the latest pg_dump tap tests are failing in the following scenarios. 1. CREATE DATABASE postgres Before this patch, the pg_dump uses to dump the CREATE DATABASE command of postgres but not by pg_dumpall. During this refactor handling, the approach that I took in pg_dump for the --create option to use the similar appraoch of pg_dumpall to not to print the CREATE DATABASE commands for "postgres" and "template1" databases. It just prints the ALTER DATABASE commands to SET the TABLESPACE for those two databases. Solution -1) Just ignore dumping these CREATE DATABASE commands and provide the user information in the documentation to create "postgres" and "template1" database in the target in case if they don't exist. If this kind of cases are very rare. Solution-2) Add a new command line option/some other settings to indicate the pg_dump execution is from pg_dumpall and follow the current refactored behavior, otherwise follow the earlier pg_dump behavior in handling CREATE DATABASE commands for "postgres" and "template1" databases. 2. In dumpDatabases function before calling the runPgDump command, Before refactoring, it used to connect to the database and dump "SET default_transaction_read_only = off;" to prevent some accidental overwrite of the target. I fixed it in the attached patch by removing the connection and dumping the set command. Does it needs the similar approach of solution-2) in previous problem and handle dumping the "SET default_transaction_read_only = off;" whenever the CREATE DATABASE and \connect command is issued? Documentation is yet to update to reflect the above changes. Regards, Hari Babu Fujitsu Australia pg_dump_changes_3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Partition-wise aggregation/grouping
Hi all, Declarative partitioning is supported in PostgreSQL 10 and work is already in progress to support partition-wise joins. Here is a proposal for partition-wise aggregation/grouping. Our initial performance measurement has shown 7 times performance when partitions are on foreign servers and approximately 15% when partitions are local. Partition-wise aggregation/grouping computes aggregates for each partition separately. If the group clause contains the partition key, all the rows belonging to a given group come from one partition, thus allowing aggregates to be computed completely for each partition. Otherwise, partial aggregates computed for each partition are combined across the partitions to produce the final aggregates. This technique improves performance because: i. When partitions are located on foreign server, we can push down the aggregate to the foreign server. ii. If hash table for each partition fits in memory, but that for the whole relation does not, each partition-wise aggregate can use an in-memory hash table. iii. Aggregation at the level of partitions can exploit properties of partitions like indexes, their storage etc. Attached an experimental patch for the same based on the partition-wise join patches posted in [1]. This patch currently implements partition-wise aggregation when group clause contains the partitioning key. A query below, involving a partitioned table with 3 partitions containing 1M rows each, producing total 30 groups showed 15% improvement over non-partition-wise aggregation. Same query showed 7 times improvement when the partitions were located on the foreign servers. Here is the sample plan: postgres=# set enable_partition_wise_agg to true; SET postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a; QUERY PLAN -- Append (cost=5100.00..61518.90 rows=30 width=12) (actual time=324.837..944.804 rows=30 loops=1) -> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual time=324.837..324.838 rows=10 loops=1) Relations: Aggregate on (public.fplt1_p1 plt1) -> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual time=309.954..309.956 rows=10 loops=1) Relations: Aggregate on (public.fplt1_p2 plt1) -> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual time=310.002..310.004 rows=10 loops=1) Relations: Aggregate on (public.fplt1_p3 plt1) Planning time: 0.370 ms Execution time: 945.384 ms (9 rows) postgres=# set enable_partition_wise_agg to false; SET postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a; QUERY PLAN --- HashAggregate (cost=121518.01..121518.31 rows=30 width=12) (actual time=6498.452..6498.459 rows=30 loops=1) Group Key: plt1.a -> Append (cost=0.00..106518.00 rows=301 width=4) (actual time=0.595..5769.592 rows=300 loops=1) -> Seq Scan on plt1 (cost=0.00..0.00 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=1) -> Foreign Scan on fplt1_p1 (cost=100.00..35506.00 rows=100 width=4) (actual time=0.587..1844.506 rows=100 loops=1) -> Foreign Scan on fplt1_p2 (cost=100.00..35506.00 rows=100 width=4) (actual time=0.384..1839.633 rows=100 loops=1) -> Foreign Scan on fplt1_p3 (cost=100.00..35506.00 rows=100 width=4) (actual time=0.402..1876.505 rows=100 loops=1) Planning time: 0.251 ms Execution time: 6499.018 ms (9 rows) Patch needs a lot of improvement including: 1. Support for partial partition-wise aggregation 2. Estimating number of groups for every partition 3. Estimating cost of partition-wise aggregation based on sample partitions similar to partition-wise join and much more. In order to support partial aggregation on foreign partitions, we need support to fetch partially aggregated results from the foreign server. That can be handled as a separate follow-on patch. Though is lot of work to be done, I would like to get suggestions/opinions from hackers. I would like to thank Ashutosh Bapat for providing a draft patch and helping me off-list on this feature while he is busy working on partition-wise join feature. [1] https://www.postgresql.org/message-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR%3DPbXj1e%3D9a%3DqMoQ%40mail.gmail.com Thanks -- Jeevan Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company pg_partwise_agg_WIP.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] btree_gin and btree_gist for enums
28.02.2017 00:22, Andrew Dunstan: OK, here's the whole series of patches. Patch 1 adds the CallerFInfoFunctionCall{1,2} functions. Patch 2 adds btree_gist support for their use for non-varlena types Patch 3 does the same for varlena types (Not required for patch 4, but better to be consistent, I think.) Patch 4 adds enum support to btree_gist Patch 5 adds enum support to btree_gin cheers andrew Thank you for implementing the feature! These patches seem to have some merge conflicts with recent commit: commit c7a9fa399d557c6366222e90b35db31e45d25678 Author: Stephen Frost Date: Wed Mar 15 11:16:25 2017 -0400 Add support for EUI-64 MAC addresses as macaddr8 And also, it's needed to update patch 0002 to consider macaddr8, I attached the diff. Complete patch (including 0002_addition fix) rebased to the current master is attached as well. It works as expected, code itself looks clear and well documented. The only issue I've found is a make check failure in contrib/btree_gin subdirectory: test numeric ... ok test enum ... /bin/sh: 1: cannot open /home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/sql/enum.sql: No such file diff: /home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out: No such file or directory diff command failed with status 512: diff "/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/expected/enum.out" "/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out" > "/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out.diff" Please, add regression test for btree_gin, and this patch can be considered "Ready for committer". -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/contrib/btree_gist/btree_macaddr8.c b/contrib/btree_gist/btree_macaddr8.c index 13238ef..96afbcd 100644 --- a/contrib/btree_gist/btree_macaddr8.c +++ b/contrib/btree_gist/btree_macaddr8.c @@ -28,37 +28,37 @@ PG_FUNCTION_INFO_V1(gbt_macad8_same); static bool -gbt_macad8gt(const void *a, const void *b) +gbt_macad8gt(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_gt, PointerGetDatum(a), PointerGetDatum(b))); } static bool -gbt_macad8ge(const void *a, const void *b) +gbt_macad8ge(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_ge, PointerGetDatum(a), PointerGetDatum(b))); } static bool -gbt_macad8eq(const void *a, const void *b) +gbt_macad8eq(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_eq, PointerGetDatum(a), PointerGetDatum(b))); } static bool -gbt_macad8le(const void *a, const void *b) +gbt_macad8le(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_le, PointerGetDatum(a), PointerGetDatum(b))); } static bool -gbt_macad8lt(const void *a, const void *b) +gbt_macad8lt(const void *a, const void *b, FmgrInfo *flinfo) { return DatumGetBool(DirectFunctionCall2(macaddr8_lt, PointerGetDatum(a), PointerGetDatum(b))); } static int -gbt_macad8key_cmp(const void *a, const void *b) +gbt_macad8key_cmp(const void *a, const void *b, FmgrInfo *flinfo) { mac8KEY*ia = (mac8KEY *) (((const Nsrt *) a)->t); mac8KEY*ib = (mac8KEY *) (((const Nsrt *) b)->t); @@ -142,7 +142,7 @@ gbt_macad8_consistent(PG_FUNCTION_ARGS) key.upper = (GBT_NUMKEY *) &kkk->upper; PG_RETURN_BOOL( - gbt_num_consistent(&key, (void *) query, &strategy, GIST_LEAF(entry), &tinfo) + gbt_num_consistent(&key, (void *) query, &strategy, GIST_LEAF(entry), &tinfo, fcinfo->flinfo) ); } @@ -154,7 +154,7 @@ gbt_macad8_union(PG_FUNCTION_ARGS) void *out = palloc0(sizeof(mac8KEY)); *(int *) PG_GETARG_POINTER(1) = sizeof(mac8KEY); - PG_RETURN_POINTER(gbt_num_union((void *) out, entryvec, &tinfo)); + PG_RETURN_POINTER(gbt_num_union((void *) out, entryvec, &tinfo, fcinfo->flinfo)); } @@ -184,7 +184,7 @@ gbt_macad8_picksplit(PG_FUNCTION_ARGS) PG_RETURN_POINTER(gbt_num_picksplit( (GistEntryVector *) PG_GETARG_POINTER(0), (GIST_SPLITVEC *) PG_GETARG_POINTER(1), - &tinfo + &tinfo, fcinfo->flinfo )); } @@ -195,6 +195,6 @@ gbt_macad8_same(PG_FUNCTION_ARGS) mac8KEY*b2 = (mac8KEY *) PG_GETARG_POINTER(1); bool *result = (bool *) PG_GETARG_POINTER(2); - *result = gbt_num_same((void *) b1, (void *) b2, &tinfo); + *result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo); PG_RETURN_POINTER(result); } commit 50f4a4efcb9ed3ae5e0378676459c6d1ce455bd2 Author: Anastasia Date: Tue Mar 21 10:11:37 2017 +0300 complete btree_gin_gist_enum patch diff --git a/contrib/btree_gin/Makefile b/contrib/btree_gin/Makefile index f22e4af..589755f 100644 --- a/contrib/btree_gin/Make
Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags
On Tue, Mar 21, 2017 at 3:16 PM, Seki, Eiji wrote: > > > Thank you for you review. > > I reflected your comment and attach the updated patch. Thanks for the updated patch. +/* Use these flags in GetOldestXmin as "flags" */ How about some thing like the following. /* Use the following flags as an input "flags" to GetOldestXmin function */ +/* Ignore vacuum backends (Note: this also ignores analyze with vacuum backends) */ +#define PROCARRAY_FLAGS_VACUUM PROCARRAY_FLAGS_DEFAULT | PROCARRAY_VACUUM_FLAG +/* Ignore analyze backends (Note: this also ignores vacuum with analyze backends) */ +#define PROCARRAY_FLAGS_ANALYZE PROCARRAY_FLAGS_DEFAULT | PROCARRAY_ANALYZE_FLAG Whenever the above flags are passed to the GetOldestXmin() function, it just verifies whether any one of the flags are set in the backend flags or not. And also actually the PROC_IN_ANALYZE flag will set when analyze operation is started and reset at the end. I feel, it is not required to mention the Note section. +/* Ignore vacuum backends and analyze ones */ How about changing it as "Ignore both vacuum and analyze backends". Regards, Hari Babu Fujitsu Australia
[HACKERS] segfault in hot standby for hash indexes
Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby. Using the attached files, I start the test case like this: nice sh do_nocrash_sr.sh >& do_nocrash_sr.err & And start the replica like: rm -r /tmp/data2_replica/ ; psql -p 9876 -c "select pg_create_physical_replication_slot('foo')"; ~/pgsql/pure_origin/bin/pg_basebackup -p 9876 -D /tmp/data2_replica -R -S foo; ~/pgsql/pure_origin/bin/pg_ctl start -D /tmp/data2_replica/ -o "--port=9874" After less than a minute, the replica fails. #0 0x004b85fe in hash_xlog_vacuum_get_latestRemovedXid (record=0x1925418) at hash_xlog.c:1006 1006iitemid = PageGetItemId(ipage, unused[i]); (gdb) bt #0 0x004b85fe in hash_xlog_vacuum_get_latestRemovedXid (record=0x1925418) at hash_xlog.c:1006 #1 0x004b881f in hash_xlog_vacuum_one_page (record=0x1925418) at hash_xlog.c:1113 #2 0x004b8bed in hash_redo (record=0x1925418) at hash_xlog.c:1217 #3 0x0051a96c in StartupXLOG () at xlog.c:7152 #4 0x00789ffb in StartupProcessMain () at startup.c:216 #5 0x0052d617 in AuxiliaryProcessMain (argc=2, argv=0x7fffe7661500) at bootstrap.c:421 #6 0x007890cf in StartChildProcess (type=StartupProcess) at postmaster.c:5256 #7 0x0078419d in PostmasterMain (argc=4, argv=0x18fc300) at postmaster.c:1329 #8 0x006cd78e in main (argc=4, argv=0x18fc300) at main.c:228 'unused' is NULL at that point. Cheers, Jeff count.pl Description: Binary data do_nocrash_sr.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
Jeevan Chalke wrote: > Declarative partitioning is supported in PostgreSQL 10 and work is already in > progress to support partition-wise joins. Here is a proposal for > partition-wise > aggregation/grouping. Our initial performance measurement has shown 7 times > performance when partitions are on foreign servers and approximately 15% when > partitions are local. > > Partition-wise aggregation/grouping computes aggregates for each partition > separately. If the group clause contains the partition key, all the rows > belonging to a given group come from one partition, thus allowing aggregates > to be computed completely for each partition. Otherwise, partial aggregates > computed for each partition are combined across the partitions to produce the > final aggregates. This technique improves performance because: > i. When partitions are located on foreign server, we can push down the > aggregate to the foreign server. > ii. If hash table for each partition fits in memory, but that for the whole > relation does not, each partition-wise aggregate can use an in-memory hash > table. > iii. Aggregation at the level of partitions can exploit properties of > partitions like indexes, their storage etc. I suspect this overlaps with https://www.postgresql.org/message-id/29111.1483984605%40localhost I'm working on the next version of the patch, which will be able to aggregate the result of both base relation scans and joins. I'm trying hard to make the next version available before an urgent vacation that I'll have to take at random date between today and early April. I suggest that we coordinate the effort, it's lot of work in any case. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] segfault in hot standby for hash indexes
On Tue, Mar 21, 2017 at 1:28 PM, Jeff Janes wrote: > Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby. > I think I see the problem in hash_xlog_vacuum_get_latestRemovedXid(). It seems to me that we are using different block_id for registering the deleted items in xlog XLOG_HASH_VACUUM_ONE_PAGE and then using different block_id for fetching those items in hash_xlog_vacuum_get_latestRemovedXid(). So probably matching those will fix this issue (instead of fetching block number and items from block_id 1, we should use block_id 0). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Implementing delete in columnar store fdw
Hello, I want to implement delete functionality for a column store fdw in postgres. It is similar to file_fdw. I want to use the “AddForeignUpdateTargets” function to implement this , but the junk filter shouldn’t be a column present in the table . Is it possible to add a Expr/Var to the junk filter that is not present in the schema of the table , that i will generate on the fly ?? If yes, how can i do that ?? Thanks in advance. Regards, Sri Harsha
Re: [HACKERS] Asymmetry between parent and child wrt "false" quals
On 2017/03/21 14:59, Ashutosh Bapat wrote: > When I run a query like below on a child-less table, the plan comes out to be > > explain verbose SELECT * FROM uprt1_l WHERE a = 1 AND a = 2; > QUERY PLAN > -- > Result (cost=0.00..11.50 rows=1 width=13) >Output: a, b, c >One-Time Filter: false >-> Seq Scan on public.uprt1_l (cost=0.00..11.50 rows=1 width=13) > Output: a, b, c > Filter: (uprt1_l.a = 1) > (6 rows) > > where as the same query run on a parent with children, the plan is > postgres=# \d prt1_l > Table "public.prt1_l" > Column | Type| Collation | Nullable | Default > +---+---+--+- > a | integer | | not null | > b | integer | | | > c | character varying | | | > Partition key: RANGE (a) > Number of partitions: 3 (Use \d+ to list them.) > > postgres=# explain verbose SELECT * FROM prt1_l WHERE a = 1 AND a = 2; > QUERY PLAN > --- > Result (cost=0.00..0.00 rows=0 width=40) >Output: prt1_l.a, prt1_l.b, prt1_l.c >One-Time Filter: false > (3 rows) > > For a parent table with children, set_append_rel_size() evaluates > restrictions in loop > 880 foreach(l, root->append_rel_list) > 881 { > 882 AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); > > starting at 1021. If any of the restrictions are evaluated to false, > it set the child as dummy. If all children are dummy, the appendrel is > set to dummy. > > But for a child-less table, even if the "false" qual is available in > baserestrictinfo in set_rel_size(), we do not mark the relation as > dummy. Instead, paths are created for it and only at the time of > planning we add the gating plan when there is a pseudo constant quals. > Why do we have different behaviours in these two cases? I think the case where there is no child table would not be handled by set_append_rel_size(), because rte->inh would be false. Instead, I thought the test at the beginning of relation_excluded_by_constraints() would have detected this somehow; the comment there says the following: /* * Regardless of the setting of constraint_exclusion, detect * constant-FALSE-or-NULL restriction clauses. Because const-folding will * reduce "anything AND FALSE" to just "FALSE", any such case should * result in exactly one baserestrictinfo entry. But the qual (a = 1 and a = 2) is *not* reduced to exactly one constant-false-or-null baserestrictinfo entry; instead I see that there are two RestrictInfos viz. a = 1 and const-FALSE at that point. I think the const-folding mentioned in the above comment does not occur after equivalence class processing, which would be required to conclude that (a = 1 and a = 2) reduces to constant-false. OTOH, (a = 1 and false) can be reduced to constant-false much earlier when performing preprocess_qual_conditions(). That said, I am not sure if it's worthwhile to modify the test at the beginning of relation_excluded_by_constraints() to iterate over rel->baserestrictinfos to look for any const-FALSE quals, instead of doing it only when there *only* the const-FALSE qual. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested As I can see, this bugfix was already discussed and reviewed. All mentioned issues are fixed in the latest version of the patch So I mark it "Ready for committer". Thank you for fixing it! The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
Hi, I like your suggestion and took a look at your patch though I’m not the expert about psql. I like the idea taking advantage of linestyle utilities to implement rst and markdown format efficiently instead of newly developing pset format things. But I'm thinking two comments below needs change to something about not focusing only linestyle. That's because they really take care of both '\pset linestyle and \pset format' and it may lead to misunderstanding to readers. --- /* Line style control structures */ const printTextFormat pg_markdown = /* get selected or default line style */ const printTextFormat * get_line_style(const printTableOpt *opt) --- The rest things are about code style convention. - there are some indents with white spaces around skip_leading_spaces_print() but Postgresql conventions says indents should be with 4 column tab. https://www.postgresql.org/docs/devel/static/source-format.html - On the other hand, in docs there are some tab indent but white space indenet is preferable. Looking around sgml files, white space is used. - some multi-line comment style also needs fix according to the above documentation (link) - And I also found patch cannot be applied to current master. Regards, Ideriha, Takeshi From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Jan Michalek Sent: Sunday, March 19, 2017 5:10 AM To: Peter Eisentraut Cc: Pavel Stehule ; PostgreSQL mailing lists Subject: Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki 2017-03-12 19:21 GMT+01:00 Jan Michálek mailto:godzilalal...@gmail.com>>: 2017-03-10 9:43 GMT+01:00 Jan Michálek mailto:godzilalal...@gmail.com>>: 2017-03-09 20:10 GMT+01:00 Peter Eisentraut mailto:peter.eisentr...@2ndquadrant.com>>: This is looking pretty neat. I played around with it a bit. There are a couple of edge cases that you need to address, I think. Thanks, original code is very synoptical and and well prepared for adding new formats. - Does not support \x I know, i dnot`t know, if \x make sense in this case. I will look, how it is done in other formats like html. I think, that it should work in sense, that table generated to rst should give similar output after processing like output of html format. I prepared something like this (i have no prepared diff, i need do some another changes) There a few things I need to do. First problem is bold column names, i should do it in sme fashin as "RECORD", but i need to do some research about length of column. Bigger problem is with tab indent, rst processor doesn`t work with this in this case. In new diff is added \x for rst and skipping leading spaces in rst in both. make check passed Jan jelen=# execute q \g | xclip +-++ | **RECORD 1** | +-++ | column1 | Elephant, kangaroo, | | | squirrel, gorilla | +-++ | column2 | 121 | +-++ | column3 | 1.0035971223021583 | +-++ | column4 | 0. | +-++ | column5 | Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello | +-++ | **RECORD 2** | +-++ | column1 | goat, rhinoceros, | | | monkey, ape | +-++ | column2 | 11121 | +-++ | column3 | 1.0007824726134585 | +-++ | column4 | 5. | +-++ | column5 | xx xx xx xx xx xx xx xx xx xx | +--
Re: [HACKERS] extended statistics: n-distinct
Thank you for finishing this. At Mon, 20 Mar 2017 16:02:20 -0300, Alvaro Herrera wrote in <20170320190220.ixlaueanxegqd5gr@alvherre.pgsql> > Here is a closer to final version of the multivariate statistics series, > last posted at > https://www.postgresql.org/message-id/20170316222033.ncdi7nidah2gdzjx%40alvherre.pgsql I'm sorry but this seems conflicting the current master(17fa3e8) and a3eac988c267, which is the base of v28. > If you've always wanted to review multivariate stats, but never found a > good reason to, now is a terrific time to do so! (In other words: I > plan to get this pushed in the not too distant future.) Great! But sorry for having contributed not so much. > This is a new thread to present a version of the n-distinct patch that > IMO is close enough to commit. There are some work items still. > There's some discussion on the topic of cross-column statistics: > https://wiki.postgresql.org/wiki/Cross_Columns_Stats > > This problem is important enough that Kyotaro Horiguchi submitted > another patch that does the same thing: > https://www.postgresql.org/message-id/flat/20150828.173334.114731693.horiguchi.kyotaro%40lab.ntt.co.jp > This patch aims to provide the same functionality, keeping the design > general enough that other kinds of statistics can be added later (such > as functional dependencies, histograms and MCVs, all of which have been > previously submitted as patches by Tomas). I may be stupid but I don't get the picture here, specifically about the relation to Tomas's patch. Does this work as infrastructure for Tomas's mv patch? Or in some other relationsip? > To recap, what this patch provides is a new command of the form >CREATE STATISTICS statname [WITH (opts)] ON (columns) FROM table > > Note that we put the table name in a separate FROM clause instead of > together with the column name, so that this is more readily extensible > to things that are not just columns, for example expressions that might > involve more than one table (per review from Dean Rasheed). Currently, > only one table is supported. > > In this patch, the "opts" can only be "ndistinct", which creates a > pg_statistic_ext row with the number of distinct groups found in all > possible combination across that set of columns. This can be used when > a GROUP BY or a DISTINCT clause need to estimate the number of distinct > groups in an aggregation. > > > > Some things left to change: > > * Currently, we use the ndistinct value only if the grouping uses > exactly the set of columns covered by a statistics. For example, if we > have stats on (a,b,c) and the grouping is on (a,b,c,d), we fall back to > the old method, which may result in worse results than if we used the > number we know about (a,b,c) then applied a fixup to consider the > distinctness of (d). Do you planning to realize correcting esitimation of joins perplexed by strong correlations? > * Also, estimate_num_groups() looks a bit patchy. With slightly more > invasive changes we can make it look more natural. > > * I'm not terribly happy with the header organization. I think > VacAttrStats should be in its own (new) src/include/statistics/analyze.h > for example (which cleans up a bunch of existing stuff a bit), and the > new files could do with some slight makeover. > > * The current code uses AttrNumber * and int2vector, in places where it > would be more convenient to use Bitmapsets. > > * We currently try to keep a stats object even if a column in it is > dropped -- for example, if we have stats on (a,b,c) and drop (b), then > we still have stats on (a,c). While this is nice, it creates a bunch of > weird corner cases, so I'm going to rip that out and just drop the > statistics instead. If the user wants stats on (a,c) to remain, they > can create it after (or before) dropping the column. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioned tables and relfilenode
On 2017/03/21 1:16, Robert Haas wrote: > On Fri, Mar 17, 2017 at 4:57 AM, Amit Langote > wrote: >>> Yes, but on the flip side, you're having to add code in a lot of >>> places -- I think I counted 7 -- where you turn around and ignore >>> those AppendRelInfos. >> >> Perhaps you were looking at the previous version with "minimal" appinfos >> containing the child_is_partitioned field? > > Yes, I think I was. I think this version looks a lot better. Just to clarify, I assume you reviewed the latest version which does not create AppendRelInfos, but instead creates PartitionedChildRelInfos (as also evident from your comments below). Sorry about the confusion. > /* > + * Close the root partitioned rel if we opened it above, but keep the > + * lock. > + */ > +if (rel != mtstate->resultRelInfo->ri_RelationDesc) > +heap_close(rel, NoLock); > > We didn't take a lock above, though, so drop everything in the comment > from "but" onward. Oh, right. > -add_paths_to_append_rel(root, rel, live_childrels); > +add_paths_to_append_rel(root, rel, live_childrels, partitioned_rels); > > I think it would make more sense to put the new logic into > add_paths_to_append_rel, instead of passing this down as an additional > parameter. OK, done. > + * do not appear anywhere else in the plan. Situation is exactly the > > The situation is... Fixed. > > +if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE) > +{ > +foreach(lc, root->pcinfo_list) > +{ > +PartitionedChildRelInfo *pc = lfirst(lc); > + > +if (pc->parent_relid == parentRTindex) > +{ > +partitioned_rels = pc->child_rels; > +break; > +} > +} > +} > > You seem to have a few copies of this logic. I think it would be > worth factoring it out into a separate function. OK, done. Put the definition in in planner.c > +root->glob->nonleafResultRelations = > +list_concat(root->glob->nonleafResultRelations, > +list_copy(splan->partitioned_rels)); > > Please add a brief comment. One line is fine. Done. > > +newrc->isParent = childrte->relkind == RELKIND_PARTITIONED_TABLE; > > I'm not sure what project style is, but I personally find these kinds > of assignments easier to read with an extra set of parantheses: > > newrc->isParent = (childrte->relkind == > RELKIND_PARTITIONED_TABLE); Ah, done. > > +if (partitioned_rels == NIL) > +return; > + > +foreach(lc, partitioned_rels) > > I think the if-test is pointless; the foreach loop is going to start > by comparing the initial value with NIL. Right, fixed. > Why doesn't ExecSerializePlan() need to transfer a proper value for > nonleafResultRelations to workers? Seems like it should. It doesn't transfer resultRelations either, presumably because workers do not handle result relations yet. Also, both resultRelations and nonleafResultRelations are set *only* if there is a ModifyTable node. Attached updated patches. Thanks, Amit >From 28fe97f6a831485f5f85d20ec6a3e0df066ad612 Mon Sep 17 00:00:00 2001 From: amit Date: Fri, 10 Mar 2017 13:48:31 +0900 Subject: [PATCH 1/2] Avoid creating scan nodes for partitioned tables There are quite a few changes here: * Currently, we create scan nodes for inheritance parents in their role as an inheritance set member. Partitioned tables do not contain any data, so it's useless to create scan nodes for them. So, it's not necessary to create AppendRelInfos for partitioned child tables in expand_inherited_rtentry(). Although, create the RTEs and record the RT indexes in a list. The list is associated with the parent_relid using the new PartitionedChildRelInfo nodes, which are kept in a global list called pcinfo_list in the root PlannerInfo, similar to append_rel_list. For a given parent RTE, add_paths_to_append_rel() and inheritance_planner() must copy the aforementioned list of partitioned child RT indexes to Append/MergeAppend and ModifyTable paths, respectively. The list is then carried over to a new field called partitioned_rels in Append, MergeAppend, and ModifyTable nodes. * For partitioned parent RTEs with inh=false, set_rel_size() must not try to do set_plain_rel_size(). Instead mark such rels as "dummy", because they are empty. Happens with partitioned tables without any leaf partitions. * In case of UPDATE/DELETE on a partitioned table, executor can already know the leaf result relations with PlannedStmt.resultRelations, but not the non-leaf relations. InitPlan must lock *all* the result relations before initializing the plan tree to avoid lock upgrade hazards, so we must include non-leaf result relations as well. For that, add a new field nonleafResultRelations to PlannedStmt, which is intitialized by copying ModifyTable.partitioned_rels. * PlanRowMar
Re: [HACKERS] postgres_fdw: support parameterized foreign joins
On 2017/03/16 22:23, Arthur Zakirov wrote: 2017-02-27 12:40 GMT+03:00 Etsuro Fujita : I'd like to propose to support parameterized foreign joins. Attached is a patch for that, which has been created on top of [1]. Can you rebase the patch? It is not applied now. Ok, will do. Thanks for the report! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw bug in 9.6
On 2017/03/17 0:37, David Steele wrote: This patch does not apply cleanly at cccbdde: Marked "Waiting on Author". Ok, I'll update the patch. One thing I'd like to revise in addition to that is (1) add to JoinPathExtraData a flag member to indicate whether to give the FDW a chance to consider a remote join, which will be set to true if the joinrel's fdwroutine is not NULL and the fdwroutine's GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info to create an alternative local join path, such as hashclauses and mergeclauses proposed in the patch, into JoinPathExtraData in add_paths_to_joinrel. This would avoid useless overhead in saving such info into JoinPathExtraData when we don't give the FDW that chance. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2017/03/17 2:35, Robert Haas wrote: And ... I don't see anything to complain about, so, committed. Thanks for committing, Robert! Thanks for reviewing, Ashutosh and David! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
2017-03-21 9:59 GMT+01:00 Ideriha, Takeshi : > Hi, > > I like your suggestion and took a look at your patch though I’m not the > expert about psql. > > > > I like the idea taking advantage of linestyle utilities > to implement rst and markdown format efficiently instead of newly > developing pset format things. > > But I'm thinking two comments below needs change to something about not > focusing only linestyle. > > That's because they really take care of both '\pset linestyle and \pset > format' and it may lead to misunderstanding to readers. > I am not sure if linestyle is enough - the markdown aligning is not visual. regards Pavel > > > --- > > /* Line style control structures */ > > const printTextFormat pg_markdown = > > > > /* get selected or default line style */ > > const printTextFormat * > > get_line_style(const printTableOpt *opt) > > --- > > > > The rest things are about code style convention. > > - there are some indents with white spaces around > skip_leading_spaces_print() > > but Postgresql conventions says indents should be with 4 column tab. > > https://www.postgresql.org/docs/devel/static/source-format.html > > > > - On the other hand, in docs there are some tab indent > > but white space indenet is preferable. Looking around sgml files, white > space is used. > > > > - some multi-line comment style also needs fix according to the above > documentation (link) > > > > - And I also found patch cannot be applied to current master. > > > > Regards, > > Ideriha, Takeshi > > > > *From:* pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-owner@ > postgresql.org] *On Behalf Of *Jan Michalek > *Sent:* Sunday, March 19, 2017 5:10 AM > *To:* Peter Eisentraut > *Cc:* Pavel Stehule ; PostgreSQL mailing lists < > pgsql-hackers@postgresql.org> > *Subject:* Re: [HACKERS] Other formats in pset like markdown, rst, > mediawiki > > > > > > > > 2017-03-12 19:21 GMT+01:00 Jan Michálek : > > > > > > 2017-03-10 9:43 GMT+01:00 Jan Michálek : > > > > > > 2017-03-09 20:10 GMT+01:00 Peter Eisentraut com>: > > This is looking pretty neat. I played around with it a bit. There are > a couple of edge cases that you need to address, I think. > > > > Thanks, original code is very synoptical and and well prepared for adding > new formats. > > > > > - Does not support \x > > > > I know, i dnot`t know, if \x make sense in this case. I will look, how it > is done in other formats like html. I think, that it should work in sense, > that table generated to rst should give similar output after processing > like output of html format. > > > > I prepared something like this (i have no prepared diff, i need do some > another changes) > > There a few things I need to do. First problem is bold column names, i > should do it in sme fashin as "RECORD", but i need to do some research > about length of column. > > Bigger problem is with tab indent, rst processor doesn`t work with this in > this case. > > > In new diff is added \x for rst and skipping leading spaces in rst in > both. make check passed > > > > Jan > > > > > > jelen=# execute q \g | xclip > +-+- > ---+ > | **RECORD 1** > | > +-+- > ---+ > | column1 | Elephant, kangaroo, > | > | | squirrel, gorilla > | > +-+- > ---+ > | column2 | 121 > | > +-+- > ---+ > | column3 | 1.0035971223021583 > | > +-+- > ---+ > | column4 | 0. > | > +-+- > ---+ > | column5 | Hello Hello Hello Hello Hello Hello Hello Hello Hello > Hello| > +-+- > ---+ > | **RECORD 2** > | > +-+- > ---+ > | column1 | goat, rhinoceros, >| > | | monkey, ape > | > +-+- > ---+ > | column2 | 11121 > | > +-+- > ---+ > | column3 | 1.0007824726134585 > | > +-+- > ---+ > | column4 | 5. > | > +-+- > ---+ > | column5 | xx xx xx xx xx xx xx xx xx > xx | > +-+- > ---+ > | **RECORD 3** > | > +-+- > ---+ > | column1 | donkey,
Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
2017-03-21 9:59 GMT+01:00 Ideriha, Takeshi : > Hi, > > I like your suggestion and took a look at your patch though I’m not the > expert about psql. > > > > I like the idea taking advantage of linestyle utilities > > to implement rst and markdown format efficiently instead of newly > developing pset format things. > > But I'm thinking two comments below needs change to something about not > focusing only linestyle. > > That's because they really take care of both '\pset linestyle and \pset > format' and it may lead to misunderstanding to readers. > > > > --- > > /* Line style control structures */ > > const printTextFormat pg_markdown = > > > > /* get selected or default line style */ > > const printTextFormat * > > get_line_style(const printTableOpt *opt) > > --- > It is in command.c? I have it done that \pset format changes linestyle psql (9.6.2, server 9.6.1) Type "help" for help. jelen=# \pset linestyle ascii Line style is ascii. jelen=# \pset format rst Output format is rst. jelen=# \pset linestyle Line style is rst. jelen=# Peter wrote that this is not right, but i don`t know how it should like, because most of this is done on linestyle, format is used only for switch from console. > > > The rest things are about code style convention. > > - there are some indents with white spaces around > skip_leading_spaces_print() > > but Postgresql conventions says indents should be with 4 column tab. > > https://www.postgresql.org/docs/devel/static/source-format.html > Thanks, i often using 4 whitespaces (i have it in vim) but in other code i found mostly used 8 whitespaces. I will look on this. I use code from another functions (fputnbytes, print_html_escaped) as template. > > > - On the other hand, in docs there are some tab indent > > but white space indenet is preferable. Looking around sgml files, white > space is used. > > > > - some multi-line comment style also needs fix according to the above > documentation (link) > I will look on the comments, this is only work version, coding style issues will be corrected (i have some comments only my orientation in code). > > > - And I also found patch cannot be applied to current master. > I have 9.6.2 source code. It is not correct? Where i find source code i should use? Have nice day Jan > > > Regards, > > Ideriha, Takeshi > > > > *From:* pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-owner@ > postgresql.org] *On Behalf Of *Jan Michalek > *Sent:* Sunday, March 19, 2017 5:10 AM > *To:* Peter Eisentraut > *Cc:* Pavel Stehule ; PostgreSQL mailing lists < > pgsql-hackers@postgresql.org> > *Subject:* Re: [HACKERS] Other formats in pset like markdown, rst, > mediawiki > > > > > > > > 2017-03-12 19:21 GMT+01:00 Jan Michálek : > > > > > > 2017-03-10 9:43 GMT+01:00 Jan Michálek : > > > > > > 2017-03-09 20:10 GMT+01:00 Peter Eisentraut com>: > > This is looking pretty neat. I played around with it a bit. There are > a couple of edge cases that you need to address, I think. > > > > Thanks, original code is very synoptical and and well prepared for adding > new formats. > > > > > - Does not support \x > > > > I know, i dnot`t know, if \x make sense in this case. I will look, how it > is done in other formats like html. I think, that it should work in sense, > that table generated to rst should give similar output after processing > like output of html format. > > > > I prepared something like this (i have no prepared diff, i need do some > another changes) > > There a few things I need to do. First problem is bold column names, i > should do it in sme fashin as "RECORD", but i need to do some research > about length of column. > > Bigger problem is with tab indent, rst processor doesn`t work with this in > this case. > > > In new diff is added \x for rst and skipping leading spaces in rst in > both. make check passed > > > > Jan > > > > > > jelen=# execute q \g | xclip > +-+- > ---+ > | **RECORD 1** > | > +-+- > ---+ > | column1 | Elephant, kangaroo, > | > | | squirrel, gorilla > | > +-+- > ---+ > | column2 | 121 > | > +-+- > ---+ > | column3 | 1.0035971223021583 > | > +-+- > ---+ > | column4 | 0. > | > +-+- > ---+ > | column5 | Hello Hello Hello Hello Hello Hello Hello Hello Hello > Hello| > +-+- > ---+ > | **RECORD 2** > | > +-+- > ---+ > | column1 | goat, rhinoceros, >| > |
Re: [HACKERS] WIP: Covering + unique indexes.
Patch rebased to the current master is in attachments. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 497d52b713dd8f926b465ddf22f21db7229b12e3 Author: Anastasia Date: Tue Mar 21 12:58:13 2017 +0300 include_columns_10.0_v4.patch diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index c1e9089..5c80e3b 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); static void deleteConnection(const char *name); -static char **get_pkey_attnames(Relation rel, int16 *numatts); +static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts); static char **get_text_array_contents(ArrayType *array, int *numitems); static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals); static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals); @@ -1480,7 +1480,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey); Datum dblink_get_pkey(PG_FUNCTION_ARGS) { - int16 numatts; + int16 indnkeyatts; char **results; FuncCallContext *funcctx; int32 call_cntr; @@ -1506,7 +1506,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS) rel = get_rel_from_relname(PG_GETARG_TEXT_PP(0), AccessShareLock, ACL_SELECT); /* get the array of attnums */ - results = get_pkey_attnames(rel, &numatts); + results = get_pkey_attnames(rel, &indnkeyatts); relation_close(rel, AccessShareLock); @@ -1526,9 +1526,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS) attinmeta = TupleDescGetAttInMetadata(tupdesc); funcctx->attinmeta = attinmeta; - if ((results != NULL) && (numatts > 0)) + if ((results != NULL) && (indnkeyatts > 0)) { - funcctx->max_calls = numatts; + funcctx->max_calls = indnkeyatts; /* got results, keep track of them */ funcctx->user_fctx = results; @@ -2016,10 +2016,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS) * get_pkey_attnames * * Get the primary key attnames for the given relation. - * Return NULL, and set numatts = 0, if no primary key exists. + * Return NULL, and set indnkeyatts = 0, if no primary key exists. */ static char ** -get_pkey_attnames(Relation rel, int16 *numatts) +get_pkey_attnames(Relation rel, int16 *indnkeyatts) { Relation indexRelation; ScanKeyData skey; @@ -2029,8 +2029,8 @@ get_pkey_attnames(Relation rel, int16 *numatts) char **result = NULL; TupleDesc tupdesc; - /* initialize numatts to 0 in case no primary key exists */ - *numatts = 0; + /* initialize indnkeyatts to 0 in case no primary key exists */ + *indnkeyatts = 0; tupdesc = rel->rd_att; @@ -2051,12 +2051,12 @@ get_pkey_attnames(Relation rel, int16 *numatts) /* we're only interested if it is the primary key */ if (index->indisprimary) { - *numatts = index->indnatts; - if (*numatts > 0) + *indnkeyatts = index->indnkeyatts; + if (*indnkeyatts > 0) { -result = (char **) palloc(*numatts * sizeof(char *)); +result = (char **) palloc(*indnkeyatts * sizeof(char *)); -for (i = 0; i < *numatts; i++) +for (i = 0; i < *indnkeyatts; i++) result[i] = SPI_fname(tupdesc, index->indkey.values[i]); } break; diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c index 1241108..943e410 100644 --- a/contrib/tcn/tcn.c +++ b/contrib/tcn/tcn.c @@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS) /* we're only interested if it is the primary key and valid */ if (index->indisprimary && IndexIsValid(index)) { - int numatts = index->indnatts; + int indnkeyatts = index->indnkeyatts; - if (numatts > 0) + if (indnkeyatts > 0) { int i; @@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS) appendStringInfoCharMacro(payload, ','); appendStringInfoCharMacro(payload, operation); -for (i = 0; i < numatts; i++) +for (i = 0; i < indnkeyatts; i++) { int colno = index->indkey.values[i]; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index df0435c..e196e20 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3620,6 +3620,14 @@ pg_class.relnatts) + + indnkeyatts + int2 + + The number of key columns in the index. "Key columns" are ordinary + index columns (as opposed to "included" columns). + + indisunique bool diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index ac51258..f9539e9 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -114,6 +114,8 @@ typedef struct IndexAmRoutine boolamcanparallel; /* type of data stored in index, or InvalidOid if variable */ Oid amkeytype; +/* does AM support columns included with clause INCLUDE? */ +boolamca
Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
2017-03-21 10:59 GMT+01:00 Jan Michálek : > > > 2017-03-21 9:59 GMT+01:00 Ideriha, Takeshi >: > >> Hi, >> >> I like your suggestion and took a look at your patch though I’m not the >> expert about psql. >> >> >> >> I like the idea taking advantage of linestyle utilities >> >> to implement rst and markdown format efficiently instead of newly >> developing pset format things. >> >> But I'm thinking two comments below needs change to something about not >> focusing only linestyle. >> >> That's because they really take care of both '\pset linestyle and \pset >> format' and it may lead to misunderstanding to readers. >> >> >> >> --- >> >> /* Line style control structures */ >> >> const printTextFormat pg_markdown = >> >> >> >> /* get selected or default line style */ >> >> const printTextFormat * >> >> get_line_style(const printTableOpt *opt) >> >> --- >> > > It is in command.c? > > I have it done that \pset format changes linestyle > > psql (9.6.2, server 9.6.1) > Type "help" for help. > > jelen=# \pset linestyle ascii > Line style is ascii. > jelen=# \pset format rst > Output format is rst. > jelen=# \pset linestyle > Line style is rst. > jelen=# > > Peter wrote that this is not right, but i don`t know how it should like, > because most of this is done on linestyle, format is used only for switch > from console. > > > >> >> >> The rest things are about code style convention. >> >> - there are some indents with white spaces around >> skip_leading_spaces_print() >> >> but Postgresql conventions says indents should be with 4 column tab. >> >> https://www.postgresql.org/docs/devel/static/source-format.html >> > > Thanks, i often using 4 whitespaces (i have it in vim) but in other code i > found mostly used 8 whitespaces. > I will look on this. I use code from another functions (fputnbytes, > print_html_escaped) as template. > > >> >> >> - On the other hand, in docs there are some tab indent >> >> but white space indenet is preferable. Looking around sgml files, white >> space is used. >> >> >> >> - some multi-line comment style also needs fix according to the above >> documentation (link) >> > > I will look on the comments, this is only work version, coding style > issues will be corrected (i have some comments only my orientation in code). > > >> >> >> - And I also found patch cannot be applied to current master. >> > > I have 9.6.2 source code. It is not correct? Where i find source code i > should use? > Please use git master https://wiki.postgresql.org/wiki/Working_with_Git Regards Pavel > > Have nice day > Jan > > >> >> >> Regards, >> >> Ideriha, Takeshi >> >> >> >> *From:* pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-owner@po >> stgresql.org] *On Behalf Of *Jan Michalek >> *Sent:* Sunday, March 19, 2017 5:10 AM >> *To:* Peter Eisentraut >> *Cc:* Pavel Stehule ; PostgreSQL mailing lists < >> pgsql-hackers@postgresql.org> >> *Subject:* Re: [HACKERS] Other formats in pset like markdown, rst, >> mediawiki >> >> >> >> >> >> >> >> 2017-03-12 19:21 GMT+01:00 Jan Michálek : >> >> >> >> >> >> 2017-03-10 9:43 GMT+01:00 Jan Michálek : >> >> >> >> >> >> 2017-03-09 20:10 GMT+01:00 Peter Eisentraut < >> peter.eisentr...@2ndquadrant.com>: >> >> This is looking pretty neat. I played around with it a bit. There are >> a couple of edge cases that you need to address, I think. >> >> >> >> Thanks, original code is very synoptical and and well prepared for adding >> new formats. >> >> >> >> >> - Does not support \x >> >> >> >> I know, i dnot`t know, if \x make sense in this case. I will look, how it >> is done in other formats like html. I think, that it should work in sense, >> that table generated to rst should give similar output after processing >> like output of html format. >> >> >> >> I prepared something like this (i have no prepared diff, i need do some >> another changes) >> >> There a few things I need to do. First problem is bold column names, i >> should do it in sme fashin as "RECORD", but i need to do some research >> about length of column. >> >> Bigger problem is with tab indent, rst processor doesn`t work with this >> in this case. >> >> >> In new diff is added \x for rst and skipping leading spaces in rst in >> both. make check passed >> >> >> >> Jan >> >> >> >> >> >> jelen=# execute q \g | xclip >> +-+- >> ---+ >> | **RECORD 1** >> | >> +-+- >> ---+ >> | column1 | Elephant, kangaroo, >> | >> | | squirrel, gorilla >> | >> +-+- >> ---+ >> | column2 | 121 >> | >> +-+- >> ---+ >> | column3 | 1.0035971223021583 >> | >> +-+- >> ---+ >> | column4 | 0. >> | >> +-+
Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions
On Wed, Mar 15, 2017 at 8:55 PM, Robert Haas wrote: > Note this: > > if (completed || !fcache->returnsSet) > postquel_end(es); > > When the SQL function doesn't return a set, then we can allow > parallelism even when lazyEval is set, because we'll only call > ExecutorStart() once. But my impression is that something like this: Well, when I test following SQL function I see it cannot be parallelised because lazyEval is true for it though it is not returning set, CREATE OR REPLACE FUNCTION not_parallel() RETURNS bigint AS $$ BEGIN SELECT count(distinct i) FROM t WHERE j = 12; END; $$ LANGUAGE sql; Query Text: SELECT count(distinct i) FROM t WHERE j = 12; Aggregate (cost=34.02..34.02 rows=1 width=8) (actual time=0.523..0.523 rows=1 loops=1) -> Seq Scan on t (cost=0.00..34.01 rows=1 width=4) (actual time=0.493..0.493 rows=0 loops=1) Filter: (j = 12) Rows Removed by Filter: 2001 2017-03-21 15:24:03.378 IST [117823] CONTEXT: SQL function "already_parallel" statement 1 2017-03-21 15:24:03.378 IST [117823] LOG: duration: 94868.181 ms plan: Query Text: select already_parallel(); Result (cost=0.00..0.26 rows=1 width=8) (actual time=87981.047..87981.048 rows=1 loops=1) already_parallel -- 0 (1 row) As far as my understanding goes for this case, lazyEvalOk is set irrespective of whether the function returns set or not in fmgr_sql, else { randomAccess = false; lazyEvalOK = true; } then it is passed to init_sql_fcache which is then passed to init_execution_state where cache->lazyEval is set, if (lasttages && fcache->junkFilter) { lasttages->setsResult = true; if (lazyEvalOK && lasttages->stmt->commandType == CMD_SELECT && !lasttages->stmt->hasModifyingCTE) fcache->lazyEval = lasttages->lazyEval = true; } Finally, this lazyEval is passed to ExecutorRun in postquel_getnext that restricts parallelism by setting execute_once = 0, /* Run regular commands to completion unless lazyEval */ uint64 count = (es->lazyEval) ? 1 : 0; ExecutorRun(es->qd, ForwardScanDirection, count, !es->lazyEval); So, this is my concern that why is such a query should not execute in parallel when in SQL function. If I run this same query from PLpgsql function then it can run in parallel, CREATE OR REPLACE FUNCTION not_parallel() RETURNS bigint AS $$ declare cnt int:=0; BEGIN SELECT count(distinct i) into cnt FROM t WHERE j = 12; RETURN cnt; END; $$ LANGUAGE plpgsql; select not_parallel(); 2017-03-21 15:28:56.282 IST [123086] LOG: duration: 0.003 ms plan: Query Text: SELECT count(distinct i) FROM t WHERE j = 12 Parallel Seq Scan on t (cost=0.00..19.42 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=1) Filter: (j = 12) 2017-03-21 15:28:56.282 IST [123087] LOG: duration: 0.003 ms plan: Query Text: SELECT count(distinct i) FROM t WHERE j = 12 Parallel Seq Scan on t (cost=0.00..19.42 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=1) Filter: (j = 12) 2017-03-21 15:28:57.530 IST [117823] LOG: duration: 1745.372 ms plan: Query Text: SELECT count(distinct i) FROM t WHERE j = 12 Aggregate (cost=19.42..19.43 rows=1 width=8) (actual time=1255.743..1255.743 rows=1 loops=1) -> Gather (cost=0.00..19.42 rows=1 width=4) (actual time=1255.700..1255.700 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t (cost=0.00..19.42 rows=1 width=4) (actual time=418.443..418.443 rows=0 loops=3) Filter: (j = 12) Rows Removed by Filter: 667 2017-03-21 15:28:57.530 IST [117823] CONTEXT: SQL statement "SELECT count(distinct i) FROM t WHERE j = 12" PL/pgSQL function not_parallel() line 4 at SQL statement 2017-03-21 15:28:57.531 IST [117823] LOG: duration: 2584.282 ms plan: Query Text: select not_parallel(); Result (cost=0.00..0.26 rows=1 width=8) (actual time=2144.315..2144.316 rows=1 loops=1) not_parallel -- 0 (1 row) Hence, it appears lazyEval is the main reason behind it and it should be definitely fixed in my opinion. Please enlighten me with your comments/opinions. Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect and hash indexes
On Tue, Mar 21, 2017 at 10:15 AM, Ashutosh Sharma wrote: >> >> I think it is not just happening for freed overflow but also for newly >> allocated bucket page. It would be good if we could mark freed >> overflow page as UNUSED page rather than just initialising it's header >> portion and leaving the page type in special area as it is. Attached >> is the patch '0001-mark_freed_ovflpage_as_UNUSED_pagetype.patch' that >> marks a freed overflow page as an unused page. >> >> Also, I have now changed pageinspect module to handle unused and empty >> hash index page. Attached is the patch >> (0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patch) for >> the same. >> >> [1] - >> https://www.postgresql.org/message-id/CAE9k0P%3DN%2BJjzqnHqrURE7ZQMgySRpv%3DBkjafbz%3DpeD4cbCgbhA%40mail.gmail.com >> I have yet to review your patches, let's first try to clear other things before doing so. > > I think when expanding hash index table we are only initialising the > pages that will be in-use after split or the last block in the index. > The initial few pages where tuples will be moved from old to new pages > has no issues but the last block that is just initialised and is not > marked as hash page needs to be handled along with the freed overflow > page. > I think PageIsEmpty() check in hashfuncs.c is sufficient for same. I don't see any value in treating the last page allocated during _hash_alloc_buckets() any different from other pages which are prior to that but will get allocated when required. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
> On Mon, Mar 20, 2017 at 1:19 PM, Ashutosh Bapat > wrote: I have created some test to cover partition wise joins with postgres_fdw, also verified make check. patch attached. Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 059c5c3..f0b1a32 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -7181,3 +7181,149 @@ AND ftoptions @> array['fetch_size=6']; (1 row) ROLLBACK; +-- === +-- test partition-wise-joins +-- === +SET enable_partition_wise_join=on; +--range partition +CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a); +CREATE TABLE fprt1_p1 (a int, b int, c text); +CREATE TABLE fprt1_p2 (a int, b int, c text); +CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250) +SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p1'); +CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (250) TO (500) +SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p2'); +INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 2) i; +INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 2) i; +ANALYZE fprt1; +CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b); +CREATE TABLE fprt2_p1 (a int, b int, c text); +CREATE TABLE fprt2_p2 (a int, b int, c text); +CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250) +SERVER loopback OPTIONS (TABLE_NAME 'fprt2_p1'); +CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500) +SERVER loopback OPTIONS (TABLE_NAME 'fprt2_p2'); +INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i; +INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i; +ANALYZE fprt2; +-- inner join three tables, all join qualified +EXPLAIN (COSTS OFF) +SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3; + QUERY PLAN + + Sort + Sort Key: t1.a, t3.c + -> Append + -> Foreign Scan + Relations: ((public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)) INNER JOIN (public.ftprt1_p1 t3) + -> Foreign Scan + Relations: ((public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)) INNER JOIN (public.ftprt1_p2 t3) +(7 rows) + +SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3; + a | b | c +-+-+-- + 0 | 0 | + 150 | 150 | 0003 + 250 | 250 | 0005 + 400 | 400 | 0008 +(4 rows) + +-- left outer join + nullable clasue +EXPLAIN (COSTS OFF) +SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; + QUERY PLAN +- + Merge Append + Sort Key: t1.a, ftprt2_p1.b, ftprt2_p1.c + -> Foreign Scan + Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2) +(4 rows) + +SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3; + a | b | c +---+---+-- + 0 | 0 | + 2 | | + 4 | | + 6 | 6 | + 8 | | +(5 rows) + +-- full outer join + right outer join +EXPLAIN (COSTS OFF) +SELECT t1.a,t2.b,t3.c FROM fprt1 t1 FULL JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) RIGHT JOIN fprt1 t3 ON (t2.a = t3.b and t2.a = t3.b) WHERE t1.a % 25 =0 ORDER BY 1,2,3; + QUERY PLAN +- + Sort + Sort Key: t1.a, t3.c + -> Hash Join + Hash Cond: (t3.b = t1.b) + -> Append + -> Seq Scan on fprt1 t3 + -> Foreign Scan on ftprt1_p1 t3_1 + -> Foreign Scan on ftprt1_p2 t3_2 + -> Hash + -> Append + -> Foreign Scan + Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2) + -> Foreign Scan + Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2) +(14 rows) + +SELECT t1.a,t2.b,t3.c FROM fprt1 t1 FULL JOIN fprt2 t2 ON
Re: [HACKERS] Logical Replication and Character encoding
At Mon, 6 Mar 2017 11:13:48 +0100, Petr Jelinek wrote in > >>> Well the length is necessary to be able to add binary format support in > >>> future so it's definitely not an omission. > >> > >> Right. So it appears the right function to use here is > >> pq_sendcountedtext(). However, this sends the strings without null > >> termination, so we'd have to do extra processing on the receiving end. > >> Or we could add an option to pq_sendcountedtext() to make it do what we > >> want. I'd rather stick to standard pqformat.c functions for handling > >> the protocol. > > > > It seems reasonable. I changed the prototype of > > pg_sendcountedtext as the following, > > > > | extern void pq_sendcountedtext(StringInfo buf, const char *str, int slen, > > | bool countincludesself, bool binary); > > > > I think 'binary' seems fine for the parameter here. The patch > > size increased a bit. > > > > Hmm I find it bit weird that binary true means NULL terminated while > false means not NULL terminated, I would think that the behaviour would > be exactly opposite given that text strings are the ones where NULL > termination matters? You're right. I renamed it as with more straightforward name. Addition to that, it contained a stupid bug that sends a garbage byte when non-null-terminated string is not converted. And the comment is edited to reflect the new behavior. > > By the way, I noticed that postmaster launches logical > > replication launcher even if wal_level < logical. Is it right? > > Yes, that came up couple of times in various threads. The logical > replication launcher is needed only to start subscriptions and > subscriptions don't need wal_level to be logical, only publication needs > that. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c index 5fe1c72..62fa70d 100644 --- a/src/backend/access/common/printsimple.c +++ b/src/backend/access/common/printsimple.c @@ -96,7 +96,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) pq_sendcountedtext(&buf, VARDATA_ANY(t), VARSIZE_ANY_EXHDR(t), - false); + false, false); } break; @@ -106,7 +106,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) char str[12]; /* sign, 10 digits and '\0' */ pg_ltoa(num, str); - pq_sendcountedtext(&buf, str, strlen(str), false); + pq_sendcountedtext(&buf, str, strlen(str), false, false); } break; @@ -116,7 +116,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self) char str[23]; /* sign, 21 digits and '\0' */ pg_lltoa(num, str); - pq_sendcountedtext(&buf, str, strlen(str), false); + pq_sendcountedtext(&buf, str, strlen(str), false, false); } break; diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c index a2ca2d7..1ebc50f 100644 --- a/src/backend/access/common/printtup.c +++ b/src/backend/access/common/printtup.c @@ -353,7 +353,8 @@ printtup(TupleTableSlot *slot, DestReceiver *self) char *outputstr; outputstr = OutputFunctionCall(&thisState->finfo, attr); - pq_sendcountedtext(&buf, outputstr, strlen(outputstr), false); + pq_sendcountedtext(&buf, outputstr, strlen(outputstr), + false, false); } else { @@ -442,7 +443,7 @@ printtup_20(TupleTableSlot *slot, DestReceiver *self) Assert(thisState->format == 0); outputstr = OutputFunctionCall(&thisState->finfo, attr); - pq_sendcountedtext(&buf, outputstr, strlen(outputstr), true); + pq_sendcountedtext(&buf, outputstr, strlen(outputstr), true, false); } pq_endmessage(&buf); diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c index c8cf67c..a83c73e 100644 --- a/src/backend/libpq/pqformat.c +++ b/src/backend/libpq/pqformat.c @@ -123,30 +123,39 @@ pq_sendbytes(StringInfo buf, const char *data, int datalen) * The data sent to the frontend by this routine is a 4-byte count field * followed by the string. The count includes itself or not, as per the * countincludesself flag (pre-3.0 protocol requires it to include itself). - * The passed text string need not be null-terminated, and the data sent - * to the frontend isn't either. + * The passed text string need not be null-terminated but should not contain + * null as a part. sendterminator instructs to send null-terminator. * */ void pq_sendcountedtext(StringInfo buf, const char *str, int slen, - bool countincludesself) + bool countincludesself, bool sendterminator) { - int extra = countincludesself ? 4 : 0; + int extra_self = countincludesself ? 4 : 0; + int extra_term = sendterminator ? 1 : 0; char *p; p = pg_server_to_client(str, slen); if (p != str)/* actual conversion has been done? */ { slen = strlen(p); - pq_sendint(buf, slen + extra, 4); + pq_sendint(buf, slen +
Re: [HACKERS] Logical Replication and Character encoding
Mmm. I shot the previous mail halfway. At Mon, 6 Mar 2017 11:13:48 +0100, Petr Jelinek wrote in > > By the way, I noticed that postmaster launches logical > > replication launcher even if wal_level < logical. Is it right? > > Yes, that came up couple of times in various threads. The logical > replication launcher is needed only to start subscriptions and > subscriptions don't need wal_level to be logical, only publication needs > that. I understand, thanks. Anyway the message seems to have hidden from startup message:) -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions
On Tue, Mar 21, 2017 at 3:36 PM, Rafia Sabih wrote: > On Wed, Mar 15, 2017 at 8:55 PM, Robert Haas wrote: >> Note this: >> >> if (completed || !fcache->returnsSet) >> postquel_end(es); >> >> When the SQL function doesn't return a set, then we can allow >> parallelism even when lazyEval is set, because we'll only call >> ExecutorStart() once. But my impression is that something like this: How about taking the decision of execute_once based on fcache->returnsSet instead of based on lazyEval? change + ExecutorRun(es->qd, ForwardScanDirection, count, !es->lazyEval); to + ExecutorRun(es->qd, ForwardScanDirection, count, !fcache->returnsSet); IMHO, Robert have the same thing in mind? >SELECT * FROM blah() LIMIT 3 > >...will trigger three separate calls to ExecutorRun(), which is a >problem if the plan is a parallel plan. And you also need to test this case what Robert have mentioned up thread. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
2017-03-21 11:01 GMT+01:00 Pavel Stehule : > > > 2017-03-21 10:59 GMT+01:00 Jan Michálek : > >> >> >> 2017-03-21 9:59 GMT+01:00 Ideriha, Takeshi > m>: >> >>> Hi, >>> >>> I like your suggestion and took a look at your patch though I’m not the >>> expert about psql. >>> >>> >>> >>> I like the idea taking advantage of linestyle utilities >>> >>> to implement rst and markdown format efficiently instead of newly >>> developing pset format things. >>> >>> But I'm thinking two comments below needs change to something about not >>> focusing only linestyle. >>> >>> That's because they really take care of both '\pset linestyle and \pset >>> format' and it may lead to misunderstanding to readers. >>> >>> >>> >>> --- >>> >>> /* Line style control structures */ >>> >>> const printTextFormat pg_markdown = >>> >>> >>> >>> /* get selected or default line style */ >>> >>> const printTextFormat * >>> >>> get_line_style(const printTableOpt *opt) >>> >>> --- >>> >> >> It is in command.c? >> >> I have it done that \pset format changes linestyle >> >> psql (9.6.2, server 9.6.1) >> Type "help" for help. >> >> jelen=# \pset linestyle ascii >> Line style is ascii. >> jelen=# \pset format rst >> Output format is rst. >> jelen=# \pset linestyle >> Line style is rst. >> jelen=# >> >> Peter wrote that this is not right, but i don`t know how it should like, >> because most of this is done on linestyle, format is used only for switch >> from console. >> >> >> >>> >>> >>> The rest things are about code style convention. >>> >>> - there are some indents with white spaces around >>> skip_leading_spaces_print() >>> >>> but Postgresql conventions says indents should be with 4 column tab. >>> >>> https://www.postgresql.org/docs/devel/static/source-format.html >>> >> >> Thanks, i often using 4 whitespaces (i have it in vim) but in other code >> i found mostly used 8 whitespaces. >> I will look on this. I use code from another functions (fputnbytes, >> print_html_escaped) as template. >> >> >>> >>> >>> - On the other hand, in docs there are some tab indent >>> >>> but white space indenet is preferable. Looking around sgml files, >>> white space is used. >>> >>> >>> >>> - some multi-line comment style also needs fix according to the above >>> documentation (link) >>> >> >> I will look on the comments, this is only work version, coding style >> issues will be corrected (i have some comments only my orientation in code). >> >> >>> >>> >>> - And I also found patch cannot be applied to current master. >>> >> >> I have 9.6.2 source code. It is not correct? Where i find source code i >> should use? >> > > Please use git master https://wiki.postgresql.org/wiki/Working_with_Git > > Thanks, i will look on this on weekend. > Regards > > Pavel > >> >> Have nice day >> Jan >> >> >>> >>> >>> Regards, >>> >>> Ideriha, Takeshi >>> >>> >>> >>> *From:* pgsql-hackers-ow...@postgresql.org [mailto: >>> pgsql-hackers-ow...@postgresql.org] *On Behalf Of *Jan Michalek >>> *Sent:* Sunday, March 19, 2017 5:10 AM >>> *To:* Peter Eisentraut >>> *Cc:* Pavel Stehule ; PostgreSQL mailing lists >>> >>> *Subject:* Re: [HACKERS] Other formats in pset like markdown, rst, >>> mediawiki >>> >>> >>> >>> >>> >>> >>> >>> 2017-03-12 19:21 GMT+01:00 Jan Michálek : >>> >>> >>> >>> >>> >>> 2017-03-10 9:43 GMT+01:00 Jan Michálek : >>> >>> >>> >>> >>> >>> 2017-03-09 20:10 GMT+01:00 Peter Eisentraut < >>> peter.eisentr...@2ndquadrant.com>: >>> >>> This is looking pretty neat. I played around with it a bit. There are >>> a couple of edge cases that you need to address, I think. >>> >>> >>> >>> Thanks, original code is very synoptical and and well prepared for >>> adding new formats. >>> >>> >>> >>> >>> - Does not support \x >>> >>> >>> >>> I know, i dnot`t know, if \x make sense in this case. I will look, how >>> it is done in other formats like html. I think, that it should work in >>> sense, that table generated to rst should give similar output after >>> processing like output of html format. >>> >>> >>> >>> I prepared something like this (i have no prepared diff, i need do some >>> another changes) >>> >>> There a few things I need to do. First problem is bold column names, i >>> should do it in sme fashin as "RECORD", but i need to do some research >>> about length of column. >>> >>> Bigger problem is with tab indent, rst processor doesn`t work with this >>> in this case. >>> >>> >>> In new diff is added \x for rst and skipping leading spaces in rst in >>> both. make check passed >>> >>> >>> >>> Jan >>> >>> >>> >>> >>> >>> jelen=# execute q \g | xclip >>> +-+- >>> ---+ >>> | **RECORD 1** >>> | >>> +-+- >>> ---+ >>> | column1 | Elephant, kangaroo, >>> | >>> | | squirrel, gorilla >>> | >>> +-+- >>> ---+ >>> | column2 | 121 >>> | >>> +---
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Fri, Mar 10, 2017 at 11:37 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Wed, Mar 8, 2017 at 2:30 PM, Alvaro Herrera >> wrote: >> > Not really -- it's a bit slower actually in a synthetic case measuring >> > exactly the slowed-down case. See >> > https://www.postgresql.org/message-id/cad__ougk12zqmwwjzim-yyud1y8jmmy6x9yectnif3rpp6h...@mail.gmail.com >> > I bet in normal cases it's unnoticeable. If WARM flies, then it's going >> > to provide a larger improvement than is lost to this. >> >> Hmm, that test case isn't all that synthetic. It's just a single >> column bulk update, which isn't anything all that crazy, > > The problem is that the update touches the second indexed column. With > the original code we would have stopped checking at that point, but with > the patched code we continue to verify all the other indexed columns for > changes. > > Maybe we need more than one bitmapset to be given -- multiple ones for > for "any of these" checks (such as HOT, KEY and Identity) which can be > stopped as soon as one is found, and one for "all of these" (for WARM, > indirect indexes) which needs to be checked to completion. > How will that help to mitigate the regression? I think what might help here is if we fetch the required columns for WARM only when we know hot_update is false. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Thu, Mar 9, 2017 at 8:43 AM, Robert Haas wrote: > On Wed, Mar 8, 2017 at 2:30 PM, Alvaro Herrera > wrote: >> Not really -- it's a bit slower actually in a synthetic case measuring >> exactly the slowed-down case. See >> https://www.postgresql.org/message-id/cad__ougk12zqmwwjzim-yyud1y8jmmy6x9yectnif3rpp6h...@mail.gmail.com >> I bet in normal cases it's unnoticeable. If WARM flies, then it's going >> to provide a larger improvement than is lost to this. > > Hmm, that test case isn't all that synthetic. It's just a single > column bulk update, which isn't anything all that crazy, and 5-10% > isn't nothing. > > I'm kinda surprised it made that much difference, though. > I think it is because heap_getattr() is not that cheap. We have noticed the similar problem during development of scan key push down work [1]. [1] - https://commitfest.postgresql.org/12/850/ -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] segfault in hot standby for hash indexes
Hi Jeff, On Tue, Mar 21, 2017 at 1:54 PM, Amit Kapila wrote: > On Tue, Mar 21, 2017 at 1:28 PM, Jeff Janes wrote: >> Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby. >> > > I think I see the problem in hash_xlog_vacuum_get_latestRemovedXid(). > It seems to me that we are using different block_id for registering > the deleted items in xlog XLOG_HASH_VACUUM_ONE_PAGE and then using > different block_id for fetching those items in > hash_xlog_vacuum_get_latestRemovedXid(). So probably matching those > will fix this issue (instead of fetching block number and items from > block_id 1, we should use block_id 0). > Thanks for reporting this issue. As Amit said, it is happening due to block_id mismatch. Attached is the patch that fixes the same. I apologise for such a silly mistake. Please note that I was not able to reproduce the issue on my local machine using the test script you shared. Could you please check with the attached patch if you are still seeing the issue. Thanks in advance. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com corrected_block_id_reference_in_hash_vacuum_get_latestRemovedXid.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Problem in Parallel Bitmap Heap Scan?
Hi, I noticed a failure in the inet.sql test while running the regression tests with parallelism cranked up, and can reproduce it interactively as follows. After an spgist index is created and the plan changes to the one shown below, the query returns no rows. regression=# set force_parallel_mode = regress; SET regression=# set max_parallel_workers_per_gather = 2; SET regression=# set parallel_tuple_cost = 0; SET regression=# set parallel_setup_cost = 0; SET regression=# set min_parallel_table_scan_size = 0; SET regression=# set min_parallel_index_scan_size = 0; SET regression=# set enable_seqscan = off; SET regression=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; c |i +-- 10.0.0.0/8 | 9.1.2.3/8 10.0.0.0/8 | 10.1.2.3/8 10.0.0.0/32| 10.1.2.3/8 10.0.0.0/8 | 10.1.2.3/8 10.1.0.0/16| 10.1.2.3/16 10.1.2.0/24| 10.1.2.3/24 10.1.2.3/32| 10.1.2.3 10.0.0.0/8 | 11.1.2.3/8 192.168.1.0/24 | 192.168.1.226/24 192.168.1.0/24 | 192.168.1.255/24 192.168.1.0/24 | 192.168.1.0/25 192.168.1.0/24 | 192.168.1.255/25 192.168.1.0/26 | 192.168.1.226 10.0.0.0/8 | 10::/8 :::1.2.3.4/128 | ::4.3.2.1/24 10:23::f1/128 | 10:23::f1/64 10:23::8000/113| 10:23:: (17 rows) regression=# CREATE INDEX inet_idx3 ON inet_tbl using spgist (i); CREATE INDEX regression=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; c | i ---+--- (0 rows) regression=# explain SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; QUERY PLAN - Gather Merge (cost=16.57..16.67 rows=10 width=64) Workers Planned: 1 -> Sort (cost=16.56..16.58 rows=10 width=64) Sort Key: i -> Parallel Bitmap Heap Scan on inet_tbl (cost=12.26..16.39 rows=10 width=64) Recheck Cond: (i <> '192.168.1.0/24'::inet) -> Bitmap Index Scan on inet_idx3 (cost=0.00..12.26 rows=17 width=0) Index Cond: (i <> '192.168.1.0/24'::inet) (8 rows) -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] postgres_fdw: correct regression test for parameterized scan for foreign table
Hi, While working on adding support for parameterized foreign joins to postgres_fdw, I noticed that commit e4106b2528727c4b48639c0e12bf2f70a766b910 forgot to modify a test query for execution of a parameterized foreign scan for a foreign table: --- parameterized remote path +-- parameterized remote path for foreign table EXPLAIN (VERBOSE, COSTS false) - SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2; + SELECT * FROM "S 1"."T 1" a, ft2 b WHERE a."C 1" = 47 AND b.c1 = a.c2; SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2; The query in the last line as-is would test to see if the remote join works correctly, which isn't the right thing to do here. Attached is a small patch for addressing that. Best regards, Etsuro Fujita postgresql-fdw-regress.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present
Hi, Am Dienstag, den 21.03.2017, 11:03 +0900 schrieb Michael Paquier: > On Tue, Mar 21, 2017 at 1:32 AM, Michael Banck > wrote: > /* > + * Try to create a permanent replication slot if one is specified. Do > + * not error out if the slot already exists, other errors are already > + * reported by CreateReplicationSlot(). > + */ > +if (!stream->temp_slot && stream->replication_slot) > +{ > +if (!CreateReplicationSlot(conn, stream->replication_slot, > NULL, true, true)) > +return false; > +} > This could be part of an else taken from the previous if where > temp_slot is checked. Not sure how this would work, as ISTM the if clause above only sets the name for param->temp_slot (if supported), but doesn't distinguish which kind of slot (if any) will actually be created. I've done it for the second (refactoring) patch though, but I had to make `no_slot' a global variable to not have the logic be too complicated. > There should be some TAP tests included. And +1 for sharing the same > behavior as pg_receivewal. Well, I've adjusted the TAP tests in so far as it's now checking that the physical slot got created, previously it checked for the opposite: |-$node->command_fails( |+$node->command_ok( |[ 'pg_basebackup', '-D', |"$tempdir/backupxs_sl_fail", '-X', |'stream','-S', |- 'slot1' ], |- 'pg_basebackup fails with nonexistent replication slot'); |+ 'slot0' ], |+ 'pg_basebackup runs with nonexistent replication slot'); |+ |+my $lsn = $node->safe_psql('postgres', |+ q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name |= 'slot0'} |+); |+like($lsn, qr!^0/[0-9A-F]{7,8}$!, 'slot is present'); I have now made the message a bit clearer by saying "pg_basebackup -S creates previously nonexistent replication slot". Probably there could be additional TAP tests, but off the top of my head I can't think of any? New patches attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer From 93337fe0ef320fe8ef68a9b64c4df85a63c4346c Mon Sep 17 00:00:00 2001 From: Michael Banck Date: Sun, 19 Mar 2017 10:58:13 +0100 Subject: [PATCH 1/2] Create replication slot in pg_basebackup if requested and not yet present. If a replication slot is explicitly requested, try to create it before starting to replicate from it. --- src/bin/pg_basebackup/pg_basebackup.c| 15 +++ src/bin/pg_basebackup/t/010_pg_basebackup.pl | 19 --- 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 0a4944d..69ca4be 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -581,6 +581,21 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier) else param->temp_slot = temp_replication_slot; + /* + * Try to create a permanent replication slot if one is specified. Do + * not error out if the slot already exists, other errors are already + * reported by CreateReplicationSlot(). + */ + if (!temp_replication_slot && replication_slot) + { + if (verbose) + fprintf(stderr, _("%s: creating replication slot \"%s\"\n"), + progname, replication_slot); + + if (!CreateReplicationSlot(param->bgconn, replication_slot, NULL, true, true)) + return 1; + } + if (format == 'p') { /* diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 14bd813..f50005f 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -4,7 +4,7 @@ use Cwd; use Config; use PostgresNode; use TestLib; -use Test::More tests => 72; +use Test::More tests => 73; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -246,17 +246,22 @@ $node->command_ok( 'pg_basebackup -X stream runs with --no-slot'); $node->command_fails( - [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1' ], + [ 'pg_basebackup', '-D', "$tempdir/fail", '-X', 'none', '-S', 'slot0' ], 'pg_basebackup with replication slot fails without -X stream'); -$node->command_fails( +$node->command_ok( [ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_fail", '-X', 'stream','-S', - 'slot1' ], - 'pg_basebackup fails with nonexistent replication slot'); + 'slot0' ], + 'pg_basebackup -S creates previously nonexistent replication slot'); + +my $lsn = $node->safe_psql('postgres', + q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'} +); +like($lsn, qr!^0/[0-9A-F]{7,8}$!, 'slot is present'); $node->saf
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 6:56 AM, Amit Kapila wrote: >> Hmm, that test case isn't all that synthetic. It's just a single >> column bulk update, which isn't anything all that crazy, and 5-10% >> isn't nothing. >> >> I'm kinda surprised it made that much difference, though. >> > > I think it is because heap_getattr() is not that cheap. We have > noticed the similar problem during development of scan key push down > work [1]. Yeah. So what's the deal with this? Is somebody working on figuring out a different approach that would reduce this overhead? Are we going to defer WARM to v11? Or is the intent to just ignore the 5-10% slowdown on a single column update and commit everything anyway? (A strong -1 on that course of action from me.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: [[Parallel] Shared] Hash
Hi, Here is a new version of the patch series addressing complaints from Rafia, Peter, Andres and Robert -- see below. First, two changes not already covered in this thread: 1. Today Robert asked me a question off-list that I hadn't previously considered: since I am sharing tuples between backends, don't I have the same kind of transient record remapping problems that tqueue.c has to deal with? The answer must be yes, and in fact it's a trickier version because there are N 'senders' and N 'receivers' communicating via the shared hash table. So I decided to avoid the problem by not planning shared hash tables if the tuples could include transient RECORD types: see tlist_references_transient_type() in 0007-hj-shared-single-batch-v8.patch. Perhaps in the future we can find a way for parallel query to keep local types in sync, so this restriction could be lifted. (I've tested this with a specially modified build, because I couldn't figure out how to actually get any transient types to be considered in a parallel query, but if someone has a suggestion for a good query for that I'd love to put one into the regression test.) 2. Earlier versions included support for Shared Hash (= one worker builds, other workers wait, but we get to use work_mem * P memory) and Parallel Shared Hash (= all workers build). Shared Hash is by now quite hard to reach, since so many hash join inner plans are now parallelisable. I decided to remove support for it from the latest patch series: I think it adds cognitive load and patch lines for little or no gain. With time running out, I thought that it would be better to rip it out for now to try to simplify things and avoid some difficult questions about how to cost that mode. It could be added with a separate patch after some more study if it really does make some sense. >> On Mon, Mar 13, 2017 at 8:40 PM, Rafia Sabih >> wrote: >>> In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a >>> few regressions, >>> Q21: 52 secs on HEAD and 400 secs with this patch As already mentioned there is a planner bug which we can fix separately from this patch series. Until that is resolved, please see that other thread[1] for the extra tweak require for sane results when testing Q21. Even with that tweak, there was a slight regression with fewer than 3 workers at 1GB for Q21. That turned out to be because the patched version was not always using as many workers as unpatched. To fix that, I had to rethink the deadlock avoidance system to make it a bit less conservative about giving up workers: see src/backend/utils/misc/leader_gate.c in 0007-hj-shared-single-batch-v8.patch. Here are some speed-up numbers comparing master to patched that I recorded on TPCH scale 10 with work_mem = 1GB. These are the queries whose plans change with the patch. Both master and v8 were patched with fix-neqseljoin-for-semi-joins.patch. query | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 ---+---+---+---+---+---+---+---+---+--- Q3| 0.94x | 1.06x | 1.25x | 1.46x | 1.64x | 1.87x | 1.99x | 1.67x | 1.67x Q5| 1.17x | 1.03x | 1.23x | 1.27x | 1.44x | 0.56x | 0.95x | 0.94x | 1.16x Q7| 1.13x | 1.04x | 1.31x | 1.06x | 1.15x | 1.28x | 1.31x | 1.35x | 1.13x Q8| 0.99x | 1.13x | 1.23x | 1.22x | 1.36x | 0.42x | 0.82x | 0.78x | 0.81x Q9| 1.16x | 0.95x | 1.92x | 1.68x | 1.90x | 1.89x | 2.02x | 2.05x | 1.81x Q10 | 1.01x | 1.03x | 1.08x | 1.10x | 1.16x | 1.17x | 1.09x | 1.01x | 1.07x Q12 | 1.03x | 1.19x | 1.42x | 0.75x | 0.74x | 1.00x | 0.99x | 1.00x | 1.01x Q13 | 1.10x | 1.66x | 1.99x | 1.00x | 1.12x | 1.00x | 1.12x | 1.01x | 1.13x Q14 | 0.97x | 1.13x | 1.22x | 1.45x | 1.43x | 1.55x | 1.55x | 1.50x | 1.45x Q16 | 1.02x | 1.13x | 1.07x | 1.09x | 1.10x | 1.10x | 1.13x | 1.10x | 1.11x Q18 | 1.05x | 1.43x | 1.33x | 1.21x | 1.07x | 1.57x | 1.76x | 1.09x | 1.09x Q21 | 0.99x | 1.01x | 1.07x | 1.18x | 1.28x | 1.37x | 1.63x | 1.26x | 1.60x These tests are a bit short and noisy and clearly there are some strange dips in there that need some investigation but the trend is positive. Here are some numbers from some simple test joins, so that you can see the raw speedup of large hash joins without all the other things going on in those TPCH plans. I executed 1-join, 2-join and 3-join queries like this: CREATE TABLE simple AS SELECT generate_series(1, 1000) AS id, 'aa'; ANALYZE simple; SELECT COUNT(*) FROM simple r JOIN simple s USING (id); SELECT COUNT(*) FROM simple r JOIN simple s USING (id) JOIN simple t USING (id); SELECT COUNT(*) FROM simple r JOIN simple s USING (id) JOIN simple t USING (id) JOIN simple u USING (id); Unpatched master can make probing go faster by adding workers, but not building, so in these self-joins the ability to scale with more CPUs is limited (here w = 1 shows the speedup compared to
Re: [HACKERS] Removing binaries
On Mon, Mar 20, 2017 at 6:15 PM, David Steele wrote: > On 3/20/17 3:40 PM, Jan de Visser wrote: >> On Monday, March 20, 2017 3:30:49 PM EDT Robert Haas wrote: >>> That would annoy me, because I use these constantly. I also think >>> that they solve a problem for users, which is this: >>> >>> [rhaas ~]$ psql >>> psql: FATAL: database "rhaas" does not exist >>> [rhaas ~]$ psql -c 'create database rhaas;' >>> psql: FATAL: database "rhaas" does not exist >>> [rhaas ~]$ gosh, i know i need to connect to a database in order to >>> create the database to which psql tries to connect by default, so >>> there must be an existing database with some name, but what exactly is >>> that name, anyway? >>> -bash: gosh,: command not found >>> >>> There was an occasion when this exact problem almost caused me to give >>> up on using PostgreSQL. Everybody here presumably knows that >>> template1 and postgres are the magic words you can add to the end of >>> that command line to make it work, but that is NOT self-evident to >>> newcomers. >> >> Same here. I worked on a system with a shrink-wrap installer which >> installed >> pgsql as well and initialized it for use by the system user of our >> software. >> If a tester or sales engineer wanted to play with the DB, it would be >> about 30 >> minutes before they would end up at my desk, in tears. > > How about adding a hint? I think it's tricky to do that, because the error happens in response to the connection attempt and at that point you don't know that the command the user plans to send is CREATE DATABASE. If we could somehow detect that the user is trying to CREATE DATABASE against a nonexistent database and hint in that case, that would be *great*, but I don't see a way to make it work. Here's another idea: what if we always created the default database at initdb time? For example, if I initdb as rhaas, maybe it should create an "rhaas" database for me, so that this works: initdb pg_ctl start psql I think a big part of the usability problem here comes from the fact that the default database for connections is based on the username, but the default databases that get created have fixed names (postgres, template1). So the default configuration is one where you can't connect. Why the heck do we do it that way? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present
Am Dienstag, den 21.03.2017, 12:52 +0100 schrieb Michael Banck: > New patches attached. On second though, there was a superfluous whitespace change in t/010_pg_basebackup.pl, and I've moved the check-for-hex regex fix to the second patch as it's cosmetic and not related to changing the --slot creation behaviour. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer From 2ab1be27a341ecdcd2d6a3dd5ce535aba5e16b03 Mon Sep 17 00:00:00 2001 From: Michael Banck Date: Sun, 19 Mar 2017 10:58:13 +0100 Subject: [PATCH 01/29] Create replication slot in pg_basebackup if requested and not yet present. If a replication slot is explicitly requested, try to create it before starting to replicate from it. --- src/bin/pg_basebackup/pg_basebackup.c| 15 +++ src/bin/pg_basebackup/t/010_pg_basebackup.pl | 15 ++- 2 files changed, 25 insertions(+), 5 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 0a4944d..69ca4be 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -581,6 +581,21 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier) else param->temp_slot = temp_replication_slot; + /* + * Try to create a permanent replication slot if one is specified. Do + * not error out if the slot already exists, other errors are already + * reported by CreateReplicationSlot(). + */ + if (!temp_replication_slot && replication_slot) + { + if (verbose) + fprintf(stderr, _("%s: creating replication slot \"%s\"\n"), + progname, replication_slot); + + if (!CreateReplicationSlot(param->bgconn, replication_slot, NULL, true, true)) + return 1; + } + if (format == 'p') { /* diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 14bd813..70c3284 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -4,7 +4,7 @@ use Cwd; use Config; use PostgresNode; use TestLib; -use Test::More tests => 72; +use Test::More tests => 73; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -246,14 +246,19 @@ $node->command_ok( 'pg_basebackup -X stream runs with --no-slot'); $node->command_fails( - [ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1' ], + [ 'pg_basebackup', '-D', "$tempdir/fail", '-X', 'none', '-S', 'slot0' ], 'pg_basebackup with replication slot fails without -X stream'); -$node->command_fails( +$node->command_ok( [ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_fail", '-X', 'stream','-S', - 'slot1' ], - 'pg_basebackup fails with nonexistent replication slot'); + 'slot0' ], + 'pg_basebackup -S creates previously nonexistent replication slot'); + +my $lsn = $node->safe_psql('postgres', + q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'} +); +like($lsn, qr!^0/[0-9A-F]{7,8}$!, 'slot is present'); $node->safe_psql('postgres', q{SELECT * FROM pg_create_physical_replication_slot('slot1')}); -- 2.1.4 From 5f0f31f61bf668de937868840a97c0a56dc2cd5d Mon Sep 17 00:00:00 2001 From: Michael Banck Date: Tue, 21 Mar 2017 12:50:22 +0100 Subject: [PATCH 02/29] Refactor replication slot creation in pg_basebackup et al. Add new argument `is_temporary' to CreateReplicationSlot() in streamutil.c which specifies whether a physical replication slot is temporary or not. Update all callers. Move the creation of the temporary replication slot for pg_basebackup from receivelog.c to pg_basebackup.c. At the same time, also create the temporary slot via CreateReplicationSlot() instead of creating the temporary slot with an explicit SQL command. --- src/bin/pg_basebackup/pg_basebackup.c| 28 +--- src/bin/pg_basebackup/pg_receivewal.c| 2 +- src/bin/pg_basebackup/pg_recvlogical.c | 2 +- src/bin/pg_basebackup/receivelog.c | 18 -- src/bin/pg_basebackup/streamutil.c | 18 +- src/bin/pg_basebackup/streamutil.h | 2 +- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 2 +- 7 files changed, 34 insertions(+), 38 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 69ca4be..c7ae281 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -92,6 +92,7 @@ static pg_time_t last_progress_report = 0; static int32 maxrate = 0; /* no limit by default */ static char *replication_slot = NULL; static bool temp_replication_slot = true; +static bool no_slot = false; static bool success = false; static bool made_new_pgdat
Re: [HACKERS] BRIN cost estimate
On 17 March 2017 at 23:50, Emre Hasegeli wrote: >> 1. >> >> + Assert(nnumbers == 1); >> >> I think its a bad idea to Assert() this. The stat tuple can come from >> a plugin which could do anything. Seems like if we need to be certain >> of that then it should be an elog(ERROR), maybe mention that we >> expected a 1 element array, but got elements. > > But it is not coming from a plugin which can do anything. It is the > plugin we know. Assert() is exact same coding with btcostestimate(). Not sure what you mean here. I'm not speaking of the brin index am, I mean the get_index_stats_hook call which you've added. An extension can be loaded which sets this hook and provides its own tuple, which may cause the Assert to fail. Asserts are normally used to verify in assert enabled builds that would cause some following code to crash or not do what it's meant to. Normally they're used to help track down bugs in the software, they're not meant to track bugs in some software we have no control over. >> 2. >> >> + Assert(varCorrelation >= 0 && varCorrelation <= 1); >> >> same goes for that. I don't think we want to Assert() that either. >> elog(ERROR) seems better, or perhaps we should fall back on the old >> estimation method in this case? > > Again the correlation value is expected to be in this range. I don't > think we even need to bother with Assert(). Garbage in garbage out. That's fine. Let's just not garbage crash. >> The Asserted range also seems to contradict the range mentioned in >> pg_statistic.h: > > We are using Abs() of the value. I missed that. >> 3. >> >> + numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel); >> >> should numBlocks be named numRanges? After all we call the option >> "pages_per_range". > > It was named this way before. hmm, before what exactly? before your patch it didn't exist. You introduced it into brincostestimate(). Here's the line of code: + double numBlocks; If we want to have a variable which stores the number of ranges, then I think numRanges is better than numBlocks. I can't imagine many people would disagree there. >> 6. >> >> Might want to consider not applying this estimate when the statistics >> don't contain any STATISTIC_KIND_CORRELATION array. > > I am not sure what would be better than using the value as 0 in this case. At the very least please write a comment to explain this in the code. Right now it looks broken. If I noticed this then one day in the future someone else will. If you write a comment then person of the future will likely read it, and then not raise any questions about the otherwise questionable code. > I can look into supporting expression indexes, if you think something > very much incomplete like this has a chance to get committed. I do strongly agree that the estimates need improved here. I've personally had issues with bad brin estimates before, and I'd like to see it improved. I think the patch is not quite complete without it also considering stats on expression indexes. If you have time to go do that I'd suggest you go ahead with that. I've not looked at the new patch yet, but thanks for making some of those changes. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem in Parallel Bitmap Heap Scan?
On Tue, Mar 21, 2017 at 4:47 PM, Thomas Munro wrote: > I noticed a failure in the inet.sql test while running the regression > tests with parallelism cranked up, and can reproduce it interactively > as follows. After an spgist index is created and the plan changes to > the one shown below, the query returns no rows. Thanks for reporting. Seems like we are getting issues related to TBM_ONE_PAGE and TBM_EMPTY. I think in this area we need more testing, reason these are not tested properly because these are not the natural case for parallel bitmap. I think in next few days I will test more such cases by forcing the parallel bitmap. Here is the patch to fix the issue in hand. I have also run the regress suit with force_parallel_mode=regress and all the test are passing. Results after fix. postgres=# explain analyze SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; QUERY PLAN -- Gather Merge (cost=16.53..16.62 rows=9 width=64) (actual time=4.467..4.478 rows=16 loops=1) Workers Planned: 1 Workers Launched: 1 -> Sort (cost=16.52..16.54 rows=9 width=64) (actual time=0.090..0.093 rows=8 loops=2) Sort Key: i Sort Method: quicksort Memory: 25kB -> Parallel Bitmap Heap Scan on inet_tbl (cost=12.26..16.37 rows=9 width=64) (actual time=0.048..0.050 rows=8 loops=2) Recheck Cond: (i <> '192.168.1.0/24'::inet) Heap Blocks: exact=1 -> Bitmap Index Scan on inet_idx3 (cost=0.00..12.25 rows=16 width=0) (actual time=0.016..0.016 rows=16 loops=1) Index Cond: (i <> '192.168.1.0/24'::inet) Planning time: 0.149 ms Execution time: 5.143 ms (13 rows) postgres=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr ORDER BY i; c |i +-- 10.0.0.0/8 | 9.1.2.3/8 10.0.0.0/8 | 10.1.2.3/8 10.0.0.0/32| 10.1.2.3/8 10.0.0.0/8 | 10.1.2.3/8 10.1.0.0/16| 10.1.2.3/16 10.1.2.0/24| 10.1.2.3/24 10.1.2.3/32| 10.1.2.3 10.0.0.0/8 | 11.1.2.3/8 192.168.1.0/24 | 192.168.1.226/24 192.168.1.0/24 | 192.168.1.255/24 192.168.1.0/24 | 192.168.1.0/25 192.168.1.0/24 | 192.168.1.255/25 192.168.1.0/26 | 192.168.1.226 :::1.2.3.4/128 | ::4.3.2.1/24 10:23::f1/128 | 10:23::f1/64 10:23::8000/113| 10:23:: (16 rows) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com fix_parallel_bitmap_handling_onepage.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Tue, Mar 21, 2017 at 3:41 PM, vinayak wrote: > Thank you for testing the patch on Windows platform. > > Thanks for the updated patch. It works good for a normal relation. But for a relation that contains child tables, the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results. How about adding another phase called PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS and set this phase only when it is an inheritance analyze operation. And adding some explanation of ROWS_SAMPLED phase about inheritance tables how these sampled rows are calculated will provide good analyze progress of relation that contains child relations also. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] BRIN cost estimate
> Not sure what you mean here. I'm not speaking of the brin index am, I > mean the get_index_stats_hook call which you've added. I see. Actually this part was from Alvaro. I haven't noticed the get_index_stats_hook call before, but it is still the same coding as btcostestimate(). btcostestimate() also calls get_index_stats_hook, and then Asserts nnumbers == 1. > hmm, before what exactly? before your patch it didn't exist. You > introduced it into brincostestimate(). I confused by looking at my changes on my repository I made on top of Alvaro's. I will rename it on the next version. > At the very least please write a comment to explain this in the code. > Right now it looks broken. If I noticed this then one day in the > future someone else will. If you write a comment then person of the > future will likely read it, and then not raise any questions about the > otherwise questionable code. Will do. > I do strongly agree that the estimates need improved here. I've > personally had issues with bad brin estimates before, and I'd like to > see it improved. I think the patch is not quite complete without it > also considering stats on expression indexes. If you have time to go > do that I'd suggest you go ahead with that. I will look into it this week. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 5:34 PM, Robert Haas wrote: > On Tue, Mar 21, 2017 at 6:56 AM, Amit Kapila > wrote: > >> Hmm, that test case isn't all that synthetic. It's just a single > >> column bulk update, which isn't anything all that crazy, and 5-10% > >> isn't nothing. > >> > >> I'm kinda surprised it made that much difference, though. > >> > > > > I think it is because heap_getattr() is not that cheap. We have > > noticed the similar problem during development of scan key push down > > work [1]. > > Yeah. So what's the deal with this? Is somebody working on figuring > out a different approach that would reduce this overhead? Are we > going to defer WARM to v11? Or is the intent to just ignore the 5-10% > slowdown on a single column update and commit everything anyway? I think I should clarify something. The test case does a single column update, but it also has columns which are very wide, has an index on many columns (and it updates a column early in the list). In addition, in the test Mithun updated all 10million rows of the table in a single transaction, used UNLOGGED table and fsync was turned off. TBH I see many artificial scenarios here. It will be very useful if he can rerun the query with some of these restrictions lifted. I'm all for addressing whatever we can, but I am not sure if this test demonstrates a real world usage. Having said that, may be if we can do a few things to reduce the overhead. - Check if the page has enough free space to perform a HOT/WARM update. If not, don't look for all index keys. - Pass bitmaps separately for each index and bail out early if we conclude neither HOT nor WARM is possible. In this case since there is just one index and as soon as we check the second column we know neither HOT nor WARM is possible, we will return early. It might complicate the API a lot, but I can give it a shot if that's what is needed to make progress. Any other ideas? Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
[HACKERS] Implementing delete in columnar store fdw
Hello, I want to implement delete functionality for a column store fdw in postgres. It is similar to file_fdw. I want to use the “AddForeignUpdateTargets” function to implement this , but the junk filter shouldn’t be a column present in the table . Is it possible to add a Expr/Var to the junk filter that is not present in the schema of the table , that i will generate on the fly ?? If yes, how can i do that ?? Thanks in advance. Regards, Harikrishnan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Mar 20, 2017 at 8:27 AM, Robert Haas wrote: > On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila wrote: >>> I was wondering about doing an explicit test: if the XID being >>> committed matches the one in the PGPROC, and nsubxids matches, and the >>> actual list of XIDs matches, then apply the optimization. That could >>> replace the logic that you've proposed to exclude non-commit cases, >>> gxact cases, etc. and it seems fundamentally safer. But it might be a >>> more expensive test, too, so I'm not sure. >> >> I think if the number of subxids is very small let us say under 5 or >> so, then such a check might not matter, but otherwise it could be >> expensive. > > We could find out by testing it. We could also restrict the > optimization to cases with just a few subxids, because if you've got a > large number of subxids this optimization probably isn't buying much > anyway. > Yes, and I have modified the patch to compare xids and subxids for group update. In the initial short tests (with few client counts), it seems like till 3 savepoints we can win and 10 savepoints onwards there is some regression or at the very least there doesn't appear to be any benefit. We need more tests to identify what is the safe number, but I thought it is better to share the patch to see if we agree on the changes because if not, then the whole testing needs to be repeated. Let me know what do you think about attached? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com group_update_clog_v11.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
PFA an updated patch. This fixes an issue reported by Tushar internally. Since the patch changes the way min and max wal_size is stored internally from segment count to size in kb, it limited the maximum size of min and max_wal_size to 2GB in 32 bit systems. The minimum required segment is 2 and the minimum wal size is 1MB. So the lowest possible value of the min/max wal_size is 2MB. Hence, I have changed the internal representation to MB instead of KB so that we can increase the range. Also, for any wal-seg-size, it retains the default seg count as 5 and 64 for min and max wal_size. Consequently, the size of the variables increase automatically according to the wal_segment_size. This behaviour is similar to that of existing code. I have also run pg_indent on the files. On Mon, Mar 20, 2017 at 11:37 PM, Beena Emerson wrote: > Hello, > > PFA the updated patch. > > On Fri, Mar 17, 2017 at 6:40 AM, Robert Haas > wrote: > >> On Tue, Mar 14, 2017 at 1:44 AM, Beena Emerson >> wrote: >> > Attached is the updated patch. It fixes the issues and also updates few >> code >> > comments. >> >> I did an initial readthrough of this patch tonight just to get a >> feeling for what's going on. Based on that, here are a few review >> comments: >> >> The changes to pg_standby seem to completely break the logic to wait >> until the file has attained the correct size. I don't know how to >> salvage that logic off-hand, but just breaking it isn't acceptable. >> > > Using, the XLogLongPageHeader->xlp_seg_size, all the original checks have > been retained. This methid is even used in pg_waldump. > > > >> >> + Note that changing this value requires an initdb. >> >> Instead, maybe say something like "Note that this value is fixed for >> the lifetime of the database cluster." >> > > Corrected. > > >> >> -intmax_wal_size = 64;/* 1 GB */ >> -intmin_wal_size = 5;/* 80 MB */ >> +intwal_segment_size = 2048;/* 16 MB */ >> +intmax_wal_size = 1024 * 1024;/* 1 GB */ >> +intmin_wal_size = 80 * 1024;/* 80 MB */ >> >> If wal_segment_size is now measured in multiple of XLOG_BLCKSZ, then >> it's not the case that 2048 is always 16MB. If the other values are >> now measured in kB, perhaps rename the variables to add _kb, to avoid >> confusion with the way it used to work (and in general). The problem >> with leaving this as-is is that any existing references to >> max_wal_size in core or extension code will silently break; you want > > it to break in a noticeable way so that it gets fixed. >> >> > The wal_segment_size now is DEFAULT_XLOG_SEG_SIZE / XLOG_BLCKSZ; > min and max wal_size have _kb postfix > > >> + * UsableBytesInSegment: It is set in assign_wal_segment_size and stores >> the >> + * number of bytes in a WAL segment usable for WAL data. >> >> The comment doesn't need to say where it gets set, and it doesn't need >> to repeat the variable name. Just say "The number of bytes in a..." >> > > Done. > > >> >> +assign_wal_segment_size(int newval, void *extra) >> >> Why does a PGC_INTERNAL GUC need an assign hook? I think the GUC >> should only be there to expose the value; it shouldn't have >> calculation logic associated with it. >> > > Removed the function and called the functions in ReadControlFile. > > >> >> /* >> + * initdb passes the WAL segment size in an environment variable. We >> don't >> + * bother doing any sanity checking, we already check in initdb that >> the >> + * user gives a sane value. >> + */ >> +XLogSegSize = pg_atoi(getenv("XLOG_SEG_SIZE"), sizeof(uint32), 0); >> >> I think we should bother. I don't like the idea of the postmaster >> crashing in flames without so much as a reasonable error message if >> this parameter-passing mechanism goes wrong. >> > > I have rechecked the XLogSegSize. > > >> >> +{"wal-segsize", required_argument, NULL, 'Z'}, >> >> When adding an option with no documented short form, generally one >> picks a number that isn't a character for the value at the end. See >> pg_regress.c or initdb.c for examples. >> > > Done. > > >> >> + wal_segment_size = atoi(str_wal_segment_size); >> >> So, you're comfortable interpreting --wal-segsize=1TB or >> --wal-segsize=1GB as 1? Implicitly, 1MB? >> > > Imitating the current behaviour of config option --with-wal-segment, I > have used strtol to throw an error if the value is not only integers. > > >> >> + * ControlFile is not accessible here so use SHOW wal_segment_size >> command >> + * to set the XLogSegSize >> >> Breaks compatibility with pre-9.6 servers. >> > > Added check for the version, the SHOW command will be run only in v10 and > above. Previous versions do not need this. > > >> >> -- > Thank you, > > Beena Emerson > > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Thank you, Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreS
Re: [HACKERS] increasing the default WAL segment size
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Mon, Mar 20, 2017 at 7:23 PM, David Steele wrote: > > With 16MB WAL segments the filename neatly aligns with the LSN. For > > example: > > > > WAL FILE 0001000100FE = LSN 1/FE00 > > > > This no longer holds true with this patch. > > It is already possible to change the WAL segment size using the > configure option --with-wal-segsize, and I think the patch should be > consistent with whatever that existing option does. Considering how little usage that option has likely seen (I can't say I've ever run into usage of it so far...), I'm not really sure that it makes sense to treat it as final when we're talking about changing the default here. In short, I'm also concerned about this change to make WAL file names no longer match up with LSNs and also about the odd stepping that you get as a result of this change when it comes to WAL file names. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee wrote: >> Yeah. So what's the deal with this? Is somebody working on figuring >> out a different approach that would reduce this overhead? Are we >> going to defer WARM to v11? Or is the intent to just ignore the 5-10% >> slowdown on a single column update and commit everything anyway? > > I think I should clarify something. The test case does a single column > update, but it also has columns which are very wide, has an index on many > columns (and it updates a column early in the list). In addition, in the > test Mithun updated all 10million rows of the table in a single transaction, > used UNLOGGED table and fsync was turned off. > > TBH I see many artificial scenarios here. It will be very useful if he can > rerun the query with some of these restrictions lifted. I'm all for > addressing whatever we can, but I am not sure if this test demonstrates a > real world usage. That's a very fair point, but if these patches - or some of them - are going to get committed then these things need to get discussed. Let's not just have nothing-nothing-nothing giant unagreed code drop. I think that very wide columns and highly indexed tables are not particularly unrealistic, nor do I think updating all the rows is particularly unrealistic. Sure, it's not everything, but it's something. Now, I would agree that all of that PLUS unlogged tables with fsync=off is not too realistic. What kind of regression would we observe if we eliminated those last two variables? > Having said that, may be if we can do a few things to reduce the overhead. > > - Check if the page has enough free space to perform a HOT/WARM update. If > not, don't look for all index keys. > - Pass bitmaps separately for each index and bail out early if we conclude > neither HOT nor WARM is possible. In this case since there is just one index > and as soon as we check the second column we know neither HOT nor WARM is > possible, we will return early. It might complicate the API a lot, but I can > give it a shot if that's what is needed to make progress. I think that whether the code ends up getting contorted is an important consideration here. For example, if the first of the things you mention can be done without making the code ugly, then I think that would be worth doing; it's likely to help fairly often in real-world cases. The problem with making the code contorted and ugly, as you say that the second idea would require, is that it can easily mask bugs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops
Thank you, pushed. I just make test table permanent. Anastasia Lubennikova wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested As I can see, this bugfix was already discussed and reviewed. All mentioned issues are fixed in the latest version of the patch So I mark it "Ready for committer". Thank you for fixing it! The new status of this patch is: Ready for Committer -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Amit Kapila wrote: > I think it is because heap_getattr() is not that cheap. We have > noticed the similar problem during development of scan key push down > work [1]. One possibility to reduce the cost of that is to use whole tuple deform instead of repeated individual heap_getattr() calls. Since we don't actually need *all* attrs, we can create a version of heap_deform_tuple that takes an attribute number as argument and decodes up to that point. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)
On Tue, Mar 21, 2017 at 10:52 AM, Michael Paquier wrote: Thank you for the review. > Unfortunately this is true only for background workers that connect to > a database. And this would break for bgworkers that do not do that. > The point to fix is here: > + if (MyBackendId != InvalidBackendId) > + { > [...] > + else if (IsBackgroundWorker) > + { > + /* bgworker */ > + beentry->st_backendType = B_BG_WORKER; > + } > Your code is assuming that a bgworker will always be setting > MyBackendId which is not necessarily true, and you would trigger the > assertion down: > Assert(MyAuxProcType != NotAnAuxProcess); > So you need to rely on IsBackgroundWorker in priority I think. I would > suggest as well to give up calling pgstat_bestart() in > BackgroundWorkerInitializeConnection[ByOid] and let the workers do > this work by themselves. This gives more flexibility. You would need > to update the logical replication worker and worker_spi for that as > well. We reserve a slot for each possible BackendId, plus one for each possible auxiliary process type. For a non-auxiliary process, BackendId is used to refer the backend status in PgBackendStatus array. So, a bgworker without any BackendId can't initialize its' entry in PgBackendStatus array. In simple terms, it will not be shown in pg_stat_activity. I've added some comments regarding this in pgstat_bestart(). And, any bgworker having valid BackendId will have either a valid userid or BOOTSTRAP_SUPERUSERID. > If you want to test this configuration, feel free to use this background > worker: > https://github.com/michaelpq/pg_plugins/tree/master/hello_world > This just prints an entry to the logs every 10s, and does not connect > to any database. Adding a call to pgstat_bestart() in hello_main > triggers the assertion. > In this case, pgstat_bestart() shouldn't be called from this module as the spawned bgworker will have invalid BackendId. By the way, thanks for sharing the repo link. Found a lot of interesting things to explore and learn. :) >>> @@ -808,6 +836,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) >>> nulls[12] = true; >>> nulls[13] = true; >>> nulls[14] = true; >>> + nulls[23] = true; >>> } >>> That's not possible to have backend_type set as NULL, no? >> >> Yes, backend_type can't be null. But, I'm not sure whether it should >> be visible to a user with insufficient privileges. Anyway, I've made >> it visible to all user for now. >> >> Please find the updated patches in the attachment. > > Yeah, hiding it may make sense... Modified. > /* The autovacuum launcher is done here */ > if (IsAutoVacuumLauncherProcess()) > + { > return; > + } > Unnecessary noise here. > Fixed. > Except for the two issues pointed out in this email, I am pretty cool > with this patch. Nice work. Thank you. :) Please find the updated patches. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com 0001-Infra-to-expose-all-backend-processes-in-pg_stat_get.patch Description: application/download 0002-Expose-stats-for-all-backends.patch Description: application/download 0003-Add-backend_type-column-in-pg_stat_get_activity.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioned tables and relfilenode
On Tue, Mar 21, 2017 at 5:05 AM, Amit Langote wrote: > Attached updated patches. Committed 0001 after removing a comma. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze on Cygwin w/ concurrency
On Mon, Mar 20, 2017 at 11:47 PM, Noah Misch wrote: > "pgbench -i -s 50; pgbench -S -j2 -c16 -T900 -P5" freezes consistently on > Cygwin 2.2.1 and Cygwin 2.6.0. (I suspect most other versions are affected.) > I've pinged[1] the Cygwin bug thread with some additional detail. Ouch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Multiple false-positive warnings from Valgrind
Hello. I need a little help. Recently I've decided to run PostgreSQL under Valgrind according to wiki description [1]. Lots of warnings are generated [2] but it is my understanding that all of them are false-positive. For instance I've found these two reports particularly interesting: ``` ==00:00:40:40.161 7677== Use of uninitialised value of size 8 ==00:00:40:40.161 7677==at 0xA15FF5: pg_b64_encode (base64.c:68) ==00:00:40:40.161 7677==by 0x6FFE85: scram_build_verifier (auth-scram.c:348) ==00:00:40:40.161 7677==by 0x6F3F76: encrypt_password (crypt.c:171) ==00:00:40:40.161 7677==by 0x68F40C: CreateRole (user.c:403) ==00:00:40:40.161 7677==by 0x85D53A: standard_ProcessUtility (utility.c:716) ==00:00:40:40.161 7677==by 0x85CCC7: ProcessUtility (utility.c:353) ==00:00:40:40.161 7677==by 0x85BD22: PortalRunUtility (pquery.c:1165) ==00:00:40:40.161 7677==by 0x85BF20: PortalRunMulti (pquery.c:1308) ==00:00:40:40.161 7677==by 0x85B4A0: PortalRun (pquery.c:788) ==00:00:40:40.161 7677==by 0x855672: exec_simple_query (postgres.c:1101) ==00:00:40:40.161 7677==by 0x8597BB: PostgresMain (postgres.c:4066) ==00:00:40:40.161 7677==by 0x7C6322: BackendRun (postmaster.c:4317) ==00:00:40:40.161 7677== Uninitialised value was created by a stack allocation ==00:00:40:40.161 7677==at 0x6FFDB7: scram_build_verifier (auth-scram.c:328) ==00:00:40:40.593 7677== Use of uninitialised value of size 8 ==00:00:40:40.593 7677==at 0x8A7C36: hex_encode (encode.c:132) ==00:00:40:40.593 7677==by 0x6FFEF5: scram_build_verifier (auth-scram.c:355) ==00:00:40:40.593 7677==by 0x6F3F76: encrypt_password (crypt.c:171) ==00:00:40:40.593 7677==by 0x68F40C: CreateRole (user.c:403) ==00:00:40:40.593 7677==by 0x85D53A: standard_ProcessUtility (utility.c:716) ==00:00:40:40.593 7677==by 0x85CCC7: ProcessUtility (utility.c:353) ==00:00:40:40.593 7677==by 0x85BD22: PortalRunUtility (pquery.c:1165) ==00:00:40:40.593 7677==by 0x85BF20: PortalRunMulti (pquery.c:1308) ==00:00:40:40.593 7677==by 0x85B4A0: PortalRun (pquery.c:788) ==00:00:40:40.593 7677==by 0x855672: exec_simple_query (postgres.c:1101) ==00:00:40:40.593 7677==by 0x8597BB: PostgresMain (postgres.c:4066) ==00:00:40:40.593 7677==by 0x7C6322: BackendRun (postmaster.c:4317) ==00:00:40:40.593 7677== Uninitialised value was created by a stack allocation ==00:00:40:40.593 7677==at 0x6FFDB7: scram_build_verifier (auth-scram.c:328) ==00:00:40:40.593 7677== ``` And here is what I see in GDB [3]: ``` 0x00a160b4 in pg_b64_encode (src=0xffefffb10 [...] at base64.c:80 80 *p++ = _base64[(buf >> 12) & 0x3f]; (gdb) monitor get_vbits 0xffefffb10 10 0x008a7c36 in hex_encode ( src=0xffefffbc0 [...] at encode.c:132 132 *dst++ = hextbl[(*src >> 4) & 0xF]; (gdb) monitor get_vbits 0xffefffbc0 32 ``` So Valgrind thinks that in both cases first argument is completely uninitialized which is very doubtful to say the least :) There are also lots of memory leak reports which could be found in [2]. I got a strong feeling that maybe I'm doing something wrong. Here are exact script I'm using to build [4], install and run PostgreSQL under Valgrind [5]. Naturally USE_VALGRIND in declared in pg_config_manual.h. Valgrind version is 3.12 and an environment in general is Arch Linux. Could you please give a little piece of advice? Or maybe a wiki page is just a bit outdated? [1] https://wiki.postgresql.org/wiki/Valgrind [2] http://afiskon.ru/s/8a/390698e914_valgrind.tgz [3] http://afiskon.ru/s/09/c4f6231679_pgvg.txt [4] https://github.com/afiskon/pgscripts/blob/master/quick-build.sh [5] https://github.com/afiskon/pgscripts/blob/master/valgrind.sh -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 6:55 PM, Robert Haas wrote: > On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee > wrote: >>> Yeah. So what's the deal with this? Is somebody working on figuring >>> out a different approach that would reduce this overhead? Are we >>> going to defer WARM to v11? Or is the intent to just ignore the 5-10% >>> slowdown on a single column update and commit everything anyway? >> >> I think I should clarify something. The test case does a single column >> update, but it also has columns which are very wide, has an index on many >> columns (and it updates a column early in the list). In addition, in the >> test Mithun updated all 10million rows of the table in a single transaction, >> used UNLOGGED table and fsync was turned off. >> >> TBH I see many artificial scenarios here. It will be very useful if he can >> rerun the query with some of these restrictions lifted. I'm all for >> addressing whatever we can, but I am not sure if this test demonstrates a >> real world usage. > > That's a very fair point, but if these patches - or some of them - are > going to get committed then these things need to get discussed. Let's > not just have nothing-nothing-nothing giant unagreed code drop. > > I think that very wide columns and highly indexed tables are not > particularly unrealistic, nor do I think updating all the rows is > particularly unrealistic. Sure, it's not everything, but it's > something. Now, I would agree that all of that PLUS unlogged tables > with fsync=off is not too realistic. What kind of regression would we > observe if we eliminated those last two variables? > Sure, we can try that. I think we need to try it with synchronous_commit = off, otherwise, WAL writes completely overshadows everything. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 10:01 AM, Amit Kapila wrote: >> I think that very wide columns and highly indexed tables are not >> particularly unrealistic, nor do I think updating all the rows is >> particularly unrealistic. Sure, it's not everything, but it's >> something. Now, I would agree that all of that PLUS unlogged tables >> with fsync=off is not too realistic. What kind of regression would we >> observe if we eliminated those last two variables? > > Sure, we can try that. I think we need to try it with > synchronous_commit = off, otherwise, WAL writes completely overshadows > everything. synchronous_commit = off is a much more realistic scenario than fsync = off. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 6:55 PM, Robert Haas wrote: > > I think that very wide columns and highly indexed tables are not > particularly unrealistic, nor do I think updating all the rows is > particularly unrealistic. Ok. But those who update 10M rows in a single transaction, would they really notice 5-10% variation? I think it probably makes sense to run those updates in smaller transactions and see if the regression is still visible (otherwise tweaking synchronous_commit is mute anyways). > Sure, it's not everything, but it's > something. Now, I would agree that all of that PLUS unlogged tables > with fsync=off is not too realistic. What kind of regression would we > observe if we eliminated those last two variables? > Hard to say. I didn't find any regression on the machines available to me even with the original test case that I used, which was pretty bad case to start with (sure, Mithun tweaked it further to create even worse scenario). May be the kind of machines he has access to, it might show up even with those changes. > > > I think that whether the code ends up getting contorted is an > important consideration here. For example, if the first of the things > you mention can be done without making the code ugly, then I think > that would be worth doing; it's likely to help fairly often in > real-world cases. The problem with making the code contorted and > ugly, as you say that the second idea would require, is that it can > easily mask bugs. Agree. That's probably one reason why Alvaro wrote the patch to start with. I'll give the first of those two options a try. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Robert Haas wrote: > On Tue, Mar 21, 2017 at 10:01 AM, Amit Kapila wrote: > > Sure, we can try that. I think we need to try it with > > synchronous_commit = off, otherwise, WAL writes completely overshadows > > everything. > > synchronous_commit = off is a much more realistic scenario than fsync = off. Sure, synchronous_commit=off is a reasonable case. But I say if we lose a few % on the case where you update only the first indexed of a large number of very wide columns all indexed, and this is only noticeable if you don't write WAL and only if you update all the rows in the table, then I don't see much reason for concern. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 21 March 2017 at 02:21, Craig Ringer wrote: > On 20 March 2017 at 17:33, Andres Freund wrote: > >>> Subject: [PATCH 2/3] Follow timeline switches in logical decoding >> >> FWIW, the title doesn't really seem accurate to me. > > Yeah, it's not really at the logical decoding layer at all. > > "Teach xlogreader to follow timeline switches" ? Happy with that. I think Craig has addressed Andres' issues with this patch, so I will apply later today as planned using that name. The longer Logical Decoding on Standby will not be applied yet and not without further changess, per review. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
Thanks Rajkumar. Added those in the latest set of patches. On Tue, Mar 21, 2017 at 3:52 PM, Rajkumar Raghuwanshi wrote: >> On Mon, Mar 20, 2017 at 1:19 PM, Ashutosh Bapat >> wrote: > > I have created some test to cover partition wise joins with > postgres_fdw, also verified make check. > patch attached. > > Thanks & Regards, > Rajkumar Raghuwanshi > QMG, EnterpriseDB Corporation -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing binaries
On Tue, Mar 21, 2017 at 5:12 AM, Robert Haas wrote: > > Here's another idea: what if we always created the default database at > initdb time? For example, if I initdb as rhaas, maybe it should > create an "rhaas" database for me, so that this works: > > initdb > pg_ctl start > psql > > I think a big part of the usability problem here comes from the fact > that the default database for connections is based on the username, > but the default databases that get created have fixed names (postgres, > template1). So the default configuration is one where you can't > connect. Why the heck do we do it that way? > > I'd be curious to estimate how many users that have difficulties learning how all this works actually run a manual initdb prior to beginning their experimentation. I suspect the percentage is fairly low. Doing away with "the default database for psql is one named after the user" seems like it would be more widely applicable. I for one tend to name things after what they do, or are used for, and thus have never benefited from this particular design decision. David J.
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Tue, Mar 21, 2017 at 7:41 AM, Ashutosh Bapat wrote: > On Mon, Mar 20, 2017 at 10:17 PM, Ashutosh Bapat > wrote: >>> >>> On a further testing of this patch I find another case when it is >>> showing regression, the time taken with patch is around 160 secs and >>> without it is 125 secs. >>> Another minor thing to note that is planning time is almost twice with >>> this patch, though I understand that this is for scenarios with really >>> big 'big data' so this may not be a serious issue in such cases, but >>> it'd be good if we can keep an eye on this that it doesn't exceed the >>> computational bounds for a really large number of tables. >> >> Right, planning time would be proportional to the number of partitions >> at least in the first version. We may improve upon it later. >> >>> Please find the attached .out file to check the output I witnessed and >>> let me know if anymore information is required >>> Schema and data was similar to the preciously shared schema with the >>> addition of more data for this case, parameter settings used were: >>> work_mem = 1GB >>> random_page_cost = seq_page_cost = 0.1 > > this doesn't look good. Why do you set both these costs to the same value? That's a perfectly reasonable configuration if the data is in memory on a medium with fast random access, like an SSD. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 10:21 AM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Tue, Mar 21, 2017 at 10:01 AM, Amit Kapila >> wrote: > >> > Sure, we can try that. I think we need to try it with >> > synchronous_commit = off, otherwise, WAL writes completely overshadows >> > everything. >> >> synchronous_commit = off is a much more realistic scenario than fsync = off. > > Sure, synchronous_commit=off is a reasonable case. But I say if we lose > a few % on the case where you update only the first indexed of a large > number of very wide columns all indexed, and this is only noticeable if > you don't write WAL and only if you update all the rows in the table, > then I don't see much reason for concern. If the WAL writing hides the loss, then I agree that's not a big concern. But if the loss is still visible even when WAL is written, then I'm not so sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On 3/21/17 9:04 AM, Stephen Frost wrote: Robert, * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Mar 20, 2017 at 7:23 PM, David Steele wrote: With 16MB WAL segments the filename neatly aligns with the LSN. For example: WAL FILE 0001000100FE = LSN 1/FE00 This no longer holds true with this patch. It is already possible to change the WAL segment size using the configure option --with-wal-segsize, and I think the patch should be consistent with whatever that existing option does. Considering how little usage that option has likely seen (I can't say I've ever run into usage of it so far...), I'm not really sure that it makes sense to treat it as final when we're talking about changing the default here. +1. A seldom-used compile-time option does not necessarily provide a good model for a user-facing feature. In short, I'm also concerned about this change to make WAL file names no longer match up with LSNs and also about the odd stepping that you get as a result of this change when it comes to WAL file names. I can't decide which way I like best. I like the filenames corresponding to LSNs as they do now, but it seems like a straight sequence might be easier to understand. Either way you need to know that different segment sizes mean different numbers of segments per lsn.xlogid. Even now the correspondence is a bit tenuous. I've always thought: 00010001000F Should be: 000100010F00 I'm really excited to (hopefully) have this feature in v10. I just want to be sure we discuss this as it will be a big change for tool authors and just about anybody who looks at WAL. Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing binaries
On 3/21/17 10:30 AM, David G. Johnston wrote: On Tue, Mar 21, 2017 at 5:12 AM, Robert Haas mailto:robertmh...@gmail.com>>wrote: Here's another idea: what if we always created the default database at initdb time? For example, if I initdb as rhaas, maybe it should create an "rhaas" database for me, so that this works: initdb pg_ctl start psql I think a big part of the usability problem here comes from the fact that the default database for connections is based on the username, but the default databases that get created have fixed names (postgres, template1). So the default configuration is one where you can't connect. Why the heck do we do it that way? I'd be curious to estimate how many users that have difficulties learning how all this works actually run a manual initdb prior to beginning their experimentation. I suspect the percentage is fairly low. Doing away with "the default database for psql is one named after the user" seems like it would be more widely applicable. I for one tend to name things after what they do, or are used for, and thus have never benefited from this particular design decision. I suppose it would be too big a change to have psql try the username and then fallback to postgres on failure? -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing binaries
On Tue, Mar 21, 2017 at 10:43 AM, David Steele wrote: > I suppose it would be too big a change to have psql try the username and > then fallback to postgres on failure? It's not so much that the change would be too big as that the resulting semantics would be confusing, at least IMHO. Imagine: 1. Person A logs into the service account, runs psql, starts creating stuff in postgres DB without realizing it. 2. Person B logs into the service account, runs psql -l, sees that the usual DB name is not there, runs createdb. 3. Person A reruns psql and says "where'd all my stuff go?". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields
Andres Freund writes: > On 2017-03-21 07:22:57 +0100, Fabien COELHO wrote: >> I've been annoyed by these stupid functions and forgetting to update them >> since I run into them while trying to fix an issue in pg_stat_statement some >> time ago. >> >> I've started to develop a perl script to generate most of them from headers. >> It is not done yet, but it looks that it can work in the end with limited >> effort. Currently it works for copy & equal. > It'd have to do out/read as well imo. Yeah. A partial solution is pretty much useless. Even with out/read support as well, I'm not sure it's not useless, because you'd still have to remember to consider places like expression_tree_walker and expression_tree_mutator. Those are not really amenable to automatic generation AFAICS, because they have to understand the actual semantics of each field. It's conceivable that you could get somewhere if the starting point were some marked-up representation of every node type's field list, rather than just a C declaration. (IOW, include/nodes/*.h would become generated files as well.) But really, isn't that just moving the error locale from "forgetting to update equalfuncs.c" to "forgetting to include the correct marker keywords"? And people would have to learn about this generator tool and its input language. In the end there's very little substitute for checking every reference to a struct type when you add/modify one of its fields. grep is your friend. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields
On Tue, Mar 21, 2017 at 10:56 AM, Tom Lane wrote: > Andres Freund writes: >> On 2017-03-21 07:22:57 +0100, Fabien COELHO wrote: >>> I've been annoyed by these stupid functions and forgetting to update them >>> since I run into them while trying to fix an issue in pg_stat_statement some >>> time ago. >>> >>> I've started to develop a perl script to generate most of them from headers. >>> It is not done yet, but it looks that it can work in the end with limited >>> effort. Currently it works for copy & equal. > >> It'd have to do out/read as well imo. > > Yeah. A partial solution is pretty much useless. Even with out/read > support as well, I'm not sure it's not useless, because you'd still > have to remember to consider places like expression_tree_walker and > expression_tree_mutator. Those are not really amenable to automatic > generation AFAICS, because they have to understand the actual semantics > of each field. Well, let's not move the goalposts into the outer solar system. There are plenty of changes to node structure that don't require expression_tree_walker or expression_tree_mutator to be touched at all. Expression nodes aren't touched all that frequently compared to path, plan, etc. nodes. IMHO, what would be a lot more useful than something that generates {read,equal,copy,out}funcs.c automatically would be something that just checks them for trivial errors of omission. For example, if you read a list of structure members from the appropriate header files and cross-checked it against the list of structure members mentioned in the body of a copy/equal/read/outfunc, you'd catch probably 80% of the mistakes people make right there. If you could also check for a copy/read/equal/outfunc that ought to have been present but was omitted entirely, that'd probably up it to 90%. The idea would be that if you added a field that wasn't supposed to be copied, you'd have to add something to copyfuncs.c that said, e.g. /* NOTCOPIED: mymember */ ...and the checking script would ignore things so flagged. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Faster Expression Processing v4
Andres Freund writes: > On 2017-03-20 16:06:27 -0400, Tom Lane wrote: >> ... is there a reason why resultnum for EEOP_ASSIGN_* steps is declared >> size_t and not just int? Since it's an array index, and one that >> certainly can't be bigger than AttrNumber, that seems rather confusing. > Not that I can see, no. I guess I might have "overcompensated" when > changing it from AttrNumber - AttrNumber isn't a good idea because that > needs an extra move-zero-extend, because 16bit indexing isn't that well > supported on x86. But that doesn't mean it should be a 64bit number - > to the contrary actually. OK, will fix in the edits I'm working on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Thank you for your suggestions, do not hesitate to ask any questions, concurrency and GIN both are very interesting topics. I had a look on patch and found some issue. Look at ginvacuum.c around line 387, function ginVacuumPostingTreeLeaves(): /* * All subtree is empty - just return TRUE to indicate that parent must * do a cleanup. Unless we are ROOT an there is way to go upper. */ if(isChildHasVoid && !isAnyNonempy && !isRoot) return TRUE; if(isChildHasVoid) { ... ginScanToDelete(gvs, blkno, TRUE, &root, InvalidOffsetNumber); } In first 'if' clause I see !isRoot, so second if and corresponding ginScanToDelete() could be called only for root in posting tree. If so, it seems to me, it wasn't a good idea to move ginScanToDelete() from ginVacuumPostingTree() and patch should just remove lock for cleanup over ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page then lock the whole posting tree to do ginScanToDelete() work. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] kNN for SP-GiST
Hi Nikita, On 3/9/17 8:52 AM, Alexander Korotkov wrote: I take a look to this patchset. My first notes are following. This thread has been idle for quite a while. Please respond and/or post a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Potential data loss of 2PC files
If that can happen, don't we have the same problem in many other places? Like, all the SLRUs? They don't fsync the directory either. Right, pg_commit_ts and pg_clog enter in this category. Implemented as attached. Is unlink() guaranteed to be durable, without fsyncing the directory? If not, then we need to fsync() the directory even if there are no files in it at the moment, because some might've been removed earlier in the checkpoint cycle. What is protection if pg crashes after unlimk() but before fsync()? Right, it's rather small window for such scenario, but isn't better to have another protection? Like WAL-logging of WAL segment removing... -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Sun, Mar 19, 2017 at 9:03 PM, Peter Geoghegan wrote: > On Sun, Mar 12, 2017 at 3:05 PM, Peter Geoghegan wrote: >> I attach my V9 of the patch. I came up some stuff for the design of >> resource management that I think meets every design goal that we have >> for shared/unified BufFiles: > > Commit 2609e91fc broke the parallel CREATE INDEX cost model. I should > now pass -1 as the index block argument to compute_parallel_worker(), > just as all callers that aren't parallel index scan do after that > commit. This issue caused V9 to never choose parallel CREATE INDEX > within nbtsort.c. There was also a small amount of bitrot. > > Attached V10 fixes this regression. I also couldn't resist adding a > few new assertions that I thought were worth having to buffile.c, plus > dedicated wait events for parallel tuplesort. And, I fixed a silly bug > added in V9 around where worker_wait() should occur. Some initial review comments: - * This code is moderately slow (~10% slower) compared to the regular - * btree (insertion) build code on sorted or well-clustered data. On - * random data, however, the insertion build code is unusable -- the - * difference on a 60MB heap is a factor of 15 because the random - * probes into the btree thrash the buffer pool. (NOTE: the above - * "10%" estimate is probably obsolete, since it refers to an old and - * not very good external sort implementation that used to exist in - * this module. tuplesort.c is almost certainly faster.) While I agree that the old comment is probably inaccurate, I don't think dropping it without comment in a patch to implement parallel sorting is the way to go. How about updating it to be more current as a separate patch? +/* Magic numbers for parallel state sharing */ +#define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA001) +#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA002) +#define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA003) 1, 2, and 3 would probably work just as well. The parallel infrastructure uses high-numbered values to avoid conflict with plan_node_id values, but this is a utility statement so there's no such problem. But it doesn't matter very much. + * Note: caller had better already hold some type of lock on the table and + * index. + */ +int +plan_create_index_workers(Oid tableOid, Oid indexOid) Caller should pass down the Relation rather than the Oid. That is better both because it avoids unnecessary work and because it more or less automatically avoids the problem mentioned in the note. Why is this being put in planner.c rather than something specific to creating indexes? Not sure that's a good idea. + * This should be called when workers have flushed out temp file buffers and + * yielded control to caller's process. Workers should hold open their + * BufFiles at least until the caller's process is able to call here and + * assume ownership of BufFile. The general pattern is that workers make + * available data from their temp files to one nominated process; there is + * no support for workers that want to read back data from their original + * BufFiles following writes performed by the caller, or any other + * synchronization beyond what is implied by caller contract. All + * communication occurs in one direction. All output is made available to + * caller's process exactly once by workers, following call made here at the + * tail end of processing. Thomas has designed a system for sharing files among cooperating processes that lacks several of these restrictions. With his system, it's still necessary for all data to be written and flushed by the writer before anybody tries to read it. But the restriction that the worker has to hold its BufFile open until the leader can assume ownership goes away. That's a good thing; it avoids the need for workers to sit around waiting for the leader to assume ownership of a resource instead of going away faster and freeing up worker slots for some other query, or moving on to some other computation. The restriction that the worker can't reread the data after handing off the file also goes away. The files can be read and written by any participant in any order, as many times as you like, with only the restriction that the caller must guarantee that data will be written and flushed from private buffers before it can be read. I don't see any reason to commit both his system and your system, and his is more general so I think you should use it. That would cut hundreds of lines from this patch with no real disadvantage that I can see -- including things like worker_wait(), which are only needed because of the shortcomings of the underlying mechanism. + * run. Parallel workers always use quicksort, however. Comment fails to mention a reason. +elog(LOG, "%d using " INT64_FORMAT " KB of memory for read buffers among %d input tapes", + state->worker, state->availMem / 1024, numInputTapes);
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On 2017-03-21 08:04:11 -0400, Robert Haas wrote: > On Tue, Mar 21, 2017 at 6:56 AM, Amit Kapila wrote: > >> Hmm, that test case isn't all that synthetic. It's just a single > >> column bulk update, which isn't anything all that crazy, and 5-10% > >> isn't nothing. > >> > >> I'm kinda surprised it made that much difference, though. > >> > > > > I think it is because heap_getattr() is not that cheap. We have > > noticed the similar problem during development of scan key push down > > work [1]. > > Yeah. So what's the deal with this? Is somebody working on figuring > out a different approach that would reduce this overhead? I think one reasonable thing would be to use slots here, and use slot_getsomeattrs(), with a pre-computed offset, for doing the deforming. Given that more than one place run into the issue with deforming cost via heap_*, that seems like something we're going to have to do. Additionally the patches I had for JITed deforming all integrated at the slot layer, so it'd be a good thing from that angle as well. Deforming all columns at once would also a boon for the accompanying index_getattr calls. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioned tables and relfilenode
On 16 March 2017 at 10:03, Amit Langote wrote: > On 2017/03/15 7:09, Robert Haas wrote: >> I think that eliding the Append node when there's only one child may >> be unsafe in the case where the child's attribute numbers are >> different from the parent's attribute numbers. I remember Tom making >> some comment about this when I was working on MergeAppend, although I >> no longer remember the specific details. > > Append node elision does not occur in the one-child case. With the patch: ... > create table q1 partition of q for values in (1); > explain select * from q; > QUERY PLAN > > Append (cost=0.00..35.50 rows=2550 width=4) >-> Seq Scan on q1 (cost=0.00..35.50 rows=2550 width=4) > (2 rows) > > Maybe that should be done, but this patch doesn't implement that. Robert raises the possible problem that removing the Append wouldn't work when the parent and child attribute numbers don't match. Surely that never happens with partitions, by definition? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields
> > IMHO, what would be a lot more useful than something that generates > {read,equal,copy,out}funcs.c automatically would be something that > just checks them for trivial errors of omission. For example, if you > read a list of structure members from the appropriate header files and > cross-checked it against the list of structure members mentioned in > the body of a copy/equal/read/outfunc, you'd catch probably 80% of the > mistakes people make right there. If you could also check for a > copy/read/equal/outfunc that ought to have been present but was > omitted entirely, that'd probably up it to 90%. > > +1 The work on nodes maintenance is not too much, but some smart check can be nice. Regards Pavel > The idea would be that if you added a field that wasn't supposed to be > copied, you'd have to add something to copyfuncs.c that said, e.g. > > /* NOTCOPIED: mymember */ > > ...and the checking script would ignore things so flagged. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On 2017-03-21 19:49:07 +0530, Pavan Deolasee wrote: > On Tue, Mar 21, 2017 at 6:55 PM, Robert Haas wrote: > > > > > I think that very wide columns and highly indexed tables are not > > particularly unrealistic, nor do I think updating all the rows is > > particularly unrealistic. > > > Ok. But those who update 10M rows in a single transaction, would they > really notice 5-10% variation? Yes. It's very common in ETL, and that's quite performance sensitive. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] segfault in hot standby for hash indexes
On Tue, Mar 21, 2017 at 4:00 AM, Ashutosh Sharma wrote: > Hi Jeff, > > On Tue, Mar 21, 2017 at 1:54 PM, Amit Kapila > wrote: > > On Tue, Mar 21, 2017 at 1:28 PM, Jeff Janes > wrote: > >> Against an unmodified HEAD (17fa3e8), I got a segfault in the hot > standby. > >> > > > > I think I see the problem in hash_xlog_vacuum_get_latestRemovedXid(). > > It seems to me that we are using different block_id for registering > > the deleted items in xlog XLOG_HASH_VACUUM_ONE_PAGE and then using > > different block_id for fetching those items in > > hash_xlog_vacuum_get_latestRemovedXid(). So probably matching those > > will fix this issue (instead of fetching block number and items from > > block_id 1, we should use block_id 0). > > > > Thanks for reporting this issue. As Amit said, it is happening due to > block_id mismatch. Attached is the patch that fixes the same. I > apologise for such a silly mistake. Please note that I was not able > to reproduce the issue on my local machine using the test script you > shared. Could you please check with the attached patch if you are > still seeing the issue. Thanks in advance. > Hi Ashutosh, I can confirm that that fixes the seg faults for me. Did you mean you couldn't reproduce the problem in the first place, or that you could reproduce it and now the patch fixes it? If the first of those, I forget to say you do have to wait for hot standby to reach a consistency and open for connections, and then connect to the standby ("psql -p 9874"), before the seg fault will be triggered. But, there are places where hash_xlog_vacuum_get_latestRemovedXid diverges from btree_xlog_delete_get_latestRemovedXid, which I don't understand the reason for the divergence. Is there a reason we dropped the PANIC if we have not reached consistency? That is a case which should never happen, but it seems worth preserving the PANIC. And why does this code get 'unused' from XLogRecGetBlockData(record, 0, &len), while the btree code gets it from xlrec? Is that because the record being replayed is structured differently between btree and hash, or is there some other reason? Thanks, Jeff
Re: [HACKERS] Partitioned tables and relfilenode
On Tue, Mar 21, 2017 at 12:19 PM, Simon Riggs wrote: > On 16 March 2017 at 10:03, Amit Langote wrote: >> On 2017/03/15 7:09, Robert Haas wrote: > >>> I think that eliding the Append node when there's only one child may >>> be unsafe in the case where the child's attribute numbers are >>> different from the parent's attribute numbers. I remember Tom making >>> some comment about this when I was working on MergeAppend, although I >>> no longer remember the specific details. >> >> Append node elision does not occur in the one-child case. With the patch: > ... >> create table q1 partition of q for values in (1); >> explain select * from q; >> QUERY PLAN >> >> Append (cost=0.00..35.50 rows=2550 width=4) >>-> Seq Scan on q1 (cost=0.00..35.50 rows=2550 width=4) >> (2 rows) >> >> Maybe that should be done, but this patch doesn't implement that. > > Robert raises the possible problem that removing the Append wouldn't > work when the parent and child attribute numbers don't match. Surely > that never happens with partitions, by definition? No, the attribute numbers don't have to match. This decision was made a long time ago, and there have been a whole bunch of followup commits since the original partitioning patch that were dedicated to fixing up cases where that wasn't working properly in the original commit. It seems superficially attractive to require the attribute numbers to match, but it creates some really unpleasant cases. For example, suppose a user tries to creates a partitioned table, drops a column, then creates a standalone table which matches the apparent column list of the partitioned table, then tries to attach it as a partition. ERROR: the columns you previously dropped from the parent that you can't see and don't know about aren't the same as the ones dropped from the standalone table you're trying to attach as a partition DETAIL: Try recreating your proposed new partition with a pass-by-value column of width 8 after the third column, and then dropping that column before trying to attach it as a partition. HINT: Abandon all hope, ye who enter here. Not cool with that. The decision not to require the attribute numbers to match doesn't necessarily mean we can't get rid of the Append node, though. First of all, in a lot of practical cases the attribute numbers will all match. Second, if they don't, the most that would be required is a projection step, which could usually be done without a separate node because most nodes are projection-capable. And maybe not even that much is needed; I'd have to go back and look at what Tom was worried about the last time this came up. (Hmm, maybe the problem had to do with varnos matching, rather then attribute numbers?) Another and independent problem with eliding the Append node is that, if we did that, we'd still have to guarantee that the parent relation corresponding to the Append node got locked somehow. Otherwise, we'll be accessing the tuple routing information for a table on which we don't have a lock. That's probably a solvable problem, too, but it hasn't been solved yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we create a new roadmap page for development?
On Tue, Mar 21, 2017 at 2:36 AM, Tsunakawa, Takayuki wrote: > I'd like to share our roadmap for PostgreSQL development, as other companies > and individuals do in the following page. But this page is for PostgreSQL 10. > > PostgreSQL10 Roadmap > https://wiki.postgresql.org/wiki/PostgreSQL10_Roadmap > > Should I create a page for PostgreSQL 11 likewise? Or, do you want a more > stable page named "PostgreSQL Roadmap" instead of "PostgreSQL11 Roadmap" and > attach the target version to each feature as in "Feature X (V11)", so that we > can represent more longer-term goals? I think a page that's just called PostgreSQL Roadmap will get out of date pretty quickly. Ultimately it'll end up like the TODO list, which is not really a list of things TO DO. The advantage of the version-specific pages is that old information kind of ages its way out of the system. > And, why don't we add a link to the above roadmap page in the "Development > information" page? > > Development information > https://wiki.postgresql.org/wiki/Development_information I'm sure nobody will object to you doing that. It's a wiki, and intended to be edited. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 09:25:49AM -0400, Robert Haas wrote: > On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee > > TBH I see many artificial scenarios here. It will be very useful if he can > > rerun the query with some of these restrictions lifted. I'm all for > > addressing whatever we can, but I am not sure if this test demonstrates a > > real world usage. > > That's a very fair point, but if these patches - or some of them - are > going to get committed then these things need to get discussed. Let's > not just have nothing-nothing-nothing giant unagreed code drop. First, let me say I love this feature for PG 10, along with multi-variate statistics. However, not to be a bummer on this, but the persistent question I have is whether we are locking ourselves into a feature that can only do _one_ index-change per WARM chain before a lazy vacuum is required. Are we ever going to figure out how to do more changes per WARM chain in the future, and is our use of so many bits for this feature going to restrict our ability to do that in the future. I know we have talked about it, but not recently, and if everyone else is fine with it, I am too, but I have to ask these questions. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog
Hi Michael, On 3/10/17 9:15 AM, Peter Eisentraut wrote: On 3/9/17 17:03, Michael Paquier wrote: Having something like --limit-retained-segments partially addresses it, as long as there is a way to define an automatic mode, based on statvfs() obviously. But that is not portable/usable enough, as we have determined, I think. Have you looked into using inotify for implementing your use case? This thread has been idle for quite a while. Please respond and/or post a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 12:49 PM, Bruce Momjian wrote: > On Tue, Mar 21, 2017 at 09:25:49AM -0400, Robert Haas wrote: >> On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee >> > TBH I see many artificial scenarios here. It will be very useful if he can >> > rerun the query with some of these restrictions lifted. I'm all for >> > addressing whatever we can, but I am not sure if this test demonstrates a >> > real world usage. >> >> That's a very fair point, but if these patches - or some of them - are >> going to get committed then these things need to get discussed. Let's >> not just have nothing-nothing-nothing giant unagreed code drop. > > First, let me say I love this feature for PG 10, along with > multi-variate statistics. > > However, not to be a bummer on this, but the persistent question I have > is whether we are locking ourselves into a feature that can only do > _one_ index-change per WARM chain before a lazy vacuum is required. Are > we ever going to figure out how to do more changes per WARM chain in the > future, and is our use of so many bits for this feature going to > restrict our ability to do that in the future. > > I know we have talked about it, but not recently, and if everyone else > is fine with it, I am too, but I have to ask these questions. I think that's a good question. I previously expressed similar concerns. On the one hand, it's hard to ignore the fact that, in the cases where this wins, it already buys us a lot of performance improvement. On the other hand, as you say (and as I said), it eats up a lot of bits, and that limits what we can do in the future. On the one hand, there is a saying that a bird in the hand is worth two in the bush. On the other hand, there is also a saying that one should not paint oneself into the corner. I'm not sure we've had any really substantive discussion of these issues. Pavan's response to my previous comments was basically "well, I think it's worth it", which is entirely reasonable, because he presumably wouldn't have written the patch that way if he thought it sucked. But it might not be the only opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make async slave to wait for lsn to be replayed
Hi Ivan, On 3/12/17 10:20 PM, Thomas Munro wrote: On Fri, Mar 10, 2017 at 1:49 AM, Ivan Kartyshov wrote: Here I attached rebased patch waitlsn_10dev_v3 (core feature) I will leave the choice of implementation (core/contrib) to the discretion of the community. Will be glad to hear your suggestion about syntax, code and patch. Hi Ivan, Here is some feedback based on a first read-through of the v4 patch. I haven't tested it yet. This thread has been idle for over a week. Please respond and/or post a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 10:04 AM, Robert Haas wrote: > I think that's a good question. I previously expressed similar > concerns. On the one hand, it's hard to ignore the fact that, in the > cases where this wins, it already buys us a lot of performance > improvement. On the other hand, as you say (and as I said), it eats > up a lot of bits, and that limits what we can do in the future. On > the one hand, there is a saying that a bird in the hand is worth two > in the bush. On the other hand, there is also a saying that one > should not paint oneself into the corner. Are we really saying that there can be no incompatible change to the on-disk representation for the rest of eternity? I can see why that's something to avoid indefinitely, but I wouldn't like to rule it out. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]
Hi, On 3/13/17 3:25 AM, Jeevan Chalke wrote: I have reviewed this patch further and here are my comments: This thread has been idle for over a week. Please respond and/or post a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 1:08 PM, Peter Geoghegan wrote: > On Tue, Mar 21, 2017 at 10:04 AM, Robert Haas wrote: >> I think that's a good question. I previously expressed similar >> concerns. On the one hand, it's hard to ignore the fact that, in the >> cases where this wins, it already buys us a lot of performance >> improvement. On the other hand, as you say (and as I said), it eats >> up a lot of bits, and that limits what we can do in the future. On >> the one hand, there is a saying that a bird in the hand is worth two >> in the bush. On the other hand, there is also a saying that one >> should not paint oneself into the corner. > > Are we really saying that there can be no incompatible change to the > on-disk representation for the rest of eternity? I can see why that's > something to avoid indefinitely, but I wouldn't like to rule it out. Well, I don't want to rule it out either, but if we do a release to which you can't pg_upgrade, it's going to be really painful for a lot of users. Many users can't realistically upgrade using pg_dump, ever. So they'll be stuck on the release before the one that breaks compatibility for a very long time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] free space map and visibility map
On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas wrote: > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes wrote: >> Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then >> can't leave the block as all visible or all frozen). I think the issue is >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly, >> that neither of those ever update the FSM, regardless of FPI? > > Yes, updates to the FSM are never logged. Forcing replay of > HEAP2_FREEZE_PAGE to update the FSM might be a good idea. > I think I was missing something. I imaged your situation is that FPI is replayed during crash recovery after the crashed server vacuums the page and marked it as all-frozen. But this situation is also resolved by that solution. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Generic type subscripting
Hi Dmitry, On 3/14/17 7:10 PM, Tom Lane wrote: Dmitry Dolgov <9erthali...@gmail.com> writes: [ generic_type_subscription_v7.patch ] I looked through this a bit. This thread has been idle for over a week. Please respond and/or post a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 01:04:14PM -0400, Robert Haas wrote: > > I know we have talked about it, but not recently, and if everyone else > > is fine with it, I am too, but I have to ask these questions. > > I think that's a good question. I previously expressed similar > concerns. On the one hand, it's hard to ignore the fact that, in the > cases where this wins, it already buys us a lot of performance > improvement. On the other hand, as you say (and as I said), it eats > up a lot of bits, and that limits what we can do in the future. On > the one hand, there is a saying that a bird in the hand is worth two > in the bush. On the other hand, there is also a saying that one > should not paint oneself into the corner. > > I'm not sure we've had any really substantive discussion of these > issues. Pavan's response to my previous comments was basically "well, > I think it's worth it", which is entirely reasonable, because he > presumably wouldn't have written the patch that way if he thought it > sucked. But it might not be the only opinion. Early in the discussion we talked about allowing multiple changes per WARM chain if they all changed the same index and were in the same direction so there were no duplicates, but it was complicated. There was also discussion about checking the index during INSERT/UPDATE to see if there was a duplicate. However, those ideas never led to further discussion. I know the current patch yields good results, but only on a narrow test case, so I am not ready to just stop asking questions based the opinion of the author or test results alone. If someone came to me and said, "We have thought about allowing more than one index change per WARM chain, and if we can ever do it, it will probably be done this way, and we have the bits for it," I would be more comfortable. One interesting side-issue is that indirect indexes have a similar problem with duplicate index entries, and there is no plan on how to fix that either. I guess I just don't feel we have explored the duplicate-index-entry problem enough for me to be comfortable. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make async slave to wait for lsn to be replayed
On Sun, Mar 12, 2017 at 10:20 PM, Thomas Munro wrote: > Maybe someone can think of a clever way for an extension to insert a > wait for a user-supplied LSN *before* acquiring a snapshot so it can > work for the higher levels, or maybe the hooks should go into core > PostgreSQL so that the extension can exist as an external project not > requiring a patched PostgreSQL installation, or maybe this should be > done with new core syntax that extends transaction commands. Do other > people have views on this? IMHO, trying to do this using a function-based interface is a really bad idea for exactly the reasons you mention. I don't see why we'd resist the idea of core syntax here; transactions are a core part of PostgreSQL. There is, of course, the question of whether making LSNs such a user-visible thing is a good idea in the first place, but that's a separate question from issue of what syntax for such a thing is best. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Tue, Mar 21, 2017 at 01:14:00PM -0400, Robert Haas wrote: > On Tue, Mar 21, 2017 at 1:08 PM, Peter Geoghegan wrote: > > On Tue, Mar 21, 2017 at 10:04 AM, Robert Haas wrote: > >> I think that's a good question. I previously expressed similar > >> concerns. On the one hand, it's hard to ignore the fact that, in the > >> cases where this wins, it already buys us a lot of performance > >> improvement. On the other hand, as you say (and as I said), it eats > >> up a lot of bits, and that limits what we can do in the future. On > >> the one hand, there is a saying that a bird in the hand is worth two > >> in the bush. On the other hand, there is also a saying that one > >> should not paint oneself into the corner. > > > > Are we really saying that there can be no incompatible change to the > > on-disk representation for the rest of eternity? I can see why that's > > something to avoid indefinitely, but I wouldn't like to rule it out. > > Well, I don't want to rule it out either, but if we do a release to > which you can't pg_upgrade, it's going to be really painful for a lot > of users. Many users can't realistically upgrade using pg_dump, ever. > So they'll be stuck on the release before the one that breaks > compatibility for a very long time. Right. If we weren't setting tuple and tid bits we could imrpove it easily in PG 11, but if we use them for a single-change WARM chain for PG 10, we might need bits that are not available to improve it later. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioned tables and relfilenode
On 21 March 2017 at 16:33, Robert Haas wrote: > On Tue, Mar 21, 2017 at 12:19 PM, Simon Riggs wrote: >> On 16 March 2017 at 10:03, Amit Langote >> wrote: >>> On 2017/03/15 7:09, Robert Haas wrote: >> I think that eliding the Append node when there's only one child may be unsafe in the case where the child's attribute numbers are different from the parent's attribute numbers. I remember Tom making some comment about this when I was working on MergeAppend, although I no longer remember the specific details. >>> >>> Append node elision does not occur in the one-child case. With the patch: >> ... >>> create table q1 partition of q for values in (1); >>> explain select * from q; >>> QUERY PLAN >>> >>> Append (cost=0.00..35.50 rows=2550 width=4) >>>-> Seq Scan on q1 (cost=0.00..35.50 rows=2550 width=4) >>> (2 rows) >>> >>> Maybe that should be done, but this patch doesn't implement that. >> >> Robert raises the possible problem that removing the Append wouldn't >> work when the parent and child attribute numbers don't match. Surely >> that never happens with partitions, by definition? > > No, the attribute numbers don't have to match. This decision was made > a long time ago, and there have been a whole bunch of followup commits > since the original partitioning patch that were dedicated to fixing up > cases where that wasn't working properly in the original commit. It > seems superficially attractive to require the attribute numbers to > match, but it creates some really unpleasant cases. For example, > suppose a user tries to creates a partitioned table, drops a column, > then creates a standalone table which matches the apparent column list > of the partitioned table, then tries to attach it as a partition. > > ERROR: the columns you previously dropped from the parent that you > can't see and don't know about aren't the same as the ones dropped > from the standalone table you're trying to attach as a partition > DETAIL: Try recreating your proposed new partition with a > pass-by-value column of width 8 after the third column, and then > dropping that column before trying to attach it as a partition. > HINT: Abandon all hope, ye who enter here. > > Not cool with that. Thanks for the explanation. > The decision not to require the attribute numbers to match doesn't > necessarily mean we can't get rid of the Append node, though. First > of all, in a lot of practical cases the attribute numbers will all > match. Second, if they don't, the most that would be required is a > projection step, which could usually be done without a separate node > because most nodes are projection-capable. And maybe not even that > much is needed; I'd have to go back and look at what Tom was worried > about the last time this came up. (Hmm, maybe the problem had to do > with varnos matching, rather then attribute numbers?) There used to be some code there to fix them up, not sure where that went. > Another and independent problem with eliding the Append node is that, > if we did that, we'd still have to guarantee that the parent relation > corresponding to the Append node got locked somehow. Otherwise, we'll > be accessing the tuple routing information for a table on which we > don't have a lock. That's probably a solvable problem, too, but it > hasn't been solved yet. Hmm, why would we need to access tuple routing information? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
On Wed, Mar 15, 2017 at 4:50 PM, Amit Kapila wrote: > On Thu, Mar 9, 2017 at 10:21 PM, Masahiko Sawada > wrote: >> On Wed, Mar 8, 2017 at 1:43 AM, Peter Geoghegan wrote: >>> On Sat, Mar 4, 2017 at 1:30 AM, Amit Kapila wrote: > While I can't see this explained anywhere, I'm > pretty sure that that's supposed to be impossible, which this patch > changes. > What makes you think that patch will allow pg_class.relfrozenxid to be advanced past opaque->btpo.xact which was previously not possible? >>> >>> By not reliably recycling pages in a timely manner, we won't change >>> anything about the dead page. It just sticks around. This is mostly >>> fine, but we still need VACUUM to be able to reason about it (to >>> determine if it is in fact recyclable), which is what I'm concerned >>> about breaking here. It still needs to be *possible* to recycle any >>> recyclable page at some point (e.g., when we find it convenient). >>> >>> pg_class.relfrozenxid is InvalidTransactionId for indexes because >>> indexes generally don't store XIDs. This is the one exception that I'm >>> aware of, presumably justified by the fact that it's only for >>> recyclable pages anyway, and those are currently *guaranteed* to get >>> recycled quickly. In particular, they're guaranteed to get recycled by >>> the next VACUUM. They may be recycled in the course of anti-wraparound >>> VACUUM, even if VACUUM has no garbage tuples to kill (even if we only >>> do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the >>> case that this patch proposes to have us skip touching indexes for. >>> >> >> To prevent this, I think we need to not skip the lazy_cleanup_index >> when ant-wraparound vacuum (aggressive = true) even if the number of >> scanned pages is less then the threshold. This can ensure that >> pg_class.relfrozenxid is not advanced past opaque->bp.xact with >> minimal pain. Even if the btree dead page is leaved, the subsequent >> modification makes garbage on heap and then autovauum recycles that >> page while index vacuuming(lazy_index_vacuum). >> > > What about if somebody does manual vacuum and there are no garbage > tuples to clean, won't in that case also you want to avoid skipping > the lazy_cleanup_index? Yes, in that case lazy_cleanup_index will be skipped. > Another option could be to skip updating the > relfrozenxid if we have skipped the index cleanup. This could make anti-wraparound VACUUM occurs at high frequency and we cannot skip lazy_clean_index when aggressive vacuum after all. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
Hi, On 3/15/17 9:50 AM, Amit Kapila wrote: What about if somebody does manual vacuum and there are no garbage tuples to clean, won't in that case also you want to avoid skipping the lazy_cleanup_index? Another option could be to skip updating the relfrozenxid if we have skipped the index cleanup. This thread has been idle for six days. Please respond and/or post a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Measuring replay lag
Hi Thomas, On 3/15/17 8:38 PM, Simon Riggs wrote: On 16 March 2017 at 08:02, Thomas Munro wrote: I agree that these states exist, but we disagree on what 'lag' really means, or, rather, which of several plausible definitions would be the most useful here. My proposal is that the *_lag columns should always report how long it took for recently written, flushed and applied WAL to be written, flushed and applied (and for the primary to know about it). By this definition, sent LSN = applied LSN is not a special case: we simply report how long that LSN took to be written, flushed and applied. Your proposal is that the *_lag columns should report how far in the past the standby is at each of the three stages with respect to the current end of WAL. By this definition when sent LSN = applied LSN we are currently in the 'A' state meaning 'caught up' and should show 00:00:00. I accept your proposal for how we handle these, on condition that you write up some docs that explain the subtle difference between the two, so we can just show people the URL. That needs to explain clearly the difference in an impartial way between "what is the most recent lag measurement" and "how long until we are caught up" as possible intrepretations of these values. Thanks. This thread has been idle for six days. Please respond and/or post a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: recursive json_populate_record()
On 3/16/17 11:54 AM, David Steele wrote: On 2/1/17 12:53 AM, Michael Paquier wrote: On Thu, Jan 26, 2017 at 6:49 AM, Tom Lane wrote: Nikita Glukhov writes: On 25.01.2017 23:58, Tom Lane wrote: I think you need to take a second look at the code you're producing and realize that it's not so clean either. This extract from populate_record_field, for example, is pretty horrid: But what if we introduce some helper macros like this: #define JsValueIsNull(jsv) \ ((jsv)->is_json ? !(jsv)->val.json.str \ : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull) #define JsValueIsString(jsv) \ ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \ : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString) Yeah, I was wondering about that too. I'm not sure that you can make a reasonable set of helper macros that will fix this, but if you want to try, go for it. BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have to go back to the manual every darn time to convince myself whether that means (a?b:c)||d or a?b:(c||d). It's better not to rely on the reader (... or the author) having memorized C's precedence rules in quite that much detail. Extra parens help. Moved to CF 2017-03 as discussion is going on and more review is needed on the last set of patches. The current patches do not apply cleanly at cccbdde: $ git apply ../other/0001-introduce-JsonContainerXxx-macros-v04.patch error: patch failed: src/backend/utils/adt/jsonb_util.c:328 error: src/backend/utils/adt/jsonb_util.c: patch does not apply error: patch failed: src/backend/utils/adt/jsonfuncs.c:1266 error: src/backend/utils/adt/jsonfuncs.c: patch does not apply error: patch failed: src/include/utils/jsonb.h:218 error: src/include/utils/jsonb.h: patch does not apply In addition, it appears a number of suggestions have been made by Tom that warrant new patches. Marked "Waiting on Author". This thread has been idle for months since Tom's review. The submission has been marked "Returned with Feedback". Please feel free to resubmit to a future commitfest. Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send
On 3/16/17 11:56 AM, David Steele wrote: My recommendation is that we mark this patch "Returned with Feedback" to allow you time to test and refine the patch. You can resubmit once it is ready. This submission has been marked "Returned with Feedback". Please feel free to resubmit to a future commitfest. Thanks, -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)
On Mon, Mar 20, 2017 at 1:38 AM, Craig Ringer wrote: > On 14 March 2017 at 19:57, Robert Haas wrote: >> On Mon, Mar 13, 2017 at 10:22 PM, Craig Ringer wrote: >>> I'll introduce a new LWLock, ClogTruncationLock, which will be held >>> from when we advance the new clogOldestXid field through to when clog >>> truncation completes. >> >> +1. > > OK, here's the revised patch. Relevant comments added where needed. > > It still changes the xlog record for clog truncation to carry the new > xid, but I've left advancing oldestXid until the checkpoint as is > currently the case, and only advanced oldestClogXid when we replay > clog truncation. /me smacks forehead. Actually, it should be CLogTruncationLock, with a capital L, for consistency with CLogControlLock. The new lock needs to be added to the table in monitoring.sgml. I don't think the new header comments in TransactionIdDidCommit and TransactionIdDidAbort are super-clear. I'm not sure you're going to be able to explain it there in a reasonable number of words, but I think that speaking of "testing against oldestClogXid" will leave people wondering what exactly that means. Maybe just write "caller is responsible for ensuring that the clog records covering XID being looked up can't be truncated away while the lookup is in progress", and then leave the bit about CLogTruncationLock to be explained by the callers that do that. Or you could drop these comments entirely. Overall, though, I think that 0001 looks far better than any previous iteration. It's simple. It looks safe. It seems unlikely to break anything that works now. Woo hoo! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication apply to run with sync commit off by default
On Mon, Mar 20, 2017 at 7:56 PM, Petr Jelinek wrote: > On 18/03/17 13:31, Petr Jelinek wrote: >> On 07/03/17 06:23, Petr Jelinek wrote: >>> there has been discussion at the logical replication initial copy thread >>> [1] about making apply work with sync commit off by default for >>> performance reasons and adding option to change that per subscription. >>> >>> Here I propose patch to implement this - it adds boolean column >>> subssynccommit to pg_subscription catalog which decides if >>> synchronous_commit should be off or local for apply. And it adds >>> SYNCHRONOUS_COMMIT = boolean to the list of WITH options for CREATE and >>> ALTER SUBSCRIPTION. When nothing is specified it will set it to false. >>> >>> The patch is built on top of copy patch currently as there are conflicts >>> between the two and this helps a bit with testing of copy patch. >>> >>> [1] >>> https://www.postgresql.org/message-id/CA+TgmoY7Lk2YKArcp4O=Qu=xoor8j71mad1oteojawmuje3...@mail.gmail.com >>> >> >> I rebased this patch against recent changes and the latest version of >> copy patch. > > And another rebase after pg_dump tests commit. +else if (strcmp(defel->defname, "nosynchronous_commit") == 0 && synchronous_commit) +{ +if (synchronous_commit_given) +ereport(ERROR, +(errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + +synchronous_commit_given = true; +*synchronous_commit = !defGetBoolean(defel); +} Uh, what's this nosynchronous_commit thing? + local otherwise to false. The + default value is false independently of the default + synchronous_commit setting for the instance. This phrasing isn't very clear or accurate, IMHO. I'd say something like "The value of this parameter overrides the synchronous_commit setting. The default value is false." And I'd make the word synchronous_commit in that sentence a link to the GUC, so that it's absolutely unmistakable what we mean by "the synchronous_commit setting". /* + * We need to make new connection to new slot if slot name has changed + * so exit here as well if that's the case. + */ +if (strcmp(newsub->slotname, MySubscription->slotname) != 0) +{ +ereport(LOG, +(errmsg("logical replication worker for subscription \"%s\" will " +"restart because the replication slot name was changed", +MySubscription->name))); + +walrcv_disconnect(wrconn); +proc_exit(0); +} Looks unrelated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Hi, Teodor! 2017-03-21 20:32 GMT+05:00 Teodor Sigaev : > I had a look on patch That's great, thanks! > > /* > * All subtree is empty - just return TRUE to indicate that parent > must > * do a cleanup. Unless we are ROOT an there is way to go upper. > */ > > if(isChildHasVoid && !isAnyNonempy && !isRoot) > return TRUE; > > if(isChildHasVoid) > { > ... > ginScanToDelete(gvs, blkno, TRUE, &root, > InvalidOffsetNumber); > } > > In first 'if' clause I see !isRoot, so second if and corresponding > ginScanToDelete() could be called only for root in posting tree. No, second conditional code will be called for any subtree, which contains totally empty subtree. That check !isRoot covers case when the entire posting tree should be erased: we cannot just quit out of recursive cleanup, we have to make a scan here, starting from root. Probably, variable isChildHasVoid has a bit confusing name. This flag indicates that some subtree: 1. Had empty pages 2. Did not bother deleting them, because there is a chance that it is a part of a bigger empty subtree. May be it'd be better to call the variable "someChildIsVoidSubtree". >just remove lock for cleanup over ginVacuumPostingTreeLeaves() and if it >returns that tree contains empty page then lock the whole posting tree to do >ginScanToDelete() work. It is, indeed, viable approach. But currently proposed patch is capable of dealing with small page deletes without much of locking fuss, even in 4-5 level trees. How do you think, which way should we take? Best regards, Andrey Borodin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers