Re: [HACKERS] extend pgbench expressions with functions
v40 is yet another rebase. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index c6d1454..4ceddae 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -815,9 +815,10 @@ pgbench options dbname - Sets variable varname to an integer value calculated + Sets variable varname to a value calculated from expression. The expression may contain integer constants such as 5432, + double constants such as 3.14159, references to variables :variablename, unary operators (+, -) and binary operators (+, -, *, /, @@ -830,7 +831,7 @@ pgbench options dbname Examples: \set ntellers 10 * :scale -\set aid (1021 * :aid) % (10 * :scale) + 1 +\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1 @@ -850,66 +851,35 @@ pgbench options dbname - By default, or when uniform is specified, all values in the - range are drawn with equal probability. Specifying gaussian - or exponential options modifies this behavior; each - requires a mandatory parameter which determines the precise shape of the - distribution. - + + + + \setrandom n 1 10 or \setrandom n 1 10 uniform + is equivalent to \set n random(1, 10) and uses a uniform + distribution. + + - - For a Gaussian distribution, the interval is mapped onto a standard - normal distribution (the classical bell-shaped Gaussian curve) truncated - at -parameter on the left and +parameter - on the right. - Values in the middle of the interval are more likely to be drawn. - To be precise, if PHI(x) is the cumulative distribution - function of the standard normal distribution, with mean mu - defined as (max + min) / 2.0, with - - f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / -(2.0 * PHI(parameter) - 1.0) - - then value i between min and - max inclusive is drawn with probability: - f(i + 0.5) - f(i - 0.5). - Intuitively, the larger parameter, the more - frequently values close to the middle of the interval are drawn, and the - less frequently values close to the min and - max bounds. About 67% of values are drawn from the - middle 1.0 / parameter, that is a relative - 0.5 / parameter around the mean, and 95% in the middle - 2.0 / parameter, that is a relative - 1.0 / parameter around the mean; for instance, if - parameter is 4.0, 67% of values are drawn from the - middle quarter (1.0 / 4.0) of the interval (i.e. from - 3.0 / 8.0 to 5.0 / 8.0) and 95% from - the middle half (2.0 / 4.0) of the interval (second and - third quartiles). The minimum parameter is 2.0 for - performance of the Box-Muller transform. - + + +\setrandom n 1 10 exponential 3.0 is equivalent to +\set n random_exponential(1, 10, 3.0) and uses an +exponential distribution. + + - - For an exponential distribution, parameter - controls the distribution by truncating a quickly-decreasing - exponential distribution at parameter, and then - projecting onto integers between the bounds. - To be precise, with - -f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter)) - - Then value i between min and - max inclusive is drawn with probability: - f(x) - f(x + 1). - Intuitively, the larger parameter, the more - frequently values close to min are accessed, and the - less frequently values close to max are accessed. - The closer to 0 parameter, the flatter (more uniform) - the access distribution. - A crude approximation of the distribution is that the most frequent 1% - values in the range, close to min, are drawn - parameter% of the time. - parameter value must be strictly positive. + + +\setrandom n 1 10 gaussian 2.0 is equivalent to +\set n random_gaussian(1, 10, 2.0), and uses a gaussian +distribution. + + + + + See the documentation of these functions below for further information + about the precise shape of these distributions, depending on the value + of the parameter. @@ -990,34 +960,6 @@ f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter)) - - - As an example, the full definition of the built-in TPC-B-like - transaction is: - - -\set nbranches :scale -\set ntellers 10 * :scale -\set naccounts 10 * :scale -\setrandom aid 1 :naccounts -\setrandom bid 1 :nbranches -\setrandom tid 1 :ntellers -\setrandom delta -5000 5000 -BEGIN; -UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; -SELECT abalance FROM pgbench_accounts WHERE aid = :aid; -UPDATE pgbench_tellers SET tbalance = tbal
Re: [HACKERS] Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
On 2016-03-26 19:29, Piotr Stefaniak wrote: I'm not saying this is necessarily a bug since the whole function deals with floats, but perhaps it's interesting to note that ndistinct can be 0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize: On the exact same note, something like this (again reduced from what sqlsmith produced): select 1 from unnest('{}'::boolean[]) a (x) left join ( select * from unnest('{}'::boolean[]) where false ) b (x) on a.x = b.x; leads to vardata.rel->tuples being zero here: if (vardata.rel) ndistinct *= vardata.rel->rows / vardata.rel->tuples; Back-trace attached. #0 estimate_hash_bucketsize (root=0xf3cf98, hashkey=0xf44398, nbuckets=1024) at src/backend/utils/adt/selfuncs.c:3543 #1 0x007141a4 in final_cost_hashjoin (root=0xf3cf98, path=0xf47118, workspace=0x7fffd950, sjinfo=0xf45460, semifactors=0x7fffdae8) at src/backend/optimizer/path/costsize.c:2856 #2 0x0075c134 in create_hashjoin_path (root=0xf3cf98, joinrel=0xf46438, jointype=JOIN_LEFT, workspace=0x7fffd950, sjinfo=0xf45460, semifactors=0x7fffdae8, outer_path=0xf461a0, inner_path=0xf45a98, restrict_clauses=0xf466f0, required_outer=0x0, hashclauses=0xf471d8) at src/backend/optimizer/util/pathnode.c:2133 #3 0x0072074a in try_hashjoin_path (root=0xf3cf98, joinrel=0xf46438, outer_path=0xf461a0, inner_path=0xf45a98, hashclauses=0xf471d8, jointype=JOIN_LEFT, extra=0x7fffdad0) at src/backend/optimizer/path/joinpath.c:523 #4 0x007219e2 in hash_inner_and_outer (root=0xf3cf98, joinrel=0xf46438, outerrel=0xf44cd0, innerrel=0xf44fa8, jointype=JOIN_LEFT, extra=0x7fffdad0) at src/backend/optimizer/path/joinpath.c:1403 #5 0x00720058 in add_paths_to_joinrel (root=0xf3cf98, joinrel=0xf46438, outerrel=0xf44cd0, innerrel=0xf44fa8, jointype=JOIN_LEFT, sjinfo=0xf45460, restrictlist=0xf466f0) at src/backend/optimizer/path/joinpath.c:211 #6 0x00722e38 in make_join_rel (root=0xf3cf98, rel1=0xf44cd0, rel2=0xf44fa8) at src/backend/optimizer/path/joinrels.c:771 #7 0x007222dd in make_rels_by_clause_joins (root=0xf3cf98, old_rel=0xf44cd0, other_rels=0xf463b8) at src/backend/optimizer/path/joinrels.c:274 #8 0x00721f86 in join_search_one_level (root=0xf3cf98, level=2) at src/backend/optimizer/path/joinrels.c:96 #9 0x0070dc4d in standard_join_search (root=0xf3cf98, levels_needed=2, initial_rels=0xf46380) at src/backend/optimizer/path/allpaths.c:2124 #10 0x0070dbc6 in make_rel_from_joinlist (root=0xf3cf98, joinlist=0xf452c8) at src/backend/optimizer/path/allpaths.c:2055 #11 0x0070b304 in make_one_rel (root=0xf3cf98, joinlist=0xf452c8) at src/backend/optimizer/path/allpaths.c:175 #12 0x007352be in query_planner (root=0xf3cf98, tlist=0xf440b8, qp_callback=0x739ec7 , qp_extra=0x7fffde10) at src/backend/optimizer/plan/planmain.c:246 #13 0x00737d89 in grouping_planner (root=0xf3cf98, inheritance_update=0 '\000', tuple_fraction=0) at src/backend/optimizer/plan/planner.c:1673 #14 0x00736535 in subquery_planner (glob=0xf3ad88, parse=0xf3a458, parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0) at src/backend/optimizer/plan/planner.c:758 #15 0x00735688 in standard_planner (parse=0xf3a458, cursorOptions=256, boundParams=0x0) at src/backend/optimizer/plan/planner.c:307 #16 0x007353ca in planner (parse=0xf3a458, cursorOptions=256, boundParams=0x0) at src/backend/optimizer/plan/planner.c:177 #17 0x00800d28 in pg_plan_query (querytree=0xf3a458, cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:798 #18 0x00800ddb in pg_plan_queries (querytrees=0xf3cf60, cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:857 #19 0x00801080 in exec_simple_query (query_string=0xf077a8 "select 1\nfrom unnest('{}'::boolean[]) a (x)\nleft join (\n select *\n from unnest('{}'::boolean[])\n where false\n) b (x) on a.x = b.x;") at src/backend/tcop/postgres.c:1022 #20 0x00805342 in PostgresMain (argc=1, argv=0xe958d0, dbname=0xe95730 "postgres", username=0xe95710 "me") at src/backend/tcop/postgres.c:4059 #21 0x0077ed31 in BackendRun (port=0xeb2950) at src/backend/postmaster/postmaster.c:4258 #22 0x0077e495 in BackendStartup (port=0xeb2950) at src/backend/postmaster/postmaster.c:3932 #23 0x0077ac19 in ServerLoop () at src/backend/postmaster/postmaster.c:1690 #24 0x0077a24e in PostmasterMain (argc=3, argv=0xe94850) at src/backend/postmaster/postmaster.c:1298 #25 0x006c6216 in main (argc=3, argv=0xe94850) at src/backend/main/main.c:228 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > Thank you very much for testing! > I also got access to 4 x 18 Intel server with 144 threads. I'm going to > post results of tests on this server in next Monday. > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j 100 -M prepared -T 300. See results in the table and chart. clients master v3 v5 1 11671 12507 12679 2 24650 26005 25010 4 49631 48863 49811 8 96790 96441 99946 10 121275 119928 124100 20 243066 243365 246432 30 359616 342241 357310 40 431375 415310 441619 50 489991 489896 500590 60 538057 636473 554069 70 588659 714426 738535 80 405008 923039 902632 90 295443 1181247 1155918 100 258695 1323125 1325019 110 238842 1393767 1410274 120 226018 1432504 1474982 130 215102 1465459 1503241 140 206415 1470454 1505380 150 197850 1475479 1519908 160 190935 1420915 1484868 170 185835 1438965 1453128 180 182519 1416252 1453098 My conclusions are following: 1) We don't observe any regression in v5 in comparison to master. 2) v5 in most of cases slightly outperforms v3. I'm going to do some code cleanup of v5 in Monday -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relation extension scalability
On Sat, Mar 26, 2016 at 3:18 PM, Dilip Kumar wrote: > search the last, say, two pages of the FSM in all cases. But that >> might be expensive. The extra call to RelationGetNumberOfBlocks seems >> cheap enough here because the alternative is to wait for a contended >> heavyweight lock. >> > > I will try the test with this also and post the results. > I have changed v14 as per this suggestion and results are same as v14. I have again measured the relation size, this time directly using size function so results are better understandable. Relation Size - INSERT : 16000 transaction from 32 Client Base v13 v14_1 - - TPS 37 255 112 Rel Size 17GB 17GB 18GB COPY: 32000 transaction from 32 client Base v13v14_1 --- TPS 121 823 427 Rel Size 11GB11GB 11GB Script are attached in the mail = test_size_ins.sh --> run insert test and calculate relation size. test_size_copy --> run copy test and relation size copy_script -> copy pg_bench script used by test_size_copy.sh insert_script --> insert pg_bench script used by test_size_ins.sh multi_extend_v14_poc_v1.patch --> modified patch of v14. I also tried modifying v14 from different different angle. One is like below--> - In AddExtraBlock { I add page to FSM one by one like v13 does. then update the full FSM tree up till root } Results: -- 1. With this performance is little less than v14 but the problem of extra relation size is solved. 2. With this we can conclude that extra size of relation in v14 is because some while extending the pages, its not immediately available and at end some of the pages are left unused. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com test_size_copy.sh Description: Bourne shell script test_size_ins.sh Description: Bourne shell script copy_script Description: Binary data insert_script Description: Binary data multi_extend_v14_poc_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relation extension scalability
On Sat, Mar 26, 2016 at 3:18 PM, Dilip Kumar wrote: > We could go further still and have GetPageWithFreeSpace() always >> search the last, say, two pages of the FSM in all cases. But that >> might be expensive. The extra call to RelationGetNumberOfBlocks seems >> cheap enough here because the alternative is to wait for a contended >> heavyweight lock. >> > > I will try the test with this also and post the results. > **Something went wrong in last mail, seems like become separate thread, so resending the same mail ** I have changed v14 as per this suggestion and results are same as v14. I have again measured the relation size, this time directly using size function so results are better understandable. Relation Size - INSERT : 16000 transaction from 32 Client Base v13 v14_1 - - TPS 37 255 112 Rel Size 17GB 17GB 18GB COPY: 32000 transaction from 32 client Base v13v14_1 --- TPS 121 823427 Rel Size 11GB11GB 11GB Script are attached in the mail = test_size_ins.sh --> run insert test and calculate relation size. test_size_copy --> run copy test and relation size copy_script -> copy pg_bench script used by test_size_copy.sh insert_script --> insert pg_bench script used by test_size_ins.sh multi_extend_v14_poc_v1.patch --> modified patch of v14. I also tried modifying v14 from different different angle. One is like below--> - In AddExtraBlock { I add page to FSM one by one like v13 does. then update the full FSM tree up till root } Results: -- 1. With this performance is little less than v14 but the problem of extra relation size is solved. 2. With this we can conclude that extra size of relation in v14 is because some while extending the pages, its not immediately available and at end some of the pages are left unused. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com copy_script Description: Binary data insert_script Description: Binary data multi_extend_v14_poc_v1.patch Description: Binary data test_size_copy.sh Description: Bourne shell script test_size_ins.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On 2016-03-25 23:02:11 +0530, Dilip Kumar wrote: > On Fri, Mar 25, 2016 at 8:09 PM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > > > Could anybody run benchmarks? Feature freeze is soon, but it would be > > *very nice* to fit it into 9.6 release cycle, because it greatly improves > > scalability on large machines. Without this patch PostgreSQL 9.6 will be > > significantly behind competitors like MySQL 5.7. > > > I have run the performance and here are the results.. With latest patch I > did not see any regression at lower client count (median of 3 reading). > > scale factor 1000 shared buffer 8GB readonly > *Client Base patch* > 1 12957 13068 > 2 24931 25816 > 4 46311 48767 > 32 300921 310062 > 64 387623 493843 > 128 249635 583513 > scale factor 300 shared buffer 8GB readonly > *Client Base patch* > 1 14537 14586--> one thread number looks little less, generally I get > ~18000 (will recheck). > 2 34703 33929--> may be run to run variance (once I get time, will > recheck) > 4 67744 69069 > 32 312575 336012 > 64 213312 539056 > 128 190139 380122 > > *Summary:* > > Actually with 64 client we have seen ~470,000 TPS with head also, by > revering commit 6150a1b0. > refer this thread: ( > http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com > ) > > I haven't tested this patch by reverting commit 6150a1b0, so not sure can > this patch give even better performance ? > > It also points to the case, what Andres has mentioned in this thread. > > http://www.postgresql.org/message-id/20160226191158.3vidtk3ktcmhi...@alap3.anarazel.de On what hardware did you run these tests? Thanks, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote: > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > > > Thank you very much for testing! > > I also got access to 4 x 18 Intel server with 144 threads. I'm going to > > post results of tests on this server in next Monday. > > > > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j 100 > -M prepared -T 300. > See results in the table and chart. > > clients master v3 v5 > 1 11671 12507 12679 > 2 24650 26005 25010 > 4 49631 48863 49811 > 8 96790 96441 99946 > 10 121275 119928 124100 > 20 243066 243365 246432 > 30 359616 342241 357310 > 40 431375 415310 441619 > 50 489991 489896 500590 > 60 538057 636473 554069 > 70 588659 714426 738535 > 80 405008 923039 902632 > 90 295443 1181247 1155918 > 100 258695 1323125 1325019 > 110 238842 1393767 1410274 > 120 226018 1432504 1474982 > 130 215102 1465459 1503241 > 140 206415 1470454 1505380 > 150 197850 1475479 1519908 > 160 190935 1420915 1484868 > 170 185835 1438965 1453128 > 180 182519 1416252 1453098 > > My conclusions are following: > 1) We don't observe any regression in v5 in comparison to master. > 2) v5 in most of cases slightly outperforms v3. What commit did you base these tests on? I guess something recent, after 98a64d0bd? > I'm going to do some code cleanup of v5 in Monday Ok, I'll try to do a review and possibly commit after that. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Sun, Mar 27, 2016 at 5:37 PM, Andres Freund wrote: > On what hardware did you run these tests? IBM POWER 8 MACHINE. Architecture: ppc64le Byte Order:Little Endian CPU(s):192 Thread(s) per core:8 Core(s) per socket:1 Socket(s):24 NUMA node(s): 4 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Performance degradation in commit 6150a1b0
Hi, On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote: > As mentioned in my earlier mail i was not able to apply > *pinunpin-cas-5.patch* on commit *6150a1b0, That's not surprising; that's pretty old. > *therefore i thought of applying it on the latest commit and i was > able to do it successfully. I have now taken the performance readings > at latest commit i.e. *76281aa9* with and without applying > *pinunpin-cas-5.patch* and my observations are as follows, > > 1. I can still see that the current performance lags by 2-3% from the > expected performance when *pinunpin-cas-5.patch *is applied on the commit > > *76281aa9.* > 2. When *pinunpin-cas-5.patch *is ignored and performance is measured at > commit *76281aa9 *the overall performance lags by 50-60% from the expected > performance. > > *Note:* Here, the expected performance is the performance observed before > commit *6150a1b0 *when* ac1d794 *is reverted. Thanks for doing these benchmarks. What's the performance if you revert 6150a1b0 on top of a recent master? There've been a lot of other patches influencing performance since 6150a1b0, so minor performance differences aren't necessarily meaningful; especially when that older version then had other patches reverted. Thanks, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On 2016-03-27 17:45:52 +0530, Dilip Kumar wrote: > On Sun, Mar 27, 2016 at 5:37 PM, Andres Freund wrote: > > > On what hardware did you run these tests? > > > IBM POWER 8 MACHINE. > > Architecture: ppc64le > Byte Order:Little Endian > CPU(s):192 > Thread(s) per core:8 > Core(s) per socket:1 > Socket(s):24 > NUMA node(s): 4 What's sizeof(BufferDesc) after applying these patches? It should better be <= 64... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index
>>> I'll try to explain with two-dimensional example over points. ASCII-art: >> >> Thank you for the explanation. Should we incorporate this with the patch. > > added I have worked on the comments of the patch. It is attached. I hope it looks more clear than it was before. >>> + cmp_double(const double a, const double b) >> >> Does this function necessary? We can just compare the doubles. > > Yes, it compares with limited precision as it does by geometry operations. > Renamed to point actual arguments. I meant that we could use FP macros directly instead of this function. Doing so is also more correct. I haven't tried to produce the problem, but this function is not same as using the macros directly. For differences smaller that the epsilon, it can return different results. I have removed it on the attached version. >>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle) >> >> The "rectangle" variable in here should be renamed. > > fixed I found a bunch of those too and renamed for clarity. I have also reordered the arguments of the helper functions to keep them consistent. > evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If > evalRangeBox() will palloc its result then we need to copy its result into > RangeBox struct and free. Let both fucntion use the same interface. I found it nicer to copy and edit the existing value than creating it in two steps like this. It is also attached. > it works until allthesame branch contains only one inner node. Otherwise > traversalValue will be freeed several times, see spgWalk(). It just works, when traversalValues is not set. It is also attached. I have also added the missing regression tests. While doing that I noticed that some operators are missing and also added support for them. It is also attached with the updated version of my test script. I hope I haven't changed the patch too much. I have also pushed the changes here: https://github.com/hasegeli/postgres/commits/box-spgist q4d-5.patch Description: Binary data box-index-test.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alter or rename enum value
On 03/27/2016 12:43 AM, Christophe Pettus wrote: On Mar 26, 2016, at 7:40 AM, Andrew Dunstan wrote: It would be nice if we could find a less broad brush approach to dealing with the issue. I don't know how doable this is, but could we use the existing mechanism of marking an index invalid if it contains an enum type to which a value was added, and the transaction was rolled back? For the 90% use case, that would be acceptable, I would expect. The more I think about this the more I bump up against the fact that almost anything we do might want to do to ameliorate the situation is going to be rolled back. The only approach I can think of that doesn't suffer from this is to abort if an insert/update will affect an index on a modified enum. i.e. we prevent the possible corruption from happening in the first place, as we do now, but in a much more fine grained way. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund wrote: > On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote: > > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov < > > a.korot...@postgrespro.ru> wrote: > > > > > Thank you very much for testing! > > > I also got access to 4 x 18 Intel server with 144 threads. I'm going to > > > post results of tests on this server in next Monday. > > > > > > > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j > 100 > > -M prepared -T 300. > > See results in the table and chart. > > > > clients master v3 v5 > > 1 11671 12507 12679 > > 2 24650 26005 25010 > > 4 49631 48863 49811 > > 8 96790 96441 99946 > > 10 121275 119928 124100 > > 20 243066 243365 246432 > > 30 359616 342241 357310 > > 40 431375 415310 441619 > > 50 489991 489896 500590 > > 60 538057 636473 554069 > > 70 588659 714426 738535 > > 80 405008 923039 902632 > > 90 295443 1181247 1155918 > > 100 258695 1323125 1325019 > > 110 238842 1393767 1410274 > > 120 226018 1432504 1474982 > > 130 215102 1465459 1503241 > > 140 206415 1470454 1505380 > > 150 197850 1475479 1519908 > > 160 190935 1420915 1484868 > > 170 185835 1438965 1453128 > > 180 182519 1416252 1453098 > > > > My conclusions are following: > > 1) We don't observe any regression in v5 in comparison to master. > > 2) v5 in most of cases slightly outperforms v3. > > What commit did you base these tests on? I guess something recent, after > 98a64d0bd? > Yes, more recent than 98a64d0bd. It was based on 676265eb7b. > > I'm going to do some code cleanup of v5 in Monday > > Ok, I'll try to do a review and possibly commit after that. > Sounds good. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Alter or rename enum value
Andrew Dunstan writes: > The more I think about this the more I bump up against the fact that > almost anything we do might want to do to ameliorate the situation is > going to be rolled back. The only approach I can think of that doesn't > suffer from this is to abort if an insert/update will affect an index on > a modified enum. i.e. we prevent the possible corruption from happening > in the first place, as we do now, but in a much more fine grained way. Perhaps, instead of forbidding ALTER ENUM ADD in a transaction, we could allow that, but not allow the new value to be *used* until it's committed? This could be checked cheaply during enum value lookup (ie, is xmin of the pg_enum row committed). What you really need is to prevent the new value from being inserted into any indexes, but checking that directly seems far more difficult, ugly, and expensive than the above. I do not know whether this would be a meaningful improvement for common use-cases, though. (It'd help if people were more specific about the use-cases they need to work.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AssertArg failure in src/backend/executor/functions.c:check_sql_fn_retval()
Piotr Stefaniak writes: > using sqlsmith I found a way to induce an AssertArg failure in > src/backend/executor/functions.c:check_sql_fn_retval() for > assert-enabled builds. It boils down to creating a function and calling > it like this: > CREATE FUNCTION bad_argument_assert(anyarray, integer) RETURNS anyarray > LANGUAGE sql AS $$select $1$$; > SELECT bad_argument_assert(CAST(NULL AS ANYARRAY), 0); Hm. I would argue that it should have rejected CAST(NULL AS ANYARRAY). That's a pseudotype and so there should never be an actual value of that type, not even a null value. The Assert is positing that the type resolution system took care of mapping some actual array type into ANYARRAY, but here the parser didn't notice that it had anything to resolve, because it had an exact match of input data type for the function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GinPageIs* don't actually return a boolean
On 2016-03-18 14:36:23 +0300, Yury Zhuravlev wrote: > Robert Haas wrote: > >On Wed, Mar 2, 2016 at 9:48 PM, Peter Eisentraut wrote: > >>On 2/11/16 9:30 PM, Michael Paquier wrote: ... > > > >We need to decide what to do about this. I disagree with Peter: I > >think that regardless of stdbool, what we've got right now is sloppy > >coding - bad style if nothing else. Furthermore, I think that while C > >lets you use any non-zero value to represent true, our bool type is > >supposed to contain only one of those two values. Therefore, I think > >we should commit the full patch, back-patch it as far as somebody has > >the energy for, and move on. But regardless, this patch can't keep > >sitting in the CommitFest - we either have to take it or reject it, > >and soon. > > > > I know that we are trying to do the right thing. But right now there is an > error only in ginStepRight. Maybe now the fix this place, and we will think > about "bool" then? The patch is attached (small and simple). > > Thanks. > > > -- > Yury Zhuravlev > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > diff --git a/src/backend/access/gin/ginbtree.c > b/src/backend/access/gin/ginbtree.c > index 06ba9cb..30113d0 100644 > --- a/src/backend/access/gin/ginbtree.c > +++ b/src/backend/access/gin/ginbtree.c > @@ -162,8 +162,8 @@ ginStepRight(Buffer buffer, Relation index, int lockmode) > { > Buffer nextbuffer; > Pagepage = BufferGetPage(buffer); > - boolisLeaf = GinPageIsLeaf(page); > - boolisData = GinPageIsData(page); > + uint8 isLeaf = GinPageIsLeaf(page); > + uint8 isData = GinPageIsData(page); > BlockNumber blkno = GinPageGetOpaque(page)->rightlink; > > nextbuffer = ReadBuffer(index, blkno); I've pushed the gin specific stuff (although I fixed the macros instead of the callsites) to all branches. I plan to commit the larger patch (which has grown since last posting it) after the minor releases; it's somewhat large and has a lot of conflicts. This way at least the confirmed issue is fixed in the next set of minor releases. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Alter or rename enum value
On Mar 27, 2016, at 7:20 AM, Tom Lane wrote: > I do not know whether this would be a meaningful improvement for > common use-cases, though. It would certainly be a step forward over the current situation. It would mean that a specific imaginable use-case (inserting a new enum value, then populating a dimension table for it) would have to be done as two migrations rather than one, but that is much more doable in most tools than having a migration run without a transaction at all. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Alter or rename enum value
ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes: > I was bored and thought "how hard could it be?", and a few hours' > hacking later, I have something that seems to work. It doesn't do IF > NOT EXISTS yet, and the error messaging could do with some improvement, > and there are no docs. The patch is attached, as well as at > https://github.com/ilmari/postgres/commit/enum-alter-value Here's v3 of the patch of the patch, which I consider ready for proper review. It now features: - IF (NOT) EXISTS support - Transaction support - Documentation - Improved error reporting (renaming a non-existent value to an existing one complains about the former, not the latter) >From 0fee3bdc2e1f4022344e050969b993c963889f55 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Thu, 24 Mar 2016 17:50:58 + Subject: [PATCH] =?UTF-8?q?Add=20ALTER=20TYPE=20=E2=80=A6=20ALTER=20VALUE?= =?UTF-8?q?=20=E2=80=A6=20TO=20=E2=80=A6=20for=20enums?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Unlike adding values to an enum, altering an existing value doesn't affect the OID values or ordering, so can be done in a transaction. IF EXISTS and IF NOT EXISTS are supported for ignoring the non-existence of the old value or the existance of the new one, respectively. --- doc/src/sgml/ref/alter_type.sgml | 24 +++- src/backend/catalog/pg_enum.c | 117 + src/backend/commands/typecmds.c| 51 +--- src/backend/parser/gram.y | 18 ++ src/include/catalog/pg_enum.h | 3 + src/include/nodes/parsenodes.h | 2 + src/test/regress/expected/enum.out | 63 src/test/regress/sql/enum.sql | 30 ++ 8 files changed, 284 insertions(+), 24 deletions(-) diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml index 9789881..f6b4e66 100644 --- a/doc/src/sgml/ref/alter_type.sgml +++ b/doc/src/sgml/ref/alter_type.sgml @@ -29,6 +29,7 @@ ALTER TYPE name RENAME ATTRIBUTE name RENAME TO new_name ALTER TYPE name SET SCHEMA new_schema ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } existing_enum_value ] +ALTER TYPE name ALTER VALUE [ IF EXISTS ] existing_enum_value TO new_enum_value [ IF NOT EXISTS ] where action is one of: @@ -124,6 +125,23 @@ ALTER TYPE name ADD VALUE [ IF NOT +ALTER VALUE [ IF EXISTST ] TO [ IF NOT EXISTS ] + + + This form renames a value in an enum type. The value's place in the + enum's ordering is not affected. + + + If IF EXISTS or is IF NOT EXISTS + is specified, it is not an error if the type doesn't contain the old + value or already contains the new value, respectively: a notice is + issued but no other action is taken. Otherwise, an error will occur + if the old value is not alreeady present or the new value is. + + + + + CASCADE @@ -241,7 +259,8 @@ ALTER TYPE name ADD VALUE [ IF NOT new_enum_value -The new value to be added to an enum type's list of values. +The new value to be added to an enum type's list of values, +or to rename an existing one to. Like all enum literals, it needs to be quoted. @@ -252,7 +271,8 @@ ALTER TYPE name ADD VALUE [ IF NOT The existing enum value that the new value should be added immediately -before or after in the enum type's sort ordering. +before or after in the enum type's sort ordering, +or renamed from. Like all enum literals, it needs to be quoted. diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c index af89daa..2e78294 100644 --- a/src/backend/catalog/pg_enum.c +++ b/src/backend/catalog/pg_enum.c @@ -464,6 +464,123 @@ restart: /* + * RenameEnumLabel + * Add a new label to the enum set. By default it goes at + * the end, but the user can choose to place it before or + * after any existing set member. + */ +void +RenameEnumLabel(Oid enumTypeOid, +const char *oldVal, +const char *newVal, +bool skipIfNotExists, +bool skipIfExists) +{ + Relation pg_enum; + HeapTuple old_tup = NULL; + HeapTuple enum_tup; + CatCList *list; + int nelems; + int nbr_index; + Form_pg_enum nbr_en; + boolfound_new = false; + + /* check length of new label is ok */ + if (strlen(newVal) > (NAMEDATALEN - 1)) + ereport(ERROR, +(errcode(ERRCODE_INVALID_NAME), + errmsg("invalid enum label \"%s\"", newVal), + errdetail("Labels must be %d characters or less.", + NAMEDATALEN - 1))); + + /* + * Acquire a lock on the enum type, which we won't release until commit. + * This ensures that two backends aren't concurrently modifying the same + * enum type. Without that, we couldn't be sure to get a consistent view + * of the enum members v
Re: [HACKERS] [PATCH] Alter or rename enum value
On 2016-03-27 19:30, Dagfinn Ilmari Mannsåker wrote: ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes: I was bored and thought "how hard could it be?", and a few hours' hacking later, I have something that seems to work. It doesn't do IF NOT EXISTS yet, and the error messaging could do with some improvement, and there are no docs. The patch is attached, as well as at https://github.com/ilmari/postgres/commit/enum-alter-value Here's v3 of the patch of the patch, which I consider ready for proper review. A couple of trivial comments below. +ALTER VALUE [ IF EXISTST ] TO [ IF NOT EXISTS ] typo EXISTST + If IF EXISTS or is IF NOT EXISTS + is specified, it is not an error if the type doesn't contain the old double is + if the old value is not alreeady present or the new value is. typo alreeady + * RenameEnumLabel + * Add a new label to the enum set. By default it goes at copypaste-o? + if (!stmt->oldVal) { not project curly brace style .m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Alter or rename enum value
Marko Tiikkaja writes: > On 2016-03-27 19:30, Dagfinn Ilmari Mannsåker wrote: >> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes: >> >>> I was bored and thought "how hard could it be?", and a few hours' >>> hacking later, I have something that seems to work. It doesn't do IF >>> NOT EXISTS yet, and the error messaging could do with some improvement, >>> and there are no docs. The patch is attached, as well as at >>> https://github.com/ilmari/postgres/commit/enum-alter-value >> >> Here's v3 of the patch of the patch, which I consider ready for proper >> review. > > A couple of trivial comments below. Thanks, all fixed locally and will be in the next version of the patch. -- "The surreality of the universe tends towards a maximum" -- Skud's Law "Never formulate a law or axiom that you're not prepared to live with the consequences of." -- Skud's Meta-Law -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AssertArg failure in src/backend/executor/functions.c:check_sql_fn_retval()
On 2016-03-27 16:40, Tom Lane wrote: Hm. I would argue that it should have rejected CAST(NULL AS ANYARRAY). That's a pseudotype and so there should never be an actual value of that type, not even a null value. I'm a little confused about what you mean here. I thought reject was exactly what's happening; normally you'd get "ERROR: return type anyarray is not supported for SQL functions". If you mean specifically to forbid CAST(NULL AS ANYARRAY) in general then I'd like to point out that there are columns of type anyarray, at least pg_catalog.pg_statistic.stavalues1 is, so the cast is not the only way to trigger this. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relation extension scalability
On Sun, Mar 27, 2016 at 8:00 AM, Dilip Kumar wrote: > Relation Size > - > INSERT : 16000 transaction from 32 Client > > Base v13 v14_1 > - - > TPS 37 255 112 > Rel Size 17GB 17GB 18GB > > COPY: 32000 transaction from 32 client > Base v13v14_1 >--- > TPS 121 823427 > Rel Size 11GB11GB 11GB > > > Script are attached in the mail > = > test_size_ins.sh --> run insert test and calculate relation size. > test_size_copy --> run copy test and relation size > copy_script -> copy pg_bench script used by test_size_copy.sh > insert_script --> insert pg_bench script used by test_size_ins.sh > multi_extend_v14_poc_v1.patch --> modified patch of v14. > > I also tried modifying v14 from different different angle. > > One is like below--> > - > In AddExtraBlock > { >I add page to FSM one by one like v13 does. >then update the full FSM tree up till root > } Not following this. Did you attach this version? > Results: > -- > 1. With this performance is little less than v14 but the problem of extra > relation size is solved. > 2. With this we can conclude that extra size of relation in v14 is because > some while extending the pages, its not immediately available and at end > some of the pages are left unused. I agree with that conclusion. I'm not quite sure where that leaves us, though. We can go back to v13, but why isn't that producing extra pages? It seems like it should: whenever a bulk extend rolls over to a new FSM page, concurrent backends will search either the old or the new one but not both. Maybe we could do this - not sure if it's what you were suggesting above: 1. Add the pages one at a time, and do RecordPageWithFreeSpace after each one. 2. After inserting them all, go back and update the upper levels of the FSM tree up the root. Another idea is: If ConditionalLockRelationForExtension fails to get the lock immediately, search the last *two* pages of the FSM for a free page. Just brainstorming here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Automatically add -Wold-style-definition?
Hi, Does somebody see a reason not to automatically detect and use -Wold-style-definition? Per e.g. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f7c527af308dcdaba2f0ff9d362d672e8886fb1 that'd be useful, and it's an easy to make and automatically detect mistake; and it doesn't have false positives. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Sync timezone code with upstream release tzcode2016c
Well, that was just about as tedious as I feared it might be, but attached is a patch for $SUBJECT. We should apply this, and probably eventually back-patch it, but it'd be wise to let it age awhile in HEAD first. Is anyone interested in reviewing it, or shall I just push it and see what the buildfarm thinks? A note about comparing this to upstream: I found that the best way to do that was to run the IANA source files through a sed filter like this: sed -r \ -e 's/^([ ]*)\*\*([ ])/\1 *\2/' \ -e 's/^([ ]*)\*\*$/\1 */' \ -e 's|^\*/| */|' \ -e 's/register //g' \ -e 's/int_fast32_t/int32/g' \ -e 's/int_fast64_t/int64/g' \ -e 's/struct tm\b/struct pg_tm/g' \ -e 's/\btime_t\b/pg_time_t/g' \ and then pgindent them. (If you pgindent without this, it'll make a hash of their preferred block-comment format with double **'s. As long as I had to do that, I figured I could make the filter deal with substituting typedef names and getting rid of their overenthusiasm for "register".) regards, tom lane sync-timezone-code-2016c.patch.gz Description: sync-timezone-code-2016c.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatically add -Wold-style-definition?
Andres Freund writes: > Does somebody see a reason not to automatically detect and use > -Wold-style-definition? +1 ... unless we have some that are that way intentionally, which I kinda doubt, but you could soon find out. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automatically add -Wold-style-definition?
On 2016-03-27 17:16:11 -0400, Tom Lane wrote: > Andres Freund writes: > > Does somebody see a reason not to automatically detect and use > > -Wold-style-definition? > > +1 ... unless we have some that are that way intentionally, which > I kinda doubt, but you could soon find out. We don't, I've been running with it enabled for a long while (which is how I found issue pointed out upthread). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind just doesn't fsync *anything*?
On 2016-03-09 19:43:52 -0800, Andres Freund wrote: > Hi, > > how come that the only comment in pg_rewind about fsyncing is ' > void > close_target_file(void) > { > ... > /* fsync? */ > } > > Isn't that a bit, uh, minimal for a utility that's likely to be used in > failover scenarios? > > I think we might actually be "saved" due to > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ce439f33 > because pg_rewind appears to leave the cluster in > > ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY; > updateControlFile(&ControlFile_new); > > a state that StartupXLOG will treat as needing recovery: > > if (ControlFile->state != DB_SHUTDOWNED && > ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY) > SyncDataDirectory(); > > but that code went in after pg_rewind, so this certainly can't be an > intentional save. > > I also don't think it's ok that you need to start the cluster to make it > safe against a crash? > > I guess the easiest fix would be to shell out to initdb -s? I've pushed a modified version of the fix that Michael posted in http://archives.postgresql.org/message-id/CAB7nPqRmM%2BCX6bVxw0Y7mMVGMFj1S8kwhevt8TaP83yeFRfbXA%40mail.gmail.com Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
Piotr Stefaniak writes: > I'm not saying this is necessarily a bug since the whole function deals > with floats, but perhaps it's interesting to note that ndistinct can be > 0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize: I think it's basically cosmetic unless you've got a machine that traps zero divide, but still that's good to fix. Thanks for the report! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c
Piotr Stefaniak writes: > On the exact same note, something like this (again reduced from what > sqlsmith produced): > leads to vardata.rel->tuples being zero here: > if (vardata.rel) > ndistinct *= vardata.rel->rows / vardata.rel->tuples; Ugh. That's a bit worse because it'll be 0/0, ie you get a NaN. Thanks for the report. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] silent data loss with ext4 / all current versions
Hi, On 2016-03-18 15:08:32 +0900, Michael Paquier wrote: > +/* > + * Sync data directory to ensure that what has been generated up to now is > + * persistent in case of a crash, and this is done once globally for > + * performance reasons as sync requests on individual files would be > + * a negative impact on the running time of pg_rewind. This is invoked at > + * the end of processing once everything has been processed and written. > + */ > +static void > +syncTargetDirectory(const char *argv0) > +{ > + int ret; > + charexec_path[MAXPGPATH]; > + charcmd[MAXPGPATH]; > + > + if (dry_run) > + return; I think it makes more sense to return after detecting the binary, so you'd find out about problems around not finding initdb during the dry run, not later. > + /* Grab and invoke initdb to perform the sync */ > + if ((ret = find_other_exec(argv0, "initdb", > +"initdb (PostgreSQL) > " PG_VERSION "\n", > +exec_path)) < 0) > + { > + charfull_path[MAXPGPATH]; > + > + if (find_my_exec(argv0, full_path) < 0) > + strlcpy(full_path, progname, sizeof(full_path)); > + > + if (ret == -1) > + pg_fatal("The program \"initdb\" is needed by %s but > was \n" > + "not found in the same directory as > \"%s\".\n" > + "Check your installation.\n", > progname, full_path); > + else > + pg_fatal("The program \"postgres\" was found by \"%s\" > but was \n" > + "not the same version as %s.\n" > + "Check your installation.\n", > progname, full_path); Wrong binary name. > + } > + > + /* now run initdb */ > + if (debug) > + snprintf(cmd, MAXPGPATH, "\"%s\" -D \"%s\" -S", > + exec_path, datadir_target); > + else > + snprintf(cmd, MAXPGPATH, "\"%s\" -D \"%s\" -S > \"%s\"", > + exec_path, datadir_target, DEVNULL); > + > + if (system(cmd) != 0) > + pg_fatal("sync of target directory with initdb -S failed\n"); > + pg_log(PG_PROGRESS, "sync of target directory with initdb -S done\n"); > +} Don't see need for emitting "done", for now at least. > /* > + * Wrapper of rename() similar to the backend version with the same function > + * name aimed at making the renaming durable on disk. Note that this version > + * does not fsync the old file before the rename as all the code paths > leading > + * to this function are already doing this operation. The new file is also > + * normally not present on disk before the renaming so there is no need to > + * bother about it. I don't think it's a good idea to skip fsyncing the old file based on that; it's way too likely that that'll not be done for the next user of durable_rename. > + */ > +static int > +durable_rename(const char *oldfile, const char *newfile) > +{ > + int fd; > + charparentpath[MAXPGPATH]; > + > + if (rename(oldfile, newfile) != 0) > + { > + /* durable_rename produced a log entry */ Uh? > + fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"), > + progname, current_walfile_name, > strerror(errno)); current_walfile_name doesn't look right, that's a largely independent global variable. > + if (fsync(fd) != 0) > + { > + close(fd); > + fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"), > + progname, newfile, strerror(errno)); > + return -1; Close should be after the strerror() call (yes, that mistake already existed once in receivelog.c). > + fd = open(parentpath, O_RDONLY | PG_BINARY, 0); > + > + /* > + * Some OSs don't allow us to open directories at all (Windows returns > + * EACCES), just ignore the error in that case. If desired also > silently > + * ignoring errors about unreadable files. Log others. > + */ Comment is not applicable as a whole. > + if (fd < 0 && (errno == EISDIR || errno == EACCES)) > + return 0; > + else if (fd < 0) > + { > + fprintf(stderr, _("%s: could not open file \"%s\": %s\n"), > + progname, parentpath, strerror(errno)); > + return -1; > + } > + > + if (fsync(fd) != 0) > + { > + close(fd); > + fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"), > + progname, parentpath, strerror(errno)); > + return -1; close() needs to be moved again. It'd be easier to apply these if the rate of trivial issues were lower :(. Attached is an heavil
[HACKERS] backup tools ought to ensure created backups are durable
Hi, As pointed out in http://www.postgresql.org/message-id/20160327232509.v5wgac5vskuse...@awork2.anarazel.de our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't make any efforts to ensure their output is durable. I think for backup tools of possibly critical data, that's pretty much unaceptable. There's cases where we can't ensure durability (i.e. pg_dump | gzip > file), but it's out of our hands in that case. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind just doesn't fsync *anything*?
On Mon, Mar 28, 2016 at 6:52 AM, Andres Freund wrote: > I've pushed a modified version of the fix that Michael posted in > http://archives.postgresql.org/message-id/CAB7nPqRmM%2BCX6bVxw0Y7mMVGMFj1S8kwhevt8TaP83yeFRfbXA%40mail.gmail.com Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] backup tools ought to ensure created backups are durable
On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund wrote: > As pointed out in > http://www.postgresql.org/message-id/20160327232509.v5wgac5vskuse...@awork2.anarazel.de > our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't > make any efforts to ensure their output is durable. > > I think for backup tools of possibly critical data, that's pretty much > unaceptable. Definitely agreed, once a backup/dump has been taken and those utilities exit, we had better ensure that they are durably on disk. For pg_basebackup and pg_dump, as everything except pg_dump/plain require a target directory for the location of the output result, we really can make things better. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relation extension scalability
On Mon, Mar 28, 2016 at 1:55 AM, Robert Haas wrote: > > One is like below--> > > - > > In AddExtraBlock > > { > >I add page to FSM one by one like v13 does. > >then update the full FSM tree up till root > > } > > Not following this. Did you attach this version? > No I did not attached this.. During rough experiment, tried this, did not produced any patch, I will send this. > > Results: > > -- > > 1. With this performance is little less than v14 but the problem of extra > > relation size is solved. > > 2. With this we can conclude that extra size of relation in v14 is > because > > some while extending the pages, its not immediately available and at end > > some of the pages are left unused. > > I agree with that conclusion. I'm not quite sure where that leaves > us, though. We can go back to v13, but why isn't that producing extra > pages? It seems like it should: whenever a bulk extend rolls over to > a new FSM page, concurrent backends will search either the old or the > new one but not both. > > Maybe we could do this - not sure if it's what you were suggesting above: > > 1. Add the pages one at a time, and do RecordPageWithFreeSpace after each > one. > 2. After inserting them all, go back and update the upper levels of > the FSM tree up the root. > Yes same, I wanted to explained the same above. > Another idea is: > > If ConditionalLockRelationForExtension fails to get the lock > immediately, search the last *two* pages of the FSM for a free page. > > Just brainstorming here. I think this is better option, Since we will search last two pages of FSM tree, then no need to update the upper level of the FSM tree. Right ? I will test and post the result with this option. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] VS 2015 support in src/tools/msvc
On Fri, Mar 25, 2016 at 9:48 PM, Andrew Dunstan wrote: > OK, sounds good. Just a side-note. Andres has pushed the fix for the GinIs* macros as af4472bc, making patch 0003 from the last series useless now. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes for next week's releases
On Sat, Mar 26, 2016 at 4:34 PM, Tom Lane wrote: > Probably the most discussion-worthy item is whether we can say > anything more about the strxfrm mess. Should we make a wiki > page about that and have the release note item link to it? I think that there is an argument against doing so, which is that right now, all we have to offer on that are weasel words. However, I'm still in favor of a Wiki page, because I would not be at all surprised if our understanding of this problem evolved, and we were able to offer better answers in several weeks. Realistically, it will probably take at least that long before affected users even start to think about this. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes for next week's releases
On Sun, Mar 27, 2016 at 8:43 PM, Peter Geoghegan wrote: > On Sat, Mar 26, 2016 at 4:34 PM, Tom Lane wrote: > > Probably the most discussion-worthy item is whether we can say > > anything more about the strxfrm mess. Should we make a wiki > > page about that and have the release note item link to it? > > I think that there is an argument against doing so, which is that > right now, all we have to offer on that are weasel words. However, I'm > still in favor of a Wiki page, because I would not be at all surprised > if our understanding of this problem evolved, and we were able to > offer better answers in several weeks. Realistically, it will probably > take at least that long before affected users even start to think > about this. > One question to debate is whether placing a list of "known" (collated from the program runs lots of people performed) would do more harm than good. Personally I'd rather see a list of known failures and evaluate my situation objectively (i.e., large index but no reported problem on my combination of locale and platform). I understand that a lack of evidence is not proof that I am unaffected at this stage in the game. Having something I can execute on my server to try and verify behavior - irrespective of the correctness of the indexes themselves - would be welcomed. David J.
[HACKERS] Nested funtion
Hi Is there any way to create nested function? oracle to postgres migration required super function variable reference into nested function without nested function parameter Oracle sample: --- create or replace function f1(n number) return number is vs number:=1; function nf1(m number) return number is begin return vs + m + n; end; begin return nf1(2); end; / run: SQL> select f1(9) from dual; F1(9) -- 12 Thanks Sridhar BN
Re: [HACKERS] Nested funtion
Hi 2016-03-28 6:14 GMT+02:00 Sridhar N Bamandlapally : > Hi > > Is there any way to create nested function? > Some languages supports this feature, like plv8, but plpgsql doesn't support it, You have to use two function and some implementation of session variables. Regards Pavel > > oracle to postgres migration required super function variable reference > into nested function without nested function parameter > > Oracle sample: > --- > create or replace function f1(n number) return number > is > vs number:=1; > function nf1(m number) return number is > begin > return vs + m + n; > end; > begin > return nf1(2); > end; > / > > run: > > SQL> select f1(9) from dual; > > F1(9) > -- > 12 > > > > Thanks > Sridhar BN >
Re: [HACKERS] Nested funtion
On Sun, Mar 27, 2016 at 9:14 PM, Sridhar N Bamandlapally < sridhar@gmail.com> wrote: > Hi > > Is there any way to create nested function? > > oracle to postgres migration required super function variable reference > into nested function without nested function parameter > > Oracle sample: > --- > create or replace function f1(n number) return number > is > vs number:=1; > function nf1(m number) return number is > begin > return vs + m + n; > end; > begin > return nf1(2); > end; > / > > run: > > SQL> select f1(9) from dual; > > F1(9) > -- > 12 > > PostgreSQL's pl/pgsql langauge doesn't grok closures. You are welcome to write "f1" in a language that does. Perl and Python are built-in and many others are available. I assume at least one of them can do this. David J.
Re: [HACKERS] silent data loss with ext4 / all current versions
On Mon, Mar 28, 2016 at 8:25 AM, Andres Freund wrote: > On 2016-03-18 15:08:32 +0900, Michael Paquier wrote: >> + fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"), >> + progname, current_walfile_name, >> strerror(errno)); > > current_walfile_name doesn't look right, that's a largely independent > global variable. Stupid mistake. >> + if (fsync(fd) != 0) >> + { >> + close(fd); >> + fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"), >> + progname, newfile, strerror(errno)); >> + return -1; > > Close should be after the strerror() call (yes, that mistake already > existed once in receivelog.c). Right. > It'd be easier to apply these if the rate of trivial issues were lower > :(. Sorry about that. I'll be more careful. > Attached is an heavily revised version of the patch. Besides the above, > I've: > * copied fsync_fname_ext from initdb, I think it's easier to understand > code this way, and it'll make unifying the code easier OK. > * added fsyncing of the directory in mark_file_as_archived() > * The WAL files also need to be fsynced when created in open_walfile(): > - otherwise the directory entry itself isn't safely persistent, as we > don't fsync the directory in the stream->synchronous fsync() cases. > - we refuse to resume in open_walfile(), if a file isn't 16MB when > restarting. Without an fsync that's actually not unlikely after a > crash. Even with an fsync that's not guaranteed not to happen, but > the chance of it is much lower. > I'm too tired to push this at the eleventh hour. Besides a heavily > revised patch, backpatching will likely include a number of conflicts. > If somebody in the US has the energy to take care of this... Close enough to the US. Attached are backpatchable versions based on the corrected version you sent. 9.3 and 9.4 share the same patch, more work has been necessary for 9.2 but that's not huge either. > So we're going to have another round of fsync stuff in the next set of > releases anyway... Yes, seeing how 9.5.2 is close by, I think that it would be wiser to push this stuff after the upcoming minor release. -- Michael pg_receivexlog-sync-94-93.patch Description: invalid/octet-stream pg_receivexlog-sync-92.patch Description: invalid/octet-stream pg_receivexlog-sync-95.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relation extension scalability
On Mon, Mar 28, 2016 at 1:55 AM, Robert Haas wrote: > > On Sun, Mar 27, 2016 at 8:00 AM, Dilip Kumar wrote: > > > Results: > > -- > > 1. With this performance is little less than v14 but the problem of extra > > relation size is solved. > > 2. With this we can conclude that extra size of relation in v14 is because > > some while extending the pages, its not immediately available and at end > > some of the pages are left unused. > > I agree with that conclusion. I'm not quite sure where that leaves > us, though. We can go back to v13, but why isn't that producing extra > pages? It seems like it should: whenever a bulk extend rolls over to > a new FSM page, concurrent backends will search either the old or the > new one but not both. > I have not debugged the flow, but by looking at v13 code, it looks like it will search both old and new. In function GetPageWithFreeSpaceExtended()->fsm_search_from_addr()->fsm_search_avail(), the basic idea of search is: Start the search from the target slot. At every step, move one node to the right, then climb up to the parent. Stop when we reach a node with enough free space (as we must, since the root has enough space). So shouldn't it be able to find the new FSM page where the bulk extend rolls over? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Relation extension scalability
On Mon, Mar 28, 2016 at 11:00 AM, Amit Kapila wrote: > I have not debugged the flow, but by looking at v13 code, it looks like it > will search both old and new. In > function > GetPageWithFreeSpaceExtended()->fsm_search_from_addr()->fsm_search_avail(), > the basic idea of search is: Start the search from the target slot. At > every step, move one > node to the right, then climb up to the parent. Stop when we reach a node > with enough free space (as we must, since the root has enough space). > So shouldn't it be able to find the new FSM page where the bulk extend > rolls over? > This is actually multi level tree, So each FSM page contain one slot tree. So fsm_search_avail() is searching only the slot tree, inside one FSM page. But we want to go to next FSM page. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics
On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund wrote: > > What's sizeof(BufferDesc) after applying these patches? It should better > be <= 64... > It is 72. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com