Re: [HACKERS] extend pgbench expressions with functions

2016-03-27 Thread Fabien COELHO


v40 is yet another rebase.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c6d1454..4ceddae 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -815,9 +815,10 @@ pgbench  options  dbname
 
 
  
-  Sets variable varname to an integer value calculated
+  Sets variable varname to a value calculated
   from expression.
   The expression may contain integer constants such as 5432,
+  double constants such as 3.14159,
   references to variables :variablename,
   unary operators (+, -) and binary operators
   (+, -, *, /,
@@ -830,7 +831,7 @@ pgbench  options  dbname
   Examples:
 
 \set ntellers 10 * :scale
-\set aid (1021 * :aid) % (10 * :scale) + 1
+\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1
 
 

@@ -850,66 +851,35 @@ pgbench  options  dbname
  
 
  
-  By default, or when uniform is specified, all values in the
-  range are drawn with equal probability.  Specifying gaussian
-  or  exponential options modifies this behavior; each
-  requires a mandatory parameter which determines the precise shape of the
-  distribution.
- 
+  
+   
+
+ \setrandom n 1 10 or \setrandom n 1 10 uniform
+ is equivalent to \set n random(1, 10) and uses a uniform
+ distribution.
+
+   
 
- 
-  For a Gaussian distribution, the interval is mapped onto a standard
-  normal distribution (the classical bell-shaped Gaussian curve) truncated
-  at -parameter on the left and +parameter
-  on the right.
-  Values in the middle of the interval are more likely to be drawn.
-  To be precise, if PHI(x) is the cumulative distribution
-  function of the standard normal distribution, with mean mu
-  defined as (max + min) / 2.0, with
-
- f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
-(2.0 * PHI(parameter) - 1.0)
-
-  then value i between min and
-  max inclusive is drawn with probability:
-  f(i + 0.5) - f(i - 0.5).
-  Intuitively, the larger parameter, the more
-  frequently values close to the middle of the interval are drawn, and the
-  less frequently values close to the min and
-  max bounds. About 67% of values are drawn from the
-  middle 1.0 / parameter, that is a relative
-  0.5 / parameter around the mean, and 95% in the middle
-  2.0 / parameter, that is a relative
-  1.0 / parameter around the mean; for instance, if
-  parameter is 4.0, 67% of values are drawn from the
-  middle quarter (1.0 / 4.0) of the interval (i.e. from
-  3.0 / 8.0 to 5.0 / 8.0) and 95% from
-  the middle half (2.0 / 4.0) of the interval (second and
-  third quartiles). The minimum parameter is 2.0 for
-  performance of the Box-Muller transform.
- 
+  
+   
+\setrandom n 1 10 exponential 3.0 is equivalent to
+\set n random_exponential(1, 10, 3.0) and uses an
+exponential distribution.
+   
+  
 
- 
-  For an exponential distribution, parameter
-  controls the distribution by truncating a quickly-decreasing
-  exponential distribution at parameter, and then
-  projecting onto integers between the bounds.
-  To be precise, with
-
-f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
-
-  Then value i between min and
-  max inclusive is drawn with probability:
-  f(x) - f(x + 1).
-  Intuitively, the larger parameter, the more
-  frequently values close to min are accessed, and the
-  less frequently values close to max are accessed.
-  The closer to 0 parameter, the flatter (more uniform)
-  the access distribution.
-  A crude approximation of the distribution is that the most frequent 1%
-  values in the range, close to min, are drawn
-  parameter% of the time.
-  parameter value must be strictly positive.
+  
+   
+\setrandom n 1 10 gaussian 2.0 is equivalent to
+\set n random_gaussian(1, 10, 2.0), and uses a gaussian
+distribution.
+   
+  
+ 
+
+   See the documentation of these functions below for further information
+   about the precise shape of these distributions, depending on the value
+   of the parameter.
  
 
  
@@ -990,34 +960,6 @@ f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
 

   
-
-  
-   As an example, the full definition of the built-in TPC-B-like
-   transaction is:
-
-
-\set nbranches :scale
-\set ntellers 10 * :scale
-\set naccounts 10 * :scale
-\setrandom aid 1 :naccounts
-\setrandom bid 1 :nbranches
-\setrandom tid 1 :ntellers
-\setrandom delta -5000 5000
-BEGIN;
-UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-UPDATE pgbench_tellers SET tbalance = tbal

Re: [HACKERS] Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c

2016-03-27 Thread Piotr Stefaniak

On 2016-03-26 19:29, Piotr Stefaniak wrote:

I'm not saying this is necessarily a bug since the whole function deals
with floats, but perhaps it's interesting to note that ndistinct can be
0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize:


On the exact same note, something like this (again reduced from what 
sqlsmith produced):


select 1
from unnest('{}'::boolean[]) a (x)
left join (
  select *
  from unnest('{}'::boolean[])
  where false
) b (x) on a.x = b.x;

leads to vardata.rel->tuples being zero here:
if (vardata.rel)
ndistinct *= vardata.rel->rows / vardata.rel->tuples;

Back-trace attached.

#0  estimate_hash_bucketsize (root=0xf3cf98, hashkey=0xf44398, nbuckets=1024) 
at src/backend/utils/adt/selfuncs.c:3543
#1  0x007141a4 in final_cost_hashjoin (root=0xf3cf98, path=0xf47118, 
workspace=0x7fffd950, sjinfo=0xf45460, semifactors=0x7fffdae8) at 
src/backend/optimizer/path/costsize.c:2856
#2  0x0075c134 in create_hashjoin_path (root=0xf3cf98, 
joinrel=0xf46438, jointype=JOIN_LEFT, workspace=0x7fffd950, 
sjinfo=0xf45460, semifactors=0x7fffdae8, outer_path=0xf461a0, 
inner_path=0xf45a98, restrict_clauses=0xf466f0, required_outer=0x0, 
hashclauses=0xf471d8) at src/backend/optimizer/util/pathnode.c:2133
#3  0x0072074a in try_hashjoin_path (root=0xf3cf98, joinrel=0xf46438, 
outer_path=0xf461a0, inner_path=0xf45a98, hashclauses=0xf471d8, 
jointype=JOIN_LEFT, extra=0x7fffdad0) at 
src/backend/optimizer/path/joinpath.c:523
#4  0x007219e2 in hash_inner_and_outer (root=0xf3cf98, 
joinrel=0xf46438, outerrel=0xf44cd0, innerrel=0xf44fa8, jointype=JOIN_LEFT, 
extra=0x7fffdad0) at src/backend/optimizer/path/joinpath.c:1403
#5  0x00720058 in add_paths_to_joinrel (root=0xf3cf98, 
joinrel=0xf46438, outerrel=0xf44cd0, innerrel=0xf44fa8, jointype=JOIN_LEFT, 
sjinfo=0xf45460, restrictlist=0xf466f0) at 
src/backend/optimizer/path/joinpath.c:211
#6  0x00722e38 in make_join_rel (root=0xf3cf98, rel1=0xf44cd0, 
rel2=0xf44fa8) at src/backend/optimizer/path/joinrels.c:771
#7  0x007222dd in make_rels_by_clause_joins (root=0xf3cf98, 
old_rel=0xf44cd0, other_rels=0xf463b8) at 
src/backend/optimizer/path/joinrels.c:274
#8  0x00721f86 in join_search_one_level (root=0xf3cf98, level=2) at 
src/backend/optimizer/path/joinrels.c:96
#9  0x0070dc4d in standard_join_search (root=0xf3cf98, levels_needed=2, 
initial_rels=0xf46380) at src/backend/optimizer/path/allpaths.c:2124
#10 0x0070dbc6 in make_rel_from_joinlist (root=0xf3cf98, 
joinlist=0xf452c8) at src/backend/optimizer/path/allpaths.c:2055
#11 0x0070b304 in make_one_rel (root=0xf3cf98, joinlist=0xf452c8) at 
src/backend/optimizer/path/allpaths.c:175
#12 0x007352be in query_planner (root=0xf3cf98, tlist=0xf440b8, 
qp_callback=0x739ec7 , qp_extra=0x7fffde10) at 
src/backend/optimizer/plan/planmain.c:246
#13 0x00737d89 in grouping_planner (root=0xf3cf98, inheritance_update=0 
'\000', tuple_fraction=0) at src/backend/optimizer/plan/planner.c:1673
#14 0x00736535 in subquery_planner (glob=0xf3ad88, parse=0xf3a458, 
parent_root=0x0, hasRecursion=0 '\000', tuple_fraction=0) at 
src/backend/optimizer/plan/planner.c:758
#15 0x00735688 in standard_planner (parse=0xf3a458, cursorOptions=256, 
boundParams=0x0) at src/backend/optimizer/plan/planner.c:307
#16 0x007353ca in planner (parse=0xf3a458, cursorOptions=256, 
boundParams=0x0) at src/backend/optimizer/plan/planner.c:177
#17 0x00800d28 in pg_plan_query (querytree=0xf3a458, cursorOptions=256, 
boundParams=0x0) at src/backend/tcop/postgres.c:798
#18 0x00800ddb in pg_plan_queries (querytrees=0xf3cf60, 
cursorOptions=256, boundParams=0x0) at src/backend/tcop/postgres.c:857
#19 0x00801080 in exec_simple_query (query_string=0xf077a8 "select 
1\nfrom unnest('{}'::boolean[]) a (x)\nleft join (\n  select *\n  from 
unnest('{}'::boolean[])\n  where false\n) b (x) on a.x = b.x;") at 
src/backend/tcop/postgres.c:1022
#20 0x00805342 in PostgresMain (argc=1, argv=0xe958d0, dbname=0xe95730 
"postgres", username=0xe95710 "me") at src/backend/tcop/postgres.c:4059
#21 0x0077ed31 in BackendRun (port=0xeb2950) at 
src/backend/postmaster/postmaster.c:4258
#22 0x0077e495 in BackendStartup (port=0xeb2950) at 
src/backend/postmaster/postmaster.c:3932
#23 0x0077ac19 in ServerLoop () at 
src/backend/postmaster/postmaster.c:1690
#24 0x0077a24e in PostmasterMain (argc=3, argv=0xe94850) at 
src/backend/postmaster/postmaster.c:1298
#25 0x006c6216 in main (argc=3, argv=0xe94850) at 
src/backend/main/main.c:228
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Alexander Korotkov
On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Thank you very much for testing!
> I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> post results of tests on this server in next Monday.
>

I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j 100
-M prepared -T 300.
See results in the table and chart.

clients master  v3  v5
1   11671   12507   12679
2   24650   26005   25010
4   49631   48863   49811
8   96790   96441   99946
10  121275  119928  124100
20  243066  243365  246432
30  359616  342241  357310
40  431375  415310  441619
50  489991  489896  500590
60  538057  636473  554069
70  588659  714426  738535
80  405008  923039  902632
90  295443  1181247 1155918
100 258695  1323125 1325019
110 238842  1393767 1410274
120 226018  1432504 1474982
130 215102  1465459 1503241
140 206415  1470454 1505380
150 197850  1475479 1519908
160 190935  1420915 1484868
170 185835  1438965 1453128
180 182519  1416252 1453098

My conclusions are following:
1) We don't observe any regression in v5 in comparison to master.
2) v5 in most of cases slightly outperforms v3.

I'm going to do some code cleanup of v5 in Monday

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Dilip Kumar
On Sat, Mar 26, 2016 at 3:18 PM, Dilip Kumar  wrote:

> search the last, say, two pages of the FSM in all cases.  But that
>> might be expensive. The extra call to RelationGetNumberOfBlocks seems
>> cheap enough here because the alternative is to wait for a contended
>> heavyweight lock.
>>
>
> I will try the test with this also and post the results.
>

I have changed v14 as per this suggestion and results are same as v14.

I have again measured the relation size, this time directly using size
function so results are better understandable.

Relation Size
-
INSERT : 16000 transaction from 32 Client

 Base  v13 v14_1
-  -  
TPS  37  255 112
Rel Size   17GB 17GB 18GB

COPY: 32000 transaction from 32 client
 Base  v13v14_1
   ---
TPS  121   823 427
Rel Size   11GB11GB   11GB


Script are attached in the mail
=
test_size_ins.sh  --> run insert test and calculate relation size.
test_size_copy   --> run copy test and relation size
copy_script  -> copy pg_bench script used by test_size_copy.sh
insert_script --> insert pg_bench script used by test_size_ins.sh
multi_extend_v14_poc_v1.patch --> modified patch of v14.

I also tried modifying v14 from different different angle.

One is like below-->
-
In AddExtraBlock
{
   I add page to FSM one by one like v13 does.
   then update the full FSM tree up till root
}

Results:
--
1. With this performance is little less than v14 but the problem of extra
relation size is solved.
2. With this we can conclude that extra size of relation in v14 is because
some while extending the pages, its not immediately available and at end
some of the pages are left unused.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


test_size_copy.sh
Description: Bourne shell script


test_size_ins.sh
Description: Bourne shell script


copy_script
Description: Binary data


insert_script
Description: Binary data


multi_extend_v14_poc_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Dilip Kumar
On Sat, Mar 26, 2016 at 3:18 PM, Dilip Kumar  wrote:

> We could go further still and have GetPageWithFreeSpace() always
>> search the last, say, two pages of the FSM in all cases.  But that
>> might be expensive. The extra call to RelationGetNumberOfBlocks seems
>> cheap enough here because the alternative is to wait for a contended
>> heavyweight lock.
>>
>
> I will try the test with this also and post the results.
>

**Something went wrong in last mail, seems like become separate thread,  so
resending the same mail **

I have changed v14 as per this suggestion and results are same as v14.

I have again measured the relation size, this time directly using size
function so results are better understandable.

Relation Size
-
INSERT : 16000 transaction from 32 Client

 Base  v13 v14_1
-  -  
TPS 37  255 112
Rel Size   17GB 17GB 18GB

COPY: 32000 transaction from 32 client
 Base  v13v14_1
   ---
TPS 121   823427
Rel Size   11GB11GB  11GB


Script are attached in the mail
=
test_size_ins.sh  --> run insert test and calculate relation size.
test_size_copy   --> run copy test and relation size
copy_script  -> copy pg_bench script used by test_size_copy.sh
insert_script --> insert pg_bench script used by test_size_ins.sh
multi_extend_v14_poc_v1.patch --> modified patch of v14.

I also tried modifying v14 from different different angle.

One is like below-->
-
In AddExtraBlock
{
   I add page to FSM one by one like v13 does.
   then update the full FSM tree up till root
}

Results:
--
1. With this performance is little less than v14 but the problem of extra
relation size is solved.
2. With this we can conclude that extra size of relation in v14 is because
some while extending the pages, its not immediately available and at end
some of the pages are left unused.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


copy_script
Description: Binary data


insert_script
Description: Binary data


multi_extend_v14_poc_v1.patch
Description: Binary data


test_size_copy.sh
Description: Bourne shell script


test_size_ins.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Andres Freund
On 2016-03-25 23:02:11 +0530, Dilip Kumar wrote:
> On Fri, Mar 25, 2016 at 8:09 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> 
> > Could anybody run benchmarks?  Feature freeze is soon, but it would be
> > *very nice* to fit it into 9.6 release cycle, because it greatly improves
> > scalability on large machines.  Without this patch PostgreSQL 9.6 will be
> > significantly behind competitors like MySQL 5.7.
> 
> 
> I have run the performance and here are the results.. With latest patch I
> did not see any regression at lower client count (median of 3 reading).
> 
> scale factor 1000 shared buffer 8GB readonly
> *Client Base patch*
> 1 12957 13068
> 2 24931 25816
> 4 46311 48767
> 32 300921 310062
> 64 387623 493843
> 128 249635 583513
> scale factor 300 shared buffer 8GB readonly
> *Client Base patch*
> 1 14537 14586--> one thread number looks little less, generally I get
> ~18000 (will recheck).
> 2 34703 33929--> may be run to run variance (once I get time, will
> recheck)
> 4 67744 69069
> 32 312575 336012
> 64 213312 539056
> 128 190139 380122
> 
> *Summary:*
> 
> Actually with 64 client we have seen ~470,000 TPS with head also, by
> revering commit 6150a1b0.
> refer this thread: (
> http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
> )
> 
> I haven't tested this patch by reverting commit 6150a1b0, so not sure can
> this patch give even better performance ?
> 
> It also points to the case, what Andres has mentioned in this thread.
> 
> http://www.postgresql.org/message-id/20160226191158.3vidtk3ktcmhi...@alap3.anarazel.de

On what hardware did you run these tests?

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Andres Freund
On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
> On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> 
> > Thank you very much for testing!
> > I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> > post results of tests on this server in next Monday.
> >
> 
> I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j 100
> -M prepared -T 300.
> See results in the table and chart.
> 
> clients master  v3  v5
> 1   11671   12507   12679
> 2   24650   26005   25010
> 4   49631   48863   49811
> 8   96790   96441   99946
> 10  121275  119928  124100
> 20  243066  243365  246432
> 30  359616  342241  357310
> 40  431375  415310  441619
> 50  489991  489896  500590
> 60  538057  636473  554069
> 70  588659  714426  738535
> 80  405008  923039  902632
> 90  295443  1181247 1155918
> 100 258695  1323125 1325019
> 110 238842  1393767 1410274
> 120 226018  1432504 1474982
> 130 215102  1465459 1503241
> 140 206415  1470454 1505380
> 150 197850  1475479 1519908
> 160 190935  1420915 1484868
> 170 185835  1438965 1453128
> 180 182519  1416252 1453098
> 
> My conclusions are following:
> 1) We don't observe any regression in v5 in comparison to master.
> 2) v5 in most of cases slightly outperforms v3.

What commit did you base these tests on? I guess something recent, after
98a64d0bd?

> I'm going to do some code cleanup of v5 in Monday

Ok, I'll try to do a review and possibly commit after that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Dilip Kumar
On Sun, Mar 27, 2016 at 5:37 PM, Andres Freund  wrote:

> On what hardware did you run these tests?


IBM POWER 8 MACHINE.

Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):192
Thread(s) per core:8
Core(s) per socket:1
Socket(s):24
NUMA node(s): 4

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-03-27 Thread Andres Freund
Hi,

On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote:
> As mentioned in my earlier mail i was not able to apply
> *pinunpin-cas-5.patch* on commit *6150a1b0,

That's not surprising; that's pretty old.

> *therefore i thought of applying it on the latest commit and i was
> able to do it successfully. I have now taken the performance readings
> at latest commit i.e. *76281aa9* with and without applying
> *pinunpin-cas-5.patch* and my observations are as follows,
>

> 1. I can still see that the current performance lags by 2-3% from the
> expected performance when *pinunpin-cas-5.patch *is applied on the commit
> 
> *76281aa9.*
> 2. When *pinunpin-cas-5.patch *is ignored and performance is measured at
> commit *76281aa9 *the overall performance lags by 50-60% from the expected
> performance.
> 
> *Note:* Here, the expected performance is the performance observed before
> commit *6150a1b0 *when* ac1d794 *is reverted.

Thanks for doing these benchmarks. What's the performance if you revert
6150a1b0 on top of a recent master? There've been a lot of other patches
influencing performance since 6150a1b0, so minor performance differences
aren't necessarily meaningful; especially when that older version then
had other patches reverted.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Andres Freund
On 2016-03-27 17:45:52 +0530, Dilip Kumar wrote:
> On Sun, Mar 27, 2016 at 5:37 PM, Andres Freund  wrote:
> 
> > On what hardware did you run these tests?
> 
> 
> IBM POWER 8 MACHINE.
> 
> Architecture:  ppc64le
> Byte Order:Little Endian
> CPU(s):192
> Thread(s) per core:8
> Core(s) per socket:1
> Socket(s):24
> NUMA node(s): 4

What's sizeof(BufferDesc) after applying these patches? It should better
be <= 64...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-27 Thread Emre Hasegeli
>>> I'll try to explain with two-dimensional example over points. ASCII-art:
>>
>> Thank you for the explanation.  Should we incorporate this with the patch.
>
> added

I have worked on the comments of the patch.  It is attached.  I hope
it looks more clear than it was before.

>>> + cmp_double(const double a, const double b)
>>
>> Does this function necessary?  We can just compare the doubles.
>
> Yes, it compares with limited precision as it does by geometry operations.
> Renamed to point actual arguments.

I meant that we could use FP macros directly instead of this function.
Doing so is also more correct.  I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results.  I have removed it on the attached version.

>>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
>>
>> The "rectangle" variable in here should be renamed.
>
> fixed

I found a bunch of those too and renamed for clarity.  I have also
reordered the arguments of the helper functions to keep them
consistent.

> evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
> evalRangeBox() will palloc its result then we need to copy its result into
> RangeBox struct and free. Let both fucntion use the same interface.

I found it nicer to copy and edit the existing value than creating it
in two steps like this.  It is also attached.

> it works until allthesame branch contains only one inner node. Otherwise
> traversalValue will be freeed several times, see spgWalk().

It just works, when traversalValues is not set.  It is also attached.

I have also added the missing regression tests.  While doing that I
noticed that some operators are missing and also added support for
them.  It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much.  I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist


q4d-5.patch
Description: Binary data


box-index-test.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alter or rename enum value

2016-03-27 Thread Andrew Dunstan



On 03/27/2016 12:43 AM, Christophe Pettus wrote:

On Mar 26, 2016, at 7:40 AM, Andrew Dunstan  wrote:

It would be nice if we could find a less broad brush approach to dealing with 
the issue.

I don't know how doable this is, but could we use the existing mechanism of 
marking an index invalid if it contains an enum type to which a value was 
added, and the transaction was rolled back?  For the 90% use case, that would 
be acceptable, I would expect.




The more I think about this the more I bump up against the fact that 
almost anything we do might want to do to ameliorate the situation is 
going to be rolled back. The only approach I can think of that doesn't 
suffer from this is to abort if an insert/update will affect an index on 
a modified enum. i.e. we prevent the possible corruption from happening 
in the first place, as we do now, but in a much more fine grained way.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Alexander Korotkov
On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund  wrote:

> On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
> > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
> > a.korot...@postgrespro.ru> wrote:
> >
> > > Thank you very much for testing!
> > > I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> > > post results of tests on this server in next Monday.
> > >
> >
> > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j
> 100
> > -M prepared -T 300.
> > See results in the table and chart.
> >
> > clients master  v3  v5
> > 1   11671   12507   12679
> > 2   24650   26005   25010
> > 4   49631   48863   49811
> > 8   96790   96441   99946
> > 10  121275  119928  124100
> > 20  243066  243365  246432
> > 30  359616  342241  357310
> > 40  431375  415310  441619
> > 50  489991  489896  500590
> > 60  538057  636473  554069
> > 70  588659  714426  738535
> > 80  405008  923039  902632
> > 90  295443  1181247 1155918
> > 100 258695  1323125 1325019
> > 110 238842  1393767 1410274
> > 120 226018  1432504 1474982
> > 130 215102  1465459 1503241
> > 140 206415  1470454 1505380
> > 150 197850  1475479 1519908
> > 160 190935  1420915 1484868
> > 170 185835  1438965 1453128
> > 180 182519  1416252 1453098
> >
> > My conclusions are following:
> > 1) We don't observe any regression in v5 in comparison to master.
> > 2) v5 in most of cases slightly outperforms v3.
>
> What commit did you base these tests on? I guess something recent, after
> 98a64d0bd?
>

Yes, more recent than 98a64d0bd. It was based on 676265eb7b.


> > I'm going to do some code cleanup of v5 in Monday
>
> Ok, I'll try to do a review and possibly commit after that.
>

Sounds good.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Alter or rename enum value

2016-03-27 Thread Tom Lane
Andrew Dunstan  writes:
> The more I think about this the more I bump up against the fact that 
> almost anything we do might want to do to ameliorate the situation is 
> going to be rolled back. The only approach I can think of that doesn't 
> suffer from this is to abort if an insert/update will affect an index on 
> a modified enum. i.e. we prevent the possible corruption from happening 
> in the first place, as we do now, but in a much more fine grained way.

Perhaps, instead of forbidding ALTER ENUM ADD in a transaction, we could
allow that, but not allow the new value to be *used* until it's committed?
This could be checked cheaply during enum value lookup (ie, is xmin of the
pg_enum row committed).

What you really need is to prevent the new value from being inserted
into any indexes, but checking that directly seems far more difficult,
ugly, and expensive than the above.

I do not know whether this would be a meaningful improvement for
common use-cases, though.  (It'd help if people were more specific
about the use-cases they need to work.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AssertArg failure in src/backend/executor/functions.c:check_sql_fn_retval()

2016-03-27 Thread Tom Lane
Piotr Stefaniak  writes:
> using sqlsmith I found a way to induce an AssertArg failure in 
> src/backend/executor/functions.c:check_sql_fn_retval() for 
> assert-enabled builds. It boils down to creating a function and calling 
> it like this:

> CREATE FUNCTION bad_argument_assert(anyarray, integer) RETURNS anyarray 
> LANGUAGE sql AS $$select $1$$;
> SELECT bad_argument_assert(CAST(NULL AS ANYARRAY), 0);

Hm.  I would argue that it should have rejected CAST(NULL AS ANYARRAY).
That's a pseudotype and so there should never be an actual value of that
type, not even a null value.

The Assert is positing that the type resolution system took care of
mapping some actual array type into ANYARRAY, but here the parser
didn't notice that it had anything to resolve, because it had an
exact match of input data type for the function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GinPageIs* don't actually return a boolean

2016-03-27 Thread Andres Freund
On 2016-03-18 14:36:23 +0300, Yury Zhuravlev wrote:
> Robert Haas wrote:
> >On Wed, Mar 2, 2016 at 9:48 PM, Peter Eisentraut  wrote:
> >>On 2/11/16 9:30 PM, Michael Paquier wrote: ...
> >
> >We need to decide what to do about this.  I disagree with Peter: I
> >think that regardless of stdbool, what we've got right now is sloppy
> >coding - bad style if nothing else.  Furthermore, I think that while C
> >lets you use any non-zero value to represent true, our bool type is
> >supposed to contain only one of those two values.  Therefore, I think
> >we should commit the full patch, back-patch it as far as somebody has
> >the energy for, and move on.  But regardless, this patch can't keep
> >sitting in the CommitFest - we either have to take it or reject it,
> >and soon.
> >
> 
> I know that we are trying to do the right thing. But right now there is an
> error only in ginStepRight. Maybe now the fix this place, and we will think
> about "bool" then? The patch is attached (small and simple).
> 
> Thanks.
> 
> 
> -- 
> Yury Zhuravlev
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

> diff --git a/src/backend/access/gin/ginbtree.c 
> b/src/backend/access/gin/ginbtree.c
> index 06ba9cb..30113d0 100644
> --- a/src/backend/access/gin/ginbtree.c
> +++ b/src/backend/access/gin/ginbtree.c
> @@ -162,8 +162,8 @@ ginStepRight(Buffer buffer, Relation index, int lockmode)
>  {
> Buffer  nextbuffer;
> Pagepage = BufferGetPage(buffer);
> -   boolisLeaf = GinPageIsLeaf(page);
> -   boolisData = GinPageIsData(page);
> +   uint8   isLeaf = GinPageIsLeaf(page);
> +   uint8   isData = GinPageIsData(page);
> BlockNumber blkno = GinPageGetOpaque(page)->rightlink;
>  
> nextbuffer = ReadBuffer(index, blkno);

I've pushed the gin specific stuff (although I fixed the macros instead
of the callsites) to all branches. I plan to commit the larger patch
(which has grown since last posting it) after the minor releases; it's
somewhat large and has a lot of conflicts. This way at least the
confirmed issue is fixed in the next set of minor releases.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alter or rename enum value

2016-03-27 Thread Christophe Pettus

On Mar 27, 2016, at 7:20 AM, Tom Lane  wrote:
> I do not know whether this would be a meaningful improvement for
> common use-cases, though.

It would certainly be a step forward over the current situation.  It would mean 
that a specific imaginable use-case (inserting a new enum value, then 
populating a dimension table for it) would have to be done as two migrations 
rather than one, but that is much more doable in most tools than having a 
migration run without a transaction at all.

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Alter or rename enum value

2016-03-27 Thread Dagfinn Ilmari Mannsåker
ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:

> I was bored and thought "how hard could it be?", and a few hours'
> hacking later, I have something that seems to work.  It doesn't do IF
> NOT EXISTS yet, and the error messaging could do with some improvement,
> and there are no docs.  The patch is attached, as well as at
> https://github.com/ilmari/postgres/commit/enum-alter-value

Here's v3 of the patch of the patch, which I consider ready for proper
review.  It now features:

- IF (NOT) EXISTS support
- Transaction support
- Documentation
- Improved error reporting (renaming a non-existent value to an existing
  one complains about the former, not the latter)

>From 0fee3bdc2e1f4022344e050969b993c963889f55 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 24 Mar 2016 17:50:58 +
Subject: [PATCH] =?UTF-8?q?Add=20ALTER=20TYPE=20=E2=80=A6=20ALTER=20VALUE?=
 =?UTF-8?q?=20=E2=80=A6=20TO=20=E2=80=A6=20for=20enums?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Unlike adding values to an enum, altering an existing value doesn't
affect the OID values or ordering, so can be done in a transaction.

IF EXISTS and IF NOT EXISTS are supported for ignoring the non-existence
of the old value or the existance of the new one, respectively.
---
 doc/src/sgml/ref/alter_type.sgml   |  24 +++-
 src/backend/catalog/pg_enum.c  | 117 +
 src/backend/commands/typecmds.c|  51 +---
 src/backend/parser/gram.y  |  18 ++
 src/include/catalog/pg_enum.h  |   3 +
 src/include/nodes/parsenodes.h |   2 +
 src/test/regress/expected/enum.out |  63 
 src/test/regress/sql/enum.sql  |  30 ++
 8 files changed, 284 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml
index 9789881..f6b4e66 100644
--- a/doc/src/sgml/ref/alter_type.sgml
+++ b/doc/src/sgml/ref/alter_type.sgml
@@ -29,6 +29,7 @@ ALTER TYPE name RENAME ATTRIBUTE name RENAME TO new_name
 ALTER TYPE name SET SCHEMA new_schema
 ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } existing_enum_value ]
+ALTER TYPE name ALTER VALUE [ IF EXISTS ] existing_enum_value TO new_enum_value [ IF NOT EXISTS ]
 
 where action is one of:
 
@@ -124,6 +125,23 @@ ALTER TYPE name ADD VALUE [ IF NOT

 

+ALTER VALUE [ IF EXISTST ] TO [ IF NOT EXISTS ]
+
+ 
+  This form renames a value in an enum type.  The value's place in the
+  enum's ordering is not affected.
+ 
+ 
+  If IF EXISTS or is IF NOT EXISTS
+  is specified, it is not an error if the type doesn't contain the old
+  value or already contains the new value, respectively: a notice is
+  issued but no other action is taken.  Otherwise, an error will occur
+  if the old value is not alreeady present or the new value is.
+ 
+
+   
+
+   
 CASCADE
 
  
@@ -241,7 +259,8 @@ ALTER TYPE name ADD VALUE [ IF NOT
   new_enum_value
   

-The new value to be added to an enum type's list of values.
+The new value to be added to an enum type's list of values,
+or to rename an existing one to.
 Like all enum literals, it needs to be quoted.

   
@@ -252,7 +271,8 @@ ALTER TYPE name ADD VALUE [ IF NOT
   

 The existing enum value that the new value should be added immediately
-before or after in the enum type's sort ordering.
+before or after in the enum type's sort ordering,
+or renamed from.
 Like all enum literals, it needs to be quoted.

   
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index af89daa..2e78294 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -464,6 +464,123 @@ restart:
 
 
 /*
+ * RenameEnumLabel
+ *		Add a new label to the enum set. By default it goes at
+ *		the end, but the user can choose to place it before or
+ *		after any existing set member.
+ */
+void
+RenameEnumLabel(Oid enumTypeOid,
+const char *oldVal,
+const char *newVal,
+bool skipIfNotExists,
+bool skipIfExists)
+{
+	Relation	pg_enum;
+	HeapTuple	old_tup = NULL;
+	HeapTuple	enum_tup;
+	CatCList   *list;
+	int			nelems;
+	int			nbr_index;
+	Form_pg_enum nbr_en;
+	boolfound_new = false;
+
+	/* check length of new label is ok */
+	if (strlen(newVal) > (NAMEDATALEN - 1))
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_NAME),
+ errmsg("invalid enum label \"%s\"", newVal),
+ errdetail("Labels must be %d characters or less.",
+		   NAMEDATALEN - 1)));
+
+	/*
+	 * Acquire a lock on the enum type, which we won't release until commit.
+	 * This ensures that two backends aren't concurrently modifying the same
+	 * enum type.  Without that, we couldn't be sure to get a consistent view
+	 * of the enum members v

Re: [HACKERS] [PATCH] Alter or rename enum value

2016-03-27 Thread Marko Tiikkaja

On 2016-03-27 19:30, Dagfinn Ilmari Mannsåker wrote:

ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:


I was bored and thought "how hard could it be?", and a few hours'
hacking later, I have something that seems to work.  It doesn't do IF
NOT EXISTS yet, and the error messaging could do with some improvement,
and there are no docs.  The patch is attached, as well as at
https://github.com/ilmari/postgres/commit/enum-alter-value


Here's v3 of the patch of the patch, which I consider ready for proper
review.


A couple of trivial comments below.

  +ALTER VALUE [ IF EXISTST ] TO [ IF NOT EXISTS 
]


typo EXISTST

  +  If IF EXISTS or is IF NOT 
EXISTS
  +  is specified, it is not an error if the type doesn't contain 
the old


double is

  +  if the old value is not alreeady present or the new value is.

typo alreeady

  + * RenameEnumLabel
  + *   Add a new label to the enum set. By default it goes at

copypaste-o?

  + if (!stmt->oldVal) {

not project curly brace style


.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Alter or rename enum value

2016-03-27 Thread Dagfinn Ilmari Mannsåker
Marko Tiikkaja  writes:

> On 2016-03-27 19:30, Dagfinn Ilmari Mannsåker wrote:
>> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>>
>>> I was bored and thought "how hard could it be?", and a few hours'
>>> hacking later, I have something that seems to work.  It doesn't do IF
>>> NOT EXISTS yet, and the error messaging could do with some improvement,
>>> and there are no docs.  The patch is attached, as well as at
>>> https://github.com/ilmari/postgres/commit/enum-alter-value
>>
>> Here's v3 of the patch of the patch, which I consider ready for proper
>> review.
>
> A couple of trivial comments below.

Thanks, all fixed locally and will be in the next version of the patch.

-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AssertArg failure in src/backend/executor/functions.c:check_sql_fn_retval()

2016-03-27 Thread Piotr Stefaniak

On 2016-03-27 16:40, Tom Lane wrote:

Hm.  I would argue that it should have rejected CAST(NULL AS ANYARRAY).
That's a pseudotype and so there should never be an actual value of that
type, not even a null value.


I'm a little confused about what you mean here. I thought reject was 
exactly what's happening; normally you'd get "ERROR:  return type 
anyarray is not supported for SQL functions".


If you mean specifically to forbid CAST(NULL AS ANYARRAY) in general 
then I'd like to point out that there are columns of type anyarray, at 
least pg_catalog.pg_statistic.stavalues1 is, so the cast is not the only 
way to trigger this.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Robert Haas
On Sun, Mar 27, 2016 at 8:00 AM, Dilip Kumar  wrote:
> Relation Size
> -
> INSERT : 16000 transaction from 32 Client
>
>  Base  v13 v14_1
> -  -  
> TPS 37  255 112
> Rel Size   17GB 17GB 18GB
>
> COPY: 32000 transaction from 32 client
>  Base  v13v14_1
>---
> TPS 121   823427
> Rel Size   11GB11GB  11GB
>
>
> Script are attached in the mail
> =
> test_size_ins.sh  --> run insert test and calculate relation size.
> test_size_copy   --> run copy test and relation size
> copy_script  -> copy pg_bench script used by test_size_copy.sh
> insert_script --> insert pg_bench script used by test_size_ins.sh
> multi_extend_v14_poc_v1.patch --> modified patch of v14.
>
> I also tried modifying v14 from different different angle.
>
> One is like below-->
> -
> In AddExtraBlock
> {
>I add page to FSM one by one like v13 does.
>then update the full FSM tree up till root
> }

Not following this.  Did you attach this version?

> Results:
> --
> 1. With this performance is little less than v14 but the problem of extra
> relation size is solved.
> 2. With this we can conclude that extra size of relation in v14 is because
> some while extending the pages, its not immediately available and at end
> some of the pages are left unused.

I agree with that conclusion.  I'm not quite sure where that leaves
us, though.  We can go back to v13, but why isn't that producing extra
pages?  It seems like it should: whenever a bulk extend rolls over to
a new FSM page, concurrent backends will search either the old or the
new one but not both.

Maybe we could do this - not sure if it's what you were suggesting above:

1. Add the pages one at a time, and do RecordPageWithFreeSpace after each one.
2. After inserting them all, go back and update the upper levels of
the FSM tree up the root.

Another idea is:

If ConditionalLockRelationForExtension fails to get the lock
immediately, search the last *two* pages of the FSM for a free page.

Just brainstorming here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Automatically add -Wold-style-definition?

2016-03-27 Thread Andres Freund
Hi,

Does somebody see a reason not to automatically detect and use
-Wold-style-definition?  Per e.g. 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9f7c527af308dcdaba2f0ff9d362d672e8886fb1
that'd be useful, and it's an easy to make and automatically detect
mistake; and it doesn't have false positives.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Sync timezone code with upstream release tzcode2016c

2016-03-27 Thread Tom Lane
Well, that was just about as tedious as I feared it might be, but
attached is a patch for $SUBJECT.  We should apply this, and
probably eventually back-patch it, but it'd be wise to let it
age awhile in HEAD first.  Is anyone interested in reviewing it,
or shall I just push it and see what the buildfarm thinks?

A note about comparing this to upstream: I found that the best
way to do that was to run the IANA source files through a sed
filter like this:

sed -r \
-e 's/^([   ]*)\*\*([   ])/\1 *\2/' \
-e 's/^([   ]*)\*\*$/\1 */' \
-e 's|^\*/| */|' \
-e 's/register //g' \
-e 's/int_fast32_t/int32/g' \
-e 's/int_fast64_t/int64/g' \
-e 's/struct tm\b/struct pg_tm/g' \
-e 's/\btime_t\b/pg_time_t/g' \

and then pgindent them.  (If you pgindent without this, it'll make
a hash of their preferred block-comment format with double **'s.
As long as I had to do that, I figured I could make the filter
deal with substituting typedef names and getting rid of their
overenthusiasm for "register".)

regards, tom lane



sync-timezone-code-2016c.patch.gz
Description: sync-timezone-code-2016c.patch.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatically add -Wold-style-definition?

2016-03-27 Thread Tom Lane
Andres Freund  writes:
> Does somebody see a reason not to automatically detect and use
> -Wold-style-definition?

+1 ... unless we have some that are that way intentionally, which
I kinda doubt, but you could soon find out.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatically add -Wold-style-definition?

2016-03-27 Thread Andres Freund
On 2016-03-27 17:16:11 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Does somebody see a reason not to automatically detect and use
> > -Wold-style-definition?
> 
> +1 ... unless we have some that are that way intentionally, which
> I kinda doubt, but you could soon find out.

We don't, I've been running with it enabled for a long while (which is
how I found issue pointed out upthread).

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind just doesn't fsync *anything*?

2016-03-27 Thread Andres Freund
On 2016-03-09 19:43:52 -0800, Andres Freund wrote:
> Hi,
> 
> how come that the only comment in pg_rewind about fsyncing is '
> void
> close_target_file(void)
> {
> ...
>   /* fsync? */
> }
> 
> Isn't that a bit, uh, minimal for a utility that's likely to be used in
> failover scenarios?
> 
> I think we might actually be "saved" due to
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ce439f33
> because pg_rewind appears to leave the cluster in
> 
> ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
> updateControlFile(&ControlFile_new);
> 
> a state that StartupXLOG will treat as needing recovery:
> 
> if (ControlFile->state != DB_SHUTDOWNED &&
> ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> SyncDataDirectory();
> 
> but that code went in after pg_rewind, so this certainly can't be an
> intentional save.
> 
> I also don't think it's ok that you need to start the cluster to make it
> safe against a crash?
> 
> I guess the easiest fix would be to shell out to initdb -s?

I've pushed a modified version of the fix that Michael posted in
http://archives.postgresql.org/message-id/CAB7nPqRmM%2BCX6bVxw0Y7mMVGMFj1S8kwhevt8TaP83yeFRfbXA%40mail.gmail.com

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c

2016-03-27 Thread Tom Lane
Piotr Stefaniak  writes:
> I'm not saying this is necessarily a bug since the whole function deals 
> with floats, but perhaps it's interesting to note that ndistinct can be 
> 0 in src/backend/utils/adt/selfuncs.c:estimate_hash_bucketsize:

I think it's basically cosmetic unless you've got a machine that traps
zero divide, but still that's good to fix.  Thanks for the report!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Two division by 0 errors in optimizer/plan/planner.c and optimizer/path/costsize.c

2016-03-27 Thread Tom Lane
Piotr Stefaniak  writes:
> On the exact same note, something like this (again reduced from what 
> sqlsmith produced):
> leads to vardata.rel->tuples being zero here:
>   if (vardata.rel)
>   ndistinct *= vardata.rel->rows / vardata.rel->tuples;

Ugh.  That's a bit worse because it'll be 0/0, ie you get a NaN.
Thanks for the report.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-27 Thread Andres Freund
Hi,

On 2016-03-18 15:08:32 +0900, Michael Paquier wrote:
> +/*
> + * Sync data directory to ensure that what has been generated up to now is
> + * persistent in case of a crash, and this is done once globally for
> + * performance reasons as sync requests on individual files would be
> + * a negative impact on the running time of pg_rewind. This is invoked at
> + * the end of processing once everything has been processed and written.
> + */
> +static void
> +syncTargetDirectory(const char *argv0)
> +{
> + int ret;
> + charexec_path[MAXPGPATH];
> + charcmd[MAXPGPATH];
> +
> + if (dry_run)
> + return;

I think it makes more sense to return after detecting the binary, so
you'd find out about problems around not finding initdb during the dry
run, not later.


> + /* Grab and invoke initdb to perform the sync */
> + if ((ret = find_other_exec(argv0, "initdb",
> +"initdb (PostgreSQL) 
> " PG_VERSION "\n",
> +exec_path)) < 0)
> + {
> + charfull_path[MAXPGPATH];
> +
> + if (find_my_exec(argv0, full_path) < 0)
> + strlcpy(full_path, progname, sizeof(full_path));
> +
> + if (ret == -1)
> + pg_fatal("The program \"initdb\" is needed by %s but 
> was \n"
> +  "not found in the same directory as 
> \"%s\".\n"
> +  "Check your installation.\n", 
> progname, full_path);
> + else
> + pg_fatal("The program \"postgres\" was found by \"%s\" 
> but was \n"
> +  "not the same version as %s.\n"
> +  "Check your installation.\n", 
> progname, full_path);

Wrong binary name.


> + }
> +
> + /* now run initdb */
> + if (debug)
> + snprintf(cmd, MAXPGPATH, "\"%s\" -D \"%s\" -S",
> +  exec_path, datadir_target);
> + else
> + snprintf(cmd, MAXPGPATH, "\"%s\" -D \"%s\" -S > \"%s\"",
> +  exec_path, datadir_target, DEVNULL);
> +
> + if (system(cmd) != 0)
> + pg_fatal("sync of target directory with initdb -S failed\n");
> + pg_log(PG_PROGRESS, "sync of target directory with initdb -S done\n");
> +}

Don't see need for emitting "done", for now at least.


>  /*
> + * Wrapper of rename() similar to the backend version with the same function
> + * name aimed at making the renaming durable on disk. Note that this version
> + * does not fsync the old file before the rename as all the code paths 
> leading
> + * to this function are already doing this operation. The new file is also
> + * normally not present on disk before the renaming so there is no need to
> + * bother about it.

I don't think it's a good idea to skip fsyncing the old file based on
that; it's way too likely that that'll not be done for the next user of
durable_rename.


> + */
> +static int
> +durable_rename(const char *oldfile, const char *newfile)
> +{
> + int fd;
> + charparentpath[MAXPGPATH];
> +
> + if (rename(oldfile, newfile) != 0)
> + {
> + /* durable_rename produced a log entry */

Uh?


> + fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"),
> + progname, current_walfile_name, 
> strerror(errno));

current_walfile_name doesn't look right, that's a largely independent
global variable.


> + if (fsync(fd) != 0)
> + {
> + close(fd);
> + fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
> + progname, newfile, strerror(errno));
> + return -1;

Close should be after the strerror() call (yes, that mistake already
existed once in receivelog.c).


> + fd = open(parentpath, O_RDONLY | PG_BINARY, 0);
> +
> + /*
> +  * Some OSs don't allow us to open directories at all (Windows returns
> +  * EACCES), just ignore the error in that case.  If desired also 
> silently
> +  * ignoring errors about unreadable files. Log others.
> +  */

Comment is not applicable as a whole.


> + if (fd < 0 && (errno == EISDIR || errno == EACCES))
> + return 0;
> + else if (fd < 0)
> + {
> + fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
> + progname, parentpath, strerror(errno));
> + return -1;
> + }
> +
> + if (fsync(fd) != 0)
> + {
> + close(fd);
> + fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
> + progname, parentpath, strerror(errno));
> + return -1;

close() needs to be moved again.


It'd be easier to apply these if the rate of trivial issues were lower
:(.


Attached is an heavil

[HACKERS] backup tools ought to ensure created backups are durable

2016-03-27 Thread Andres Freund
Hi,

As pointed out in
http://www.postgresql.org/message-id/20160327232509.v5wgac5vskuse...@awork2.anarazel.de
our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
make any efforts to ensure their output is durable.

I think for backup tools of possibly critical data, that's pretty much
unaceptable.

There's cases where we can't ensure durability (i.e. pg_dump | gzip >
file), but it's out of our hands in that case.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind just doesn't fsync *anything*?

2016-03-27 Thread Michael Paquier
On Mon, Mar 28, 2016 at 6:52 AM, Andres Freund  wrote:
> I've pushed a modified version of the fix that Michael posted in
> http://archives.postgresql.org/message-id/CAB7nPqRmM%2BCX6bVxw0Y7mMVGMFj1S8kwhevt8TaP83yeFRfbXA%40mail.gmail.com

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] backup tools ought to ensure created backups are durable

2016-03-27 Thread Michael Paquier
On Mon, Mar 28, 2016 at 8:30 AM, Andres Freund  wrote:
> As pointed out in
> http://www.postgresql.org/message-id/20160327232509.v5wgac5vskuse...@awork2.anarazel.de
> our backup tools (i.e. pg_basebackup, pg_dump[all]), currently don't
> make any efforts to ensure their output is durable.
>
> I think for backup tools of possibly critical data, that's pretty much
> unaceptable.

Definitely agreed, once a backup/dump has been taken and those
utilities exit, we had better ensure that they are durably on disk.
For pg_basebackup and pg_dump, as everything except pg_dump/plain
require a target directory for the location of the output result, we
really can make things better.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Dilip Kumar
On Mon, Mar 28, 2016 at 1:55 AM, Robert Haas  wrote:

> > One is like below-->
> > -
> > In AddExtraBlock
> > {
> >I add page to FSM one by one like v13 does.
> >then update the full FSM tree up till root
> > }
>
> Not following this.  Did you attach this version?
>

No I did not attached this.. During rough experiment, tried this, did not
produced any patch, I will send this.


> > Results:
> > --
> > 1. With this performance is little less than v14 but the problem of extra
> > relation size is solved.
> > 2. With this we can conclude that extra size of relation in v14 is
> because
> > some while extending the pages, its not immediately available and at end
> > some of the pages are left unused.
>
> I agree with that conclusion.  I'm not quite sure where that leaves
> us, though.  We can go back to v13, but why isn't that producing extra
> pages?  It seems like it should: whenever a bulk extend rolls over to
> a new FSM page, concurrent backends will search either the old or the
> new one but not both.
>
> Maybe we could do this - not sure if it's what you were suggesting above:
>
> 1. Add the pages one at a time, and do RecordPageWithFreeSpace after each
> one.
> 2. After inserting them all, go back and update the upper levels of
> the FSM tree up the root.
>

Yes same, I wanted to explained the same above.


> Another idea is:
>
> If ConditionalLockRelationForExtension fails to get the lock
> immediately, search the last *two* pages of the FSM for a free page.
>
> Just brainstorming here.


I think this is better option, Since we will search last two pages of FSM
tree, then no need to update the upper level of the FSM tree. Right ?

I will test and post the result with this option.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-27 Thread Michael Paquier
On Fri, Mar 25, 2016 at 9:48 PM, Andrew Dunstan  wrote:
> OK, sounds good.

Just a side-note. Andres has pushed the fix for the GinIs* macros as
af4472bc, making patch 0003 from the last series useless now.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes for next week's releases

2016-03-27 Thread Peter Geoghegan
On Sat, Mar 26, 2016 at 4:34 PM, Tom Lane  wrote:
> Probably the most discussion-worthy item is whether we can say
> anything more about the strxfrm mess.  Should we make a wiki
> page about that and have the release note item link to it?

I think that there is an argument against doing so, which is that
right now, all we have to offer on that are weasel words. However, I'm
still in favor of a Wiki page, because I would not be at all surprised
if our understanding of this problem evolved, and we were able to
offer better answers in several weeks. Realistically, it will probably
take at least that long before affected users even start to think
about this.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes for next week's releases

2016-03-27 Thread David G. Johnston
On Sun, Mar 27, 2016 at 8:43 PM, Peter Geoghegan  wrote:

> On Sat, Mar 26, 2016 at 4:34 PM, Tom Lane  wrote:
> > Probably the most discussion-worthy item is whether we can say
> > anything more about the strxfrm mess.  Should we make a wiki
> > page about that and have the release note item link to it?
>
> I think that there is an argument against doing so, which is that
> right now, all we have to offer on that are weasel words. However, I'm
> still in favor of a Wiki page, because I would not be at all surprised
> if our understanding of this problem evolved, and we were able to
> offer better answers in several weeks. Realistically, it will probably
> take at least that long before affected users even start to think
> about this.
>

​One question to debate is whether placing a list of "known" (collated from
the program runs lots of people performed) would do more harm than good.
Personally I'd rather see a list of known failures and evaluate my
situation objectively (i.e., large index but no reported problem on my
combination of locale and platform).  I understand that a lack of evidence
is not proof that I am unaffected at this stage in the game.  Having
something I can execute on my server to try and verify behavior -
irrespective of the correctness of the indexes themselves - would be
welcomed.

David J.


[HACKERS] Nested funtion

2016-03-27 Thread Sridhar N Bamandlapally
Hi

Is there any way to create nested function?

oracle to postgres migration required super function variable reference
into nested function without nested function parameter

Oracle sample:
---
create or replace function f1(n number) return number
is
vs number:=1;
function nf1(m number) return number is
begin
return vs + m + n;
end;
begin
return nf1(2);
end;
/

run:

SQL> select f1(9) from dual;

 F1(9)
--
12



Thanks
Sridhar BN


Re: [HACKERS] Nested funtion

2016-03-27 Thread Pavel Stehule
Hi

2016-03-28 6:14 GMT+02:00 Sridhar N Bamandlapally :

> Hi
>
> Is there any way to create nested function?
>

Some languages supports this feature, like plv8, but plpgsql doesn't
support it,

You have to use two function and some implementation of session variables.

Regards

Pavel

>
> oracle to postgres migration required super function variable reference
> into nested function without nested function parameter
>
> Oracle sample:
> ---
> create or replace function f1(n number) return number
> is
> vs number:=1;
> function nf1(m number) return number is
> begin
> return vs + m + n;
> end;
> begin
> return nf1(2);
> end;
> /
>
> run:
> 
> SQL> select f1(9) from dual;
>
>  F1(9)
> --
> 12
>
>
>
> Thanks
> Sridhar BN
>


Re: [HACKERS] Nested funtion

2016-03-27 Thread David G. Johnston
On Sun, Mar 27, 2016 at 9:14 PM, Sridhar N Bamandlapally <
sridhar@gmail.com> wrote:

> Hi
>
> Is there any way to create nested function?
>
> oracle to postgres migration required super function variable reference
> into nested function without nested function parameter
>
> Oracle sample:
> ---
> create or replace function f1(n number) return number
> is
> vs number:=1;
> function nf1(m number) return number is
> begin
> return vs + m + n;
> end;
> begin
> return nf1(2);
> end;
> /
>
> run:
> 
> SQL> select f1(9) from dual;
>
>  F1(9)
> --
> 12
>
>
PostgreSQL's ​pl/pgsql langauge doesn't grok closures.  You are welcome to
write "f1" in a language that does.  Perl and Python are built-in and many
others are available.  I assume at least one of them can do this.

​David J.
​
​


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-27 Thread Michael Paquier
On Mon, Mar 28, 2016 at 8:25 AM, Andres Freund  wrote:
> On 2016-03-18 15:08:32 +0900, Michael Paquier wrote:
>> + fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"),
>> + progname, current_walfile_name, 
>> strerror(errno));
>
> current_walfile_name doesn't look right, that's a largely independent
> global variable.

Stupid mistake.

>> + if (fsync(fd) != 0)
>> + {
>> + close(fd);
>> + fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
>> + progname, newfile, strerror(errno));
>> + return -1;
>
> Close should be after the strerror() call (yes, that mistake already
> existed once in receivelog.c).

Right.

> It'd be easier to apply these if the rate of trivial issues were lower
> :(.

Sorry about that. I'll be more careful.

> Attached is an heavily revised version of the patch. Besides the above,
> I've:
> * copied fsync_fname_ext from initdb, I think it's easier to understand
>   code this way, and it'll make unifying the code easier

OK.

> * added fsyncing of the directory in mark_file_as_archived()
> * The WAL files also need to be fsynced when created in open_walfile():
>   - otherwise the directory entry itself isn't safely persistent, as we
> don't fsync the directory in the stream->synchronous fsync() cases.
>   - we refuse to resume in open_walfile(), if a file isn't 16MB when
> restarting. Without an fsync that's actually not unlikely after a
> crash. Even with an fsync that's not guaranteed not to happen, but
> the chance of it is much lower.
> I'm too tired to push this at the eleventh hour. Besides a heavily
> revised patch, backpatching will likely include a number of conflicts.
> If somebody in the US has the energy to take care of this...

Close enough to the US. Attached are backpatchable versions based on
the corrected version you sent. 9.3 and 9.4 share the same patch, more
work has been necessary for 9.2 but that's not huge either.

> So we're going to have another round of fsync stuff in the next set of
> releases anyway...

Yes, seeing how 9.5.2 is close by, I think that it would be wiser to
push this stuff after the upcoming minor release.
-- 
Michael


pg_receivexlog-sync-94-93.patch
Description: invalid/octet-stream


pg_receivexlog-sync-92.patch
Description: invalid/octet-stream


pg_receivexlog-sync-95.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Amit Kapila
On Mon, Mar 28, 2016 at 1:55 AM, Robert Haas  wrote:
>
> On Sun, Mar 27, 2016 at 8:00 AM, Dilip Kumar 
wrote:
>
> > Results:
> > --
> > 1. With this performance is little less than v14 but the problem of
extra
> > relation size is solved.
> > 2. With this we can conclude that extra size of relation in v14 is
because
> > some while extending the pages, its not immediately available and at end
> > some of the pages are left unused.
>
> I agree with that conclusion.  I'm not quite sure where that leaves
> us, though.  We can go back to v13, but why isn't that producing extra
> pages?  It seems like it should: whenever a bulk extend rolls over to
> a new FSM page, concurrent backends will search either the old or the
> new one but not both.
>

I have not debugged the flow, but by looking at v13 code, it looks like it
will search both old and new.   In
function 
GetPageWithFreeSpaceExtended()->fsm_search_from_addr()->fsm_search_avail(),
the basic idea of search is: Start the search from the target slot.  At
every step, move one
node to the right, then climb up to the parent.  Stop when we reach a node
with enough free space (as we must, since the root has enough space).
So shouldn't it be able to find the new FSM page where the bulk extend
rolls over?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2016-03-27 Thread Dilip Kumar
On Mon, Mar 28, 2016 at 11:00 AM, Amit Kapila 
wrote:

> I have not debugged the flow, but by looking at v13 code, it looks like it
> will search both old and new.   In
> function 
> GetPageWithFreeSpaceExtended()->fsm_search_from_addr()->fsm_search_avail(),
> the basic idea of search is: Start the search from the target slot.  At
> every step, move one
> node to the right, then climb up to the parent.  Stop when we reach a node
> with enough free space (as we must, since the root has enough space).
> So shouldn't it be able to find the new FSM page where the bulk extend
> rolls over?
>

This is actually multi level tree, So each FSM page contain one slot tree.

So fsm_search_avail() is searching only the slot tree, inside one FSM page.
But we want to go to next FSM page.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Dilip Kumar
On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund  wrote:

>
> What's sizeof(BufferDesc) after applying these patches? It should better
> be <= 64...
>

It is 72.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com