Re: [HACKERS] Relation extension scalability

2015-03-29 Thread Amit Kapila
On Mon, Mar 30, 2015 at 12:26 AM, Andres Freund and...@2ndquadrant.com
wrote:

 Hello,

 Currently bigger shared_buffers settings don't combine well with
 relations being extended frequently. Especially if many/most pages have
 a high usagecount and/or are dirty and the system is IO constrained.

 As a quick recap, relation extension basically works like:
 1) We lock the relation for extension
 2) ReadBuffer*(P_NEW) is being called, to extend the relation
 3) smgrnblocks() is used to find the new target block
 4) We search for a victim buffer (via BufferAlloc()) to put the new
block into
 5) If dirty the victim buffer is cleaned
 6) The relation is extended using smgrextend()
 7) The page is initialized


 The problems come from 4) and 5) potentially each taking a fair
 while. If the working set mostly fits into shared_buffers 4) can
 requiring iterating over all shared buffers several times to find a
 victim buffer. If the IO subsystem is buys and/or we've hit the kernel's
 dirty limits 5) can take a couple seconds.


In the past, I have observed in one of the Write-oriented tests that
backend's have to flush the pages by themselves many a times, so
in above situation that can lead to more severe bottleneck.

 I've prototyped solving this for heap relations moving the smgrnblocks()
 + smgrextend() calls to RelationGetBufferForTuple(). With some care
 (including a retry loop) it's possible to only do those two under the
 extension lock. That indeed fixes problems in some of my tests.


So do this means that the problem is because of contention on extension
lock?

 I'm not sure whether the above is the best solution however.

Another thing to note here is that during extension we are extending
just one block, won't it make sense to increment it by some bigger
number (we can even take input from user for the same where user
can specify how much to autoextend a relation when the relation doesn't
have any empty space).  During mdextend(), we might increase just one
block, however we can register the request for background process to
increase the size similar to what is done for fsync.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Relation extension scalability

2015-03-29 Thread Andres Freund
On 2015-03-29 20:02:06 -0400, Robert Haas wrote:
 On Sun, Mar 29, 2015 at 2:56 PM, Andres Freund and...@2ndquadrant.com
  As a quick recap, relation extension basically works like:
  1) We lock the relation for extension
  2) ReadBuffer*(P_NEW) is being called, to extend the relation
  3) smgrnblocks() is used to find the new target block
  4) We search for a victim buffer (via BufferAlloc()) to put the new
 block into
  5) If dirty the victim buffer is cleaned
  6) The relation is extended using smgrextend()
  7) The page is initialized
 
  The problems come from 4) and 5) potentially each taking a fair
  while. If the working set mostly fits into shared_buffers 4) can
  requiring iterating over all shared buffers several times to find a
  victim buffer. If the IO subsystem is buys and/or we've hit the kernel's
  dirty limits 5) can take a couple seconds.
 
 Interesting.  I had always assumed the bottleneck was waiting for the
 filesystem to extend the relation.

That might be the case sometimes, but it's not what I've actually
observed so far. I think most modern filesystems doing preallocation
resolved this to some degree.

  Secondly I think we could maybe remove the requirement of needing an
  extension lock alltogether. It's primarily required because we're
  worried that somebody else can come along, read the page, and initialize
  it before us. ISTM that could be resolved by *not* writing any data via
  smgrextend()/mdextend(). If we instead only do the write once we've read
  in  locked the page exclusively there's no need for the extension
  lock. We probably still should write out the new page to the OS
  immediately once we've initialized it; to avoid creating sparse files.
 
  The other reason we need the extension lock is that code like
  lazy_scan_heap() and btvacuumscan() that tries to avoid initializing
  pages that are about to be initilized by the extending backend. I think
  we should just remove that code and deal with the problem by retrying in
  the extending backend; that's why I think moving extension to a
  different file might be helpful.
 
 I thought the primary reason we did this is because we wanted to
 write-and-fsync the block so that, if we're out of disk space, any
 attendant failure will happen before we put data into the block.

Well, we only write and register a fsync. Afaics we don't actually
perform the fsync it at that point. I don't think having to do the
fsync() necessarily precludes removing the extension lock.

 Once we've initialized the block, a subsequent failure to write or
 fsync it will be hard to recover from;

At the very least the buffer shouldn't become dirty before we
successfully wrote once, right. It seems quite doable to achieve that
without the lock though. We'll have to do the write without going
through the buffer manager, but that seems doable.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rounding to even for numeric data type

2015-03-29 Thread Michael Paquier
On Mon, Mar 30, 2015 at 4:51 AM, James Cloos cl...@jhcloos.com wrote:
 MP == Michael Paquier michael.paqu...@gmail.com writes:

 MP So, attached is a patch that does 1) and 2) to make clear to the
 MP user how numeric and double precision behave regarding rounding.
 MP I am adding it to CF 2015-06 to keep track of it...

 Given that the examples show -2.5 rounds to -3, the IEEE term is
 roundTiesToAway, and the typical conversational english is round ties
 away from zero.

Ah, thanks for the correct wording. Fixed in the attached.

 RoundUp means mean towards +Infinity.

 754 specifies that for decimal, either roundTiesToEven or roundTiesToAway
 are acceptable defaults, and which of the two applies is language dependent.
 Does ANSI SQL say anything about how numeric should round?

 In general, for decimals (or anything other than binary), there are
 twelve possible roundings:

  ToEven ToOdd AwayFromZero ToZero Up Down
  TiesToEven TiesToOdd TiesAwayFromZero TiesToZero TiesUp TiesDown

 (Up is the same as ceil(3), Down as floor(3).)

Well, I am not sure about that... But reading this thread changing the
default rounding sounds unwelcome. So it may be better to just put in
words the rounding method used now in the docs, with perhaps a mention
that this is not completely in-line with the SQL spec if that's not
the case.
-- 
Michael
From ae28d91519854e6d47d2c864fa26b65c70bb0526 Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Sun, 29 Mar 2015 19:46:50 +0900
Subject: [PATCH] Precise rounding behavior of numeric and double precision in
 docs

Regression tests improving the coverage in this area are added as well.
---
 doc/src/sgml/datatype.sgml| 19 +++
 src/test/regress/expected/int2.out| 20 
 src/test/regress/expected/int4.out| 20 
 src/test/regress/expected/int8.out| 20 
 src/test/regress/expected/numeric.out | 24 
 src/test/regress/sql/int2.sql | 10 ++
 src/test/regress/sql/int4.sql | 10 ++
 src/test/regress/sql/int8.sql | 10 ++
 src/test/regress/sql/numeric.sql  | 10 ++
 9 files changed, 143 insertions(+)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index da1f25f..eb131c3 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -612,6 +612,25 @@ NUMERIC
  equivalent.  Both types are part of the acronymSQL/acronym
  standard.
 /para
+
+para
+ With using the functionround/ function, the typenumeric/type
+ type rounds ties away from zero, and the typedouble precision/type
+ type rounds ties away to even.
+
+programlisting
+SELECT round(1.5::numeric), round(2.5::numeric);
+ round | round
+---+---
+ 2 | 3
+(1 row)
+SELECT round(1.5::double precision), round(2.5::double precision);
+ round | round
+---+---
+ 2 | 2
+(1 row)
+/programlisting
+/para
/sect2
 
 
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 311fe73..3ea4ed9 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -286,3 +286,23 @@ FROM (VALUES (-2.5::float8),
   2.5 |  2
 (7 rows)
 
+-- check rounding when casting from numeric
+SELECT x, x::int2 AS int2_value
+FROM (VALUES (-2.5::numeric),
+ (-1.5::numeric),
+ (-0.5::numeric),
+ (0.0::numeric),
+ (0.5::numeric),
+ (1.5::numeric),
+ (2.5::numeric)) t(x);
+  x   | int2_value 
+--+
+ -2.5 | -3
+ -1.5 | -2
+ -0.5 | -1
+  0.0 |  0
+  0.5 |  1
+  1.5 |  2
+  2.5 |  3
+(7 rows)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 83fe022..372fd4d 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -383,3 +383,23 @@ FROM (VALUES (-2.5::float8),
   2.5 |  2
 (7 rows)
 
+-- check rounding when casting from numeric
+SELECT x, x::int4 AS int4_value
+FROM (VALUES (-2.5::numeric),
+ (-1.5::numeric),
+ (-0.5::numeric),
+ (0.0::numeric),
+ (0.5::numeric),
+ (1.5::numeric),
+ (2.5::numeric)) t(x);
+  x   | int4_value 
+--+
+ -2.5 | -3
+ -1.5 | -2
+ -0.5 | -1
+  0.0 |  0
+  0.5 |  1
+  1.5 |  2
+  2.5 |  3
+(7 rows)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index da8be51..ed0bd34 100644
--- a/src/test/regress/expected/int8.out
+++ b/src/test/regress/expected/int8.out
@@ -866,3 +866,23 @@ FROM (VALUES (-2.5::float8),
   2.5 |  2
 (7 rows)
 
+-- check rounding when casting from numeric
+SELECT x, x::int8 AS int8_value
+FROM (VALUES (-2.5::numeric),
+ (-1.5::numeric),
+ 

Re: [HACKERS] Relation extension scalability

2015-03-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I thought the primary reason we did this is because we wanted to
 write-and-fsync the block so that, if we're out of disk space, any
 attendant failure will happen before we put data into the block.  Once
 we've initialized the block, a subsequent failure to write or fsync it
 will be hard to recover from; basically, we won't be able to
 checkpoint any more.  If we discover the problem while the block is
 still all-zeroes, the transaction that uncovers the problem errors
 out, but the system as a whole is still OK.

Yeah.  As Andres says, the fsync is not an important part of that,
but we do expect that ENOSPC will happen during the initial write()
if it's going to happen.

To some extent that's an obsolete assumption, I'm afraid --- I believe
that some modern filesystems don't necessarily overwrite the previous
version of a block, which would mean that they are capable of failing
with ENOSPC even during a re-write of a previously-written block.
However, the possibility of filesystem misfeasance of that sort doesn't
excuse us from having a clear recovery strategy for failures during
relation extension.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-29 Thread Fabien COELHO


Hello Tomas,


The results are as follow:

 * 1 thread 33 runs median tps (average is consistent):
 - no logging:22062
 - separate logging:  19360  (-12.2%)
 - merged logging:19326  (-12.4%, not significant from previous)


Interesting. What hardware is this?


Dell PowerEdge R720 with 2 Intel Xeon CPU E5-2660 2.20GHz (8 cores and 16 
threads per processor, so 32 threads in total), running with Linux 3.13 
(Ubuntu trusty).



I wouldn't be surprised by this behavior on a multi-socket system, [...]


There are 2 sockets.


So my overall conclusion is:

(1) The simple thread-shared file approach would save pgbench from
post-processing merge-sort heavy code, for a reasonable cost.


No it wouldn't - you're missing the fact that the proposed approach
(shared file + fprintf) only works with raw transaction log.

It does not work with aggregated log - the threads would have to somehow
track the progress of the other threads somehow, in a very non-trivial
way (e.g. what if one of the threads executes a long query, and thus
does not send the results for a long time?).


The counters are updated when the transaction is finished anyway?

Another option would be to update shared aggregated results, but that 
requires locking.


That is what I had in mind. ISTM that the locking impact would be much 
lower than for logging, the data is just locked for a counter update, and 
if counters are per-thread, a conflict may only occur when the data are 
gathered for actual logging, which would be rather infrequent. Even if the 
counters are shared, the locking would be for small time that would imply 
a low conflict probability. So I do not see this one as a significant 
performance issue.



(2) The feature would not be available for the thread-emulation with
this approach, but I do not see this as a particular issue as I
think that it is pretty much only dead code and a maintenance burden.


I'm not willing to investigate that, nor am I willing to implement
another feature that works only sometimes (I've done that in the past,
and I find it a bad practice).


Hmmm. Keeping an obsolete feature with significant impact on how other 
features can be implemented, so basically a maintenance burden, does not 
look like best practice *either*.



If someone else is willing to try to eliminate the thread emulation, I
won't object to that.


Hmmm. I'll try to trigger a discussion in another thread to test the idea.

But as I pointed out above, simple fprintf is not going to work for the 
aggregated log - solving that will need more code (e.g. maintaining 
aggregated results for all threads, requiring additional locking etc).


The code for that is probably simple and short, and my wish is to try to 
avoid an external merge sort post processing, if possible, which is not 
especially cheap anyway, neither in code nor in time.



(3) Optimizing doLog from its current fprintf-based implementation
may be a good thing.


That's probably true. The simplest thing we can do right now is
buffering the data into larger chunks and writing those chunks.
That amortizes the costs of locking.


If it is buffered in the process, that would mean more locking. If it is 
buffered per threads that would result in out of order logs. Would that be 
an issue? It would be fine with me.



Using O_APPEND, as suggested by Andres, seems like a promising idea.


I tried that with a shared file handle, but the impact seemed negligeable. 
The figures I reported used it, btw.


I also tried to open the same file in append mode from all threads, with 
positive performance effects, but then the flush did not occur at line 
boundaries to there was some mangling in the result.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - merging transaction logs

2015-03-29 Thread Fabien COELHO



I also implemented a quick and dirty version for a merge log based on
sharing a file handle (append mode + sprintf + fputs).


I tried the append + per-thread 2KB buffered sprintf + fputs when full,
with the same number of runs. The logs are out of order by chunks, the 
overhead seems higher with 1 thread, but there is no extra overhead

with 12 threads.


The results are as follow:

 * 1 thread 33 runs median tps (average is consistent):
 - no logging:22062
 - separate logging:  19360  (-12.2%)
 - merged logging:19326  (-12.4%, not significant from previous)


 - buf merged logging:   18751 (-15%, seems significant)


The worst overhead I could trigger is with 12 threads:

 * 12 threads 35 runs median tps (average is consistent)
 - no logging:   155917
 - separate logging: 124695 (-20.0%)
 - merged logging:   119349 (-23.5%)


 - buf merged logging: 124914 (-19.9%, not significant from separate)

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRAP: BadArgument - mcxt.c, Line: 813

2015-03-29 Thread Erik Rijkers
On Sun, March 29, 2015 17:40, Erik Rijkers wrote:
 With the latest server

 testdb=# select version();
version
 --
  PostgreSQL 9.5devel_HEAD_20150329_0154_2c33e0fbceb0 on 
 x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.9.2, 64-bit
 (1 row)

 I get a crash while restoring a trgm GIN index.


I forget to say this crash only happens with an assert-enabled server.

./configure --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD \
 --bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/bin \
 --libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/lib \
 --with-pgport=6545 --quiet --enable-depend --enable-cassert \
 --enable-debug --with-extra-version=_HEAD_20150329_1726_2c33e0fbceb0 \
 --with-openssl --with-perl --with-libxml --with-libxslt --with-zlib


thanks,

Erik Rijkers






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of thread fork emulation in pgbench?

2015-03-29 Thread Tom Lane
Fabien COELHO coe...@cri.ensmp.fr writes:
 There is a thread fork emulation hack which allows pgbench to be 
 compiled without threads by emulating them with forks, obviously with 
 limited capabilities. This feature is triggered by configuring postgresql 
 with --disable-thread-safety.

 I'm not aware of platforms which would still benefit from this feature (we 
 are probably talking of a PostgreSQL 9.6 maybe released in 2016), and this 
 thread emulation is a real pain for maintaining and extending it: some 
 features cannot be implemented, or must be implemented twice, or must be 
 implemented in a heavy way because it cannot be assumed to be in a 
 threaded implementation.

 My question is: would someone still object strongly to the removal of this 
 thread fork emulation hack in pgbench?

By my count, the pthread-emulation code amounts to about 175 lines out of
the nearly 4000 in pgbench.c.  That's not an undue burden IMO (it's not,
for comparison's sake, very much more than the amount of WIN32-specific
code in that file).  And what will you do instead?  It would be fine
I think for pgbench to not allow --jobs different from 1 on a threadless
platform, but not for it to fail to work altogether.  It looks to me like
allowing it to compile without that code would take nearly as much
effort/mess as what's there now.

 So the underlying question is, does Tom and Robert's opinion have changed 
 from 2 years ago?

Not mine.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] TRAP: BadArgument - mcxt.c, Line: 813

2015-03-29 Thread Erik Rijkers
With the latest server

testdb=# select version();
   version
--
 PostgreSQL 9.5devel_HEAD_20150329_0154_2c33e0fbceb0 on 
x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.9.2, 64-bit
(1 row)

I get a crash while restoring a trgm GIN index.


The attached bash creates 2 test tables (t5, t6).

A dumpfile, created thus:
  pg_dump -F c -v  -h /tmp -p 6545 -t t6 -f ~/pg_stuff/dumps/testdb_t6.dump 
testdb

cannot be restored in an empty database: it crashed the server (see below).

A restore of the smaller table t5 does not crash the system; the indexing is 
then OK.

Restore:
  pg_restore -v -O -x -d testdb ~/pg_stuff/dumps/testdb_t6.dump

I get this crash from restoring :

2015-03-29 17:32:55.846 CEST 8016 LOG:  database system is ready to accept 
connections
TRAP: BadArgument(!(((CurrentMemoryContext) != ((void *)0)  (const 
Node*)((CurrentMemoryContext)))-type) ==
T_AllocSetContext, File: mcxt.c, Line: 813)
2015-03-29 17:35:02.459 CEST 8016 LOG:  server process (PID 8062) was 
terminated by signal 6: Aborted
2015-03-29 17:35:02.459 CEST 8016 DETAIL:  Failed process was running: CREATE 
INDEX t6_trgm_re_idx ON t6 USING gin (txt
gin_trgm_ops);



2015-03-29 17:35:02.459 CEST 8016 LOG:  terminating any other active server 
processes
2015-03-29 17:35:02.502 CEST 8016 LOG:  all server processes terminated; 
reinitializing
2015-03-29 17:35:02.710 CEST 8114 LOG:  database system was interrupted; last 
known up at 2015-03-29 17:32:55 CEST
2015-03-29 17:35:02.735 CEST 8114 LOG:  database system was not properly shut 
down; automatic recovery in progress
2015-03-29 17:35:02.828 CEST 8114 LOG:  redo starts at 0/270E550
2015-03-29 17:35:05.665 CEST 8114 LOG:  invalid magic number  in log 
segment 00010010, offset 7716864
2015-03-29 17:35:05.665 CEST 8114 LOG:  redo done at 0/1075AD90
2015-03-29 17:35:05.665 CEST 8114 LOG:  last completed transaction was at log 
time 2015-03-29 17:34:11.149606+02
2015-03-29 17:35:10.296 CEST 8016 LOG:  database system is ready to accept 
connections


So it would seem the CREATE INDEX is the actual problem (I haven't tried that 
command separately).


Erik Rijkers


create_data.sh
Description: application/shellscript

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of thread fork emulation in pgbench?

2015-03-29 Thread Robert Haas
On Sun, Mar 29, 2015 at 1:49 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:
 Now if we have threads, it is simpler (if the overhead is reasonable) that
 threads share a file handle and just generate one file, so there is no
 merging.

I really, really wish you'd stop arguing against the patch to allow
merging of pgbench logs in this basis.  There may or may not be other
reasons to reject that patch, but arguing that we can or should reject
it because of the hypothetical possibility that sharing a file handle
will work out just isn't right.  People who work hard to develop a
patch that does something useful don't deserve to have it kicked to
the curb because someone can imagine a development path that would
eventually obsolete it.  That patch deserves to be evaluated on its
own merits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-03-29 Thread Robert Haas
On Sun, Mar 29, 2015 at 2:56 PM, Andres Freund and...@2ndquadrant.com
 As a quick recap, relation extension basically works like:
 1) We lock the relation for extension
 2) ReadBuffer*(P_NEW) is being called, to extend the relation
 3) smgrnblocks() is used to find the new target block
 4) We search for a victim buffer (via BufferAlloc()) to put the new
block into
 5) If dirty the victim buffer is cleaned
 6) The relation is extended using smgrextend()
 7) The page is initialized

 The problems come from 4) and 5) potentially each taking a fair
 while. If the working set mostly fits into shared_buffers 4) can
 requiring iterating over all shared buffers several times to find a
 victim buffer. If the IO subsystem is buys and/or we've hit the kernel's
 dirty limits 5) can take a couple seconds.

Interesting.  I had always assumed the bottleneck was waiting for the
filesystem to extend the relation.

 Secondly I think we could maybe remove the requirement of needing an
 extension lock alltogether. It's primarily required because we're
 worried that somebody else can come along, read the page, and initialize
 it before us. ISTM that could be resolved by *not* writing any data via
 smgrextend()/mdextend(). If we instead only do the write once we've read
 in  locked the page exclusively there's no need for the extension
 lock. We probably still should write out the new page to the OS
 immediately once we've initialized it; to avoid creating sparse files.

 The other reason we need the extension lock is that code like
 lazy_scan_heap() and btvacuumscan() that tries to avoid initializing
 pages that are about to be initilized by the extending backend. I think
 we should just remove that code and deal with the problem by retrying in
 the extending backend; that's why I think moving extension to a
 different file might be helpful.

I thought the primary reason we did this is because we wanted to
write-and-fsync the block so that, if we're out of disk space, any
attendant failure will happen before we put data into the block.  Once
we've initialized the block, a subsequent failure to write or fsync it
will be hard to recover from; basically, we won't be able to
checkpoint any more.  If we discover the problem while the block is
still all-zeroes, the transaction that uncovers the problem errors
out, but the system as a whole is still OK.

Or at least, I think.  Maybe I'm misunderstanding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] cache lookup error for shell type creation with incompatible output function (DDL deparsing bug)

2015-03-29 Thread Michael Paquier
Hi all,

I just bumped into the following problem in HEAD (1c41e2a):
=# create type my_array_float (INPUT = array_in, OUTPUT = array_out,
ELEMENT = float4, INTERNALLENGTH = 32);
ERROR:  XX000: cache lookup failed for type 0
LOCATION:  format_type_internal, format_type.c:135

First note that in ~9.4 the error message is correct:
=# create type my_array_float (INPUT = array_in, OUTPUT = array_out,
ELEMENT = float4, INTERNALLENGTH = 32);
ERROR:  42883: function array_out(my_array_float) does not exist
LOCATION:  findTypeOutputFunction, typecmds.c:1709

Now, the problem is caused by findTypeOutputFunction() in DefineType()
because process passes InvalidOid as type OID when creating a type
shell, root cause being a2e35b5 because of this:
+   address = TypeShellMake(typeName, typeNamespace, GetUserId());

The fix consists in being sure that typoid uses the OID of the type
shell created, aka the OID stored in adress.ObjectID. Attached is a
patch with a regression test checking for shell creation with
incompatible input/output functions (failure caused by output function
here though) able to check this code path.
Regards,
-- 
Michael
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 67e2ae2..37bcd53 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -217,6 +217,7 @@ DefineType(List *names, List *parameters)
 		address = TypeShellMake(typeName, typeNamespace, GetUserId());
 		/* Make new shell type visible for modification below */
 		CommandCounterIncrement();
+		typoid = address.objectId;
 
 		/*
 		 * If the command was a parameterless CREATE TYPE, we're done ---
@@ -1720,6 +1721,8 @@ findTypeOutputFunction(List *procname, Oid typeOid)
 	if (OidIsValid(procOid))
 		return procOid;
 
+	Assert(OidIsValid(typeOid));
+
 	/* No luck, try it with OPAQUE */
 	argList[0] = OPAQUEOID;
 
diff --git a/src/test/regress/expected/create_type.out b/src/test/regress/expected/create_type.out
index 35e8f5d..3646c2b 100644
--- a/src/test/regress/expected/create_type.out
+++ b/src/test/regress/expected/create_type.out
@@ -107,6 +107,12 @@ ERROR:  type text_w_default already exists
 DROP TYPE default_test_row CASCADE;
 NOTICE:  drop cascades to function get_default_test()
 DROP TABLE default_test;
+-- Check shell type create with input/output incompatibility
+CREATE TYPE not_existing_type (INPUT = array_in,
+OUTPUT = array_out,
+ELEMENT = int,
+INTERNALLENGTH = 32);
+ERROR:  function array_out(not_existing_type) does not exist
 -- Check usage of typmod with a user-defined type
 -- (we have borrowed numeric's typmod functions)
 CREATE TEMP TABLE mytab (foo widget(42,13,7)); -- should fail
diff --git a/src/test/regress/sql/create_type.sql b/src/test/regress/sql/create_type.sql
index 96a075b..0c79050 100644
--- a/src/test/regress/sql/create_type.sql
+++ b/src/test/regress/sql/create_type.sql
@@ -106,6 +106,12 @@ DROP TYPE default_test_row CASCADE;
 
 DROP TABLE default_test;
 
+-- Check shell type create with input/output incompatibility
+CREATE TYPE not_existing_type (INPUT = array_in,
+OUTPUT = array_out,
+ELEMENT = int,
+INTERNALLENGTH = 32);
+
 -- Check usage of typmod with a user-defined type
 -- (we have borrowed numeric's typmod functions)
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2015-03-29 Thread David Rowley
On 18 December 2014 at 02:48, Simon Riggs si...@2ndquadrant.com wrote:

 David, if you can update your patch with some docs to explain the
 behaviour, it looks complete enough to think about committing it in
 early January, to allow other patches that depend upon it to stand a
 chance of getting into 9.5. (It is not yet ready, but I see it could
 be).


I've been thinking of bumping this patch to the June commitfest as the
patch only exists to provide the basic infrastructure for things like
parallel aggregation, aggregate before join, and perhaps auto updating
materialised views.

It seems unlikely that any of those things will happen for 9.5.
Does anybody object to me moving this to June's commitfest?

Regards

David Rowley


Re: [HACKERS] Combining Aggregates

2015-03-29 Thread Michael Paquier
On Mon, Mar 30, 2015 at 2:08 PM, David Rowley dgrowle...@gmail.com wrote:


 On 18 December 2014 at 02:48, Simon Riggs si...@2ndquadrant.com wrote:

 David, if you can update your patch with some docs to explain the
 behaviour, it looks complete enough to think about committing it in
 early January, to allow other patches that depend upon it to stand a
 chance of getting into 9.5. (It is not yet ready, but I see it could
 be).


 I've been thinking of bumping this patch to the June commitfest as the
 patch only exists to provide the basic infrastructure for things like
 parallel aggregation, aggregate before join, and perhaps auto updating
 materialised views.

 It seems unlikely that any of those things will happen for 9.5.


Yeah, I guess so...


 Does anybody object to me moving this to June's commitfest?


Not from my side FWIW. I think it actually makes sense.
-- 
Michael


Re: [HACKERS] TRAP: BadArgument - mcxt.c, Line: 813

2015-03-29 Thread Tom Lane
Erik Rijkers e...@xs4all.nl writes:
 I get a crash while restoring a trgm GIN index.

Hm, interesting: this seems to be the first negative fallout from
commit eaa5808e8ec4e82ce1a87103a6b6f687666e4e4c, which made
MemoryContextReset() delete not reset child contexts.  ginbuild()
is creating a funcCtx as a child of its tmpCtx but then supposing
that it can reset the tmpCtx without damaging the other one.

Easy enough to fix, but it's a bit worrisome that we did not find
this in regression testing.  Apparently the regression tests do
not exercise the early dump code path in ginBuildCallback...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-29 Thread Pavel Stehule
Hi

I checked this patch. I like the functionality and behave.

There is minor issue with outdated regress test

test rules... FAILED

I have no objections.

Regards

Pavel



2015-03-27 9:23 GMT+01:00 Haribabu Kommi kommi.harib...@gmail.com:

 On Fri, Mar 13, 2015 at 1:33 PM, Peter Eisentraut pete...@gmx.net wrote:
  On 3/4/15 1:34 AM, Haribabu Kommi wrote:
  On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi
  kommi.harib...@gmail.com wrote:
  + foreach(line, parsed_hba_lines)
 
  In the above for loop it is better to add check_for_interrupts to
  avoid it looping
  if the parsed_hba_lines are more.
 
  Updated patch is attached with the addition of check_for_interrupts in
  the for loop.
 
  I tried out your latest patch.  I like that it updates even in running
  sessions when the file is reloaded.

 Thanks for the review. Sorry for late reply.

  The permission checking is faulty, because unprivileged users can
  execute pg_hba_settings() directly.

 corrected.

  Check the error messages against the style guide (especially
  capitalization).

 corrected.

  I don't like that there is a hard-coded limit of 16 options 5 pages away
  from where it is actually used.  That could be done better.

 changed to 12 instead of 16.

  I'm not sure about the name pg_hba_settings.  Why not just pg_hba or
  pg_hba_conf if you're going for a representation that is close to the
  file (as opposed to pg_settings, which is really a lot more abstract
  than any particular file).

 changed to pg_hba_conf.

  I would put the line_number column first.

 changed.

  I continue to think that it is a serious mistake to stick special values
  like 'all' and 'replication' into the arrays without additional
  decoration.  That makes this view useless or at least dangerous for
  editing tools or tools that want to reassemble the original file.
  Clearly at least one of those has to be a use case.  Otherwise we can
  just print out the physical lines without interpretation.

 It is possible to provide more than one keyword for databases or users.
 Is it fine to use the text array for keyword databases and keyword users.

  The mask field can go, because address is of type inet, which can
  represent masks itself.  (Or should it be cidr then?  Who knows.)  The
  preferred visual representation of masks in pg_hba.conf has been
  address/mask for a while now, so we should preserve that.
  Additionally, you can then use the existing inet/cidr operations to do
  things like checking whether some IP address is contained in an address
  specification.

 removed.

  I can't tell from the documentation what the compare_method field is
  supposed to do.  I see it on the code, but that is not a natural
  representation of pg_hba.conf.  In fact, this just supports my earlier
  statement.  Why are special values in the address field special, but not
  in the user or database fields?
 
  uaImplicitReject is not a user-facing authentication method, so it
  shouldn't be shown (or showable).

 removed.

  I would have expected options to be split into keys and values.
 
  All that code to reassemble the options from the parsed struct
  representation seems crazy to me.  Surely, we could save the strings as
  we parse them?

 I didn't get this point clearly. Can you explain it a bit more.

  I can't make sense of the log message pg_hba.conf not reloaded,
  pg_hba_settings may show stale information.  If load_hba() failed, the
  information isn't stale, is it?
 
  In any case, printing this to the server log seems kind of pointless,
  because someone using the view is presumably doing so because they can't
  or don't want to read the server log.  The proper place might be a
  warning when the view is actually called.

 Changed accordingly as if the reload fails, further selects on the
 view through a warning
 as view data may not be proper like below.

 There was some failure in reloading pg_hba.conf file. The pg_hba.conf
 settings data may contains stale information

 Here I attached latest patch with the fixes other than keyword columns.
 I will provide the patch with keyword columns and documentation changes
 later.

 Regards,
 Hari Babu
 Fujitsu Australia


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] improve pgbench syntax error messages

2015-03-29 Thread Fabien COELHO



Here is a v5.


Here is v6, just a rebase.

--
Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
index e68631e..68c85c9 100644
--- a/contrib/pgbench/exprparse.y
+++ b/contrib/pgbench/exprparse.y
@@ -26,6 +26,13 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 %expect 0
 %name-prefix=expr_yy
 
+/*
+// improve bison generated syntax error messages
+// *but* not available with PostgreSQL mimimal bison 1.875
+%define parse.lac full
+%define parse.error verbose
+*/
+
 %union
 {
 	int64		ival;
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
index 6e3331d..5331ab7 100644
--- a/contrib/pgbench/exprscan.l
+++ b/contrib/pgbench/exprscan.l
@@ -17,6 +17,13 @@ static int	yyline = 0, yycol = 0;
 static YY_BUFFER_STATE scanbufhandle;
 static char *scanbuf;
 static int	scanbuflen;
+
+/* context information for error reporting */
+static char *expr_source = NULL;
+static int expr_lineno = 0;
+static char *expr_full_line = NULL;
+static char *expr_command = NULL;
+static int expr_col = 0;
 %}
 
 %option 8bit
@@ -56,7 +63,9 @@ space			[ \t\r\f]
 
 .{
 	yycol += yyleng;
-	fprintf(stderr, unexpected character \%s\\n, yytext);
+	syntax_error(expr_source, expr_lineno, expr_full_line, expr_command,
+ unexpected character, yytext, expr_col + yycol);
+	/* dead code, exit is called from syntax_error */
 	return CHAR_ERROR;
 }
 %%
@@ -64,20 +73,27 @@ space			[ \t\r\f]
 void
 yyerror(const char *message)
 {
-	/* yyline is always 1 as pgbench calls the parser for each line...
-	 * so the interesting location information is the column number */
-	fprintf(stderr, %s at column %d\n, message, yycol);
-	/* go on to raise the error from pgbench with more information */
+	syntax_error(expr_source, expr_lineno, expr_full_line, expr_command,
+ message, NULL, expr_col + yycol);
 }
 
 /*
  * Called before any actual parsing is done
  */
 void
-expr_scanner_init(const char *str)
+expr_scanner_init(const char *str, const char *source,
+  const int lineno, const char *line,
+  const char *cmd, const int ecol)
 {
 	Size	slen = strlen(str);
 
+	/* save context informations for error messages */
+	expr_source = (char *) source;
+	expr_lineno = (int) lineno;
+	expr_full_line = (char *) line;
+	expr_command = (char *) cmd;
+	expr_col = (int) ecol;
+
 	/*
 	 * Might be left over after error
 	 */
@@ -105,4 +121,9 @@ expr_scanner_finish(void)
 {
 	yy_delete_buffer(scanbufhandle);
 	pg_free(scanbuf);
+	expr_source = NULL;
+	expr_lineno = 0;
+	expr_full_line = NULL;
+	expr_command = NULL;
+	expr_col = 0;
 }
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 822adfd..6a4805e 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -287,6 +287,7 @@ typedef struct
 	int			type;			/* command type (SQL_COMMAND or META_COMMAND) */
 	int			argc;			/* number of command words */
 	char	   *argv[MAX_ARGS]; /* command word list */
+	int			cols[MAX_ARGS]; /* corresponding column starting from 1 */
 	PgBenchExpr *expr;			/* parsed expression */
 } Command;
 
@@ -2185,6 +2186,29 @@ parseQuery(Command *cmd, const char *raw_sql)
 	return true;
 }
 
+void
+syntax_error(const char *source, const int lineno,
+			 const char *line, const char *command,
+			 const char *msg, const char *more, const int column)
+{
+	fprintf(stderr, %s:%d: %s, source, lineno, msg);
+	if (more != NULL)
+		fprintf(stderr,  (%s), more);
+	if (column != -1)
+		fprintf(stderr,  at column %d, column);
+	fprintf(stderr,  in command \%s\\n, command);
+	if (line != NULL) {
+		fprintf(stderr, %s\n, line);
+		if (column != -1) {
+			int i;
+			for (i = 0; i  column - 1; i++)
+fprintf(stderr,  );
+			fprintf(stderr, ^ error found here\n);
+		}
+	}
+	exit(1);
+}
+
 /* Parse a command; return a Command struct, or NULL if it's a comment */
 static Command *
 process_commands(char *buf, const char *source, const int lineno)
@@ -2229,6 +2253,7 @@ process_commands(char *buf, const char *source, const int lineno)
 
 		while (tok != NULL)
 		{
+			my_commands-cols[j] = tok - buf + 1;
 			my_commands-argv[j++] = pg_strdup(tok);
 			my_commands-argc++;
 			if (max_args = 0  my_commands-argc = max_args)
@@ -2246,9 +2271,10 @@ process_commands(char *buf, const char *source, const int lineno)
 
 			if (my_commands-argc  4)
 			{
-fprintf(stderr, %s: missing argument\n, my_commands-argv[0]);
-exit(1);
+syntax_error(source, lineno, my_commands-line, my_commands-argv[0],
+			 missing arguments, NULL, -1);
 			}
+
 			/* argc = 4 */
 
 			if (my_commands-argc == 4 || /* uniform without/with uniform keyword */
@@ -2263,41 +2289,38 @@ process_commands(char *buf, const char *source, const int lineno)
 			{
 if (my_commands-argc  6)
 {
-	fprintf(stderr, %s(%s): missing threshold argument\n, my_commands-argv[0], my_commands-argv[4]);
-	exit(1);
+	syntax_error(source, lineno, my_commands-line, my_commands-argv[0],
+ missing 

Re: [HACKERS] extend pgbench expressions with functions

2015-03-29 Thread Fabien COELHO



Here is a small v2 update:


v3, just a rebase.

--
Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
index e68631e..049937f 100644
--- a/contrib/pgbench/exprparse.y
+++ b/contrib/pgbench/exprparse.y
@@ -20,6 +20,7 @@ static PgBenchExpr *make_integer_constant(int64 ival);
 static PgBenchExpr *make_variable(char *varname);
 static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 		PgBenchExpr *rexpr);
+static PgBenchExpr *make_func(const char *fname, PgBenchExpr *arg1);
 
 %}
 
@@ -35,9 +36,9 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 
 %type expr expr
 %type ival INTEGER
-%type str VARIABLE
+%type str VARIABLE FUNCTION
 
-%token INTEGER VARIABLE
+%token INTEGER VARIABLE FUNCTION
 %token CHAR_ERROR /* never used, will raise a syntax error */
 
 /* Precedence: lowest to highest */
@@ -59,6 +60,7 @@ expr: '(' expr ')'			{ $$ = $2; }
 	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
 	| INTEGER{ $$ = make_integer_constant($1); }
 	| VARIABLE { $$ = make_variable($1); }
+	| FUNCTION '(' expr ')' { $$ = make_func($1, $3); pg_free($1); }
 	;
 
 %%
@@ -95,4 +97,33 @@ make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
 	return expr;
 }
 
+static struct {
+	char * fname;
+	int nargs;
+	PgBenchFunction tag;
+} PGBENCH_FUNCTIONS[] = {
+	{ abs, 1, PGBENCH_ABS }
+};
+
+static PgBenchExpr *
+make_func(const char * fname, PgBenchExpr *arg1)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+	int nfunctions = sizeof(PGBENCH_FUNCTIONS) / sizeof(PGBENCH_FUNCTIONS[0]);
+	int i;
+
+	expr-etype = ENODE_FUNCTION;
+	expr-u.function.function = PGBENCH_NONE;
+
+	for (i = 0; i  nfunctions; i++)
+		if (pg_strcasecmp(fname, PGBENCH_FUNCTIONS[i].fname) == 0)
+			expr-u.function.function = PGBENCH_FUNCTIONS[i].tag;
+
+	if (expr-u.function.function == PGBENCH_NONE)
+		yyerror(unexpected function name);
+
+	expr-u.function.arg1 = arg1;
+	return expr;
+}
+
 #include exprscan.c
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
index 6e3331d..8fa7499 100644
--- a/contrib/pgbench/exprscan.l
+++ b/contrib/pgbench/exprscan.l
@@ -50,6 +50,11 @@ space			[ \t\r\f]
 	yylval.ival = strtoint64(yytext);
 	return INTEGER;
 }
+[a-zA-Z]+   {
+	yycol += yyleng;
+	yylval.str = pg_strdup(yytext);
+	return FUNCTION;
+}
 
 [\n]			{ yycol = 0; yyline++; }
 {space}+		{ yycol += yyleng; /* ignore */ }
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 822adfd..5a8f376 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -955,6 +955,26 @@ evaluateExpr(CState *st, PgBenchExpr *expr, int64 *retval)
 return false;
 			}
 
+		case ENODE_FUNCTION:
+			{
+switch (expr-u.function.function)
+{
+	case PGBENCH_ABS:
+	{
+		int64 arg1;
+		if (!evaluateExpr(st, expr-u.function.arg1, arg1))
+			return false;
+
+		*retval = arg1  0? arg1: -arg1;
+		return true;
+	}
+	default:
+		fprintf(stderr, bad function (%d)\n,
+expr-u.function.function);
+		return false;
+}
+			}
+
 		default:
 			break;
 	}
diff --git a/contrib/pgbench/pgbench.h b/contrib/pgbench/pgbench.h
index 0396e55..86d1ca1 100644
--- a/contrib/pgbench/pgbench.h
+++ b/contrib/pgbench/pgbench.h
@@ -15,11 +15,18 @@ typedef enum PgBenchExprType
 {
 	ENODE_INTEGER_CONSTANT,
 	ENODE_VARIABLE,
-	ENODE_OPERATOR
+	ENODE_OPERATOR,
+	ENODE_FUNCTION
 } PgBenchExprType;
 
 typedef struct PgBenchExpr PgBenchExpr;
 
+typedef enum PgBenchFunction
+{
+	PGBENCH_NONE,
+	PGBENCH_ABS
+} PgBenchFunction;
+
 struct PgBenchExpr
 {
 	PgBenchExprType	etype;
@@ -39,6 +46,11 @@ struct PgBenchExpr
 			PgBenchExpr	*lexpr;
 			PgBenchExpr *rexpr;
 		} operator;
+		struct
+		{
+			PgBenchFunction function;
+			PgBenchExpr *arg1;
+		} function;
 	} u;
 };
 
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index ed12e27..0c58412 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -762,7 +762,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
   references to variables literal:/replaceablevariablename/,
   and expressions composed of unary (literal-/) or binary operators
   (literal+/, literal-/, literal*/, literal//, literal%/)
-  with their usual associativity, and parentheses.
+  with their usual associativity, function literalabs/ and parentheses.
  /para
 
  para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of thread fork emulation in pgbench?

2015-03-29 Thread Fabien COELHO


Hello Tom,


My question is: would someone still object strongly to the removal of this
thread fork emulation hack in pgbench?


By my count, the pthread-emulation code amounts to about 175 lines out of
the nearly 4000 in pgbench.c.  That's not an undue burden IMO (it's not,
for comparison's sake, very much more than the amount of WIN32-specific
code in that file).


From my point of view, the burden is how things can be implemented if you 

have to assume no threads.

In the discussion at hand which motivates this renewed request, the 
question is whether external merge sorts implying rescanning and 
reprinting files over and over should be implemented in pgbench so as to 
recombine log files.


Now if we have threads, it is simpler (if the overhead is reasonable) that 
threads share a file handle and just generate one file, so there is no 
merging. That is not possible with processes at least for the aggregated 
logs because the thread data must be combined, or would require that 
pgbench use shared memory and so on, but then it is really a lot of bother 
for a limited feature.


The alternative is that the feature would not be available if no thread, 
but then people object to that, on principle.


And what will you do instead? It would be fine I think for pgbench to 
not allow --jobs different from 1 on a threadless platform, but not for 
it to fail to work altogether.


Sure. No thread really means working with only one thread.

It looks to me like allowing it to compile without that code would take 
nearly as much effort/mess as what's there now.


My motivation is to simplify how things are done by simply assuming that
threads are available and can share data, esp for counters.


So the underlying question is, does Tom and Robert's opinion have changed
from 2 years ago?


Not mine.


Ok! I'll ask again in 2017, and probably again in 2019:-)

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: row_to_array function

2015-03-29 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 here is rebased patch.
 It contains both patches - row_to_array function and foreach array support.

While I don't have a problem with hstore_to_array, I don't think that
row_to_array is a very good idea; it's basically encouraging people to
throw away SQL datatypes altogether and imagine that everything is text.
They've already bought into that concept if they are using hstore or
json, so smashing elements of those containers to text is not a problem.
But that doesn't make this version a good thing.

(In any case, those who insist can get there through row_to_json, no?)

Also, could we please *not* mix up these two very independent features?
foreach array as implemented here may or may not be a good thing, but
it should get its own discussion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: row_to_array function

2015-03-29 Thread Pavel Stehule
Hi

here is rebased patch.

It contains both patches - row_to_array function and foreach array support.

This design is in conformity with hstore functions. There can be good
synergy.

Regards

Pavel

2015-03-28 23:53 GMT+01:00 Jeff Janes jeff.ja...@gmail.com:

 On Tue, Jan 27, 2015 at 10:58 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:

 Hi

 2015-01-27 11:41 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:



 2015-01-26 21:44 GMT+01:00 Jim Nasby jim.na...@bluetreble.com:

 On 1/25/15 4:23 AM, Pavel Stehule wrote:


 I tested a concept iteration over array in format [key1, value1, key2,
 value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2],
 ...] too

 It is only a few lines more to current code, and this change doesn't
 break a compatibility.

 Do you think, so this patch is acceptable?

 Ideas, comments?


 Aside from fixing the comments... I think this needs more tests on
 corner cases. For example, what happens when you do

 foreach a, b, c in array(array(1,2),array(3,4)) ?


 it is relative simple behave -- empty values are NULL

 array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively
 ARRAY[1,2,3,4]



 Or the opposite case of

 foreach a,b in array(array(1,2,3))

 Also, what about:

 foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ?



  postgres=# select array(select
 unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[]));
array
 ---
  {1,2,3,4,5,6,7,8}
 (1 row)

 so it generate pairs {1,2}{3,4},{5,6},{7,8}


 I fixed situation when array has not enough elements.



 This no longer applies due to conflicts in src/pl/plpgsql/src/pl_exec.c
 caused by e524cbdc45ec6d677b1dd49

 Also, what is the relationship of this patch to the row_to_array patch?
 Are they independent, or does one depend on the other?  row_to_array by
 itself applies but doesn't compile.

 Cheers,

 Jeff

diff --git a/contrib/hstore/expected/hstore.out b/contrib/hstore/expected/hstore.out
new file mode 100644
index 9749e45..e44532e
*** a/contrib/hstore/expected/hstore.out
--- b/contrib/hstore/expected/hstore.out
*** select %% 'aa=1, cq=l, b=g, fg=NULL'
*** 1148,1153 
--- 1148,1169 
   {b,g,aa,1,cq,l,fg,NULL}
  (1 row)
  
+ -- fast iteration over keys
+ do $$
+ declare
+   key text;
+   value text;
+ begin
+   foreach key, value in array hstore_to_array('aa=1, cq=l, b=g, fg=NULL'::hstore)
+   loop
+ raise notice 'key: %, value: %', key, value;
+   end loop;
+ end;
+ $$;
+ NOTICE:  key: b, value: g
+ NOTICE:  key: aa, value: 1
+ NOTICE:  key: cq, value: l
+ NOTICE:  key: fg, value: NULL
  select hstore_to_matrix('aa=1, cq=l, b=g, fg=NULL'::hstore);
  hstore_to_matrix 
  -
diff --git a/contrib/hstore/sql/hstore.sql b/contrib/hstore/sql/hstore.sql
new file mode 100644
index 5a9e9ee..7b9eb09
*** a/contrib/hstore/sql/hstore.sql
--- b/contrib/hstore/sql/hstore.sql
*** select avals('');
*** 257,262 
--- 257,275 
  select hstore_to_array('aa=1, cq=l, b=g, fg=NULL'::hstore);
  select %% 'aa=1, cq=l, b=g, fg=NULL';
  
+ -- fast iteration over keys
+ do $$
+ declare
+   key text;
+   value text;
+ begin
+   foreach key, value in array hstore_to_array('aa=1, cq=l, b=g, fg=NULL'::hstore)
+   loop
+ raise notice 'key: %, value: %', key, value;
+   end loop;
+ end;
+ $$;
+ 
  select hstore_to_matrix('aa=1, cq=l, b=g, fg=NULL'::hstore);
  select %# 'aa=1, cq=l, b=g, fg=NULL';
  
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
new file mode 100644
index d36acf6..e4abb97
*** a/doc/src/sgml/plpgsql.sgml
--- b/doc/src/sgml/plpgsql.sgml
*** NOTICE:  row = {7,8,9}
*** 2505,2510 
--- 2505,2533 
  NOTICE:  row = {10,11,12}
  /programlisting
  /para
+ 
+ para
+  literalFOREACH/ cycle can be used for iteration over record. You
+  need a xref linkend=hstore extension. For this case a clause
+  literalSLICE/literal should not be used. literalFOREACH/literal
+  statements supports list of target variables. When source array is
+  a array of composites, then composite array element is saved to target
+  variables. When the array is a array of scalar values, then target 
+  variables are filled item by item.
+ programlisting
+ CREATE FUNCTION trig_function() RETURNS TRIGGER AS $$
+ DECLARE
+   key text; value text;
+ BEGIN
+   FOREACH key, value IN ARRAY hstore_to_array(hstore(NEW))
+   LOOP
+ RAISE NOTICE 'key = %, value = %', key, value;
+   END LOOP;
+   RETURN NEW;
+ END;
+ $$ LANGUAGE plpgsql;
+ /programlisting
+ /para
 /sect2
  
 sect2 id=plpgsql-error-trapping
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
new file mode 100644
index a65e18d..1a64d8e
*** a/src/backend/utils/adt/rowtypes.c
--- b/src/backend/utils/adt/rowtypes.c
***
*** 21,26 
--- 21,27 
  #include catalog/pg_type.h
  #include funcapi.h
  #include libpq/pqformat.h
+ #include utils/array.h
  

Re: [HACKERS] getting rid of thread fork emulation in pgbench?

2015-03-29 Thread Tom Lane
Fabien COELHO coe...@cri.ensmp.fr writes:
 And what will you do instead? It would be fine I think for pgbench to 
 not allow --jobs different from 1 on a threadless platform, but not for 
 it to fail to work altogether.

 Sure. No thread really means working with only one thread.

 It looks to me like allowing it to compile without that code would take 
 nearly as much effort/mess as what's there now.

 My motivation is to simplify how things are done by simply assuming that
 threads are available and can share data, esp for counters.

pgbench does not get to assume that threads are available, at least
not as long as the project as a whole supports --disable-thread-safety.

As I said, I wouldn't have a problem with restricting the --jobs setting
on threadless platforms, which seems like it would fix your problem since
you wouldn't need to have more than one process involved.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Relation extension scalability

2015-03-29 Thread Andres Freund
Hello,

Currently bigger shared_buffers settings don't combine well with
relations being extended frequently. Especially if many/most pages have
a high usagecount and/or are dirty and the system is IO constrained.

As a quick recap, relation extension basically works like:
1) We lock the relation for extension
2) ReadBuffer*(P_NEW) is being called, to extend the relation
3) smgrnblocks() is used to find the new target block
4) We search for a victim buffer (via BufferAlloc()) to put the new
   block into
5) If dirty the victim buffer is cleaned
6) The relation is extended using smgrextend()
7) The page is initialized


The problems come from 4) and 5) potentially each taking a fair
while. If the working set mostly fits into shared_buffers 4) can
requiring iterating over all shared buffers several times to find a
victim buffer. If the IO subsystem is buys and/or we've hit the kernel's
dirty limits 5) can take a couple seconds.

I've prototyped solving this for heap relations moving the smgrnblocks()
+ smgrextend() calls to RelationGetBufferForTuple(). With some care
(including a retry loop) it's possible to only do those two under the
extension lock. That indeed fixes problems in some of my tests.

I'm not sure whether the above is the best solution however. For one I
think it's not necessarily a good idea to opencode this in hio.c - I've
not observed it, but this probably can happen for btrees and such as
well. For another, this is still a exclusive lock while we're doing IO:
smgrextend() wants a page to write out, so we have to be careful not to
overwrite things.

There's two things that seem to make sense to me:

First, decouple relation extension from ReadBuffer*, i.e. remove P_NEW
and introduce a bufmgr function specifically for extension.

Secondly I think we could maybe remove the requirement of needing an
extension lock alltogether. It's primarily required because we're
worried that somebody else can come along, read the page, and initialize
it before us. ISTM that could be resolved by *not* writing any data via
smgrextend()/mdextend(). If we instead only do the write once we've read
in  locked the page exclusively there's no need for the extension
lock. We probably still should write out the new page to the OS
immediately once we've initialized it; to avoid creating sparse files.

The other reason we need the extension lock is that code like
lazy_scan_heap() and btvacuumscan() that tries to avoid initializing
pages that are about to be initilized by the extending backend. I think
we should just remove that code and deal with the problem by retrying in
the extending backend; that's why I think moving extension to a
different file might be helpful.

I've attached my POC for heap extension, but it's really just a POC at
this point.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From f1f38829160d0f2998cd187f2f920cdc7b1fa709 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sun, 29 Mar 2015 20:55:32 +0200
Subject: [PATCH] WIP: Saner heap extension.

---
 src/backend/access/heap/hio.c | 110 --
 1 file changed, 63 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 6d091f6..178c417 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -15,6 +15,8 @@
 
 #include postgres.h
 
+#include miscadmin.h
+
 #include access/heapam.h
 #include access/hio.h
 #include access/htup_details.h
@@ -420,63 +422,77 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	/*
 	 * Have to extend the relation.
 	 *
-	 * We have to use a lock to ensure no one else is extending the rel at the
-	 * same time, else we will both try to initialize the same new page.  We
-	 * can skip locking for new or temp relations, however, since no one else
-	 * could be accessing them.
+	 * To avoid, as it used to be the case, holding the extension lock during
+	 * victim buffer search for the new buffer, we extend the relation here
+	 * instead of relying on bufmgr.c. We still have to hold the extension
+	 * lock to prevent a race between two backends initializing the same page.
 	 */
-	needLock = !RELATION_IS_LOCAL(relation);
+	while(true)
+	{
+		char		emptybuf[BLCKSZ];
 
-	if (needLock)
-		LockRelationForExtension(relation, ExclusiveLock);
+		/*
+		 * We have to use a lock to ensure no one else is extending the rel at
+		 * the same time, else we will both try to initialize the same new
+		 * page.  We can skip locking for new or temp relations, however,
+		 * since no one else could be accessing them.
+		 */
+		needLock = !RELATION_IS_LOCAL(relation);
+		RelationOpenSmgr(relation);
 
-	/*
-	 * XXX This does an lseek - rather expensive - but at the moment it is the
-	 * only way to accurately determine how many blocks are in a relation.  Is
-	 * it worth keeping an accurate file length in 

Re: [HACKERS] How about to have relnamespace and relrole?

2015-03-29 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I have just claimed this as committer in the CF, but on reviewing the 
 emails it looks like there is disagreement about the need for it at all, 
 especially from Tom and Robert.

 I confess I have often wanted regnamespace, particularly, and 
 occasionally regrole, simply as a convenience. But I'm not going to 
 commit it against substantial opposition.

 Do we need a vote?

My concern about it is basically that I don't see where we stop.
The existing regFOO alias types are provided for object classes which
have nontrivial naming conventions (schema qualification, overloaded
argument types, etc), so that you can't just do select ... from
catalog where objectname = 'blah'.  That doesn't apply to namespaces
or roles.  So I'm afraid that once this precedent is established,
there will be demands for regFOO for every object class we have,
and I don't want that much clutter.

It may be that these two cases are so much more useful than any other
conceivable cases that we can do them and stop, but I don't think that
argument has been made convincingly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] getting rid of thread fork emulation in pgbench?

2015-03-29 Thread Fabien COELHO



As I said, I wouldn't have a problem with restricting the --jobs setting
on threadless platforms, which seems like it would fix your problem since
you wouldn't need to have more than one process involved.


Indeed!

So I undestand that thread fork emulation could eventually go from your 
point of view, provided that pgbench compiles and runs with one job.


I'll try to submit a patch to that effect, maybe to June CF.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rounding to even for numeric data type

2015-03-29 Thread Michael Paquier
On Sun, Mar 29, 2015 at 9:21 AM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
 On 29/03/15 13:07, David G. Johnston wrote:

 On Sat, Mar 28, 2015 at 3:59 PM, Michael Paquier
 michael.paqu...@gmail.com mailto:michael.paqu...@gmail.comwrote:

 On Sun, Mar 29, 2015 at 5:34 AM, Gavin Flower
 gavinflo...@archidevsys.co.nz
 mailto:gavinflo...@archidevsys.co.nz wrote:
  On 28/03/15 21:58, Dean Rasheed wrote:
  [...]
 
 
  Andrew mentioned that there have been complaints from people doing
  calculations with monetary data that we don't implement
  round-to-nearest-even (Banker's) rounding. It's actually the
 case that
  various different financial calculations demand different specific
  rounding modes, so it wouldn't be enough to simply change the
 default
  - we would have to provide a choice of modes.
 
  [...]
 
  Could the 2 current round functions have cousins that included
 an extra char
  parameter (or string), that indicated the type of rounding?
 
  So we don't end up with an explosion of rounding functions, yet
 could cope
  with a limited set of additional rounding modes initially, and
 possibly
  others in the future.

 Instead of extending round, isn't what we are looking at here a new
 data type? I have doubts that we only want to have a way to switch
 round() between different modes. Hence, what we could do is:
 1) Mention in the docs that numeric does round-half-away-from-zero
 2) Add regression tests for numeric(n,m) and round(numeric)
 3) Add a TODO item for something like numeric2, doing rounding-at-even
 (this could be an extension as well), but with the number of
 duplication that it may have with numeric, an in-core type would make
 sense, to facilitate things exposing some of structures key structures
 would help.


 So, create a numeric type for each possible rounding mode? That implies at
 least two types, round-half-even and round-half-away-from-zero, with
 suitable abbreviations: numeric_rhe, numeric_raz.

The existing numeric now does half-up rounding.

 If the goal is to make plain numeric IEEE standard conforming then giving
 the user a way to switch all existing numeric types to numeric_raz would be
 nice.

 Implicit casts between each of the various numeric types would be needed
 and understandable.

That's exactly the thing I think would be helpful.

 I'm pondering calling them numeric_eng and numeric_bus (for engineering
 and business respectively)...

 In Java, there are 8 rounding modes specified:

 https://docs.oracle.com/javase/8/docs/api/java/math/RoundingMode.html
 Some of these may be relevant to pg.

That's interesting. I didn't recall those details.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rounding to even for numeric data type

2015-03-29 Thread Michael Paquier
On Sun, Mar 29, 2015 at 7:59 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Sun, Mar 29, 2015 at 5:34 AM, Gavin Flower
 gavinflo...@archidevsys.co.nz wrote:
 On 28/03/15 21:58, Dean Rasheed wrote:
 [...]


 Andrew mentioned that there have been complaints from people doing
 calculations with monetary data that we don't implement
 round-to-nearest-even (Banker's) rounding. It's actually the case that
 various different financial calculations demand different specific
 rounding modes, so it wouldn't be enough to simply change the default
 - we would have to provide a choice of modes.

 [...]

 Could the 2 current round functions have cousins that included an extra char
 parameter (or string), that indicated the type of rounding?

 So we don't end up with an explosion of rounding functions, yet could cope
 with a limited set of additional rounding modes initially, and possibly
 others in the future.

 Instead of extending round, isn't what we are looking at here a new
 data type? I have doubts that we only want to have a way to switch
 round() between different modes. Hence, what we could do is:
 1) Mention in the docs that numeric does round-half-away-from-zero
 2) Add regression tests for numeric(n,m) and round(numeric)
 3) Add a TODO item for something like numeric2, doing rounding-at-even
 (this could be an extension as well), but with the number of
 duplication that it may have with numeric, an in-core type would make
 sense, to facilitate things exposing some of structures key structures
 would help.

So, attached is a patch that does 1) and 2) to make clear to the user
how numeric and double precision behave regarding rounding. I am
adding it to CF 2015-06 to keep track of it...
-- 
Michael
From 21e2da3d8c480f28c2cb469a004dbc225a522725 Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Sun, 29 Mar 2015 19:46:50 +0900
Subject: [PATCH] Precise rounding behavior of numeric and double precision in
 docs

Regression tests improving the coverage in this area are added as well.
---
 doc/src/sgml/datatype.sgml| 18 ++
 src/test/regress/expected/int2.out| 20 
 src/test/regress/expected/int4.out| 20 
 src/test/regress/expected/int8.out| 20 
 src/test/regress/expected/numeric.out | 24 
 src/test/regress/sql/int2.sql | 10 ++
 src/test/regress/sql/int4.sql | 10 ++
 src/test/regress/sql/int8.sql | 10 ++
 src/test/regress/sql/numeric.sql  | 10 ++
 9 files changed, 142 insertions(+)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index da1f25f..0342c8a 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -612,6 +612,24 @@ NUMERIC
  equivalent.  Both types are part of the acronymSQL/acronym
  standard.
 /para
+
+para
+ With using the functionround/ function, the typenumeric/type
+ type rounds half-up, and the typedouble precision/ type half-even.
+
+programlisting
+SELECT round(1.5::numeric), round(2.5::numeric);
+ round | round
+---+---
+ 2 | 3
+(1 row)
+SELECT round(1.5::double precision), round(2.5::double precision);
+ round | round
+---+---
+ 2 | 2
+(1 row)
+/programlisting
+/para
/sect2
 
 
diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out
index 311fe73..3ea4ed9 100644
--- a/src/test/regress/expected/int2.out
+++ b/src/test/regress/expected/int2.out
@@ -286,3 +286,23 @@ FROM (VALUES (-2.5::float8),
   2.5 |  2
 (7 rows)
 
+-- check rounding when casting from numeric
+SELECT x, x::int2 AS int2_value
+FROM (VALUES (-2.5::numeric),
+ (-1.5::numeric),
+ (-0.5::numeric),
+ (0.0::numeric),
+ (0.5::numeric),
+ (1.5::numeric),
+ (2.5::numeric)) t(x);
+  x   | int2_value 
+--+
+ -2.5 | -3
+ -1.5 | -2
+ -0.5 | -1
+  0.0 |  0
+  0.5 |  1
+  1.5 |  2
+  2.5 |  3
+(7 rows)
+
diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out
index 83fe022..372fd4d 100644
--- a/src/test/regress/expected/int4.out
+++ b/src/test/regress/expected/int4.out
@@ -383,3 +383,23 @@ FROM (VALUES (-2.5::float8),
   2.5 |  2
 (7 rows)
 
+-- check rounding when casting from numeric
+SELECT x, x::int4 AS int4_value
+FROM (VALUES (-2.5::numeric),
+ (-1.5::numeric),
+ (-0.5::numeric),
+ (0.0::numeric),
+ (0.5::numeric),
+ (1.5::numeric),
+ (2.5::numeric)) t(x);
+  x   | int4_value 
+--+
+ -2.5 | -3
+ -1.5 | -2
+ -0.5 | -1
+  0.0 |  0
+  0.5 |  1
+  1.5 |  2
+  2.5 |  3
+(7 rows)
+
diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out
index 

[HACKERS] getting rid of thread fork emulation in pgbench?

2015-03-29 Thread Fabien COELHO


Hello hackers,


This is take 2, I already suggested this 2 years ago...


There is a thread fork emulation hack which allows pgbench to be 
compiled without threads by emulating them with forks, obviously with 
limited capabilities. This feature is triggered by configuring postgresql 
with --disable-thread-safety.


I'm not aware of platforms which would still benefit from this feature (we 
are probably talking of a PostgreSQL 9.6 maybe released in 2016), and this 
thread emulation is a real pain for maintaining and extending it: some 
features cannot be implemented, or must be implemented twice, or must be 
implemented in a heavy way because it cannot be assumed to be in a 
threaded implementation.


My question is: would someone still object strongly to the removal of this 
thread fork emulation hack in pgbench?


That would mean arguing that it is more important than a simpler code base 
for pgbench, which would help both its maintenance and extension, and must 
be kept despite these costs, hence the strongly.



Tom's opinion, 2 years ago, was to object strongly: 
http://www.postgresql.org/message-id/24846.1372352...@sss.pgh.pa.us


I would object strongly to that, as it would represent a significant 
movement of the goalposts on what is required to build Postgres at all, ie 
platforms on which --enable-thread-safety is unavailable or expensive 
would be out in the cold.  Perhaps that set is approaching empty, but a 
project that's still standardized on C89 has little business making such a 
choice IMO.


However, about not providing a full feature under thread emulation:

Perhaps that's worth doing.  I agree with Fabien that full support of
this feature in the process model is more trouble than it's worth,
though, and I wouldn't scream loudly if we just didn't support it.
--disable-thread-safety doesn't have to be entirely penalty-free.


Robert Haas did also object strongly:
http://www.postgresql.org/message-id/CA+TgmoaJawAM0A4N1RnsndFhTOYYbwnt+7N7XWLcW=n-udc...@mail.gmail.com

I don't believe that to be an acceptable restriction.  We generally 
require features to work on all platforms we support.  We have made 
occasional compromises there, but generally only when the restriction is 
fundamental to the platform rather than for developer convenience.



So the underlying question is, does Tom and Robert's opinion have changed 
from 2 years ago?


If not, I would really like to know at least *one* current platform for 
which --disable-thread-safety is required, just to know why I waste time 
over such things.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config

2015-03-29 Thread Michael Paquier
On Wed, Mar 25, 2015 at 2:32 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Mar 25, 2015 at 8:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hm.  We're all agreed that there's a use case for exposing PG_VERSION_NUM
 to the makefiles, but I did not hear one for adding it to pg_config; and
 doing the former takes about two lines whereas adding a pg_config option
 entails quite a lot of overhead (documentation, translatable help text,
 yadda yadda).  So I'm not in favor of doing the latter without a much
 more solid case than has been made.

 Well, I have no other cases than ones of the type mentioned upthread,
 and honestly I am fine as long as we do not apply maths to a version
 string. So attached is a patch that adds VERSION_NUM in
 Makefile.global.

Added entry in CF 2015-06 to not have this stuff fall into the void:
https://commitfest.postgresql.org/5/203/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-03-29 Thread Andres Freund
On 2015-03-29 15:21:44 -0400, Tom Lane wrote:
  There's two things that seem to make sense to me:

  First, decouple relation extension from ReadBuffer*, i.e. remove P_NEW
  and introduce a bufmgr function specifically for extension.

 I think that removing P_NEW is likely to require a fair amount of
 refactoring of calling code, so I'm not thrilled with doing that.
 On the other hand, perhaps all that code would have to be touched
 anyway to modify the scope over which the extension lock is held.

It's not *that* many locations that need to extend relations. In my
playing around it seemed to me they all would need to be modified
anyway; if we want to remove/reduce the scope of extension locks to deal
with the fact that somebody else could have started to use the buffer.

  Secondly I think we could maybe remove the requirement of needing an
  extension lock alltogether. It's primarily required because we're
  worried that somebody else can come along, read the page, and initialize
  it before us. ISTM that could be resolved by *not* writing any data via
  smgrextend()/mdextend().

 I'm afraid this would break stuff rather thoroughly, in particular
 handling of out-of-disk-space cases.

Hm. Not a bad point.

 And I really don't see how you get
 consistent behavior at all for multiple concurrent callers if there's no
 locking.

What I was thinking is something like this:

while (true)
{
targetBuffer = AcquireFromFSMEquivalent();

if (targetBuffer == InvalidBuffer)
 targetBuffer = ExtendRelation();

LockBuffer(targetBuffer, BUFFER_LOCK_EXCLUSIVE);

if (BufferHasEnoughSpace(targetBuffer))
break;

LockBuffer(BUFFER_LOCK_UNLOCK);
}

where ExtendRelation() would basically work like
while (true)
{
targetBlock = (lseek(fd, BLCKSZ, SEEK_END) - BLCKSZ)/8192;
buffer = ReadBuffer(rel, targetBlock);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
page = BufferGetPage(buffer);
if (PageIsNew(page))
{
PageInit(page);
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
FlushBuffer(buffer);
break;
}
else
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
continue;
}
}

Obviously it's a tad more complex than that pseudocode, but I think that
basically should work. Except that it, as you can say, lead to some
oddities with out of space handling. I think it should actually be ok,
it might just be confusing to the user.

I think we might be able to address those issues by not using
lseek(SEEK_SET) but instead
fcntl(fd, F_SETFL, O_APPEND, 1);
write(fd, pre-init-block, BLCKSZ);
fcntl(fd, F_SETFL, O_APPEND, 0);
newblock = (lseek(SEEK_CUR) - BLCKSZ)/BLCKSZ;

by using O_APPEND and a pre-initialized block we can be sure to write a
block at the end, that's valid, and shouldn't run afould of any out of
space issues that we don't already have.

Unfortunately I'm not sure whether fcntl for O_APPEND is portable :(

 One idea that might help is to change smgrextend's API so that it doesn't
 need a buffer to write from, but just has an API of add a prezeroed block
 on-disk and tell me the number of the block you added.  On the other
 hand, that would then require reading in the block after allocating a
 buffer to hold it (I don't think you can safely assume otherwise) so the
 added read step might eat any savings.

Yea, I was thinking that as well. We simply could skip the reading step
by setting up the contents in the buffer manager without a read in this
case...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rounding to even for numeric data type

2015-03-29 Thread James Cloos
 MP == Michael Paquier michael.paqu...@gmail.com writes:

MP So, attached is a patch that does 1) and 2) to make clear to the
MP user how numeric and double precision behave regarding rounding.
MP I am adding it to CF 2015-06 to keep track of it...

Given that the examples show -2.5 rounds to -3, the IEEE term is
roundTiesToAway, and the typical conversational english is round ties
away from zero.

RoundUp means mean towards +Infinity.

754 specifies that for decimal, either roundTiesToEven or roundTiesToAway
are acceptable defaults, and which of the two applies is language dependent.
Does ANSI SQL say anything about how numeric should round?

In general, for decimals (or anything other than binary), there are
twelve possible roundings:

 ToEven ToOdd AwayFromZero ToZero Up Down
 TiesToEven TiesToOdd TiesAwayFromZero TiesToZero TiesUp TiesDown

(Up is the same as ceil(3), Down as floor(3).)

-JimC
-- 
James Cloos cl...@jhcloos.com OpenPGP: 0x997A9F17ED7DAEA6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-03-29 Thread Andres Freund
On 2015-03-29 16:07:49 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2015-03-29 15:21:44 -0400, Tom Lane wrote:
  One idea that might help is to change smgrextend's API so that it doesn't
  need a buffer to write from, but just has an API of add a prezeroed block
  on-disk and tell me the number of the block you added.  On the other
  hand, that would then require reading in the block after allocating a
  buffer to hold it (I don't think you can safely assume otherwise) so the
  added read step might eat any savings.
 
  Yea, I was thinking that as well. We simply could skip the reading step
  by setting up the contents in the buffer manager without a read in this
  case...
 
 No, you can't, at least not if the point is to not be holding any
 exclusive lock by the time you go to talk to the buffer manager.  There
 will be nothing stopping some other backend from writing into that page of
 the file before you can get hold of it.  If the buffer they used to do the
 write has itself gotten recycled, there is nothing left at all to tell you
 your page image is out of date.

That's why I'd proposed restructuring things so that the actual
extension/write to the file only happens once we have the buffer manager
exclusive lock on the individual buffer. While not trvia to implement it
doesn't look prohibitively complex.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compute_index_stats is missing a CHECK_FOR_INTERRUPTS

2015-03-29 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 On Sat, Mar 28, 2015 at 3:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hm.  The other per-sample-row loops in analyze.c use vacuum_delay_point()
 rather than CHECK_FOR_INTERRUPTS() directly.  Ordinarily that wouldn't
 make much difference here, but maybe a slow index function might be
 incurring I/O?

 That isn't the case for me (and if it were, they wouldn't be going through
 the buffer manager anyway and so would not trigger delay criteria), but
 that seems like a valid concern in general.  It also explains why I
 couldn't find CHECK_FOR_INTERRUPTS in other loops of that file, because I
 was looking for the wrong spelling.

 Adding a vacuum_delay_point does solve the immediately observed problem,
 both the toy one and the more realistic one.

Committed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2015-03-29 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 As a quick recap, relation extension basically works like:
 1) We lock the relation for extension
 2) ReadBuffer*(P_NEW) is being called, to extend the relation
 3) smgrnblocks() is used to find the new target block
 4) We search for a victim buffer (via BufferAlloc()) to put the new
block into
 5) If dirty the victim buffer is cleaned
 6) The relation is extended using smgrextend()
 7) The page is initialized

 The problems come from 4) and 5) potentially each taking a fair
 while.

Right, so basically we want to get those steps out of the exclusive lock
scope.

 There's two things that seem to make sense to me:

 First, decouple relation extension from ReadBuffer*, i.e. remove P_NEW
 and introduce a bufmgr function specifically for extension.

I think that removing P_NEW is likely to require a fair amount of
refactoring of calling code, so I'm not thrilled with doing that.
On the other hand, perhaps all that code would have to be touched
anyway to modify the scope over which the extension lock is held.

 Secondly I think we could maybe remove the requirement of needing an
 extension lock alltogether. It's primarily required because we're
 worried that somebody else can come along, read the page, and initialize
 it before us. ISTM that could be resolved by *not* writing any data via
 smgrextend()/mdextend().

I'm afraid this would break stuff rather thoroughly, in particular
handling of out-of-disk-space cases.  And I really don't see how you get
consistent behavior at all for multiple concurrent callers if there's no
locking.

One idea that might help is to change smgrextend's API so that it doesn't
need a buffer to write from, but just has an API of add a prezeroed block
on-disk and tell me the number of the block you added.  On the other
hand, that would then require reading in the block after allocating a
buffer to hold it (I don't think you can safely assume otherwise) so the
added read step might eat any savings.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: row_to_array function

2015-03-29 Thread Pavel Stehule
2015-03-29 20:27 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Pavel Stehule pavel.steh...@gmail.com writes:
  here is rebased patch.
  It contains both patches - row_to_array function and foreach array
 support.

 While I don't have a problem with hstore_to_array, I don't think that
 row_to_array is a very good idea; it's basically encouraging people to
 throw away SQL datatypes altogether and imagine that everything is text.


This is complementation of ARRAY API - we have row_to_json, probably will
have row_to_jsonb, row_to_hstore and row_to_array is relative logical.
Casting to text is not fast, but on second hand - working with text arrays
is fast.

I know so casting to text is a problem, but if you iterate over record's
fields, then you have to find common shared type due sharing plans - and
text arrays can be simple solution.

Now, with current possibilities I'll do full sql expression SELECT key,
value FROM each(hstore(ROW)) or FOREACH ARRAY hstore_to_matrix(hstore(ROW))

row_to_array(ROW) can reduce a hstore overhead

any other solution based on PL/Perl or PL/Python are slower due PL engine
start and due same transformation to some form of structured text.




 They've already bought into that concept if they are using hstore or
 json, so smashing elements of those containers to text is not a problem.
 But that doesn't make this version a good thing.

 (In any case, those who insist can get there through row_to_json, no?)

 Also, could we please *not* mix up these two very independent features?
 foreach array as implemented here may or may not be a good thing, but
 it should get its own discussion.


ok, I'll send two patches.



 regards, tom lane



Re: [HACKERS] Relation extension scalability

2015-03-29 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2015-03-29 15:21:44 -0400, Tom Lane wrote:
 One idea that might help is to change smgrextend's API so that it doesn't
 need a buffer to write from, but just has an API of add a prezeroed block
 on-disk and tell me the number of the block you added.  On the other
 hand, that would then require reading in the block after allocating a
 buffer to hold it (I don't think you can safely assume otherwise) so the
 added read step might eat any savings.

 Yea, I was thinking that as well. We simply could skip the reading step
 by setting up the contents in the buffer manager without a read in this
 case...

No, you can't, at least not if the point is to not be holding any
exclusive lock by the time you go to talk to the buffer manager.  There
will be nothing stopping some other backend from writing into that page of
the file before you can get hold of it.  If the buffer they used to do the
write has itself gotten recycled, there is nothing left at all to tell you
your page image is out of date.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing INNER JOINs

2015-03-29 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 I'm not worried about the cost of the decision at plan init time. I was
 just confused about what Tom was recommending. I couldn't quite decide from
 his email if he meant to keep the alternative plan node around so that the
 executor must transition through it for each row, or to just choose the
 proper plan at executor init and return the actual root of the selected
 plan instead of returning the initialised AlternativePlan node (see
 nodeAlternativePlan.c)

TBH, I don't like anything about this patch and would prefer to reject it
altogether.  I think it's a disaster from a system structural standpoint,
will probably induce nasty bugs, and simply doesn't apply to enough
real-world queries to be worth either the pain of making it work or the
planning overhead it will add.

The structural damage could be reduced if you got rid of the far-too-cute
idea that you could cut the AlternativePlan node out of the plan tree at
executor startup time.  Just have it remember which decision it made and
pass down all the calls to the appropriate child node.  What you're doing
here violates the rule that planstate trees have a one-to-one relationship
to plan trees.  EXPLAIN used to iterate over those trees in lockstep, and
there probably still is code that does similar things (in third-party
modules if not core), so I don't think we should abandon that principle.

Also, the patch seems rather schizophrenic about whether it's relying
on run-time decisions or not; in particular I see no use of the
PlannedStmt.suitableFor field, and no reason to have one if there's
to be an AlternativePlan node making the decision.

I'm just about certain that you can't generate the two alternative plans
simply by calling make_one_rel() twice.  The planner has far too much
tendency to scribble on its input data structures for that to work
reliably.  It's possible you could dodge bugs of that sort, and save
some cycles, if you did base relation planning only once and created the
alternatives only while working at the join-relation level.  Even then
I think you'd need to do pushups like what GEQO does in order to repeat
join-level planning.  (While I'm looking at that: what used to be a
reasonably clear API between query_planner and grouping_planner now seems
like a complete mess, and I'm quite certain you've created multiple bugs
in grouping_planner, because it would need to track more than one e.g.
current_pathkeys value for the subplans created for the different
final_rels.  You can't just arbitrarily do some of those steps in one loop
and the others in a totally different loop.)

As far as the fundamental-bugs issue goes, I have no faith that there are
no pending AFTER trigger events is a sufficient condition for all known
foreign-key constraints hold against whatever-random-snapshot-I'm-using;
and it's certainly not a *necessary* condition.  (BTW the patch should be
doing its best to localize knowledge about that, rather than spreading the
assumption far and wide through comments and choices of flag/variable
names.)  I think the best you can hope to say in that line is that if
there were no pending events at the time the snapshot was taken, it might
be all right.  Maybe.  But it's not hard to imagine the trigger queue
getting emptied while snapshots still exist that can see inconsistent
states that prevailed before the triggers fired.  (Remember that a trigger
might restore consistency by applying additional changes, not just by
throwing an error.)  STABLE functions are the pain point here since they
could execute queries with snapshots from quite some time back.

As for the cost issue, that's an awful lot of code you added to
remove_useless_joins().  I'm concerned about how much time it's going to
spend while failing to prove anything for most queries.  For starters,
it looks to be O(N^2) in the number of relations, which the existing
join_is_removable code is not; and in general it looks like it will do
work in many more common cases than the existing code has to.  I'm also
pretty unhappy about having get_relation_info() physically visiting the
pg_constraint catalog for every relation that's ever scanned by any query.
(You could probably teach the relcache to remember data about foreign-key
constraints, instead of doing it this way.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers