Re: [HACKERS] Relation extension scalability
On Mon, Mar 30, 2015 at 12:26 AM, Andres Freund and...@2ndquadrant.com wrote: Hello, Currently bigger shared_buffers settings don't combine well with relations being extended frequently. Especially if many/most pages have a high usagecount and/or are dirty and the system is IO constrained. As a quick recap, relation extension basically works like: 1) We lock the relation for extension 2) ReadBuffer*(P_NEW) is being called, to extend the relation 3) smgrnblocks() is used to find the new target block 4) We search for a victim buffer (via BufferAlloc()) to put the new block into 5) If dirty the victim buffer is cleaned 6) The relation is extended using smgrextend() 7) The page is initialized The problems come from 4) and 5) potentially each taking a fair while. If the working set mostly fits into shared_buffers 4) can requiring iterating over all shared buffers several times to find a victim buffer. If the IO subsystem is buys and/or we've hit the kernel's dirty limits 5) can take a couple seconds. In the past, I have observed in one of the Write-oriented tests that backend's have to flush the pages by themselves many a times, so in above situation that can lead to more severe bottleneck. I've prototyped solving this for heap relations moving the smgrnblocks() + smgrextend() calls to RelationGetBufferForTuple(). With some care (including a retry loop) it's possible to only do those two under the extension lock. That indeed fixes problems in some of my tests. So do this means that the problem is because of contention on extension lock? I'm not sure whether the above is the best solution however. Another thing to note here is that during extension we are extending just one block, won't it make sense to increment it by some bigger number (we can even take input from user for the same where user can specify how much to autoextend a relation when the relation doesn't have any empty space). During mdextend(), we might increase just one block, however we can register the request for background process to increase the size similar to what is done for fsync. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Relation extension scalability
On 2015-03-29 20:02:06 -0400, Robert Haas wrote: On Sun, Mar 29, 2015 at 2:56 PM, Andres Freund and...@2ndquadrant.com As a quick recap, relation extension basically works like: 1) We lock the relation for extension 2) ReadBuffer*(P_NEW) is being called, to extend the relation 3) smgrnblocks() is used to find the new target block 4) We search for a victim buffer (via BufferAlloc()) to put the new block into 5) If dirty the victim buffer is cleaned 6) The relation is extended using smgrextend() 7) The page is initialized The problems come from 4) and 5) potentially each taking a fair while. If the working set mostly fits into shared_buffers 4) can requiring iterating over all shared buffers several times to find a victim buffer. If the IO subsystem is buys and/or we've hit the kernel's dirty limits 5) can take a couple seconds. Interesting. I had always assumed the bottleneck was waiting for the filesystem to extend the relation. That might be the case sometimes, but it's not what I've actually observed so far. I think most modern filesystems doing preallocation resolved this to some degree. Secondly I think we could maybe remove the requirement of needing an extension lock alltogether. It's primarily required because we're worried that somebody else can come along, read the page, and initialize it before us. ISTM that could be resolved by *not* writing any data via smgrextend()/mdextend(). If we instead only do the write once we've read in locked the page exclusively there's no need for the extension lock. We probably still should write out the new page to the OS immediately once we've initialized it; to avoid creating sparse files. The other reason we need the extension lock is that code like lazy_scan_heap() and btvacuumscan() that tries to avoid initializing pages that are about to be initilized by the extending backend. I think we should just remove that code and deal with the problem by retrying in the extending backend; that's why I think moving extension to a different file might be helpful. I thought the primary reason we did this is because we wanted to write-and-fsync the block so that, if we're out of disk space, any attendant failure will happen before we put data into the block. Well, we only write and register a fsync. Afaics we don't actually perform the fsync it at that point. I don't think having to do the fsync() necessarily precludes removing the extension lock. Once we've initialized the block, a subsequent failure to write or fsync it will be hard to recover from; At the very least the buffer shouldn't become dirty before we successfully wrote once, right. It seems quite doable to achieve that without the lock though. We'll have to do the write without going through the buffer manager, but that seems doable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rounding to even for numeric data type
On Mon, Mar 30, 2015 at 4:51 AM, James Cloos cl...@jhcloos.com wrote: MP == Michael Paquier michael.paqu...@gmail.com writes: MP So, attached is a patch that does 1) and 2) to make clear to the MP user how numeric and double precision behave regarding rounding. MP I am adding it to CF 2015-06 to keep track of it... Given that the examples show -2.5 rounds to -3, the IEEE term is roundTiesToAway, and the typical conversational english is round ties away from zero. Ah, thanks for the correct wording. Fixed in the attached. RoundUp means mean towards +Infinity. 754 specifies that for decimal, either roundTiesToEven or roundTiesToAway are acceptable defaults, and which of the two applies is language dependent. Does ANSI SQL say anything about how numeric should round? In general, for decimals (or anything other than binary), there are twelve possible roundings: ToEven ToOdd AwayFromZero ToZero Up Down TiesToEven TiesToOdd TiesAwayFromZero TiesToZero TiesUp TiesDown (Up is the same as ceil(3), Down as floor(3).) Well, I am not sure about that... But reading this thread changing the default rounding sounds unwelcome. So it may be better to just put in words the rounding method used now in the docs, with perhaps a mention that this is not completely in-line with the SQL spec if that's not the case. -- Michael From ae28d91519854e6d47d2c864fa26b65c70bb0526 Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Sun, 29 Mar 2015 19:46:50 +0900 Subject: [PATCH] Precise rounding behavior of numeric and double precision in docs Regression tests improving the coverage in this area are added as well. --- doc/src/sgml/datatype.sgml| 19 +++ src/test/regress/expected/int2.out| 20 src/test/regress/expected/int4.out| 20 src/test/regress/expected/int8.out| 20 src/test/regress/expected/numeric.out | 24 src/test/regress/sql/int2.sql | 10 ++ src/test/regress/sql/int4.sql | 10 ++ src/test/regress/sql/int8.sql | 10 ++ src/test/regress/sql/numeric.sql | 10 ++ 9 files changed, 143 insertions(+) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index da1f25f..eb131c3 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -612,6 +612,25 @@ NUMERIC equivalent. Both types are part of the acronymSQL/acronym standard. /para + +para + With using the functionround/ function, the typenumeric/type + type rounds ties away from zero, and the typedouble precision/type + type rounds ties away to even. + +programlisting +SELECT round(1.5::numeric), round(2.5::numeric); + round | round +---+--- + 2 | 3 +(1 row) +SELECT round(1.5::double precision), round(2.5::double precision); + round | round +---+--- + 2 | 2 +(1 row) +/programlisting +/para /sect2 diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out index 311fe73..3ea4ed9 100644 --- a/src/test/regress/expected/int2.out +++ b/src/test/regress/expected/int2.out @@ -286,3 +286,23 @@ FROM (VALUES (-2.5::float8), 2.5 | 2 (7 rows) +-- check rounding when casting from numeric +SELECT x, x::int2 AS int2_value +FROM (VALUES (-2.5::numeric), + (-1.5::numeric), + (-0.5::numeric), + (0.0::numeric), + (0.5::numeric), + (1.5::numeric), + (2.5::numeric)) t(x); + x | int2_value +--+ + -2.5 | -3 + -1.5 | -2 + -0.5 | -1 + 0.0 | 0 + 0.5 | 1 + 1.5 | 2 + 2.5 | 3 +(7 rows) + diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out index 83fe022..372fd4d 100644 --- a/src/test/regress/expected/int4.out +++ b/src/test/regress/expected/int4.out @@ -383,3 +383,23 @@ FROM (VALUES (-2.5::float8), 2.5 | 2 (7 rows) +-- check rounding when casting from numeric +SELECT x, x::int4 AS int4_value +FROM (VALUES (-2.5::numeric), + (-1.5::numeric), + (-0.5::numeric), + (0.0::numeric), + (0.5::numeric), + (1.5::numeric), + (2.5::numeric)) t(x); + x | int4_value +--+ + -2.5 | -3 + -1.5 | -2 + -0.5 | -1 + 0.0 | 0 + 0.5 | 1 + 1.5 | 2 + 2.5 | 3 +(7 rows) + diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out index da8be51..ed0bd34 100644 --- a/src/test/regress/expected/int8.out +++ b/src/test/regress/expected/int8.out @@ -866,3 +866,23 @@ FROM (VALUES (-2.5::float8), 2.5 | 2 (7 rows) +-- check rounding when casting from numeric +SELECT x, x::int8 AS int8_value +FROM (VALUES (-2.5::numeric), + (-1.5::numeric), +
Re: [HACKERS] Relation extension scalability
Robert Haas robertmh...@gmail.com writes: I thought the primary reason we did this is because we wanted to write-and-fsync the block so that, if we're out of disk space, any attendant failure will happen before we put data into the block. Once we've initialized the block, a subsequent failure to write or fsync it will be hard to recover from; basically, we won't be able to checkpoint any more. If we discover the problem while the block is still all-zeroes, the transaction that uncovers the problem errors out, but the system as a whole is still OK. Yeah. As Andres says, the fsync is not an important part of that, but we do expect that ENOSPC will happen during the initial write() if it's going to happen. To some extent that's an obsolete assumption, I'm afraid --- I believe that some modern filesystems don't necessarily overwrite the previous version of a block, which would mean that they are capable of failing with ENOSPC even during a re-write of a previously-written block. However, the possibility of filesystem misfeasance of that sort doesn't excuse us from having a clear recovery strategy for failures during relation extension. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench - merging transaction logs
Hello Tomas, The results are as follow: * 1 thread 33 runs median tps (average is consistent): - no logging:22062 - separate logging: 19360 (-12.2%) - merged logging:19326 (-12.4%, not significant from previous) Interesting. What hardware is this? Dell PowerEdge R720 with 2 Intel Xeon CPU E5-2660 2.20GHz (8 cores and 16 threads per processor, so 32 threads in total), running with Linux 3.13 (Ubuntu trusty). I wouldn't be surprised by this behavior on a multi-socket system, [...] There are 2 sockets. So my overall conclusion is: (1) The simple thread-shared file approach would save pgbench from post-processing merge-sort heavy code, for a reasonable cost. No it wouldn't - you're missing the fact that the proposed approach (shared file + fprintf) only works with raw transaction log. It does not work with aggregated log - the threads would have to somehow track the progress of the other threads somehow, in a very non-trivial way (e.g. what if one of the threads executes a long query, and thus does not send the results for a long time?). The counters are updated when the transaction is finished anyway? Another option would be to update shared aggregated results, but that requires locking. That is what I had in mind. ISTM that the locking impact would be much lower than for logging, the data is just locked for a counter update, and if counters are per-thread, a conflict may only occur when the data are gathered for actual logging, which would be rather infrequent. Even if the counters are shared, the locking would be for small time that would imply a low conflict probability. So I do not see this one as a significant performance issue. (2) The feature would not be available for the thread-emulation with this approach, but I do not see this as a particular issue as I think that it is pretty much only dead code and a maintenance burden. I'm not willing to investigate that, nor am I willing to implement another feature that works only sometimes (I've done that in the past, and I find it a bad practice). Hmmm. Keeping an obsolete feature with significant impact on how other features can be implemented, so basically a maintenance burden, does not look like best practice *either*. If someone else is willing to try to eliminate the thread emulation, I won't object to that. Hmmm. I'll try to trigger a discussion in another thread to test the idea. But as I pointed out above, simple fprintf is not going to work for the aggregated log - solving that will need more code (e.g. maintaining aggregated results for all threads, requiring additional locking etc). The code for that is probably simple and short, and my wish is to try to avoid an external merge sort post processing, if possible, which is not especially cheap anyway, neither in code nor in time. (3) Optimizing doLog from its current fprintf-based implementation may be a good thing. That's probably true. The simplest thing we can do right now is buffering the data into larger chunks and writing those chunks. That amortizes the costs of locking. If it is buffered in the process, that would mean more locking. If it is buffered per threads that would result in out of order logs. Would that be an issue? It would be fine with me. Using O_APPEND, as suggested by Andres, seems like a promising idea. I tried that with a shared file handle, but the impact seemed negligeable. The figures I reported used it, btw. I also tried to open the same file in append mode from all threads, with positive performance effects, but then the flush did not occur at line boundaries to there was some mangling in the result. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench - merging transaction logs
I also implemented a quick and dirty version for a merge log based on sharing a file handle (append mode + sprintf + fputs). I tried the append + per-thread 2KB buffered sprintf + fputs when full, with the same number of runs. The logs are out of order by chunks, the overhead seems higher with 1 thread, but there is no extra overhead with 12 threads. The results are as follow: * 1 thread 33 runs median tps (average is consistent): - no logging:22062 - separate logging: 19360 (-12.2%) - merged logging:19326 (-12.4%, not significant from previous) - buf merged logging: 18751 (-15%, seems significant) The worst overhead I could trigger is with 12 threads: * 12 threads 35 runs median tps (average is consistent) - no logging: 155917 - separate logging: 124695 (-20.0%) - merged logging: 119349 (-23.5%) - buf merged logging: 124914 (-19.9%, not significant from separate) -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TRAP: BadArgument - mcxt.c, Line: 813
On Sun, March 29, 2015 17:40, Erik Rijkers wrote: With the latest server testdb=# select version(); version -- PostgreSQL 9.5devel_HEAD_20150329_0154_2c33e0fbceb0 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.9.2, 64-bit (1 row) I get a crash while restoring a trgm GIN index. I forget to say this crash only happens with an assert-enabled server. ./configure --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD \ --bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/bin \ --libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.HEAD/lib \ --with-pgport=6545 --quiet --enable-depend --enable-cassert \ --enable-debug --with-extra-version=_HEAD_20150329_1726_2c33e0fbceb0 \ --with-openssl --with-perl --with-libxml --with-libxslt --with-zlib thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of thread fork emulation in pgbench?
Fabien COELHO coe...@cri.ensmp.fr writes: There is a thread fork emulation hack which allows pgbench to be compiled without threads by emulating them with forks, obviously with limited capabilities. This feature is triggered by configuring postgresql with --disable-thread-safety. I'm not aware of platforms which would still benefit from this feature (we are probably talking of a PostgreSQL 9.6 maybe released in 2016), and this thread emulation is a real pain for maintaining and extending it: some features cannot be implemented, or must be implemented twice, or must be implemented in a heavy way because it cannot be assumed to be in a threaded implementation. My question is: would someone still object strongly to the removal of this thread fork emulation hack in pgbench? By my count, the pthread-emulation code amounts to about 175 lines out of the nearly 4000 in pgbench.c. That's not an undue burden IMO (it's not, for comparison's sake, very much more than the amount of WIN32-specific code in that file). And what will you do instead? It would be fine I think for pgbench to not allow --jobs different from 1 on a threadless platform, but not for it to fail to work altogether. It looks to me like allowing it to compile without that code would take nearly as much effort/mess as what's there now. So the underlying question is, does Tom and Robert's opinion have changed from 2 years ago? Not mine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] TRAP: BadArgument - mcxt.c, Line: 813
With the latest server testdb=# select version(); version -- PostgreSQL 9.5devel_HEAD_20150329_0154_2c33e0fbceb0 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.9.2, 64-bit (1 row) I get a crash while restoring a trgm GIN index. The attached bash creates 2 test tables (t5, t6). A dumpfile, created thus: pg_dump -F c -v -h /tmp -p 6545 -t t6 -f ~/pg_stuff/dumps/testdb_t6.dump testdb cannot be restored in an empty database: it crashed the server (see below). A restore of the smaller table t5 does not crash the system; the indexing is then OK. Restore: pg_restore -v -O -x -d testdb ~/pg_stuff/dumps/testdb_t6.dump I get this crash from restoring : 2015-03-29 17:32:55.846 CEST 8016 LOG: database system is ready to accept connections TRAP: BadArgument(!(((CurrentMemoryContext) != ((void *)0) (const Node*)((CurrentMemoryContext)))-type) == T_AllocSetContext, File: mcxt.c, Line: 813) 2015-03-29 17:35:02.459 CEST 8016 LOG: server process (PID 8062) was terminated by signal 6: Aborted 2015-03-29 17:35:02.459 CEST 8016 DETAIL: Failed process was running: CREATE INDEX t6_trgm_re_idx ON t6 USING gin (txt gin_trgm_ops); 2015-03-29 17:35:02.459 CEST 8016 LOG: terminating any other active server processes 2015-03-29 17:35:02.502 CEST 8016 LOG: all server processes terminated; reinitializing 2015-03-29 17:35:02.710 CEST 8114 LOG: database system was interrupted; last known up at 2015-03-29 17:32:55 CEST 2015-03-29 17:35:02.735 CEST 8114 LOG: database system was not properly shut down; automatic recovery in progress 2015-03-29 17:35:02.828 CEST 8114 LOG: redo starts at 0/270E550 2015-03-29 17:35:05.665 CEST 8114 LOG: invalid magic number in log segment 00010010, offset 7716864 2015-03-29 17:35:05.665 CEST 8114 LOG: redo done at 0/1075AD90 2015-03-29 17:35:05.665 CEST 8114 LOG: last completed transaction was at log time 2015-03-29 17:34:11.149606+02 2015-03-29 17:35:10.296 CEST 8016 LOG: database system is ready to accept connections So it would seem the CREATE INDEX is the actual problem (I haven't tried that command separately). Erik Rijkers create_data.sh Description: application/shellscript -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of thread fork emulation in pgbench?
On Sun, Mar 29, 2015 at 1:49 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: Now if we have threads, it is simpler (if the overhead is reasonable) that threads share a file handle and just generate one file, so there is no merging. I really, really wish you'd stop arguing against the patch to allow merging of pgbench logs in this basis. There may or may not be other reasons to reject that patch, but arguing that we can or should reject it because of the hypothetical possibility that sharing a file handle will work out just isn't right. People who work hard to develop a patch that does something useful don't deserve to have it kicked to the curb because someone can imagine a development path that would eventually obsolete it. That patch deserves to be evaluated on its own merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relation extension scalability
On Sun, Mar 29, 2015 at 2:56 PM, Andres Freund and...@2ndquadrant.com As a quick recap, relation extension basically works like: 1) We lock the relation for extension 2) ReadBuffer*(P_NEW) is being called, to extend the relation 3) smgrnblocks() is used to find the new target block 4) We search for a victim buffer (via BufferAlloc()) to put the new block into 5) If dirty the victim buffer is cleaned 6) The relation is extended using smgrextend() 7) The page is initialized The problems come from 4) and 5) potentially each taking a fair while. If the working set mostly fits into shared_buffers 4) can requiring iterating over all shared buffers several times to find a victim buffer. If the IO subsystem is buys and/or we've hit the kernel's dirty limits 5) can take a couple seconds. Interesting. I had always assumed the bottleneck was waiting for the filesystem to extend the relation. Secondly I think we could maybe remove the requirement of needing an extension lock alltogether. It's primarily required because we're worried that somebody else can come along, read the page, and initialize it before us. ISTM that could be resolved by *not* writing any data via smgrextend()/mdextend(). If we instead only do the write once we've read in locked the page exclusively there's no need for the extension lock. We probably still should write out the new page to the OS immediately once we've initialized it; to avoid creating sparse files. The other reason we need the extension lock is that code like lazy_scan_heap() and btvacuumscan() that tries to avoid initializing pages that are about to be initilized by the extending backend. I think we should just remove that code and deal with the problem by retrying in the extending backend; that's why I think moving extension to a different file might be helpful. I thought the primary reason we did this is because we wanted to write-and-fsync the block so that, if we're out of disk space, any attendant failure will happen before we put data into the block. Once we've initialized the block, a subsequent failure to write or fsync it will be hard to recover from; basically, we won't be able to checkpoint any more. If we discover the problem while the block is still all-zeroes, the transaction that uncovers the problem errors out, but the system as a whole is still OK. Or at least, I think. Maybe I'm misunderstanding. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] cache lookup error for shell type creation with incompatible output function (DDL deparsing bug)
Hi all, I just bumped into the following problem in HEAD (1c41e2a): =# create type my_array_float (INPUT = array_in, OUTPUT = array_out, ELEMENT = float4, INTERNALLENGTH = 32); ERROR: XX000: cache lookup failed for type 0 LOCATION: format_type_internal, format_type.c:135 First note that in ~9.4 the error message is correct: =# create type my_array_float (INPUT = array_in, OUTPUT = array_out, ELEMENT = float4, INTERNALLENGTH = 32); ERROR: 42883: function array_out(my_array_float) does not exist LOCATION: findTypeOutputFunction, typecmds.c:1709 Now, the problem is caused by findTypeOutputFunction() in DefineType() because process passes InvalidOid as type OID when creating a type shell, root cause being a2e35b5 because of this: + address = TypeShellMake(typeName, typeNamespace, GetUserId()); The fix consists in being sure that typoid uses the OID of the type shell created, aka the OID stored in adress.ObjectID. Attached is a patch with a regression test checking for shell creation with incompatible input/output functions (failure caused by output function here though) able to check this code path. Regards, -- Michael diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 67e2ae2..37bcd53 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -217,6 +217,7 @@ DefineType(List *names, List *parameters) address = TypeShellMake(typeName, typeNamespace, GetUserId()); /* Make new shell type visible for modification below */ CommandCounterIncrement(); + typoid = address.objectId; /* * If the command was a parameterless CREATE TYPE, we're done --- @@ -1720,6 +1721,8 @@ findTypeOutputFunction(List *procname, Oid typeOid) if (OidIsValid(procOid)) return procOid; + Assert(OidIsValid(typeOid)); + /* No luck, try it with OPAQUE */ argList[0] = OPAQUEOID; diff --git a/src/test/regress/expected/create_type.out b/src/test/regress/expected/create_type.out index 35e8f5d..3646c2b 100644 --- a/src/test/regress/expected/create_type.out +++ b/src/test/regress/expected/create_type.out @@ -107,6 +107,12 @@ ERROR: type text_w_default already exists DROP TYPE default_test_row CASCADE; NOTICE: drop cascades to function get_default_test() DROP TABLE default_test; +-- Check shell type create with input/output incompatibility +CREATE TYPE not_existing_type (INPUT = array_in, +OUTPUT = array_out, +ELEMENT = int, +INTERNALLENGTH = 32); +ERROR: function array_out(not_existing_type) does not exist -- Check usage of typmod with a user-defined type -- (we have borrowed numeric's typmod functions) CREATE TEMP TABLE mytab (foo widget(42,13,7)); -- should fail diff --git a/src/test/regress/sql/create_type.sql b/src/test/regress/sql/create_type.sql index 96a075b..0c79050 100644 --- a/src/test/regress/sql/create_type.sql +++ b/src/test/regress/sql/create_type.sql @@ -106,6 +106,12 @@ DROP TYPE default_test_row CASCADE; DROP TABLE default_test; +-- Check shell type create with input/output incompatibility +CREATE TYPE not_existing_type (INPUT = array_in, +OUTPUT = array_out, +ELEMENT = int, +INTERNALLENGTH = 32); + -- Check usage of typmod with a user-defined type -- (we have borrowed numeric's typmod functions) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combining Aggregates
On 18 December 2014 at 02:48, Simon Riggs si...@2ndquadrant.com wrote: David, if you can update your patch with some docs to explain the behaviour, it looks complete enough to think about committing it in early January, to allow other patches that depend upon it to stand a chance of getting into 9.5. (It is not yet ready, but I see it could be). I've been thinking of bumping this patch to the June commitfest as the patch only exists to provide the basic infrastructure for things like parallel aggregation, aggregate before join, and perhaps auto updating materialised views. It seems unlikely that any of those things will happen for 9.5. Does anybody object to me moving this to June's commitfest? Regards David Rowley
Re: [HACKERS] Combining Aggregates
On Mon, Mar 30, 2015 at 2:08 PM, David Rowley dgrowle...@gmail.com wrote: On 18 December 2014 at 02:48, Simon Riggs si...@2ndquadrant.com wrote: David, if you can update your patch with some docs to explain the behaviour, it looks complete enough to think about committing it in early January, to allow other patches that depend upon it to stand a chance of getting into 9.5. (It is not yet ready, but I see it could be). I've been thinking of bumping this patch to the June commitfest as the patch only exists to provide the basic infrastructure for things like parallel aggregation, aggregate before join, and perhaps auto updating materialised views. It seems unlikely that any of those things will happen for 9.5. Yeah, I guess so... Does anybody object to me moving this to June's commitfest? Not from my side FWIW. I think it actually makes sense. -- Michael
Re: [HACKERS] TRAP: BadArgument - mcxt.c, Line: 813
Erik Rijkers e...@xs4all.nl writes: I get a crash while restoring a trgm GIN index. Hm, interesting: this seems to be the first negative fallout from commit eaa5808e8ec4e82ce1a87103a6b6f687666e4e4c, which made MemoryContextReset() delete not reset child contexts. ginbuild() is creating a funcCtx as a child of its tmpCtx but then supposing that it can reset the tmpCtx without damaging the other one. Easy enough to fix, but it's a bit worrisome that we did not find this in regression testing. Apparently the regression tests do not exercise the early dump code path in ginBuildCallback... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
Hi I checked this patch. I like the functionality and behave. There is minor issue with outdated regress test test rules... FAILED I have no objections. Regards Pavel 2015-03-27 9:23 GMT+01:00 Haribabu Kommi kommi.harib...@gmail.com: On Fri, Mar 13, 2015 at 1:33 PM, Peter Eisentraut pete...@gmx.net wrote: On 3/4/15 1:34 AM, Haribabu Kommi wrote: On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: + foreach(line, parsed_hba_lines) In the above for loop it is better to add check_for_interrupts to avoid it looping if the parsed_hba_lines are more. Updated patch is attached with the addition of check_for_interrupts in the for loop. I tried out your latest patch. I like that it updates even in running sessions when the file is reloaded. Thanks for the review. Sorry for late reply. The permission checking is faulty, because unprivileged users can execute pg_hba_settings() directly. corrected. Check the error messages against the style guide (especially capitalization). corrected. I don't like that there is a hard-coded limit of 16 options 5 pages away from where it is actually used. That could be done better. changed to 12 instead of 16. I'm not sure about the name pg_hba_settings. Why not just pg_hba or pg_hba_conf if you're going for a representation that is close to the file (as opposed to pg_settings, which is really a lot more abstract than any particular file). changed to pg_hba_conf. I would put the line_number column first. changed. I continue to think that it is a serious mistake to stick special values like 'all' and 'replication' into the arrays without additional decoration. That makes this view useless or at least dangerous for editing tools or tools that want to reassemble the original file. Clearly at least one of those has to be a use case. Otherwise we can just print out the physical lines without interpretation. It is possible to provide more than one keyword for databases or users. Is it fine to use the text array for keyword databases and keyword users. The mask field can go, because address is of type inet, which can represent masks itself. (Or should it be cidr then? Who knows.) The preferred visual representation of masks in pg_hba.conf has been address/mask for a while now, so we should preserve that. Additionally, you can then use the existing inet/cidr operations to do things like checking whether some IP address is contained in an address specification. removed. I can't tell from the documentation what the compare_method field is supposed to do. I see it on the code, but that is not a natural representation of pg_hba.conf. In fact, this just supports my earlier statement. Why are special values in the address field special, but not in the user or database fields? uaImplicitReject is not a user-facing authentication method, so it shouldn't be shown (or showable). removed. I would have expected options to be split into keys and values. All that code to reassemble the options from the parsed struct representation seems crazy to me. Surely, we could save the strings as we parse them? I didn't get this point clearly. Can you explain it a bit more. I can't make sense of the log message pg_hba.conf not reloaded, pg_hba_settings may show stale information. If load_hba() failed, the information isn't stale, is it? In any case, printing this to the server log seems kind of pointless, because someone using the view is presumably doing so because they can't or don't want to read the server log. The proper place might be a warning when the view is actually called. Changed accordingly as if the reload fails, further selects on the view through a warning as view data may not be proper like below. There was some failure in reloading pg_hba.conf file. The pg_hba.conf settings data may contains stale information Here I attached latest patch with the fixes other than keyword columns. I will provide the patch with keyword columns and documentation changes later. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve pgbench syntax error messages
Here is a v5. Here is v6, just a rebase. -- Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y index e68631e..68c85c9 100644 --- a/contrib/pgbench/exprparse.y +++ b/contrib/pgbench/exprparse.y @@ -26,6 +26,13 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, %expect 0 %name-prefix=expr_yy +/* +// improve bison generated syntax error messages +// *but* not available with PostgreSQL mimimal bison 1.875 +%define parse.lac full +%define parse.error verbose +*/ + %union { int64 ival; diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l index 6e3331d..5331ab7 100644 --- a/contrib/pgbench/exprscan.l +++ b/contrib/pgbench/exprscan.l @@ -17,6 +17,13 @@ static int yyline = 0, yycol = 0; static YY_BUFFER_STATE scanbufhandle; static char *scanbuf; static int scanbuflen; + +/* context information for error reporting */ +static char *expr_source = NULL; +static int expr_lineno = 0; +static char *expr_full_line = NULL; +static char *expr_command = NULL; +static int expr_col = 0; %} %option 8bit @@ -56,7 +63,9 @@ space [ \t\r\f] .{ yycol += yyleng; - fprintf(stderr, unexpected character \%s\\n, yytext); + syntax_error(expr_source, expr_lineno, expr_full_line, expr_command, + unexpected character, yytext, expr_col + yycol); + /* dead code, exit is called from syntax_error */ return CHAR_ERROR; } %% @@ -64,20 +73,27 @@ space [ \t\r\f] void yyerror(const char *message) { - /* yyline is always 1 as pgbench calls the parser for each line... - * so the interesting location information is the column number */ - fprintf(stderr, %s at column %d\n, message, yycol); - /* go on to raise the error from pgbench with more information */ + syntax_error(expr_source, expr_lineno, expr_full_line, expr_command, + message, NULL, expr_col + yycol); } /* * Called before any actual parsing is done */ void -expr_scanner_init(const char *str) +expr_scanner_init(const char *str, const char *source, + const int lineno, const char *line, + const char *cmd, const int ecol) { Size slen = strlen(str); + /* save context informations for error messages */ + expr_source = (char *) source; + expr_lineno = (int) lineno; + expr_full_line = (char *) line; + expr_command = (char *) cmd; + expr_col = (int) ecol; + /* * Might be left over after error */ @@ -105,4 +121,9 @@ expr_scanner_finish(void) { yy_delete_buffer(scanbufhandle); pg_free(scanbuf); + expr_source = NULL; + expr_lineno = 0; + expr_full_line = NULL; + expr_command = NULL; + expr_col = 0; } diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 822adfd..6a4805e 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -287,6 +287,7 @@ typedef struct int type; /* command type (SQL_COMMAND or META_COMMAND) */ int argc; /* number of command words */ char *argv[MAX_ARGS]; /* command word list */ + int cols[MAX_ARGS]; /* corresponding column starting from 1 */ PgBenchExpr *expr; /* parsed expression */ } Command; @@ -2185,6 +2186,29 @@ parseQuery(Command *cmd, const char *raw_sql) return true; } +void +syntax_error(const char *source, const int lineno, + const char *line, const char *command, + const char *msg, const char *more, const int column) +{ + fprintf(stderr, %s:%d: %s, source, lineno, msg); + if (more != NULL) + fprintf(stderr, (%s), more); + if (column != -1) + fprintf(stderr, at column %d, column); + fprintf(stderr, in command \%s\\n, command); + if (line != NULL) { + fprintf(stderr, %s\n, line); + if (column != -1) { + int i; + for (i = 0; i column - 1; i++) +fprintf(stderr, ); + fprintf(stderr, ^ error found here\n); + } + } + exit(1); +} + /* Parse a command; return a Command struct, or NULL if it's a comment */ static Command * process_commands(char *buf, const char *source, const int lineno) @@ -2229,6 +2253,7 @@ process_commands(char *buf, const char *source, const int lineno) while (tok != NULL) { + my_commands-cols[j] = tok - buf + 1; my_commands-argv[j++] = pg_strdup(tok); my_commands-argc++; if (max_args = 0 my_commands-argc = max_args) @@ -2246,9 +2271,10 @@ process_commands(char *buf, const char *source, const int lineno) if (my_commands-argc 4) { -fprintf(stderr, %s: missing argument\n, my_commands-argv[0]); -exit(1); +syntax_error(source, lineno, my_commands-line, my_commands-argv[0], + missing arguments, NULL, -1); } + /* argc = 4 */ if (my_commands-argc == 4 || /* uniform without/with uniform keyword */ @@ -2263,41 +2289,38 @@ process_commands(char *buf, const char *source, const int lineno) { if (my_commands-argc 6) { - fprintf(stderr, %s(%s): missing threshold argument\n, my_commands-argv[0], my_commands-argv[4]); - exit(1); + syntax_error(source, lineno, my_commands-line, my_commands-argv[0], + missing
Re: [HACKERS] extend pgbench expressions with functions
Here is a small v2 update: v3, just a rebase. -- Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y index e68631e..049937f 100644 --- a/contrib/pgbench/exprparse.y +++ b/contrib/pgbench/exprparse.y @@ -20,6 +20,7 @@ static PgBenchExpr *make_integer_constant(int64 ival); static PgBenchExpr *make_variable(char *varname); static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr); +static PgBenchExpr *make_func(const char *fname, PgBenchExpr *arg1); %} @@ -35,9 +36,9 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, %type expr expr %type ival INTEGER -%type str VARIABLE +%type str VARIABLE FUNCTION -%token INTEGER VARIABLE +%token INTEGER VARIABLE FUNCTION %token CHAR_ERROR /* never used, will raise a syntax error */ /* Precedence: lowest to highest */ @@ -59,6 +60,7 @@ expr: '(' expr ')' { $$ = $2; } | expr '%' expr { $$ = make_op('%', $1, $3); } | INTEGER{ $$ = make_integer_constant($1); } | VARIABLE { $$ = make_variable($1); } + | FUNCTION '(' expr ')' { $$ = make_func($1, $3); pg_free($1); } ; %% @@ -95,4 +97,33 @@ make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr) return expr; } +static struct { + char * fname; + int nargs; + PgBenchFunction tag; +} PGBENCH_FUNCTIONS[] = { + { abs, 1, PGBENCH_ABS } +}; + +static PgBenchExpr * +make_func(const char * fname, PgBenchExpr *arg1) +{ + PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr)); + int nfunctions = sizeof(PGBENCH_FUNCTIONS) / sizeof(PGBENCH_FUNCTIONS[0]); + int i; + + expr-etype = ENODE_FUNCTION; + expr-u.function.function = PGBENCH_NONE; + + for (i = 0; i nfunctions; i++) + if (pg_strcasecmp(fname, PGBENCH_FUNCTIONS[i].fname) == 0) + expr-u.function.function = PGBENCH_FUNCTIONS[i].tag; + + if (expr-u.function.function == PGBENCH_NONE) + yyerror(unexpected function name); + + expr-u.function.arg1 = arg1; + return expr; +} + #include exprscan.c diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l index 6e3331d..8fa7499 100644 --- a/contrib/pgbench/exprscan.l +++ b/contrib/pgbench/exprscan.l @@ -50,6 +50,11 @@ space [ \t\r\f] yylval.ival = strtoint64(yytext); return INTEGER; } +[a-zA-Z]+ { + yycol += yyleng; + yylval.str = pg_strdup(yytext); + return FUNCTION; +} [\n] { yycol = 0; yyline++; } {space}+ { yycol += yyleng; /* ignore */ } diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 822adfd..5a8f376 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -955,6 +955,26 @@ evaluateExpr(CState *st, PgBenchExpr *expr, int64 *retval) return false; } + case ENODE_FUNCTION: + { +switch (expr-u.function.function) +{ + case PGBENCH_ABS: + { + int64 arg1; + if (!evaluateExpr(st, expr-u.function.arg1, arg1)) + return false; + + *retval = arg1 0? arg1: -arg1; + return true; + } + default: + fprintf(stderr, bad function (%d)\n, +expr-u.function.function); + return false; +} + } + default: break; } diff --git a/contrib/pgbench/pgbench.h b/contrib/pgbench/pgbench.h index 0396e55..86d1ca1 100644 --- a/contrib/pgbench/pgbench.h +++ b/contrib/pgbench/pgbench.h @@ -15,11 +15,18 @@ typedef enum PgBenchExprType { ENODE_INTEGER_CONSTANT, ENODE_VARIABLE, - ENODE_OPERATOR + ENODE_OPERATOR, + ENODE_FUNCTION } PgBenchExprType; typedef struct PgBenchExpr PgBenchExpr; +typedef enum PgBenchFunction +{ + PGBENCH_NONE, + PGBENCH_ABS +} PgBenchFunction; + struct PgBenchExpr { PgBenchExprType etype; @@ -39,6 +46,11 @@ struct PgBenchExpr PgBenchExpr *lexpr; PgBenchExpr *rexpr; } operator; + struct + { + PgBenchFunction function; + PgBenchExpr *arg1; + } function; } u; }; diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index ed12e27..0c58412 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -762,7 +762,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ references to variables literal:/replaceablevariablename/, and expressions composed of unary (literal-/) or binary operators (literal+/, literal-/, literal*/, literal//, literal%/) - with their usual associativity, and parentheses. + with their usual associativity, function literalabs/ and parentheses. /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of thread fork emulation in pgbench?
Hello Tom, My question is: would someone still object strongly to the removal of this thread fork emulation hack in pgbench? By my count, the pthread-emulation code amounts to about 175 lines out of the nearly 4000 in pgbench.c. That's not an undue burden IMO (it's not, for comparison's sake, very much more than the amount of WIN32-specific code in that file). From my point of view, the burden is how things can be implemented if you have to assume no threads. In the discussion at hand which motivates this renewed request, the question is whether external merge sorts implying rescanning and reprinting files over and over should be implemented in pgbench so as to recombine log files. Now if we have threads, it is simpler (if the overhead is reasonable) that threads share a file handle and just generate one file, so there is no merging. That is not possible with processes at least for the aggregated logs because the thread data must be combined, or would require that pgbench use shared memory and so on, but then it is really a lot of bother for a limited feature. The alternative is that the feature would not be available if no thread, but then people object to that, on principle. And what will you do instead? It would be fine I think for pgbench to not allow --jobs different from 1 on a threadless platform, but not for it to fail to work altogether. Sure. No thread really means working with only one thread. It looks to me like allowing it to compile without that code would take nearly as much effort/mess as what's there now. My motivation is to simplify how things are done by simply assuming that threads are available and can share data, esp for counters. So the underlying question is, does Tom and Robert's opinion have changed from 2 years ago? Not mine. Ok! I'll ask again in 2017, and probably again in 2019:-) -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
Pavel Stehule pavel.steh...@gmail.com writes: here is rebased patch. It contains both patches - row_to_array function and foreach array support. While I don't have a problem with hstore_to_array, I don't think that row_to_array is a very good idea; it's basically encouraging people to throw away SQL datatypes altogether and imagine that everything is text. They've already bought into that concept if they are using hstore or json, so smashing elements of those containers to text is not a problem. But that doesn't make this version a good thing. (In any case, those who insist can get there through row_to_json, no?) Also, could we please *not* mix up these two very independent features? foreach array as implemented here may or may not be a good thing, but it should get its own discussion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
Hi here is rebased patch. It contains both patches - row_to_array function and foreach array support. This design is in conformity with hstore functions. There can be good synergy. Regards Pavel 2015-03-28 23:53 GMT+01:00 Jeff Janes jeff.ja...@gmail.com: On Tue, Jan 27, 2015 at 10:58 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi 2015-01-27 11:41 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com: 2015-01-26 21:44 GMT+01:00 Jim Nasby jim.na...@bluetreble.com: On 1/25/15 4:23 AM, Pavel Stehule wrote: I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too It is only a few lines more to current code, and this change doesn't break a compatibility. Do you think, so this patch is acceptable? Ideas, comments? Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do foreach a, b, c in array(array(1,2),array(3,4)) ? it is relative simple behave -- empty values are NULL array(1,2),array(3,4) -- do you think ARRAY[[1,2],[3,4]] is effectively ARRAY[1,2,3,4] Or the opposite case of foreach a,b in array(array(1,2,3)) Also, what about: foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ? postgres=# select array(select unnest('{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[])); array --- {1,2,3,4,5,6,7,8} (1 row) so it generate pairs {1,2}{3,4},{5,6},{7,8} I fixed situation when array has not enough elements. This no longer applies due to conflicts in src/pl/plpgsql/src/pl_exec.c caused by e524cbdc45ec6d677b1dd49 Also, what is the relationship of this patch to the row_to_array patch? Are they independent, or does one depend on the other? row_to_array by itself applies but doesn't compile. Cheers, Jeff diff --git a/contrib/hstore/expected/hstore.out b/contrib/hstore/expected/hstore.out new file mode 100644 index 9749e45..e44532e *** a/contrib/hstore/expected/hstore.out --- b/contrib/hstore/expected/hstore.out *** select %% 'aa=1, cq=l, b=g, fg=NULL' *** 1148,1153 --- 1148,1169 {b,g,aa,1,cq,l,fg,NULL} (1 row) + -- fast iteration over keys + do $$ + declare + key text; + value text; + begin + foreach key, value in array hstore_to_array('aa=1, cq=l, b=g, fg=NULL'::hstore) + loop + raise notice 'key: %, value: %', key, value; + end loop; + end; + $$; + NOTICE: key: b, value: g + NOTICE: key: aa, value: 1 + NOTICE: key: cq, value: l + NOTICE: key: fg, value: NULL select hstore_to_matrix('aa=1, cq=l, b=g, fg=NULL'::hstore); hstore_to_matrix - diff --git a/contrib/hstore/sql/hstore.sql b/contrib/hstore/sql/hstore.sql new file mode 100644 index 5a9e9ee..7b9eb09 *** a/contrib/hstore/sql/hstore.sql --- b/contrib/hstore/sql/hstore.sql *** select avals(''); *** 257,262 --- 257,275 select hstore_to_array('aa=1, cq=l, b=g, fg=NULL'::hstore); select %% 'aa=1, cq=l, b=g, fg=NULL'; + -- fast iteration over keys + do $$ + declare + key text; + value text; + begin + foreach key, value in array hstore_to_array('aa=1, cq=l, b=g, fg=NULL'::hstore) + loop + raise notice 'key: %, value: %', key, value; + end loop; + end; + $$; + select hstore_to_matrix('aa=1, cq=l, b=g, fg=NULL'::hstore); select %# 'aa=1, cq=l, b=g, fg=NULL'; diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml new file mode 100644 index d36acf6..e4abb97 *** a/doc/src/sgml/plpgsql.sgml --- b/doc/src/sgml/plpgsql.sgml *** NOTICE: row = {7,8,9} *** 2505,2510 --- 2505,2533 NOTICE: row = {10,11,12} /programlisting /para + + para + literalFOREACH/ cycle can be used for iteration over record. You + need a xref linkend=hstore extension. For this case a clause + literalSLICE/literal should not be used. literalFOREACH/literal + statements supports list of target variables. When source array is + a array of composites, then composite array element is saved to target + variables. When the array is a array of scalar values, then target + variables are filled item by item. + programlisting + CREATE FUNCTION trig_function() RETURNS TRIGGER AS $$ + DECLARE + key text; value text; + BEGIN + FOREACH key, value IN ARRAY hstore_to_array(hstore(NEW)) + LOOP + RAISE NOTICE 'key = %, value = %', key, value; + END LOOP; + RETURN NEW; + END; + $$ LANGUAGE plpgsql; + /programlisting + /para /sect2 sect2 id=plpgsql-error-trapping diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c new file mode 100644 index a65e18d..1a64d8e *** a/src/backend/utils/adt/rowtypes.c --- b/src/backend/utils/adt/rowtypes.c *** *** 21,26 --- 21,27 #include catalog/pg_type.h #include funcapi.h #include libpq/pqformat.h + #include utils/array.h
Re: [HACKERS] getting rid of thread fork emulation in pgbench?
Fabien COELHO coe...@cri.ensmp.fr writes: And what will you do instead? It would be fine I think for pgbench to not allow --jobs different from 1 on a threadless platform, but not for it to fail to work altogether. Sure. No thread really means working with only one thread. It looks to me like allowing it to compile without that code would take nearly as much effort/mess as what's there now. My motivation is to simplify how things are done by simply assuming that threads are available and can share data, esp for counters. pgbench does not get to assume that threads are available, at least not as long as the project as a whole supports --disable-thread-safety. As I said, I wouldn't have a problem with restricting the --jobs setting on threadless platforms, which seems like it would fix your problem since you wouldn't need to have more than one process involved. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Relation extension scalability
Hello, Currently bigger shared_buffers settings don't combine well with relations being extended frequently. Especially if many/most pages have a high usagecount and/or are dirty and the system is IO constrained. As a quick recap, relation extension basically works like: 1) We lock the relation for extension 2) ReadBuffer*(P_NEW) is being called, to extend the relation 3) smgrnblocks() is used to find the new target block 4) We search for a victim buffer (via BufferAlloc()) to put the new block into 5) If dirty the victim buffer is cleaned 6) The relation is extended using smgrextend() 7) The page is initialized The problems come from 4) and 5) potentially each taking a fair while. If the working set mostly fits into shared_buffers 4) can requiring iterating over all shared buffers several times to find a victim buffer. If the IO subsystem is buys and/or we've hit the kernel's dirty limits 5) can take a couple seconds. I've prototyped solving this for heap relations moving the smgrnblocks() + smgrextend() calls to RelationGetBufferForTuple(). With some care (including a retry loop) it's possible to only do those two under the extension lock. That indeed fixes problems in some of my tests. I'm not sure whether the above is the best solution however. For one I think it's not necessarily a good idea to opencode this in hio.c - I've not observed it, but this probably can happen for btrees and such as well. For another, this is still a exclusive lock while we're doing IO: smgrextend() wants a page to write out, so we have to be careful not to overwrite things. There's two things that seem to make sense to me: First, decouple relation extension from ReadBuffer*, i.e. remove P_NEW and introduce a bufmgr function specifically for extension. Secondly I think we could maybe remove the requirement of needing an extension lock alltogether. It's primarily required because we're worried that somebody else can come along, read the page, and initialize it before us. ISTM that could be resolved by *not* writing any data via smgrextend()/mdextend(). If we instead only do the write once we've read in locked the page exclusively there's no need for the extension lock. We probably still should write out the new page to the OS immediately once we've initialized it; to avoid creating sparse files. The other reason we need the extension lock is that code like lazy_scan_heap() and btvacuumscan() that tries to avoid initializing pages that are about to be initilized by the extending backend. I think we should just remove that code and deal with the problem by retrying in the extending backend; that's why I think moving extension to a different file might be helpful. I've attached my POC for heap extension, but it's really just a POC at this point. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From f1f38829160d0f2998cd187f2f920cdc7b1fa709 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Sun, 29 Mar 2015 20:55:32 +0200 Subject: [PATCH] WIP: Saner heap extension. --- src/backend/access/heap/hio.c | 110 -- 1 file changed, 63 insertions(+), 47 deletions(-) diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c index 6d091f6..178c417 100644 --- a/src/backend/access/heap/hio.c +++ b/src/backend/access/heap/hio.c @@ -15,6 +15,8 @@ #include postgres.h +#include miscadmin.h + #include access/heapam.h #include access/hio.h #include access/htup_details.h @@ -420,63 +422,77 @@ RelationGetBufferForTuple(Relation relation, Size len, /* * Have to extend the relation. * - * We have to use a lock to ensure no one else is extending the rel at the - * same time, else we will both try to initialize the same new page. We - * can skip locking for new or temp relations, however, since no one else - * could be accessing them. + * To avoid, as it used to be the case, holding the extension lock during + * victim buffer search for the new buffer, we extend the relation here + * instead of relying on bufmgr.c. We still have to hold the extension + * lock to prevent a race between two backends initializing the same page. */ - needLock = !RELATION_IS_LOCAL(relation); + while(true) + { + char emptybuf[BLCKSZ]; - if (needLock) - LockRelationForExtension(relation, ExclusiveLock); + /* + * We have to use a lock to ensure no one else is extending the rel at + * the same time, else we will both try to initialize the same new + * page. We can skip locking for new or temp relations, however, + * since no one else could be accessing them. + */ + needLock = !RELATION_IS_LOCAL(relation); + RelationOpenSmgr(relation); - /* - * XXX This does an lseek - rather expensive - but at the moment it is the - * only way to accurately determine how many blocks are in a relation. Is - * it worth keeping an accurate file length in
Re: [HACKERS] How about to have relnamespace and relrole?
Andrew Dunstan and...@dunslane.net writes: I have just claimed this as committer in the CF, but on reviewing the emails it looks like there is disagreement about the need for it at all, especially from Tom and Robert. I confess I have often wanted regnamespace, particularly, and occasionally regrole, simply as a convenience. But I'm not going to commit it against substantial opposition. Do we need a vote? My concern about it is basically that I don't see where we stop. The existing regFOO alias types are provided for object classes which have nontrivial naming conventions (schema qualification, overloaded argument types, etc), so that you can't just do select ... from catalog where objectname = 'blah'. That doesn't apply to namespaces or roles. So I'm afraid that once this precedent is established, there will be demands for regFOO for every object class we have, and I don't want that much clutter. It may be that these two cases are so much more useful than any other conceivable cases that we can do them and stop, but I don't think that argument has been made convincingly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of thread fork emulation in pgbench?
As I said, I wouldn't have a problem with restricting the --jobs setting on threadless platforms, which seems like it would fix your problem since you wouldn't need to have more than one process involved. Indeed! So I undestand that thread fork emulation could eventually go from your point of view, provided that pgbench compiles and runs with one job. I'll try to submit a patch to that effect, maybe to June CF. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rounding to even for numeric data type
On Sun, Mar 29, 2015 at 9:21 AM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: On 29/03/15 13:07, David G. Johnston wrote: On Sat, Mar 28, 2015 at 3:59 PM, Michael Paquier michael.paqu...@gmail.com mailto:michael.paqu...@gmail.comwrote: On Sun, Mar 29, 2015 at 5:34 AM, Gavin Flower gavinflo...@archidevsys.co.nz mailto:gavinflo...@archidevsys.co.nz wrote: On 28/03/15 21:58, Dean Rasheed wrote: [...] Andrew mentioned that there have been complaints from people doing calculations with monetary data that we don't implement round-to-nearest-even (Banker's) rounding. It's actually the case that various different financial calculations demand different specific rounding modes, so it wouldn't be enough to simply change the default - we would have to provide a choice of modes. [...] Could the 2 current round functions have cousins that included an extra char parameter (or string), that indicated the type of rounding? So we don't end up with an explosion of rounding functions, yet could cope with a limited set of additional rounding modes initially, and possibly others in the future. Instead of extending round, isn't what we are looking at here a new data type? I have doubts that we only want to have a way to switch round() between different modes. Hence, what we could do is: 1) Mention in the docs that numeric does round-half-away-from-zero 2) Add regression tests for numeric(n,m) and round(numeric) 3) Add a TODO item for something like numeric2, doing rounding-at-even (this could be an extension as well), but with the number of duplication that it may have with numeric, an in-core type would make sense, to facilitate things exposing some of structures key structures would help. So, create a numeric type for each possible rounding mode? That implies at least two types, round-half-even and round-half-away-from-zero, with suitable abbreviations: numeric_rhe, numeric_raz. The existing numeric now does half-up rounding. If the goal is to make plain numeric IEEE standard conforming then giving the user a way to switch all existing numeric types to numeric_raz would be nice. Implicit casts between each of the various numeric types would be needed and understandable. That's exactly the thing I think would be helpful. I'm pondering calling them numeric_eng and numeric_bus (for engineering and business respectively)... In Java, there are 8 rounding modes specified: https://docs.oracle.com/javase/8/docs/api/java/math/RoundingMode.html Some of these may be relevant to pg. That's interesting. I didn't recall those details. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rounding to even for numeric data type
On Sun, Mar 29, 2015 at 7:59 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Sun, Mar 29, 2015 at 5:34 AM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: On 28/03/15 21:58, Dean Rasheed wrote: [...] Andrew mentioned that there have been complaints from people doing calculations with monetary data that we don't implement round-to-nearest-even (Banker's) rounding. It's actually the case that various different financial calculations demand different specific rounding modes, so it wouldn't be enough to simply change the default - we would have to provide a choice of modes. [...] Could the 2 current round functions have cousins that included an extra char parameter (or string), that indicated the type of rounding? So we don't end up with an explosion of rounding functions, yet could cope with a limited set of additional rounding modes initially, and possibly others in the future. Instead of extending round, isn't what we are looking at here a new data type? I have doubts that we only want to have a way to switch round() between different modes. Hence, what we could do is: 1) Mention in the docs that numeric does round-half-away-from-zero 2) Add regression tests for numeric(n,m) and round(numeric) 3) Add a TODO item for something like numeric2, doing rounding-at-even (this could be an extension as well), but with the number of duplication that it may have with numeric, an in-core type would make sense, to facilitate things exposing some of structures key structures would help. So, attached is a patch that does 1) and 2) to make clear to the user how numeric and double precision behave regarding rounding. I am adding it to CF 2015-06 to keep track of it... -- Michael From 21e2da3d8c480f28c2cb469a004dbc225a522725 Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Sun, 29 Mar 2015 19:46:50 +0900 Subject: [PATCH] Precise rounding behavior of numeric and double precision in docs Regression tests improving the coverage in this area are added as well. --- doc/src/sgml/datatype.sgml| 18 ++ src/test/regress/expected/int2.out| 20 src/test/regress/expected/int4.out| 20 src/test/regress/expected/int8.out| 20 src/test/regress/expected/numeric.out | 24 src/test/regress/sql/int2.sql | 10 ++ src/test/regress/sql/int4.sql | 10 ++ src/test/regress/sql/int8.sql | 10 ++ src/test/regress/sql/numeric.sql | 10 ++ 9 files changed, 142 insertions(+) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index da1f25f..0342c8a 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -612,6 +612,24 @@ NUMERIC equivalent. Both types are part of the acronymSQL/acronym standard. /para + +para + With using the functionround/ function, the typenumeric/type + type rounds half-up, and the typedouble precision/ type half-even. + +programlisting +SELECT round(1.5::numeric), round(2.5::numeric); + round | round +---+--- + 2 | 3 +(1 row) +SELECT round(1.5::double precision), round(2.5::double precision); + round | round +---+--- + 2 | 2 +(1 row) +/programlisting +/para /sect2 diff --git a/src/test/regress/expected/int2.out b/src/test/regress/expected/int2.out index 311fe73..3ea4ed9 100644 --- a/src/test/regress/expected/int2.out +++ b/src/test/regress/expected/int2.out @@ -286,3 +286,23 @@ FROM (VALUES (-2.5::float8), 2.5 | 2 (7 rows) +-- check rounding when casting from numeric +SELECT x, x::int2 AS int2_value +FROM (VALUES (-2.5::numeric), + (-1.5::numeric), + (-0.5::numeric), + (0.0::numeric), + (0.5::numeric), + (1.5::numeric), + (2.5::numeric)) t(x); + x | int2_value +--+ + -2.5 | -3 + -1.5 | -2 + -0.5 | -1 + 0.0 | 0 + 0.5 | 1 + 1.5 | 2 + 2.5 | 3 +(7 rows) + diff --git a/src/test/regress/expected/int4.out b/src/test/regress/expected/int4.out index 83fe022..372fd4d 100644 --- a/src/test/regress/expected/int4.out +++ b/src/test/regress/expected/int4.out @@ -383,3 +383,23 @@ FROM (VALUES (-2.5::float8), 2.5 | 2 (7 rows) +-- check rounding when casting from numeric +SELECT x, x::int4 AS int4_value +FROM (VALUES (-2.5::numeric), + (-1.5::numeric), + (-0.5::numeric), + (0.0::numeric), + (0.5::numeric), + (1.5::numeric), + (2.5::numeric)) t(x); + x | int4_value +--+ + -2.5 | -3 + -1.5 | -2 + -0.5 | -1 + 0.0 | 0 + 0.5 | 1 + 1.5 | 2 + 2.5 | 3 +(7 rows) + diff --git a/src/test/regress/expected/int8.out b/src/test/regress/expected/int8.out index
[HACKERS] getting rid of thread fork emulation in pgbench?
Hello hackers, This is take 2, I already suggested this 2 years ago... There is a thread fork emulation hack which allows pgbench to be compiled without threads by emulating them with forks, obviously with limited capabilities. This feature is triggered by configuring postgresql with --disable-thread-safety. I'm not aware of platforms which would still benefit from this feature (we are probably talking of a PostgreSQL 9.6 maybe released in 2016), and this thread emulation is a real pain for maintaining and extending it: some features cannot be implemented, or must be implemented twice, or must be implemented in a heavy way because it cannot be assumed to be in a threaded implementation. My question is: would someone still object strongly to the removal of this thread fork emulation hack in pgbench? That would mean arguing that it is more important than a simpler code base for pgbench, which would help both its maintenance and extension, and must be kept despite these costs, hence the strongly. Tom's opinion, 2 years ago, was to object strongly: http://www.postgresql.org/message-id/24846.1372352...@sss.pgh.pa.us I would object strongly to that, as it would represent a significant movement of the goalposts on what is required to build Postgres at all, ie platforms on which --enable-thread-safety is unavailable or expensive would be out in the cold. Perhaps that set is approaching empty, but a project that's still standardized on C89 has little business making such a choice IMO. However, about not providing a full feature under thread emulation: Perhaps that's worth doing. I agree with Fabien that full support of this feature in the process model is more trouble than it's worth, though, and I wouldn't scream loudly if we just didn't support it. --disable-thread-safety doesn't have to be entirely penalty-free. Robert Haas did also object strongly: http://www.postgresql.org/message-id/CA+TgmoaJawAM0A4N1RnsndFhTOYYbwnt+7N7XWLcW=n-udc...@mail.gmail.com I don't believe that to be an acceptable restriction. We generally require features to work on all platforms we support. We have made occasional compromises there, but generally only when the restriction is fundamental to the platform rather than for developer convenience. So the underlying question is, does Tom and Robert's opinion have changed from 2 years ago? If not, I would really like to know at least *one* current platform for which --disable-thread-safety is required, just to know why I waste time over such things. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing PG_VERSION_NUM in pg_config
On Wed, Mar 25, 2015 at 2:32 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Mar 25, 2015 at 8:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: Hm. We're all agreed that there's a use case for exposing PG_VERSION_NUM to the makefiles, but I did not hear one for adding it to pg_config; and doing the former takes about two lines whereas adding a pg_config option entails quite a lot of overhead (documentation, translatable help text, yadda yadda). So I'm not in favor of doing the latter without a much more solid case than has been made. Well, I have no other cases than ones of the type mentioned upthread, and honestly I am fine as long as we do not apply maths to a version string. So attached is a patch that adds VERSION_NUM in Makefile.global. Added entry in CF 2015-06 to not have this stuff fall into the void: https://commitfest.postgresql.org/5/203/ -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relation extension scalability
On 2015-03-29 15:21:44 -0400, Tom Lane wrote: There's two things that seem to make sense to me: First, decouple relation extension from ReadBuffer*, i.e. remove P_NEW and introduce a bufmgr function specifically for extension. I think that removing P_NEW is likely to require a fair amount of refactoring of calling code, so I'm not thrilled with doing that. On the other hand, perhaps all that code would have to be touched anyway to modify the scope over which the extension lock is held. It's not *that* many locations that need to extend relations. In my playing around it seemed to me they all would need to be modified anyway; if we want to remove/reduce the scope of extension locks to deal with the fact that somebody else could have started to use the buffer. Secondly I think we could maybe remove the requirement of needing an extension lock alltogether. It's primarily required because we're worried that somebody else can come along, read the page, and initialize it before us. ISTM that could be resolved by *not* writing any data via smgrextend()/mdextend(). I'm afraid this would break stuff rather thoroughly, in particular handling of out-of-disk-space cases. Hm. Not a bad point. And I really don't see how you get consistent behavior at all for multiple concurrent callers if there's no locking. What I was thinking is something like this: while (true) { targetBuffer = AcquireFromFSMEquivalent(); if (targetBuffer == InvalidBuffer) targetBuffer = ExtendRelation(); LockBuffer(targetBuffer, BUFFER_LOCK_EXCLUSIVE); if (BufferHasEnoughSpace(targetBuffer)) break; LockBuffer(BUFFER_LOCK_UNLOCK); } where ExtendRelation() would basically work like while (true) { targetBlock = (lseek(fd, BLCKSZ, SEEK_END) - BLCKSZ)/8192; buffer = ReadBuffer(rel, targetBlock); LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); page = BufferGetPage(buffer); if (PageIsNew(page)) { PageInit(page); LockBuffer(buffer, BUFFER_LOCK_UNLOCK); FlushBuffer(buffer); break; } else { LockBuffer(buffer, BUFFER_LOCK_UNLOCK); continue; } } Obviously it's a tad more complex than that pseudocode, but I think that basically should work. Except that it, as you can say, lead to some oddities with out of space handling. I think it should actually be ok, it might just be confusing to the user. I think we might be able to address those issues by not using lseek(SEEK_SET) but instead fcntl(fd, F_SETFL, O_APPEND, 1); write(fd, pre-init-block, BLCKSZ); fcntl(fd, F_SETFL, O_APPEND, 0); newblock = (lseek(SEEK_CUR) - BLCKSZ)/BLCKSZ; by using O_APPEND and a pre-initialized block we can be sure to write a block at the end, that's valid, and shouldn't run afould of any out of space issues that we don't already have. Unfortunately I'm not sure whether fcntl for O_APPEND is portable :( One idea that might help is to change smgrextend's API so that it doesn't need a buffer to write from, but just has an API of add a prezeroed block on-disk and tell me the number of the block you added. On the other hand, that would then require reading in the block after allocating a buffer to hold it (I don't think you can safely assume otherwise) so the added read step might eat any savings. Yea, I was thinking that as well. We simply could skip the reading step by setting up the contents in the buffer manager without a read in this case... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rounding to even for numeric data type
MP == Michael Paquier michael.paqu...@gmail.com writes: MP So, attached is a patch that does 1) and 2) to make clear to the MP user how numeric and double precision behave regarding rounding. MP I am adding it to CF 2015-06 to keep track of it... Given that the examples show -2.5 rounds to -3, the IEEE term is roundTiesToAway, and the typical conversational english is round ties away from zero. RoundUp means mean towards +Infinity. 754 specifies that for decimal, either roundTiesToEven or roundTiesToAway are acceptable defaults, and which of the two applies is language dependent. Does ANSI SQL say anything about how numeric should round? In general, for decimals (or anything other than binary), there are twelve possible roundings: ToEven ToOdd AwayFromZero ToZero Up Down TiesToEven TiesToOdd TiesAwayFromZero TiesToZero TiesUp TiesDown (Up is the same as ceil(3), Down as floor(3).) -JimC -- James Cloos cl...@jhcloos.com OpenPGP: 0x997A9F17ED7DAEA6 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relation extension scalability
On 2015-03-29 16:07:49 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2015-03-29 15:21:44 -0400, Tom Lane wrote: One idea that might help is to change smgrextend's API so that it doesn't need a buffer to write from, but just has an API of add a prezeroed block on-disk and tell me the number of the block you added. On the other hand, that would then require reading in the block after allocating a buffer to hold it (I don't think you can safely assume otherwise) so the added read step might eat any savings. Yea, I was thinking that as well. We simply could skip the reading step by setting up the contents in the buffer manager without a read in this case... No, you can't, at least not if the point is to not be holding any exclusive lock by the time you go to talk to the buffer manager. There will be nothing stopping some other backend from writing into that page of the file before you can get hold of it. If the buffer they used to do the write has itself gotten recycled, there is nothing left at all to tell you your page image is out of date. That's why I'd proposed restructuring things so that the actual extension/write to the file only happens once we have the buffer manager exclusive lock on the individual buffer. While not trvia to implement it doesn't look prohibitively complex. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] compute_index_stats is missing a CHECK_FOR_INTERRUPTS
Jeff Janes jeff.ja...@gmail.com writes: On Sat, Mar 28, 2015 at 3:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Hm. The other per-sample-row loops in analyze.c use vacuum_delay_point() rather than CHECK_FOR_INTERRUPTS() directly. Ordinarily that wouldn't make much difference here, but maybe a slow index function might be incurring I/O? That isn't the case for me (and if it were, they wouldn't be going through the buffer manager anyway and so would not trigger delay criteria), but that seems like a valid concern in general. It also explains why I couldn't find CHECK_FOR_INTERRUPTS in other loops of that file, because I was looking for the wrong spelling. Adding a vacuum_delay_point does solve the immediately observed problem, both the toy one and the more realistic one. Committed, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relation extension scalability
Andres Freund and...@2ndquadrant.com writes: As a quick recap, relation extension basically works like: 1) We lock the relation for extension 2) ReadBuffer*(P_NEW) is being called, to extend the relation 3) smgrnblocks() is used to find the new target block 4) We search for a victim buffer (via BufferAlloc()) to put the new block into 5) If dirty the victim buffer is cleaned 6) The relation is extended using smgrextend() 7) The page is initialized The problems come from 4) and 5) potentially each taking a fair while. Right, so basically we want to get those steps out of the exclusive lock scope. There's two things that seem to make sense to me: First, decouple relation extension from ReadBuffer*, i.e. remove P_NEW and introduce a bufmgr function specifically for extension. I think that removing P_NEW is likely to require a fair amount of refactoring of calling code, so I'm not thrilled with doing that. On the other hand, perhaps all that code would have to be touched anyway to modify the scope over which the extension lock is held. Secondly I think we could maybe remove the requirement of needing an extension lock alltogether. It's primarily required because we're worried that somebody else can come along, read the page, and initialize it before us. ISTM that could be resolved by *not* writing any data via smgrextend()/mdextend(). I'm afraid this would break stuff rather thoroughly, in particular handling of out-of-disk-space cases. And I really don't see how you get consistent behavior at all for multiple concurrent callers if there's no locking. One idea that might help is to change smgrextend's API so that it doesn't need a buffer to write from, but just has an API of add a prezeroed block on-disk and tell me the number of the block you added. On the other hand, that would then require reading in the block after allocating a buffer to hold it (I don't think you can safely assume otherwise) so the added read step might eat any savings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
2015-03-29 20:27 GMT+02:00 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: here is rebased patch. It contains both patches - row_to_array function and foreach array support. While I don't have a problem with hstore_to_array, I don't think that row_to_array is a very good idea; it's basically encouraging people to throw away SQL datatypes altogether and imagine that everything is text. This is complementation of ARRAY API - we have row_to_json, probably will have row_to_jsonb, row_to_hstore and row_to_array is relative logical. Casting to text is not fast, but on second hand - working with text arrays is fast. I know so casting to text is a problem, but if you iterate over record's fields, then you have to find common shared type due sharing plans - and text arrays can be simple solution. Now, with current possibilities I'll do full sql expression SELECT key, value FROM each(hstore(ROW)) or FOREACH ARRAY hstore_to_matrix(hstore(ROW)) row_to_array(ROW) can reduce a hstore overhead any other solution based on PL/Perl or PL/Python are slower due PL engine start and due same transformation to some form of structured text. They've already bought into that concept if they are using hstore or json, so smashing elements of those containers to text is not a problem. But that doesn't make this version a good thing. (In any case, those who insist can get there through row_to_json, no?) Also, could we please *not* mix up these two very independent features? foreach array as implemented here may or may not be a good thing, but it should get its own discussion. ok, I'll send two patches. regards, tom lane
Re: [HACKERS] Relation extension scalability
Andres Freund and...@2ndquadrant.com writes: On 2015-03-29 15:21:44 -0400, Tom Lane wrote: One idea that might help is to change smgrextend's API so that it doesn't need a buffer to write from, but just has an API of add a prezeroed block on-disk and tell me the number of the block you added. On the other hand, that would then require reading in the block after allocating a buffer to hold it (I don't think you can safely assume otherwise) so the added read step might eat any savings. Yea, I was thinking that as well. We simply could skip the reading step by setting up the contents in the buffer manager without a read in this case... No, you can't, at least not if the point is to not be holding any exclusive lock by the time you go to talk to the buffer manager. There will be nothing stopping some other backend from writing into that page of the file before you can get hold of it. If the buffer they used to do the write has itself gotten recycled, there is nothing left at all to tell you your page image is out of date. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing INNER JOINs
David Rowley dgrowle...@gmail.com writes: I'm not worried about the cost of the decision at plan init time. I was just confused about what Tom was recommending. I couldn't quite decide from his email if he meant to keep the alternative plan node around so that the executor must transition through it for each row, or to just choose the proper plan at executor init and return the actual root of the selected plan instead of returning the initialised AlternativePlan node (see nodeAlternativePlan.c) TBH, I don't like anything about this patch and would prefer to reject it altogether. I think it's a disaster from a system structural standpoint, will probably induce nasty bugs, and simply doesn't apply to enough real-world queries to be worth either the pain of making it work or the planning overhead it will add. The structural damage could be reduced if you got rid of the far-too-cute idea that you could cut the AlternativePlan node out of the plan tree at executor startup time. Just have it remember which decision it made and pass down all the calls to the appropriate child node. What you're doing here violates the rule that planstate trees have a one-to-one relationship to plan trees. EXPLAIN used to iterate over those trees in lockstep, and there probably still is code that does similar things (in third-party modules if not core), so I don't think we should abandon that principle. Also, the patch seems rather schizophrenic about whether it's relying on run-time decisions or not; in particular I see no use of the PlannedStmt.suitableFor field, and no reason to have one if there's to be an AlternativePlan node making the decision. I'm just about certain that you can't generate the two alternative plans simply by calling make_one_rel() twice. The planner has far too much tendency to scribble on its input data structures for that to work reliably. It's possible you could dodge bugs of that sort, and save some cycles, if you did base relation planning only once and created the alternatives only while working at the join-relation level. Even then I think you'd need to do pushups like what GEQO does in order to repeat join-level planning. (While I'm looking at that: what used to be a reasonably clear API between query_planner and grouping_planner now seems like a complete mess, and I'm quite certain you've created multiple bugs in grouping_planner, because it would need to track more than one e.g. current_pathkeys value for the subplans created for the different final_rels. You can't just arbitrarily do some of those steps in one loop and the others in a totally different loop.) As far as the fundamental-bugs issue goes, I have no faith that there are no pending AFTER trigger events is a sufficient condition for all known foreign-key constraints hold against whatever-random-snapshot-I'm-using; and it's certainly not a *necessary* condition. (BTW the patch should be doing its best to localize knowledge about that, rather than spreading the assumption far and wide through comments and choices of flag/variable names.) I think the best you can hope to say in that line is that if there were no pending events at the time the snapshot was taken, it might be all right. Maybe. But it's not hard to imagine the trigger queue getting emptied while snapshots still exist that can see inconsistent states that prevailed before the triggers fired. (Remember that a trigger might restore consistency by applying additional changes, not just by throwing an error.) STABLE functions are the pain point here since they could execute queries with snapshots from quite some time back. As for the cost issue, that's an awful lot of code you added to remove_useless_joins(). I'm concerned about how much time it's going to spend while failing to prove anything for most queries. For starters, it looks to be O(N^2) in the number of relations, which the existing join_is_removable code is not; and in general it looks like it will do work in many more common cases than the existing code has to. I'm also pretty unhappy about having get_relation_info() physically visiting the pg_constraint catalog for every relation that's ever scanned by any query. (You could probably teach the relcache to remember data about foreign-key constraints, instead of doing it this way.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers