[HACKERS] numeric and float comparison oddities
I saw some strange results: postgres=# select '1.1'::numeric = '1.1'::float8; ?column? -- t (1 row) postgres=# select '1.1'::numeric = '1.1'::float4; ?column? -- f (1 row) When I looked into it, I saw that the numeric is being cast to a float8, making the first statement trivially true. Why does the cast go from numeric to float if that direction loses precision? One reason is because float supports +/- infinity, but that seems more like convenience than a good reason. Is there another reason? Have we considered adding +/- infinity to numeric so that it can represent every float value? That might make the numeric hierarchy a little cleaner and less surprising. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench -- splits v4
Hello, Version one is k' = 1 + (a * k + b) modulo n with a prime with respect to n, n being the number of keys. This is nearly possible, but for the modulo operator which is currently missing, and that I'm planning to submit for this very reason, but probably another time. That's pretty crude, Yep. It is very simple, it is much better than nothing, and for a database test is may be good enough. although I don't object to a modulo operator. It would be nice to be able to use a truly random permutation, which is not hard to generate but probably requires O(n) storage, likely a problem for large scale factors. That is indeed the actual issue in my mind. I was thinking of permutations with a formula, which are not so easy to find and may end-up looking like (a*k+b)%n anyway. I had the same issue for generating random data for a schema (see http://www.coelho.net/datafiller.html). Maybe somebody who knows more math than I do (like you, probably!) can come up with something more clever. I can certainly suggest other formula, but that does not mean beautiful code, thus would probably be rejected. I'll see. An alternative to this whole process may be to hash/modulo a non uniform random value. id = 1 + hash(some-random()) % n But the hashing changes the distribution as it adds collisions, so I have to think about how to be able to control the distribution in that case, and what hash function to use. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench -- splits v4
Hi, 2014-08-01 16:26 GMT+09:00 Fabien COELHO coe...@cri.ensmp.fr Maybe somebody who knows more math than I do (like you, probably!) can come up with something more clever. I can certainly suggest other formula, but that does not mean beautiful code, thus would probably be rejected. I'll see. An alternative to this whole process may be to hash/modulo a non uniform random value. id = 1 + hash(some-random()) % n But the hashing changes the distribution as it adds collisions, so I have to think about how to be able to control the distribution in that case, and what hash function to use. I think that we have to consider and select reproducible method, because benchmark is always needed robust and reproducible result. And if we realize this idea, we might need more accurate random generator that is like Mersenne twister algorithm. erand48 algorithm is slow and not accurate very much. By the way, I don't know relativeness of this topic and command line option... Well whatever... Regards, -- Mitsumasa KONDO
[HACKERS] Index-only scans for GIST
Hi, hackers! I work on a GSoC project Index-only scans for GIST https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014 Repository is https://github.com/lubennikovaav/postgres/tree/indexonlygist2 Patch is in attachments. It includes index-only scans for multicolumn GIST and new regression test. Fetch() method is realized for box and point opclasses. Documentation is not updated yet, but I'm going to do it till the end of GSoC. I've got one question about query with OR condition. It is the last query in regression test gist_indexonly. It doesn't fail but it doensn't use index-only scans. Could someone explain to me how it works? It seems to depend on build_paths_for_OR http://doxygen.postgresql.org/indxpath_8c.html#ae660d2e886355e53ed3b9ec693e4afd2 function. But I couldn't understand how. -- Best regards, Lubennikova Anastasia indexonlyscan_gist.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.
Hello, this is the new version which is complete to some extent of parallelism based on postgres_fdw. This compares the costs for parallel and non-parallel execution and choose parallel one if it is faster by some extent specified by GUCs. The attached files are, 0001_parallel_exec_planning_v0.patch: - PostgreSQL body stuff for parallel execution planning. 0002_enable_postgres_fdw_to_run_in_parallel_v0.patch: - postgres_fdw parallelization. 0003_file_fdw_changes_to_avoid_error.patch: - error avoidig stuff for file_fdw (not necessary for this patch) env.sql: - simple test script to try this patch. = - planner stuff to handle cost of parallel execution. Including indication of parallel execution. - GUCs to control how easy to go parallel. parallel_cost_threshold is the threshold of path total cost where to enable parallel execution. prallel_ratio_threshond is the threshold of the ratio of parallel cost to non-parallel cost where to choose the parallel path. - postgres_fdw which can run in multiple sessions using snapshot export and fetches in parallel for foreign scans on dedicated connections. foreign server has a new option 'max_aux_connections', which limits the number of connections for parallel execution per (server, user) pairs. - change file_fdw to follow the changes of planner stuff. Whth the patch attached, the attached sql script shows the following result (after some line breaks are added). postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c FROM fvs1 a join fvs1_2 b on (a.a = b.a); QUERY PLAN Hash Join (cost=9573392.96..9573393.34 rows=1 width=40 parallel) (actual time=2213.400..2213.407 rows=12 loops=1) Hash Cond: (a.a = b.a) - Foreign Scan on fvs1 a (cost=9573392.96..9573393.29 rows=10 width=8 parallel) (actual time=2199.992..2199.993 rows=10 loops=1) - Hash (cost=9573393.29..9573393.29 rows=10 width=36) (actual time=13.388..13.388 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 6kB - Foreign Scan on fvs1_2 b (cost=9573392.96..9573393.29 rows=10 width=36 parallel) (actual time=13.376..13.379 rows=10 loops=1) Planning time: 4.761 ms Execution time: 2227.462 ms (8 rows) postgres=# SET parallel_ratio_threshold to 0.0; postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c FROM fvs1 a join fvs1 b on (a.a = b.a); QUERY PLAN -- Hash Join (cost=318084.32..318084.69 rows=1 width=40) (actual time=4302.913..4302.928 rows=12 loops=1) Hash Cond: (a.a = b.a) - Foreign Scan on fvs1 a (cost=159041.93..159042.26 rows=10 width=8) (actual time=2122.989..2122.992 rows=10 loops=1) - Hash (cost=159042.26..159042.26 rows=10 width=500) (actual time=2179.900..2179.900 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 6kB - Foreign Scan on fvs1 b (cost=159041.93..159042.26 rows=10 width=500) (actual time=2179.856..2179.864 rows=10 loops=1) Planning time: 5.085 ms Execution time: 4303.728 ms (8 rows) Where, parallel indicates that the node includes nodes run in parallel. The latter EXPLAIN shows the result when parallel execution is inhibited. Since the lack of time, sorry that the details for this patch is comming later. Is there any suggestions or opinions? regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 781a736..d810c3c 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -1168,16 +1168,18 @@ ExplainNode(PlanState *planstate, List *ancestors, { if (es-format == EXPLAIN_FORMAT_TEXT) { - appendStringInfo(es-str, (cost=%.2f..%.2f rows=%.0f width=%d), + appendStringInfo(es-str, (cost=%.2f..%.2f rows=%.0f width=%d%s), plan-startup_cost, plan-total_cost, - plan-plan_rows, plan-plan_width); + plan-plan_rows, plan-plan_width, + plan-parallel_start ? parallel : ); } else { ExplainPropertyFloat(Startup Cost, plan-startup_cost, 2, es); ExplainPropertyFloat(Total Cost, plan-total_cost, 2, es); ExplainPropertyFloat(Plan Rows, plan-plan_rows, 0, es); - ExplainPropertyInteger(Plan Width, plan-plan_width, es); + ExplainPropertyText(Parallel Exec, +plan-parallel_start ? true : false, es); } } diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index c81efe9..85d78b4 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -53,6 +53,8 @@ typedef struct pushdown_safety_info /* These
Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.
Hello, Hello, this is the new version which is complete to some extent of parallelism based on postgres_fdw. This compares the costs for parallel and non-parallel execution and choose parallel one if it is faster by some extent specified by GUCs. The attached files are, 0001_parallel_exec_planning_v0.patch: - PostgreSQL body stuff for parallel execution planning. 0002_enable_postgres_fdw_to_run_in_parallel_v0.patch: - postgres_fdw parallelization. 0003_file_fdw_changes_to_avoid_error.patch: - error avoidig stuff for file_fdw (not necessary for this patch) env.sql: - simple test script to try this patch. = - planner stuff to handle cost of parallel execution. Including indication of parallel execution. - GUCs to control how easy to go parallel. parallel_cost_threshold is the threshold of path total cost where to enable parallel execution. prallel_ratio_threshond is the threshold of the ratio of parallel cost to non-parallel cost where to choose the parallel path. - postgres_fdw which can run in multiple sessions using snapshot export and fetches in parallel for foreign scans on dedicated connections. But now the effect of async execution of FETCH'es is omitted during planning. foreign server has a new option 'max_aux_connections', which limits the number of connections for parallel execution per (server, user) pairs. - change file_fdw to follow the changes of planner stuff. Whth the patch attached, the attached sql script shows the following result (after some line breaks are added). postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c FROM fvs1 a join fvs1_2 b on (a.a = b.a); QUERY PLAN Hash Join (cost=9573392.96..9573393.34 rows=1 width=40 parallel) (actual time=2213.400..2213.407 rows=12 loops=1) Hash Cond: (a.a = b.a) - Foreign Scan on fvs1 a (cost=9573392.96..9573393.29 rows=10 width=8 parallel) (actual time=2199.992..2199.993 rows=10 loops=1) - Hash (cost=9573393.29..9573393.29 rows=10 width=36) (actual time=13.388..13.388 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 6kB - Foreign Scan on fvs1_2 b (cost=9573392.96..9573393.29 rows=10 width=36 parallel) (actual time=13.376..13.379 rows=10 loops=1) Planning time: 4.761 ms Execution time: 2227.462 ms (8 rows) postgres=# SET parallel_ratio_threshold to 0.0; postgres=# EXPLAIN ANALYZE SELECT a.a, a.b, b.c FROM fvs1 a join fvs1 b on (a.a = b.a); QUERY PLAN -- Hash Join (cost=318084.32..318084.69 rows=1 width=40) (actual time=4302.913..4302.928 rows=12 loops=1) Hash Cond: (a.a = b.a) - Foreign Scan on fvs1 a (cost=159041.93..159042.26 rows=10 width=8) (actual time=2122.989..2122.992 rows=10 loops=1) - Hash (cost=159042.26..159042.26 rows=10 width=500) (actual time=2179.900..2179.900 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 6kB - Foreign Scan on fvs1 b (cost=159041.93..159042.26 rows=10 width=500) (actual time=2179.856..2179.864 rows=10 loops=1) Planning time: 5.085 ms Execution time: 4303.728 ms (8 rows) Where, parallel indicates that the node includes nodes run in parallel. The latter EXPLAIN shows the result when parallel execution is inhibited. Since the lack of time, sorry that the details for this patch is comming later. Is there any suggestions or opinions? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SKIP LOCKED DATA (work in progress)
On Tue, Jul 29, 2014 at 1:35 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: David Rowley wrote: I've also been looking at the isolation tests and I see that you've added a series of tests for NOWAIT. I was wondering why you did that as that's really existing code, probably if you thought the tests were a bit thin around NOWAIT then maybe that should be a separate patch? The isolation tester is new so we don't have nearly enough tests for it. Adding more meaningful tests is good even if they're unrelated to the patch at hand. I completely agree that some more isolation tests coverage would be a good thing. I just saw it as something not directly related to this feature, so thought it would be better as a separate patch. From my experience with the project, normally when I try to sneak something extra in, it either does not make the final commit, or gets added in a separate commit. Regards David Rowley
Re: [HACKERS] SKIP LOCKED DATA (work in progress)
On Tue, Jul 29, 2014 at 9:48 AM, Thomas Munro mu...@ip9.org wrote: On 27 July 2014 23:19, Thomas Munro mu...@ip9.org wrote: On the subject of isolation tests, I think skip-locked.spec is only producing schedules that reach third of the three 'return HeapTupleWouldBlock' statements in heap_lock_tuple. I will follow up with some more thorough isolation tests in the next week or so to cover the other two, and some other scenarios and interactions with other feature. Now with extra isolation tests so that the three different code branches that can skip rows are covered. I temporarily added some logging lines to double check that the expected branches are reached by each permutation while developing the specs. They change the output and are not part of the patch -- attaching separately. I've had a look over the isolation tests now and I can't see too much wrong, just a couple of typos... * skip-locked-2.spec # s2 againt skips because it can't acquired a multi-xact lock againt should be again also mixed use of multixact and multi-xact, probably would be better to stick to just 1. Also this file would probably be slightly easier to read with a new line after each permutation line. * skip_locked-3.spec # s3 skips to second record due to tuple lock held by s2 There's a missing the after skips to Also, won't the lock be held by s1 not s2? There's just a couple of other tiny things. * Some whitespace issues shown by git diff --check src/backend/parser/gram.y:9221: trailing whitespace. +opt_nowait_or_skip: src/backend/rewrite/rewriteHandler.c:65: trailing whitespace. + LockClauseStrength strength, LockWaitPolicy waitPolicy, * analyze.c The StaticAssertStmt's I think these should be removed. The only place where this behaviour can be changed is in lockwaitpolicy.h, I think it would be better to just strengthen the comment on the enum's definition. Perhaps something more along the lines of: Policy for what to do when a row lock cannot be obtained immediately. The enum values defined here have critical importance to how the parser treats multiple FOR UPDATE/SHARE statements in different nested levels of the query. Effectively if multiple locking clauses are defined in the query then the one with the highest enum value takes precedence over the others. Apart from this I can't see any other problems with the patch and I'd be very inclined, once the above are fixed up to mark the patch ready for commiter. Good work Regards David Rowley
Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.
On 07/30/2014 07:46 PM, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Wed, Jul 30, 2014 at 01:29:31PM -0400, Tom Lane wrote: I don't find it all that odd. We should not be encouraging routine database-wide reindexes. Uh, do we encourage database-wide VACUUM FULL or CLUSTER, as we use them there with no parameter. Is there a reason REINDEX should be harder, and require a dummy argument to run? I believe that REINDEX on system catalogs carries a risk of deadlock failures against other processes --- there was a recent example of that in the mailing lists. VACUUM FULL has such risks too, but that's been pretty well deprecated for many years. (I think CLUSTER is probably relatively safe on this score because it's not going to think any system catalogs are clustered.) If there were a variant of REINDEX that only hit user tables, I'd be fine with making that easy to invoke. Here are two patches for this. The first one, reindex_user_tables.v1.patch, implements the variant that only hits user tables, as suggested by you. The second one, reindex_no_dbname.v1.patch, allows the three database-wide variants to omit the database name (voted for by Daniel Migowski, Bruce, and myself; voted against by you). This patch is to be applied on top of the first one. -- Vik *** a/doc/src/sgml/ref/reindex.sgml --- b/doc/src/sgml/ref/reindex.sgml *** *** 21,27 PostgreSQL documentation refsynopsisdiv synopsis ! REINDEX { INDEX | TABLE | DATABASE | SYSTEM } replaceable class=PARAMETERname/replaceable [ FORCE ] /synopsis /refsynopsisdiv --- 21,27 refsynopsisdiv synopsis ! REINDEX { INDEX | TABLE | DATABASE | SYSTEM | USER TABLES } replaceable class=PARAMETERname/replaceable [ FORCE ] /synopsis /refsynopsisdiv *** *** 126,139 REINDEX { INDEX | TABLE | DATABASE | SYSTEM } replaceable class=PARAMETERnam /varlistentry varlistentry termreplaceable class=PARAMETERname/replaceable/term listitem para The name of the specific index, table, or database to be reindexed. Index and table names can be schema-qualified. ! Presently, commandREINDEX DATABASE/ and commandREINDEX SYSTEM/ ! can only reindex the current database, so their parameter must match ! the current database's name. /para /listitem /varlistentry --- 126,151 /varlistentry varlistentry + termliteralUSER TABLES/literal/term + listitem + para + Recreate all indexes on user tables within the current database. + Indexes on system catalogs are not processed. + This form of commandREINDEX/command cannot be executed inside a + transaction block. + /para + /listitem +/varlistentry + +varlistentry termreplaceable class=PARAMETERname/replaceable/term listitem para The name of the specific index, table, or database to be reindexed. Index and table names can be schema-qualified. ! Presently, commandREINDEX DATABASE/, commandREINDEX SYSTEM/, ! and commandREINDEX USER TABLES/ can only reindex the current ! database, so their parameter must match the current database's name. /para /listitem /varlistentry *** a/doc/src/sgml/ref/reindexdb.sgml --- b/doc/src/sgml/ref/reindexdb.sgml *** *** 65,70 PostgreSQL documentation --- 65,80 /group arg choice=optreplaceabledbname/replaceable/arg /cmdsynopsis + + cmdsynopsis +commandreindexdb/command +arg rep=repeatreplaceableconnection-option/replaceable/arg +group choice=plain + arg choice=plainoption--usertables/option/arg + arg choice=plainoption-u/option/arg +/group +arg choice=optreplaceabledbname/replaceable/arg + /cmdsynopsis /refsynopsisdiv *** *** 173,178 PostgreSQL documentation --- 183,198 /listitem /varlistentry + varlistentry + termoption-u//term + termoption--usertables//term + listitem +para + Reindex database's user tables. +/para + /listitem + /varlistentry + varlistentry termoption-V//term termoption--version//term *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *** *** 6984,6989 ReindexStmt: --- 6984,6999 n-do_user = true; $$ = (Node *)n; } + | REINDEX USER TABLES name opt_force + { + ReindexStmt *n = makeNode(ReindexStmt); + n-kind = OBJECT_DATABASE; + n-name = $4; + n-relation = NULL; + n-do_system = false; + n-do_user = true; + $$ = (Node *)n; + } ; reindex_type: *** a/src/bin/psql/tab-complete.c --- b/src/bin/psql/tab-complete.c *** *** 3145,3151 psql_completion(const char *text, int start, int end) else if (pg_strcasecmp(prev_wd, REINDEX) == 0) { static const char
[HACKERS] Bug of pg_receivexlog -v
Hi, In 9.2, pg_receivexlog -v has emitted the messages as follows at the end of each WAL file. pg_receivexlog: finished segment at 0/200 (timeline 1) pg_receivexlog: finished segment at 0/300 (timeline 1) pg_receivexlog: finished segment at 0/400 (timeline 1) But, while reviewing the pg_receivexlog patch written by Furuya, I found that 0b63291 had broken this feature, and then now pg_receivexlog -v doesn't emit such messages at all. Attached patch fixes this problem. This needs to be back-patched to 9.3. Regards, -- Fujii Masao *** a/src/bin/pg_basebackup/receivelog.c --- b/src/bin/pg_basebackup/receivelog.c *** *** 991,997 HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, xlogoff = 0; ! if (still_sending stream_stop(blockpos, timeline, false)) { if (PQputCopyEnd(conn, NULL) = 0 || PQflush(conn)) { --- 991,997 xlogoff = 0; ! if (still_sending stream_stop(blockpos, timeline, true)) { if (PQputCopyEnd(conn, NULL) = 0 || PQflush(conn)) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] numeric and float comparison oddities
Jeff Davis pg...@j-davis.com wrote: I saw some strange results: postgres=# select '1.1'::numeric = '1.1'::float8; ?column? -- t (1 row) postgres=# select '1.1'::numeric = '1.1'::float4; ?column? -- f (1 row) The part I find strange is that the first one evaluates to true, since numeric can exactly represent 1.1 and float8 cannot. It also seems inconsistent with this: test=# set extra_float_digits = 3; SET test=# select '1.1'::float4; float4 1.1002 (1 row) test=# select '1.1'::float8; float8 - 1.10009 (1 row) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] numeric and float comparison oddities
Kevin Grittner kgri...@ymail.com writes: Jeff Davis pg...@j-davis.com wrote: I saw some strange results: The part I find strange is that the first one evaluates to true, since numeric can exactly represent 1.1 and float8 cannot. The reason is that the numeric input is converted to float8 for comparison: regression=# create table ttt(f4 float4, f8 float8, fn numeric); CREATE TABLE regression=# explain verbose select f4=fn, f8=fn from ttt; QUERY PLAN Seq Scan on public.ttt (cost=0.00..32.00 rows=1100 width=44) Output: (f4 = (fn)::double precision), (f8 = (fn)::double precision) Planning time: 0.325 ms (3 rows) Were it not, you'd hardly ever get equality. I think that years ago we concluded this behavior was required by SQL spec (see the language about exact vs inexact numeric types). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SKIP LOCKED DATA (work in progress)
On 1 August 2014 10:37, David Rowley dgrowle...@gmail.com wrote: * skip-locked-2.spec # s2 againt skips because it can't acquired a multi-xact lock againt should be again Fixed. also mixed use of multixact and multi-xact, probably would be better to stick to just 1. Changed to multixact as seen in other places. Also this file would probably be slightly easier to read with a new line after each permutation line. Done. * skip_locked-3.spec # s3 skips to second record due to tuple lock held by s2 There's a missing the after skips to Fixed. Also, won't the lock be held by s1 not s2? s1 holds the row lock, and s2 holds the tuple lock (because it is at the head of the queue waiting for the row lock). s3 skips because it couldn't acquire the tuple lock held by s2. The other two specs are about not being able to acquire a row lock and only need two sessions. This one tests the code path when there is already a queue forming and you can't even get the tuple lock, which requires an extra session. I have updated the comment to make that clearer. There's just a couple of other tiny things. * Some whitespace issues shown by git diff --check src/backend/parser/gram.y:9221: trailing whitespace. +opt_nowait_or_skip: src/backend/rewrite/rewriteHandler.c:65: trailing whitespace. + LockClauseStrength strength, LockWaitPolicy waitPolicy, Fixed. * analyze.c The StaticAssertStmt's I think these should be removed. The only place where this behaviour can be changed is in lockwaitpolicy.h, I think it would be better to just strengthen the comment on the enum's definition. Removed. Perhaps something more along the lines of: Policy for what to do when a row lock cannot be obtained immediately. The enum values defined here have critical importance to how the parser treats multiple FOR UPDATE/SHARE statements in different nested levels of the query. Effectively if multiple locking clauses are defined in the query then the one with the highest enum value takes precedence over the others. Added something along those lines. Apart from this I can't see any other problems with the patch and I'd be very inclined, once the above are fixed up to mark the patch ready for commiter. Good work Thanks for all the guidance, I appreciate it! My review karma account is now well overdrawn. Best regards, Thomas Munro diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 231dc6a..0469705 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -45,7 +45,7 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replac [ LIMIT { replaceable class=parametercount/replaceable | ALL } ] [ OFFSET replaceable class=parameterstart/replaceable [ ROW | ROWS ] ] [ FETCH { FIRST | NEXT } [ replaceable class=parametercount/replaceable ] { ROW | ROWS } ONLY ] -[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT ] [...] ] +[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] phrasewhere replaceable class=parameterfrom_item/replaceable can be one of:/phrase @@ -1283,7 +1283,7 @@ FETCH { FIRST | NEXT } [ replaceable class=parametercount/replaceable ] { The locking clause has the general form synopsis -FOR replaceablelock_strength/ [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT ] +FOR replaceablelock_strength/ [ OF replaceable class=parametertable_name/replaceable [, ...] ] [ NOWAIT | SKIP LOCKED ] /synopsis where replaceablelock_strength/ can be one of @@ -1359,11 +1359,17 @@ KEY SHARE para To prevent the operation from waiting for other transactions to commit, -use the literalNOWAIT/ option. With literalNOWAIT/, the statement -reports an error, rather than waiting, if a selected row -cannot be locked immediately. Note that literalNOWAIT/ applies only -to the row-level lock(s) mdash; the required literalROW SHARE/literal -table-level lock is still taken in the ordinary way (see +use either the literalNOWAIT/ or literalSKIP LOCKED/literal +option. With literalNOWAIT/, the statement reports an error, rather +than waiting, if a selected row cannot be locked immediately. +With literalSKIP LOCKED/literal, any selected rows that cannot be +immediately locked are skipped. Skipping locked rows provides an +inconsistent view of the data, so this is not suitable for general purpose +work, but can be used to avoid lock contention with multiple consumers +accessing a queue-like table. Note that literalNOWAIT/ +and literalSKIP LOCKED/literal apply only to the row-level lock(s) +mdash; the required literalROW SHARE/literal table-level lock is +still taken in the ordinary way (see xref linkend=mvcc). You can use xref
Re: [HACKERS] Proposal: Incremental Backup
On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com wrote: c) the map is not crash safe by design, because it needs only for incremental backup to track what blocks needs to be backuped, not for consistency or recovery of the whole cluster, so it's not an heavy cost for the whole cluster to maintain it. we could think an option (but it's heavy) to write it at every flush on file to have crash-safe map, but I not think it's so usefull . I think it's acceptable, and probably it's better to force that, to say: if your db will crash, you need a fullbackup , I am not sure if your this assumption is right/acceptable, how can we say that in such a case users will be okay to have a fullbackup? In general, taking fullbackup is very heavy operation and we should try to avoid such a situation. Besides, the one taking the backup (ie: script) may not be aware of the need to take a full one. It's a bad design to allow broken backups at all, IMNSHO. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans for GIST
On Fri, Aug 1, 2014 at 4:58 AM, Anastasia Lubennikova lubennikov...@gmail.com wrote: Hi, hackers! I work on a GSoC project Index-only scans for GIST https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014 Repository is https://github.com/lubennikovaav/postgres/tree/indexonlygist2 Patch is in attachments. It includes index-only scans for multicolumn GIST and new regression test. Fetch() method is realized for box and point opclasses. Documentation is not updated yet, but I'm going to do it till the end of GSoC. I've got one question about query with OR condition. It is the last query in regression test gist_indexonly. It doesn't fail but it doensn't use index-only scans. Could someone explain to me how it works? It seems to depend on build_paths_for_OR function. But I couldn't understand how. Very nice... please add your patch to the next commit fest [1]. Regards, [1] https://commitfest.postgresql.org/action/commitfest_view?id=23 -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] WAL format and API changes (9.5)
Here's a new version of this patch, please review. I've cleaned up a lot of stuff, fixed all the bugs reported so far, and a bunch of others I found myself while testing. I'm not going to explain again what the patch does; the README and comments should now be complete enough to explain how it works. If not, please speak up. In the previous version of this patch, I made the XLogRegisterData function to copy the WAL data to a temporary buffer, instead of constructing the XLogRecData chain. I decided to revert back to the old way after all; I still think that copying would probably be OK from a performance point of view, but that'd need more testing. We can still switch to doing it that way later; the XLogRecData struct is no longer exposed to the functions that generate the WAL records, so it would be a very isolated change. I moved the new functions for constructing the WAL record into a new file, xlogconstruct.c. XLogInsert() now just calls a function called XLogRecordAssemble(), which returns the full XLogRecData chain that includes all the data and backup blocks, ready to be written to the WAL buffer. All the code to construct the XLogRecData chain is now in xlogrecord.c; this makes XLogInsert() simpler and more readable. One change resulting from that worth mentioning is that XLogInsert() now always re-constructs the XLogRecordData chain, if after acquiring the WALInsertLock it sees that RedoRecPtr changed (i.e. a checkpoint just started). Before, it would recheck the LSNs on the individual buffers to see if it's necessary. This is simpler, and I don't think it makes any difference to performance in practice. I ran this through my WAL page comparison tool to verify that all the WAL record types are replayed correctly (although I did some small cleanup after that, so it's not impossible that I broke it again; will re-test before committing). - Heikki wal-format-and-api-changes-5.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
2014-08-01 18:20 GMT+02:00 Claudio Freire klaussfre...@gmail.com: On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com wrote: c) the map is not crash safe by design, because it needs only for incremental backup to track what blocks needs to be backuped, not for consistency or recovery of the whole cluster, so it's not an heavy cost for the whole cluster to maintain it. we could think an option (but it's heavy) to write it at every flush on file to have crash-safe map, but I not think it's so usefull . I think it's acceptable, and probably it's better to force that, to say: if your db will crash, you need a fullbackup , I am not sure if your this assumption is right/acceptable, how can we say that in such a case users will be okay to have a fullbackup? In general, taking fullbackup is very heavy operation and we should try to avoid such a situation. Besides, the one taking the backup (ie: script) may not be aware of the need to take a full one. It's a bad design to allow broken backups at all, IMNSHO. Hi Claudio, thanks for your observation First: the case it's after a crash of a database, and it's not something happens every day or every week. It's something that happens in rare conditions, or almost my experience is so. If it happens very often probably there are other problems. Second: to avoid the problem to know if the db needed to have a full backup to rebuild the map we could think to write in the map header the backup reference (with an id and LSN reference for example ) so if the someone/something try to do an incremental backup after a crash, the map header will not have noone full backup listed [because it will be empty] , and automaticcaly switch to a full one. I think after a crash it's a good practice to do a full backup, to see if there are some problems on files or on filesystems, but if I am wrong I am happy to know :) . Remember that I propose a map in ram to reduce the impact on performances, but we could create an option to leave the choose to the user, if you want a crash safe map, at every flush will be updated also a map file , if not, the map will be in ram. Kind Regards Mat
Re: [HACKERS] Index-only scans for GIST
Thank you for comment Patch is already added in Performance topic. 2014-08-01 20:25 GMT+04:00 Fabrízio de Royes Mello fabriziome...@gmail.com : On Fri, Aug 1, 2014 at 4:58 AM, Anastasia Lubennikova lubennikov...@gmail.com wrote: Hi, hackers! I work on a GSoC project Index-only scans for GIST https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014 Repository is https://github.com/lubennikovaav/postgres/tree/indexonlygist2 Patch is in attachments. It includes index-only scans for multicolumn GIST and new regression test. Fetch() method is realized for box and point opclasses. Documentation is not updated yet, but I'm going to do it till the end of GSoC. I've got one question about query with OR condition. It is the last query in regression test gist_indexonly. It doesn't fail but it doensn't use index-only scans. Could someone explain to me how it works? It seems to depend on build_paths_for_OR function. But I couldn't understand how. Very nice... please add your patch to the next commit fest [1]. Regards, [1] https://commitfest.postgresql.org/action/commitfest_view?id=23 -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello -- Best regards, Lubennikova Anastasia
Re: [HACKERS] Bug of pg_receivexlog -v
On Fri, Aug 1, 2014 at 8:35 PM, Fujii Masao masao.fu...@gmail.com wrote: Hi, In 9.2, pg_receivexlog -v has emitted the messages as follows at the end of each WAL file. pg_receivexlog: finished segment at 0/200 (timeline 1) pg_receivexlog: finished segment at 0/300 (timeline 1) pg_receivexlog: finished segment at 0/400 (timeline 1) But, while reviewing the pg_receivexlog patch written by Furuya, I found that 0b63291 had broken this feature, and then now pg_receivexlog -v doesn't emit such messages at all. Attached patch fixes this problem. This needs to be back-patched to 9.3. I found another problem on pg_receivexlog, which can cause the leak of PGresult. The leak is small and rare, but it's better to fix that. Patch attached. Regards, -- Fujii Masao *** a/src/bin/pg_basebackup/receivelog.c --- b/src/bin/pg_basebackup/receivelog.c *** *** 628,633 ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, --- 628,634 fprintf(stderr, _(%s: unexpected termination of replication stream: %s), progname, PQresultErrorMessage(res)); + PQclear(res); goto error; } PQclear(res); *** *** 642,647 ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, --- 643,650 } else if (PQresultStatus(res) == PGRES_COMMAND_OK) { + PQclear(res); + /* * End of replication (ie. controlled shut down of the server). * *** *** 663,668 ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, --- 666,672 fprintf(stderr, _(%s: unexpected termination of replication stream: %s), progname, PQresultErrorMessage(res)); + PQclear(res); goto error; } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Fri, Aug 1, 2014 at 1:43 PM, desmodemone desmodem...@gmail.com wrote: 2014-08-01 18:20 GMT+02:00 Claudio Freire klaussfre...@gmail.com: On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila amit.kapil...@gmail.com wrote: c) the map is not crash safe by design, because it needs only for incremental backup to track what blocks needs to be backuped, not for consistency or recovery of the whole cluster, so it's not an heavy cost for the whole cluster to maintain it. we could think an option (but it's heavy) to write it at every flush on file to have crash-safe map, but I not think it's so usefull . I think it's acceptable, and probably it's better to force that, to say: if your db will crash, you need a fullbackup , I am not sure if your this assumption is right/acceptable, how can we say that in such a case users will be okay to have a fullbackup? In general, taking fullbackup is very heavy operation and we should try to avoid such a situation. Besides, the one taking the backup (ie: script) may not be aware of the need to take a full one. It's a bad design to allow broken backups at all, IMNSHO. Hi Claudio, thanks for your observation First: the case it's after a crash of a database, and it's not something happens every day or every week. It's something that happens in rare conditions, or almost my experience is so. If it happens very often probably there are other problems. Not so much. In this case, the software design isn't software-crash safe, it's not that it's not hardware-crash safe. What I mean, is that an in-memory bitmap will also be out of sync if you kill -9 (or if one of the backends is killed by the OOM), or if it runs out of disk space too. Normally, a simple restart fixes it because pg will do crash recovery just fine, but now the bitmap is out of sync, and further backups are broken. It's not a situation I want to face unless there's a huge reason to go for such design. If you make it so that the commit includes flipping the bitmap, it can be done cleverly enough to avoid too much overhead (though it will have some), and you now have it so that any to-be-touched block is now part of the backup. You just apply all the bitmap changes in batch after a checkpoint, before syncing to disk, and before erasing the WAL segments. Simple, relatively efficient, and far more robust than an in-memory thing. Still, it *can* double checkpoint I/O on the worst case, and it's not an unfathomable case either. Second: to avoid the problem to know if the db needed to have a full backup to rebuild the map we could think to write in the map header the backup reference (with an id and LSN reference for example ) so if the someone/something try to do an incremental backup after a crash, the map header will not have noone full backup listed [because it will be empty] , and automaticcaly switch to a full one. I think after a crash it's a good practice to do a full backup, to see if there are some problems on files or on filesystems, but if I am wrong I am happy to know :) . After a crash I do not do a backup, I do a verification of the data (VACUUM and some data consistency checks usually), lest you have a useless backup. The backup goes after that. But, I'm not DBA guru. Remember that I propose a map in ram to reduce the impact on performances, but we could create an option to leave the choose to the user, if you want a crash safe map, at every flush will be updated also a map file , if not, the map will be in ram. I think the performance impact of a WAL-linked map isn't so big as to prefer the possibility of broken backups. I wouldn't even allow it. It's not free, making it crash safe, but it's not that expensive either. If you want to support incremental backups, you really really need to make sure those backups are correct and usable, and IMV anything short of full crash safety will be too fragile for that purpose. I don't want to be in a position of needing the backup and finding out it's inconsistent after the fact, and I don't want to encourage people to set themselves up for that by adding that faster but unsafe backups flag. I'd do it either safe, or not at all. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On 07/11/2014 08:39 PM, Alvaro Herrera wrote: Heikki Linnakangas wrote: I did again the refactoring you did back in 2006, patch attached. One thing I did differently: I moved the raw, non-encrypted, read/write functions to separate functions: pqsecure_raw_read and pqsecure_raw_write. Those functions encapsulate the SIGPIPE handling. The OpenSSL code implements a custom BIO, which calls to pqsecure_raw_read/write to do the low-level I/O. Similarly in the server-side, there are be_tls_raw_read and pg_tls_raw_write functions, which do the prepare_for_client_read()/client_read_ended() dance, so that the SSL implementation doesn't need to know about that. I'm skimming over this patch (0001). There are some issues: * You duplicated the long comment under the IDENTIFICATION tag that was in be-secure.c; it's now both in that file and also in be-secure-openssl.c. I think it should be removed from be-secure.c. Also, the hardcoded DH params are duplicated in be-secure.c, but they belong in -openssl.c only now. Hmm. Once we add other SSL implementations, shouldn't they share the hardcoded DH parameters? That would warrant keeping them in be-secure.c. * There is some mixup regarding USE_SSL and USE_OPENSSL in be-secure.c. I think anything that's OpenSSL-specific should be in be-secure-openssl.c only; any new SSL implementation will need to implement all those functions. For instance, be_tls_init(). Agreed. I imagine that if we select any SSL implementation, USE_SSL would get defined, and each SSL implementation would additionally define its own symbol. Yeah, that was the idea. * ssl_renegotiation_limit is also duplicated. But removing this one is probably not going to be as easy as deleting a line from be-secure.c, because guc.c depends on that one. I think that variable should be defined in be-secure.c (i.e. delete it from -openssl) and make sure that all SSL implementations enforce it on their own somehow. Agreed. The DISABLE_SIGPIPE thingy looks wrong in pqsecure_write. I think it should be like this: ssize_t pqsecure_write(PGconn *conn, const void *ptr, size_t len) { ssize_t n; #ifdef USE_SSL if (conn-ssl_in_use) { DECLARE_SIGPIPE_INFO(spinfo); DISABLE_SIGPIPE(conn, spinfo, return -1); n = pgtls_write(conn, ptr, len); RESTORE_SIGPIPE(spinfo); } else #endif /* USE_OPENSSL */ { n = pqsecure_raw_write(conn, ptr, len); } return n; } You are missing the restore call, and I moved the declaration inside the ssl_in_use block since otherwise it's not useful. The other path is fine since pqsecure_raw_write disables and restores the flag by itself. Also, you're missing DECLARE/DISABLE/RESTORE in the ssl_in_use block in pqsecure_read. (The original code does not have that code in the non-SSL path. I assume, without checking, that that's okay.) I also assume without checking that all SSL implementations would be fine with this SIGPIPE handling. Another thing that seems wrong is the REMEMBER_EPIPE stuff. The fe-secure-openssl.c code should be setting the flag, but AFAICS only the non-SSL code is doing it. I think you're missing a change to the way fe-secure-openssl.c now uses the OpenSSL library: it defines custom read/write functions, my_sock_read and my_sock_write, which in turn call pqsecure_raw_read/write. So all the actual I/O now goes through pqsecure_raw_read/write. I believe it's therefore enough to put do the REMEMBER_EPIPE in pqsecure_raw_write. Come to think of it, pqsecure_write() shouldn't be doing any SIGPIGE stuff at all anymore. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On 07/08/2014 08:11 PM, Jeff Janes wrote: Is there some recipe for testing the 0002 patch? Can it be tested on an MinGW environment, or does it need to use the MicroSoft supplied compilers? I used MSVC. It ought to work with MinGW, I think, although you might need to tweak the Makefiles to make it compile. Certainly we should eventually make it work, before committing. If you try it out, let me know how it goes. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] add modulo (%) operator to pgbench
This patch is pretty trivial. Add modulo operator to pgbench. This is useful to compute a permutation for tests with non uniform accesses (exponential or gaussian), so as to avoid trivial correlations between neighbour keys. -- Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index ad55c3c..16744ac 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -1570,6 +1570,16 @@ top: } snprintf(res, sizeof(res), INT64_FORMAT, ope1 / ope2); } +else if (strcmp(argv[3], %) == 0) +{ + if (ope2 == 0) + { + fprintf(stderr, %s: division by zero in modulo\n, argv[0]); + st-ecnt++; + return true; + } + snprintf(res, sizeof(res), INT64_FORMAT, ope1 % ope2); +} else { fprintf(stderr, %s: unsupported operator %s\n, argv[0], argv[3]); diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index b7d88f3..9a6ee3b 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -735,7 +735,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/ Each replaceableoperand/ is either an integer constant or a literal:/replaceablevariablename/ reference to a variable having an integer value. The replaceableoperator/ can be - literal+/, literal-/, literal*/, or literal//. + literal+/, literal-/, literal*/, literal%/ or literal//. /para para psql test pgbench-modulo-test-init.sql ./pgbench -M prepared -f pgbench-modulo-test-run.sql -P 1 -t 10 -n test psql test pgbench-modulo-test-check.sql pgbench-modulo-test-init.sql Description: application/sql pgbench-modulo-test-run.sql Description: application/sql pgbench-modulo-test-check.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] numeric and float comparison oddities
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: Jeff Davis pg...@j-davis.com wrote: I saw some strange results: The part I find strange is that the first one evaluates to true, since numeric can exactly represent 1.1 and float8 cannot. The reason is that the numeric input is converted to float8 for comparison: regression=# create table ttt(f4 float4, f8 float8, fn numeric); CREATE TABLE regression=# explain verbose select f4=fn, f8=fn from ttt; QUERY PLAN Seq Scan on public.ttt (cost=0.00..32.00 rows=1100 width=44) Output: (f4 = (fn)::double precision), (f8 = (fn)::double precision) Planning time: 0.325 ms (3 rows) Were it not, you'd hardly ever get equality. I think that years ago we concluded this behavior was required by SQL spec (see the language about exact vs inexact numeric types). I just looked at each point in the spec where they mention approximate numeric types, and while there was no direct mention of this (that I could find), casting the exact number to an approximate type would be in keeping with the spirit of other operations involving mixed data types. While I think what we do is within bounds of the implementation specific choices we are allowed, I think we made a bad choice on this: test=# select '1.1'::float8 = '1.1'::float4; ?column? -- f (1 row) I know that neither value is exactly 1.1 (decimal) and that they are not the same. In fact, while '1.1'::numeric has no exact representation in float4 or float8, '1.1'::float4 and '1.1'::float8 both have exact representations in numeric -- at least for IEEE format. They are: float4: 1.1002384185791015625 float8: 1.100088817841970012523233890533447265625 OK, so those are not equal to each other, but neither is either of them equal to 1.1. It would be more consistent, ISTM, to cast float8 to float4 when those are compared, and to cast numeric to whichever type is on the other side of the comparison operator. Obviously that would not be a change to back-patch; but it seems to me to be worth considering for 9.5. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] numeric and float comparison oddities
Kevin Grittner kgri...@ymail.com writes: It would be more consistent, ISTM, to cast float8 to float4 when those are compared, and to cast numeric to whichever type is on the other side of the comparison operator. I would vote against that on the grounds of greatly increased risk of overflow failure. Admittedly, numeric-float8 can also fail, but float4 only goes to 1e37 or thereabouts. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] numeric and float comparison oddities
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: It would be more consistent, ISTM, to cast float8 to float4 when those are compared, and to cast numeric to whichever type is on the other side of the comparison operator. I would vote against that on the grounds of greatly increased risk of overflow failure. Admittedly, numeric-float8 can also fail, but float4 only goes to 1e37 or thereabouts. Since 1e28 is sufficient to measure the diameter of the universe in angstroms, I'm not sure I accept the adjective greatly there. I suspect that people are getting silently burned by current behavior more often than they would get overflow errors with the change. At least when you get an error it's pretty clear how to fix it with a cast. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Usability improvements for pg_stop_backup()
Hackers, Since Gabrielle has improved archiving with pg_stat_archiver in 9.4, I'd like to go further and improve the usability of pg_stop_backup(). However, based on my IRC discussion with Vik, there might not be consensus on what the right behavior *should* be. This is for 9.5, of course. Currently, if archive_command is failing, pg_stop_backup() will hang forever. The only way to figure out what's wrong with pg_stop_backup() is to tail the PostgreSQL logs. This is difficult for users to troubleshoot, and strongly resists any kind of automation. Yes, we can work around this by setting statement_timeout, but that has two issues (a) the user has to remember to do it before the problem occurs, and (b) it won't differentiate between archive failure and other reasons it might time out. As such, I propose that pg_stop_backup() should error with an appropriate error message (Could not archive WAL segments) after three archiving attempts. We could also add an optional parameter to raise the number of attempts from the default of three. An alternative, if we were doing this from scratch, would be for pg_stop_backup to return false or -1 or something if it couldn't archive; there are reasons why a user might not care that archive_command was failing (shared storage comes to mind). However, that would be a surprising break with backwards compatability, since currently users don't check the result value of pg_stop_backup(). Thoughts? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Thu, Jul 31, 2014 at 1:12 PM, Peter Geoghegan p...@heroku.com wrote: Abbreviated it is. Attached revision uses new terminology. I have abandoned configure tests entirely, preferring to explicitly discriminate against Mac OS X (Darwin) as a platform with a known odd locale implementation, just like pg_get_encoding_from_locale() does. Since Robert did not give a reason for discussing the basic fmgr-elision patch despite having already discussed it a couple of years ago, I have not split the patch into two (if I did, the first patch would be virtually identical to Robert's 2012 patch). I hope that's okay. I am willing to revisit this question if Robert feels that something is uncertain about the costs and benefits of a basic fmgr-eliding sort support for text. There are still some open items: * I have left the licensing question unresolved. There is plenty we can do about this, and if necessary we can ask the original author to changed his license from the MIT license to the PostgreSQL license. AFAICT, the BSD license is *more* restrictive than the MIT license, and the PostgreSQL license is identical. The wording is almost identical. I don't see the concern. If the PostgreSQL license had the “non-endorsement clause” of the New BSD license, or alternatively if the MIT license had a similar clause that the PostgreSQL licensed lacked, that would be a substantive and appreciable difference. That isn't the case. * I have not made aggregates use the optimization where they currently accidentally don't due to using datum tuplesort. I can artificially force them to use heap tuplesort where that's likely to help [1]. Let's defer this question until we have an abort algorithm that seems reasonable. There is a FIXME item. Improvements in full: * New terminology (Abbreviated keys). * Better explanations for why we don't use the additional optimization of abbreviated keys where we're using sort support, in analyze.c and nodeMergeAppend.c. * Better handling of NULLs. * Never use the optimization with bounded heap sorts. * Better abort strategy, that centers on the idea of measuring full key cardinality, and abbreviated key cardinality, and weighing how good a proxy the former is for the latter. This is heavily weighed when deciding if we should abort normalization as tuples are copied. Exact details are explained within bttext_abort(). As already mentioned, this can allow us to use the optimization when we're sorting a million tuples with only five distinct values. This will still win decisively, but it's obviously impossible to make work while only considering abbreviated key cardinality. Determining cardinality of both abbreviated keys and full keys appears to have very low overhead, and is exactly what we ought to care about, so that's what we do. While there is still some magic to the algorithm's inputs, my sense is that I'm much closer to characterizing the break-even point than before. I also attach a second patch, which adds additional debug instrumentation, and is intended to be applied on top of the real patch to help with review. Run Postgres with DEBUG1 output when it's applied. With the patch applied, the setting backslash_quote also controls whether or not the optimization is used. So with the debug instrumentation patch applied: backslash_quote = on - use optimization, but never abort backslash_quote = off - Never use optimization - set up shim (just like the win32 UTF-8 case). Equivalent to master's behavior. backslash_quote = safe_encoding - Use optimization, but actually abort if it doesn't work out, the behavior that is always seen without instrumentation. This is useful for testing the overhead of considering the optimization in cases where it didn't work out (i.e. it's useful to compare this with backslash_quote = off). I've found it useful to experiment with real-world data with the optimization dynamically enabled and disabled. Thoughts? [1] http://www.postgresql.org/message-id/CAM3SWZSf0Ftxy8QHGAKJh=S80vD2SBx83zkEzuJyZ6R=pty...@mail.gmail.com -- Peter Geoghegan diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 3fffe09..9df13e2 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -27,6 +27,7 @@ #include libpq/pqformat.h #include miscadmin.h #include parser/scansup.h +#include parser/parser.h #include regex/regex.h #include utils/builtins.h #include utils/bytea.h @@ -34,6 +35,8 @@ #include utils/pg_locale.h #include utils/sortsupport.h +#define DEBUG_ABBREV_KEYS + #ifdef DEBUG_ABBREV_KEYS #define DEBUG_elog_output DEBUG1 #endif @@ -93,9 +96,9 @@ static int bttextfastcmp_c(Datum x, Datum y, SortSupport ssup); static int bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup); static Datum bttext_convert(Datum original, SortSupport ssup); static bool bttext_abort(int memtupcount, double rowhint, SortSupport ssup); -#ifdef WIN32 +//#ifdef WIN32 static void bttext_inject_shim(SortSupport ssup, Oid
[HACKERS] Proposed changing the definition of decade for date_trunc and extract
For a long time (since version 8.0), PostgreSQL has adopted the logical barriers for centuries and millenniums in these functions. The calendar starts millennium and century 1 on year 1, directly after 1 BC. Unfortunately decades are still reported rather simplistically by dividing the year by 10. Years 1-10 are logically the first decade and working up from there, year 2014 should be counted as 202nd decade. I've pushed code and documentation changes to reflect this, based on the master branch (9.5devel), it's on the branch new_decade_def at https://github.com/chungy/postgres.git -- In both the commit message and docs, I made note of the backwards compatibility change. I don't know how much of an impact this would have but I suspect not many applications are really going to be affected by how decades are counted (should be simple to fix on their part, if any are...). -- Mike Swanson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Small patch to fix windows build
ed802e7 seems to have broken the windows builds. .\contrib\pgbench\pgbench.c(530): error C2065: 'M_PI' : undeclared identifier The attached fixes the problem. It seems all other uses of M_PI have added the same code to the .c file rather than any header file, perhaps a better patch would fix up all 3 of these locations and stick that code in a header file, but I don't really know where at the moment. This at least should get the build running again. Regards David Rowley pgbench_M_PI_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Proposed changing the definition of decade for date_trunc and extract
Mike Swanson wrote For a long time (since version 8.0), PostgreSQL has adopted the logical barriers for centuries and millenniums in these functions. The calendar starts millennium and century 1 on year 1, directly after 1 BC. Unfortunately decades are still reported rather simplistically by dividing the year by 10. Years 1-10 are logically the first decade and working up from there, year 2014 should be counted as 202nd decade. I've pushed code and documentation changes to reflect this, based on the master branch (9.5devel), it's on the branch new_decade_def at https://github.com/chungy/postgres.git -- In both the commit message and docs, I made note of the backwards compatibility change. I don't know how much of an impact this would have but I suspect not many applications are really going to be affected by how decades are counted (should be simple to fix on their part, if any are...). Floor ( Year / 10 ) = decade number feels right. Sure, the zero decade only has 9 years but after that everything is easy to read. Typical usage refers to decades such as the 80s and the 90s but if you start counting at 1 the 0 year would have a mis-matched prefix. And date truncation would be weird...though I haven't tested the behavior I assume it works by basically just dropping the year digit and replacing it with zero...that at least would be the desired behavior for me. Any supporting arguments for 1-10 = 1st decade other than technical perfection? I guess if you use data around and before 1AD you care about this more, and rightly so, but given sound arguments for both methods the one more useful to more users who I suspect dominantly care about years 1900. So -1 to change for breaking backward compatibility and -1 because the current behavior seems to be more useful in everyday usage. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Proposed-changing-the-definition-of-decade-for-date-trunc-and-extract-tp5813578p5813580.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SKIP LOCKED DATA (work in progress)
On Sat, Aug 2, 2014 at 3:55 AM, Thomas Munro mu...@ip9.org wrote: On 1 August 2014 10:37, David Rowley dgrowle...@gmail.com wrote: Apart from this I can't see any other problems with the patch and I'd be very inclined, once the above are fixed up to mark the patch ready for commiter. Good work Thanks for all the guidance, I appreciate it! My review karma account is now well overdrawn. Ok, then I have nothing more so it's time to pass this one along. The only notes I can think to leave for the commiter would be around the precedence order of the lock policy, especially around a query such as: SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE; -- skip locked wins Of course the current behaviour is that NOWAIT wins over the standard FOR UPDATE, but with NOWAIT, there's only a chance of an error, there's no chance of giving incorrect results. I checked what Oracle did in this situation and I see that they completely disallow FOR UPDATE inside of views and subqueries. I could see an argument here that the outer most FOR UPDATE clause should be used, but I guess that ship has sailed when NOWAIT was added. Marking as ready for commiter. Regards David Rowley
Re: [HACKERS] Re: Proposed changing the definition of decade for date_trunc and extract
On 08/01/2014 05:32 PM, David G Johnston wrote: Any supporting arguments for 1-10 = 1st decade other than technical perfection? I guess if you use data around and before 1AD you care about this more, and rightly so, but given sound arguments for both methods the one more useful to more users who I suspect dominantly care about years 1900. Well, I think most people in casual speech would consider The 80's to be 1980 to 1989. But if you ask a historian, the decade is 1981 to 1990 (or, if they're an American social historian, 1981 to 1988, but that's a different topic). So both ways of counting have valid, solid arguments behind them. So -1 to change for breaking backward compatibility and -1 because the current behavior seems to be more useful in everyday usage. If we were adding a new decade feature, then I'd probably side with Mike. However, it's hard for me to believe that this change is worth breaking backwards compatibility. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed changing the definition of decade for date_trunc and extract
On 02/08/14 12:32, David G Johnston wrote: Mike Swanson wrote For a long time (since version 8.0), PostgreSQL has adopted the logical barriers for centuries and millenniums in these functions. The calendar starts millennium and century 1 on year 1, directly after 1 BC. Unfortunately decades are still reported rather simplistically by dividing the year by 10. Years 1-10 are logically the first decade and working up from there, year 2014 should be counted as 202nd decade. I've pushed code and documentation changes to reflect this, based on the master branch (9.5devel), it's on the branch new_decade_def at https://github.com/chungy/postgres.git -- In both the commit message and docs, I made note of the backwards compatibility change. I don't know how much of an impact this would have but I suspect not many applications are really going to be affected by how decades are counted (should be simple to fix on their part, if any are...). Floor ( Year / 10 ) = decade number feels right. Sure, the zero decade only has 9 years but after that everything is easy to read. Typical usage refers to decades such as the 80s and the 90s but if you start counting at 1 the 0 year would have a mis-matched prefix. And date truncation would be weird...though I haven't tested the behavior I assume it works by basically just dropping the year digit and replacing it with zero...that at least would be the desired behavior for me. Any supporting arguments for 1-10 = 1st decade other than technical perfection? I guess if you use data around and before 1AD you care about this more, and rightly so, but given sound arguments for both methods the one more useful to more users who I suspect dominantly care about years 1900. So -1 to change for breaking backward compatibility and -1 because the current behavior seems to be more useful in everyday usage. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Proposed-changing-the-definition-of-decade-for-date-trunc-and-extract-tp5813578p5813580.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. Since there was no year zero: then it follows that the first decade comprises years 1 to 10, and the current Millennium started in 2001 - or am I being too logical??? :-) Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed changing the definition of decade for date_trunc and extract
On Fri, Aug 1, 2014 at 8:15 PM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: On 02/08/14 12:32, David G Johnston wrote: Any supporting arguments for 1-10 = 1st decade other than technical perfection? I guess if you use data around and before 1AD you care about this more, and rightly so, but given sound arguments for both methods the one more useful to more users who I suspect dominantly care about years 1900. So -1 to change for breaking backward compatibility and -1 because the current behavior seems to be more useful in everyday usage. Since there was no year zero: then it follows that the first decade comprises years 1 to 10, and the current Millennium started in 2001 - or am I being too logical??? :-) This is SQL, only relational logic matters. All other logic can be superseded by committee consensus. IOW - and while I have no way of checking - this seems like something that may be governed by the SQL standard...in which case adherence to that would trump mathematical logic. David J.
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Fri, Aug 01, 2014 at 04:00:11PM -0700, Peter Geoghegan wrote: Since Robert did not give a reason for discussing the basic fmgr-elision patch despite having already discussed it a couple of years ago, I have not split the patch into two (if I did, the first patch would be virtually identical to Robert's 2012 patch). I hope that's okay. I am willing to revisit this question if Robert feels that something is uncertain about the costs and benefits of a basic fmgr-eliding sort support for text. Robert, Heikki and maybe Alvaro requested and/or explained this split back in April. The fact that the would-be first patch was discussed and rejected in the past does not counter that request. (I voted against Robert's 2012 patch. My opposition was mild even then, and the patch now being a stepping stone to a more-compelling benefit certainly tips the scale.) Given that the effect of that decision is a moderate procedural change only, even if the request were wrong, why continue debating it? There are still some open items: * I have left the licensing question unresolved. There is plenty we can do about this, and if necessary we can ask the original author to changed his license from the MIT license to the PostgreSQL license. AFAICT, the BSD license is *more* restrictive than the MIT license, and the PostgreSQL license is identical. The wording is almost identical. I don't see the concern. If the PostgreSQL license had the ???non-endorsement clause??? of the New BSD license, or alternatively if the MIT license had a similar clause that the PostgreSQL licensed lacked, that would be a substantive and appreciable difference. That isn't the case. It's fine to incorporate MIT-licensed code; we have many cases of copying more-liberally-licensed code into our tree. However, it's important to have long-term clarity on the upstream license terms. This is insufficient: + /*- + * + * hyperloglog.c + *HyperLogLog cardinality estimator + * + * Portions Copyright (c) 2014, PostgreSQL Global Development Group + * + * Based on Hideaki Ohno's C++ implementation. This is probably not ideally + * suited to estimating the cardinality of very large sets; in particular, we Usually, the foreign file had a copyright/license notice, and we prepend the PostgreSQL notice. See src/port/fls.c for a trivial example. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Fri, Aug 1, 2014 at 8:57 PM, Noah Misch n...@leadboat.com wrote: Robert, Heikki and maybe Alvaro requested and/or explained this split back in April. The fact that the would-be first patch was discussed and rejected in the past does not counter that request. (I voted against Robert's 2012 patch. My opposition was mild even then, and the patch now being a stepping stone to a more-compelling benefit certainly tips the scale.) Given that the effect of that decision is a moderate procedural change only, even if the request were wrong, why continue debating it? Heikki asked for it, and Robert repeated the request in the past few days. I am not debating it. I merely don't understand the nature of the request. I think it's virtually a foregone conclusion that there should be fmgr elision for text, with very few decisions to be made on the details of how that would work. The rest of what I have here is trickier, and is something that must be discussed. Even the improvement of ~7% that fmgr-elision offers is valuable for such a common operation. I don't want to dredge up the details of the 2012 thread, but since you mention it the fact that the patch was not committed centered on a silly disagreement on a very fine point, and nothing more. It was *not* rejected, and my sense at the time was that it was very close to being committed. I was fairly confident that everyone understood things around the 2012 patch that way, and I sought clarity on that point. It's a totally non-surprising and easy to reason about patch. Clearly at least one person had some reservations about the basic idea at the time, and it was worth asking what the concern was before splitting the patch. It is easy to split the patch, but easier still to answer my question. It's fine to incorporate MIT-licensed code; we have many cases of copying more-liberally-licensed code into our tree. However, it's important to have long-term clarity on the upstream license terms. This is insufficient: + /*- + * + * hyperloglog.c + *HyperLogLog cardinality estimator + * + * Portions Copyright (c) 2014, PostgreSQL Global Development Group + * + * Based on Hideaki Ohno's C++ implementation. This is probably not ideally + * suited to estimating the cardinality of very large sets; in particular, we Usually, the foreign file had a copyright/license notice, and we prepend the PostgreSQL notice. See src/port/fls.c for a trivial example. I will be more careful about this in the future. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposed changing the definition of decade for date_trunc and extract
On Sat, 2014-08-02 at 15:15 +1200, Gavin Flower wrote: Since there was no year zero: then it follows that the first decade comprises years 1 to 10, and the current Millennium started in 2001 - or am I being too logical??? :-) This is pretty much the reason I'm sending this patch, because it makes mathematical sense, plus my OCD-sense tingles when Postgres handles centuries and millenniums correctly, whereas decades are not. I will concede if the compatibility breaks are too great, but I don't know how many people depend on the output of this. I didn't do any market research :) Besides, it seemed to me that if the other two were able to be fixed (albeit ~10 years ago), there's little reason to avoid fixing decade too. There's a few definitions of a decade: * Spans of ten years that start from year 1. * Spans of ten years defined by the second-to-the-right digit (years 1-9 would be in decade 0?) -- this is one of the colloquial versions when people refer to the (19)90s. * The other version tends to be less well-defined. The 1960s usually conjures up images of counterculture and the British Invasion and such; debatably occurring around 1964-1972 (this version used by culture can never be derived mathematically by a database, but it might be worth putting out here). * Any span of approximately 10 years (the interval type is fine enough for this). I lack significant research but it's rare to hear people refer to 1990-1999 as the 199th century in the same way they might refer to 1900-1999 (or 1901-2000) as the 20th century -- and it's worth noting that common usage for determining 20th/21st centuries generally follow the mathematical logic of them, even if some people are off-by-one when determining when they start and end. I'd also argue that the current function basing the logic from definition #2 has limited use even when you want to use it for such. If you want to generate text for '(decades)s' you'd have to do: SELECT extract('year' from date_trunc('decade', now())) || 's'; Or with my patch: SELECT floor(extract('year' from now()) / 10) || '0s'; It's different, for sure, but I would actually think the second one is a bit less awkward. Plus it's shorter :) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers