Re: [HACKERS] COMMENT ON, psql and access methods
On Thu, Jun 2, 2016 at 1:00 PM, Michael Paquier wrote: > I have added an open item for 9.6 regarding this patch, that would be > good to complete this work in this release for consistency with the > other objects. Doh. I forgot to update psql --help. -- Michael From 9901595fd000c3ac9e1c1ce8ee2f21218c4803c7 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 2 Jun 2016 12:47:46 +0900 Subject: [PATCH] Add support for COMMENT ON and psql meta-command for access methods 473b932 has visibly forgotten that access methods, being database objects could have a description associated with them. Having a psql-level meta command that allows to list all the access commands available on the server was missing as well. At the same time, I have noticed that tab completion of psql could be more verbose regarding access methods, those things are fixed at the same time. --- contrib/bloom/bloom--1.0.sql | 1 + doc/src/sgml/ref/comment.sgml | 1 + doc/src/sgml/ref/psql-ref.sgml | 13 + src/backend/parser/gram.y | 5 ++-- src/bin/psql/command.c | 3 +++ src/bin/psql/describe.c| 61 ++ src/bin/psql/describe.h| 3 +++ src/bin/psql/help.c| 1 + src/bin/psql/tab-complete.c| 30 - 9 files changed, 109 insertions(+), 9 deletions(-) diff --git a/contrib/bloom/bloom--1.0.sql b/contrib/bloom/bloom--1.0.sql index 7fa7513..87b5442 100644 --- a/contrib/bloom/bloom--1.0.sql +++ b/contrib/bloom/bloom--1.0.sql @@ -5,6 +5,7 @@ LANGUAGE C; -- Access method CREATE ACCESS METHOD bloom TYPE INDEX HANDLER blhandler; +COMMENT ON ACCESS METHOD bloom IS 'bloom index access method'; -- Opclasses diff --git a/doc/src/sgml/ref/comment.sgml b/doc/src/sgml/ref/comment.sgml index 3321d4b..9582de8 100644 --- a/doc/src/sgml/ref/comment.sgml +++ b/doc/src/sgml/ref/comment.sgml @@ -23,6 +23,7 @@ PostgreSQL documentation COMMENT ON { + ACCESS METHOD object_name | AGGREGATE aggregate_name ( aggregate_signature ) | CAST (source_type AS target_type) | COLLATION object_name | diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index df79a37..4079ac6 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1130,6 +1130,19 @@ testdb=> + +\dA[+] [ pattern ] + + + +Lists access methods. If pattern is specified, only access +methods whose names match the pattern are shown. If ++ is appended to the command name, each access +method is listed with its associated handler function and description. + + + \db[+] [ pattern ] diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 20384db..5d646e7 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -5693,8 +5693,8 @@ opt_restart_seqs: * The COMMENT ON statement can take different forms based upon the type of * the object associated with the comment. The form of the statement is: * - * COMMENT ON [ [ CONVERSION | COLLATION | DATABASE | DOMAIN | - * EXTENSION | EVENT TRIGGER | FOREIGN DATA WRAPPER | + * COMMENT ON [ [ ACCESS METHOD | CONVERSION | COLLATION | DATABASE | + * DOMAIN | EXTENSION | EVENT TRIGGER | FOREIGN DATA WRAPPER | * FOREIGN TABLE | INDEX | [PROCEDURAL] LANGUAGE | * MATERIALIZED VIEW | POLICY | ROLE | SCHEMA | SEQUENCE | * SERVER | TABLE | TABLESPACE | @@ -5896,6 +5896,7 @@ comment_type: | TABLE{ $$ = OBJECT_TABLE; } | VIEW{ $$ = OBJECT_VIEW; } | MATERIALIZED VIEW { $$ = OBJECT_MATVIEW; } + | ACCESS METHOD { $$ = OBJECT_ACCESS_METHOD; } | COLLATION { $$ = OBJECT_COLLATION; } | CONVERSION_P { $$ = OBJECT_CONVERSION; } | TABLESPACE { $$ = OBJECT_TABLESPACE; } diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 693b531..543fe5f 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -402,6 +402,9 @@ exec_command(const char *cmd, /* standard listing of interesting things */ success = listTables("tvmsE", NULL, show_verbose, show_system); break; + case 'A': +success = describeAccessMethods(pattern, show_verbose); +break; case 'a': success = describeAggregates(pattern, show_verbose, show_system); break; diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 0a771bd..4e9e6ac 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -129,6 +129,67 @@ describeAggregates(const char *pattern, bool verbose, bool showSystem) return true; } +/* \dA + * Takes an optional regexp to select particular access methods + */ +bool +describeAccessMethods(const char *pattern, bool verbose) +{ + PQExpBufferData buf; + PGresult *res; + printQueryOpt myopt = pset.popt; + + if (pset.sversion < 90600) + { + psql_error("
Re: [HACKERS] Tracking wait event for latches
On Thu, Jun 2, 2016 at 12:25 PM, Thomas Munro wrote: > This patch allows identifiers to be specified by the WaitLatch and > WaitLatchOrSocket calls, but not for WaitEventSetWait, which is the > more general waiting primitive. I think it should be done by > WaitEventSetWait, and merely passed down by those wrappers functions. The advantage of having WaitEventSetWait track is that we could track the events of secure_read and secure_write. One potential problem by doing so is if those routines are called during authentication. I don't recall that's the case, but this needs a double-check. > These are actually names for *wait points*, not names for latches. OK. > Some of the language in this patch makes it sound like they are latch > names/latch identifiers, which is inaccurate (the latches themselves > wouldn't be very interesting eg MyLatch). In some cases the main > thing of interest is actually a socket or timer anyway, not a latch, > so maybe it should appear as wait_event_type = WaitEventSet? Hm. A latch could wait for multiple types things it is waiting for, so don't you think we'd need to add more details in what is reported to pg_stat_activity? There are two fields now, and in the context of this patch: - wait_event_type, which I'd like to think is associated to a latch, so I named it so. - wait_event, which is the code path that we are waiting at, like SyncRep, the WAL writer main routine, etc. Now if you would like to get a list of all the things that are being waited for, shouldn't we add a third column to the set that has text[] as return type? Let's name it wait_event_details, and for a latch we have the following: - WL_LATCH_SET - WL_POSTMASTER_DEATH - WL_SOCKET_READABLE - WL_SOCKET_WRITEABLE Compared to all the other existing wait types, that's a bit new and that's exclusive to latches because we need a higher level of details. Don't you think so? But actually I don't think that's necessary to go into this level of details. We already know what a latch is waiting for by looking at the code... > Is there a reason you chose names like 'WALWriterMain'? That > particular wait point is in the function WalWriterMain (note different > case). It seems odd to use the containing function names to guide > naming, but not use them exactly. Since this namespace needs to be > able to name wait points anywhere in the postgres source tree (and > maybe outside it too, for extensions), maybe it should be made > hierarchical, like 'walwriter.WalWriterMain' (or some other > organisational scheme). Yeah, possibly this could be improved. I have put some thoughts in having clear names for each one of them, so I'd rather keep them simple. > I personally think it would be very useful for extensions to be able > to name their wait points. For example, I'd rather see > 'postgres_fdw.pgfdw_get_result' or similar than a vague 'Extension' > string which appears not only for all wait points in an extension but > also for all extensions. I hope we can figure out a good way to do > that. Clearly that would involve some shmem registry machinery to > make the names visible across backends (a similar problem exists with > lock tranche names). This patch is shaped this way intentionally based on the feedback I received at PGCon (Robert and others). We could provide a routine that extensions call in _PG_init to register a new latch event name in shared memory, but I didn't see much use in doing so, take for example the case of background worker, it is possible to register a custom string for pg_stat_activity via pgstat_report_activity. One could take advantage of having custom latch wait names in shared memory if an extension has wait points with latches though... But I am still not sure if that's worth the complexity. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMENT ON, psql and access methods
On Wed, Jun 1, 2016 at 8:25 PM, Teodor Sigaev wrote: As far as I can see, COMMENT ON has no support for access methods. Wouldn't we want to add it as it is created by a command? On top of that, perhaps we could have a backslash command in psql to list the supported access methods, like \dam[S]? The system methods would be in this case all the in-core ones. >>> >>> >>> +1. >> >> >> Are there other opinions? That's not a 9.6 blocker IMO, so I could get >> patches out for 10.0 if needed. > > > I missed that on review process, but I'm agree it should be implemented. So, I have taken the time to hack that, and finished with the patch attached, that is less intrusive than I thought first: - COMMENT support is added for access methods - Added meta-command \dA in psql to list the available access methods: =# \dA List of access methods Name | Type +--- bloom | Index brin | Index btree | Index gin| Index gist | Index hash | Index spgist | Index (7 rows) =# \dA+ List of access methods Name | Type | Handler | Description +---+-+ bloom | Index | blhandler | bloom index access method brin | Index | brinhandler | block range index (BRIN) access method btree | Index | bthandler | b-tree index access method gin| Index | ginhandler | GIN index access method gist | Index | gisthandler | GiST index access method hash | Index | hashhandler | hash index access method spgist | Index | spghandler | SP-GiST index access method (7 rows) - Fixed a missing tab completion for DROP ACCESS METHOD. I have added an open item for 9.6 regarding this patch, that would be good to complete this work in this release for consistency with the other objects. Regards, -- Michael From 12c5d77c7d66cd8b33c697fc389cf9915d32765b Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 2 Jun 2016 12:47:46 +0900 Subject: [PATCH] Add support for COMMENT ON and psql meta-command for access methods 473b932 has visibly forgotten that access methods, being database objects could have a description associated with them. Having a psql-level meta command that allows to list all the access commands available on the server was missing as well. At the same time, I have noticed that tab completion of psql could be more verbose regarding access methods, those things are fixed at the same time. --- contrib/bloom/bloom--1.0.sql | 1 + doc/src/sgml/ref/comment.sgml | 1 + doc/src/sgml/ref/psql-ref.sgml | 13 + src/backend/parser/gram.y | 5 ++-- src/bin/psql/command.c | 3 +++ src/bin/psql/describe.c| 61 ++ src/bin/psql/describe.h| 3 +++ src/bin/psql/tab-complete.c| 30 - 8 files changed, 108 insertions(+), 9 deletions(-) diff --git a/contrib/bloom/bloom--1.0.sql b/contrib/bloom/bloom--1.0.sql index 7fa7513..87b5442 100644 --- a/contrib/bloom/bloom--1.0.sql +++ b/contrib/bloom/bloom--1.0.sql @@ -5,6 +5,7 @@ LANGUAGE C; -- Access method CREATE ACCESS METHOD bloom TYPE INDEX HANDLER blhandler; +COMMENT ON ACCESS METHOD bloom IS 'bloom index access method'; -- Opclasses diff --git a/doc/src/sgml/ref/comment.sgml b/doc/src/sgml/ref/comment.sgml index 3321d4b..9582de8 100644 --- a/doc/src/sgml/ref/comment.sgml +++ b/doc/src/sgml/ref/comment.sgml @@ -23,6 +23,7 @@ PostgreSQL documentation COMMENT ON { + ACCESS METHOD object_name | AGGREGATE aggregate_name ( aggregate_signature ) | CAST (source_type AS target_type) | COLLATION object_name | diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index df79a37..4079ac6 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1130,6 +1130,19 @@ testdb=> + +\dA[+] [ pattern ] + + + +Lists access methods. If pattern is specified, only access +methods whose names match the pattern are shown. If ++ is appended to the command name, each access +method is listed with its associated handler function and description. + + + \db[+] [ pattern ] diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 20384db..5d646e7 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -5693,8 +5693,8 @@ opt_restart_seqs: * The COMMENT ON statement can take different forms based upon the type of * the object associated with the comment. The form of the statement is: * - * COMMENT ON [ [ CONVERSION | COLLATION | DATABASE | DOMAIN | - * EXTENSION | EVENT TRIGGER | FOREIGN DATA WRAPPER | + * COMMENT ON [ [ ACCESS METHOD | CONVERSION | COLLATION | DATABASE | + * DOMAIN | EXTENSION | EVENT TRIGGER | FOREIGN DATA WRAPPER | * FOREIGN TABLE | INDEX | [PROCEDURAL] LANGUAGE |
Re: [HACKERS] Tracking wait event for latches
On Fri, May 20, 2016 at 8:14 AM, Michael Paquier wrote: > Hi all, > > As I mentioned $subject a couple of months back after looking at the > wait event facility here: > http://www.postgresql.org/message-id/CAB7nPqTJpgAvOK4qSC96Fpm5W+aCtJ9D=3Vn9AfiEYsur=-j...@mail.gmail.com > I have actually taken some time to implement this idea. > > The particular case that I had in mind was to be able to track in > pg_stat_activity processes that are waiting on a latch for synchronous > replication, and here is what this patch gives in this case: > =# alter system set synchronous_standby_names = 'foo'; > ALTER SYSTEM > =# select pg_reload_conf(); > pg_reload_conf > > t > (1 row) > =# -- Do something > [...] > > And from another session: > =# select wait_event_type, wait_event from pg_stat_activity where pid = 83316; > wait_event_type | wait_event > -+ > Latch | SyncRep > (1 row) > > This is a boring patch, and it relies on the wait event facility that > has been added recently in 9.6. Note a couple of things though: > 1) There is something like 30 code paths calling WaitLatch in the > backend code, all those events are classified and documented similarly > to LWLock events. > 2) After discussing this stuff while at PGCon, it does not seem worth > to have any kind of APIs to be able to add in shared memory custom > latch names that extensions could load through _PG_init(). In > replacement to that, there is a latch type flag called "Extension" > that can be used for this purpose. > Comments are welcome, I am adding that in the 9.7 queue. This is very cool, and I can't wait to have this feature! It'll be useful for all kinds of developers and users. I wanted this today to help debug something I am working on, and remembered this patch. I have signed up as a reviewer for the September commitfest. But here is some initial feedback based on a quick glance at it: This patch allows identifiers to be specified by the WaitLatch and WaitLatchOrSocket calls, but not for WaitEventSetWait, which is the more general waiting primitive. I think it should be done by WaitEventSetWait, and merely passed down by those wrappers functions. These are actually names for *wait points*, not names for latches. Some of the language in this patch makes it sound like they are latch names/latch identifiers, which is inaccurate (the latches themselves wouldn't be very interesting eg MyLatch). In some cases the main thing of interest is actually a socket or timer anyway, not a latch, so maybe it should appear as wait_event_type = WaitEventSet? Is there a reason you chose names like 'WALWriterMain'? That particular wait point is in the function WalWriterMain (note different case). It seems odd to use the containing function names to guide naming, but not use them exactly. Since this namespace needs to be able to name wait points anywhere in the postgres source tree (and maybe outside it too, for extensions), maybe it should be made hierarchical, like 'walwriter.WalWriterMain' (or some other organisational scheme). I personally think it would be very useful for extensions to be able to name their wait points. For example, I'd rather see 'postgres_fdw.pgfdw_get_result' or similar than a vague 'Extension' string which appears not only for all wait points in an extension but also for all extensions. I hope we can figure out a good way to do that. Clearly that would involve some shmem registry machinery to make the names visible across backends (a similar problem exists with lock tranche names). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
On Thu, May 26, 2016 at 03:27:40PM +0530, Amit Kapila wrote: > I think the workers should stop processing tuples after the tuple queues > got detached. This will not only handle above situation gracefully, but > will allow to speed up the queries where Limit clause is present on top of > Gather node. Patch for the same is attached with mail (this was part of > original parallel seq scan patch, but was not applied and the reason as far > as I remember was we thought such an optimization might not be required for > initial version). > > Another approach to fix this issue could be to reset mqh_partial_bytes and > mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED. > These are currently reset only incase of success. [Action required within 72 hours. This is a generic notification.] The above-described topic is currently a PostgreSQL 9.6 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a 9.6 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within 72 hours of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping 9.6rc1. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)
On Thu, May 26, 2016 at 05:08:31PM +0530, Amit Kapila wrote: > Just to summarize, apart from above issue, we have discussed two different > issues related to parallel query in this thread. > a. Push down of parallel restricted clauses to nodes below gather. Patch > to fix same is posted upthread [1]. > b. Wrong assumption that target list can only contain Vars. Patch to fix > same is posted upthread [2]. Test which prove our assumption is wrong is > also posted upthread [3]. > > I will add this issue in list of open issues for 9.6 @ > https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items > > [1] - > https://www.postgresql.org/message-id/CAA4eK1Ky2=HsTsT4hmfL=eal5rv0_t59tvwzvk9hqkvn6do...@mail.gmail.com > [2] - > https://www.postgresql.org/message-id/CAA4eK1L-Uo=s4=0jvvva51pj06u5wddvsqg673yuxj_ja+x...@mail.gmail.com > [3] - > https://www.postgresql.org/message-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=z...@mail.gmail.com [Action required within 72 hours. This is a generic notification.] The above-described topic is currently a PostgreSQL 9.6 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a 9.6 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within 72 hours of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping 9.6rc1. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: pg9.6 segfault using simple query (related to use fk for join estimates)
On Sun, May 29, 2016 at 01:36:01AM -0400, Noah Misch wrote: > On Fri, May 06, 2016 at 03:06:01PM -0400, Robert Haas wrote: > > On Thu, May 5, 2016 at 10:48 AM, David Rowley > > wrote: > > > On 5 May 2016 at 16:04, David Rowley wrote: > > >> I've started making some improvements to this, but need to talk to > > >> Tomas. It's currently in the middle of his night, but will try to > > >> catch him in his morning to discuss this with him. > > > > > > Ok, so I spoke to Tomas about this briefly, and he's asked me to send > > > in this patch. He didn't get time to look over it due to some other > > > commitments he has today. > > > > > > I do personally feel that if the attached is not good enough, or not > > > very close to good enough then probably the best course of action is > > > to revert the whole thing. > > > > Tom, what do you think about this patch? Is it good enough, or should > > we revert the whole thing? > > [This is a generic notification.] > > The above-described topic is currently a PostgreSQL 9.6 open item. Simon, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > 9.6 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within 72 hours of this > message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > well in advance of shipping 9.6rc1. Consequently, I will appreciate your > efforts toward speedy resolution. Thanks. > > [1] > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com This PostgreSQL 9.6 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perf Benchmarking and regression.
On 2016-06-01 15:33:18 -0700, Andres Freund wrote: > Cpu: i7-6820HQ > Ram: 24GB of memory > Storage: Samsung SSD 850 PRO 1TB, encrypted > postgres -c shared_buffers=6GB -c backend_flush_after=128 -c > max_wal_size=100GB -c fsync=on -c synchronous_commit=off > pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N Using scale 5000 database, with wal compression enabled (otherwise the whole thing is too slow in both cases), and 64 clients gives: disabled: latency average = 11.896 ms latency stddev = 42.187 ms tps = 5378.025369 (including connections establishing) tps = 5378.248569 (excluding connections establishing) 128: latency average = 11.002 ms latency stddev = 10.621 ms tps = 5814.586813 (including connections establishing) tps = 5814.840249 (excluding connections establishing) With flushing disabled, rougly every 30s you see: progress: 150.0 s, 6223.3 tps, lat 10.036 ms stddev 9.521 progress: 151.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 152.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 153.0 s, 4952.9 tps, lat 39.050 ms stddev 249.839 progress: 172.0 s, 4888.0 tps, lat 12.851 ms stddev 11.507 progress: 173.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 174.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 175.0 s, 4636.8 tps, lat 41.421 ms stddev 268.416 progress: 196.0 s, 1119.2 tps, lat 9.618 ms stddev 8.321 progress: 197.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 198.0 s, 1920.9 tps, lat 94.375 ms stddev 429.756 progress: 199.0 s, 5260.8 tps, lat 12.087 ms stddev 11.595 With backend flushing enabled there's not a single such pause. If you use spinning rust instead of SSDs, the pauses aren't 1-2s anymore, but easily 10-30s. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does people favor to have matrix data type?
> -Original Message- > From: Jim Nasby [mailto:jim.na...@bluetreble.com] > Sent: Wednesday, June 01, 2016 11:32 PM > To: Kaigai Kouhei(海外 浩平); Gavin Flower; Joe Conway; Ants Aasma; Simon Riggs > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Does people favor to have matrix data type? > > On 5/30/16 9:05 PM, Kouhei Kaigai wrote: > > Due to performance reason, location of each element must be deterministic > > without walking on the data structure. This approach guarantees we can > > reach individual element with 2 steps. > > Agreed. > > On various other points... > > Yes, please keep the discussion here, even when it relates only to PL/R. > Whatever is being done for R needs to be done for plpython as well. I've > looked at ways to improve analytics in plpython related to this, and it > looks like I need to take a look at the fast-path function stuff. One of > the things I've pondered for storing ndarrays in Postgres is how to > reduce or eliminate the need to copy data from one memory region to > another. It would be nice if there was a way to take memory that was > allocated by one manager (ie: python's) and transfer ownership of that > memory directly to Postgres without having to copy everything. Obviously > you'd want to go the other way as well. IIRC cython's memory manager is > the same as palloc in regard to very large allocations basically being > ignored completely, so this should be possible in that case. > > One thing I don't understand is why this type needs to be limited to 1 > or 2 dimensions? Isn't the important thing how many individual elements > you can fit into GPU? So if you can fit a 1024x1024, you could also fit > a 100x100x100, a 32x32x32x32, etc. At low enough values maybe that stops > making sense, but I don't see why there needs to be an artificial limit. > Simply, I didn't looked at the matrix larger than 2-dimensional. Is it a valid mathematic concept? Because of the nature of GPU, it is designed to map threads on X-, Y-, and Z-axis. However, not limited to 3-dimensions, because programmer can handle upper 10bit of X-axis as 4th dimension for example. > I think what's important for something like kNN is that the storage is > optimized for this, which I think means treating the highest dimension > as if it was a list. I don't know if it then matters whither the lower > dimensions are C style vs FORTRAN style. Other algorithms might want > different storage. > FORTRAN style is preferable, because it allows to use BLAS library without data format transformation. I'm not sure why you prefer a list structure on the highest dimension. A simple lookup table is enough and suitable for massive parallel processors. > Something else to consider is the 1G toast limit. I'm pretty sure that's > why MADlib stores matricies as a table of vectors. I know for certain > it's a problem they run into, because they've discussed it on their > mailing list. > > BTW, take a look at MADlib svec[1]... ISTM that's just a special case of > what you're describing with entire grids being zero (or vice-versa). > There might be some commonality there. > > [1] https://madlib.incubator.apache.org/docs/v1.8/group__grp__svec.html > Once we try to deal with a table as representation of matrix, it goes beyond the scope of data type in PostgreSQL. Likely, users have to take something special jobs to treat it, more than operator functions that support matrix data types. For a matrix larger than toast limit, my thought is that; a large matrix which is consists of multiple grid can have multiple toast pointers for each sub-matrix. Although individual sub-matrix must be up to 1GB, we can represent an entire matrix as a set of grid more than 1GB. As I wrote in the previous message, a matrix structure head will have offset to each sub-matrix of grid. The sub-matrix will be inline, or external toast if VARATT_IS_1B_E(ptr). Probably, we also have to hack row deletion code not to leak sub-matrix in the toast table. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety tagging of extension functions
On 06/02/2016 01:41 AM, Michael Paquier wrote: > On Thu, Jun 2, 2016 at 7:36 AM, Andreas Karlsson wrote: >> Looked at this quickly and I do not think adding it would be what I consider >> a small patch since we would essentially need to copy the validation logic >> from DefineAggregate and AggregateCreate and modify it to fit the alter >> case. I am leaning towards either either leaving the aggregate functions >> alone or updating the catalog manually. > > As this is proving to be a hassle, what would it cost to leave those > operator classes as-is for 9.6 and come up with a cleaner solution at > DDL level with 10.0? Then we could still focus on the other extensions > that have content that can be easily switched. That's more than 90% of > the things that need to marked with parallel-safe. I think at least three of the four aggregate functions are little used, so I do not think many users would be affected. And only min(citext) and max(citext) can make use of the parallel aggregation. The functions are: min(citext) max(citext) int_array_aggregate(int4) rewrite(tsquery[]) It would be nice if we had support for this in ALTER AGGREGATE in 9.6 already since I bet that there are external extensions which want to take advantage of the parallel aggregation, but at least if I add this command I do not feel like it would be a minor patch. If people disagree and are fine with copying the validation logic, then I am prepare to do the work. I would just rather not rush this if there is no chance anyway that it will get into 9.6. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety tagging of extension functions
On Thu, Jun 2, 2016 at 7:36 AM, Andreas Karlsson wrote: > On 05/25/2016 03:32 AM, Tom Lane wrote: >> >> Andreas Karlsson writes: >>> >>> On 05/25/2016 02:41 AM, Robert Haas wrote: I'd rather extend see us ALTER AGGREGATE to do this. >> >> >>> Wouldn't that prevent this from going into 9.6? I do not think changing >>> ALTER AGGREGATE is 9.6 materials. >> >> >> Well, it's debatable --- but if the patch to do it is small and the >> alternatives are really ugly, that would be an acceptable choice IMO. >> Certainly we'd want to add that capability eventually anyway. > > > Looked at this quickly and I do not think adding it would be what I consider > a small patch since we would essentially need to copy the validation logic > from DefineAggregate and AggregateCreate and modify it to fit the alter > case. I am leaning towards either either leaving the aggregate functions > alone or updating the catalog manually. As this is proving to be a hassle, what would it cost to leave those operator classes as-is for 9.6 and come up with a cleaner solution at DDL level with 10.0? Then we could still focus on the other extensions that have content that can be easily switched. That's more than 90% of the things that need to marked with parallel-safe. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT
On Thu, Jun 2, 2016 at 6:40 AM, Tom Lane wrote: > I suggest that there's a more principled reason for refusing a back-patch > here, which is that we don't back-patch new features, only bug fixes. > This request is certainly not a bug fix. It's in support of a new feature > --- and one that's not even ours, but a third-party extension. Yes, that's not a bug fix. I agree on that. > Considering that said extension isn't finished yet, it's hard to guess > whether this will be the only blocking factor preventing it from working > with older versions, but it might well be that there are other problems. > Also, even if it would work, the author would be reduced to saying things > like "it will work in 9.4, if it's 9.4.9 or later". Keeping track of that > level of detail is no fun either for the author or for users. The extension in this case is for 9.4, for a product yet to be released that is now on 9.4.8, so I don't care much about the support grid here to be honest. > It'd be a lot simpler all around to just say "my spiffy new extension > requires 9.6 > or later". Well, that's the only factor as far as I saw that prevented me to use this extension on Windows. But I won't fight your might regarding a backpatch, wearing the burden of a custom patch applied to miscadmin.h for REL9_4_STABLE is not huge: this code never changes. > In short, I'd vote for putting this change in HEAD, but I see no need to > back-patch. OK, fine for me. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety tagging of extension functions
On 05/25/2016 03:32 AM, Tom Lane wrote: Andreas Karlsson writes: On 05/25/2016 02:41 AM, Robert Haas wrote: I'd rather extend see us ALTER AGGREGATE to do this. Wouldn't that prevent this from going into 9.6? I do not think changing ALTER AGGREGATE is 9.6 materials. Well, it's debatable --- but if the patch to do it is small and the alternatives are really ugly, that would be an acceptable choice IMO. Certainly we'd want to add that capability eventually anyway. Looked at this quickly and I do not think adding it would be what I consider a small patch since we would essentially need to copy the validation logic from DefineAggregate and AggregateCreate and modify it to fit the alter case. I am leaning towards either either leaving the aggregate functions alone or updating the catalog manually. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perf Benchmarking and regression.
On 2016-05-31 16:03:46 -0400, Robert Haas wrote: > On Fri, May 27, 2016 at 12:37 AM, Andres Freund wrote: > > I don't think the situation is quite that simple. By *disabling* backend > > flushing it's also easy to see massive performance regressions. In > > situations where shared buffers was configured appropriately for the > > workload (not the case here IIRC). > > On what kind of workload does setting backend_flush_after=0 represent > a large regression vs. the default settings? > > I think we have to consider that pgbench and parallel copy are pretty > common things to want to do, and a non-zero default setting hurts > those workloads a LOT. I don't think pgbench's workload has much to do with reality. Even less so in the setup presented here. The slowdown comes from the fact that default pgbench randomly, but uniformly, updates a large table. Which is slower with backend_flush_after if the workload is considerably bigger than shared_buffers, but, and that's a very important restriction, the workload at the same time largely fits in to less than /proc/sys/vm/dirty_ratio / 20% (probably even 10% / /proc/sys/vm/dirty_background_ratio) of the free os memory. The "trick" in that case is that very often, before a buffer has been written back to storage by the OS, it'll be re-dirtied by postgres. Which means triggering flushing by postgres increases the total amount of writes. That only matters if the kernel doesn't trigger writeback because of the above ratios, or because of time limits (30s / dirty_writeback_centisecs). > I have a really hard time believing that the benefits on other > workloads are large enough to compensate for the slowdowns we're > seeing here. As a random example, without looking for good parameters, on my laptop: pgbench -i -q -s 1000 Cpu: i7-6820HQ Ram: 24GB of memory Storage: Samsung SSD 850 PRO 1TB, encrypted postgres -c shared_buffers=6GB -c backend_flush_after=128 -c max_wal_size=100GB -c fsync=on -c synchronous_commit=off pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N (note the -N) disabled: latency average = 2.774 ms latency stddev = 10.388 ms tps = 5761.883323 (including connections establishing) tps = 5762.027278 (excluding connections establishing) 128: latency average = 2.543 ms latency stddev = 3.554 ms tps = 6284.069846 (including connections establishing) tps = 6284.184570 (excluding connections establishing) Note the latency dev which is 3x better. And the improved throughput. That's for a workload which even fits into the OS memory. Without backend flushing there's several periods looking like progress: 249.0 s, 7237.6 tps, lat 1.997 ms stddev 4.365 progress: 250.0 s, 0.0 tps, lat -nan ms stddev -nan progress: 251.0 s, 1880.6 tps, lat 17.761 ms stddev 169.682 progress: 252.0 s, 6904.4 tps, lat 2.328 ms stddev 3.256 i.e. moments in which no transactions are executed. And that's on storage that can do 500MB/sec, and tens of thousand IOPs. If you change the workload workload that uses synchronous_commit, is bigger than OS memory and/or doesn't have very fast storage, the differences can be a *LOT* bigger. In general, any workload which doesn't fit a) the above criteria of likely re-dirtying blocks it already dirtied, before kernel triggered writeback happens b) concurrently COPYs into an indvidual file, is likely to be faster (or unchanged if within s_b) with backend flushing. Which means that transactional workloads that are bigger than the OS memory, or which have a non-uniform distribution leading to some locality, are likely to be faster. In practice those are *hugely* more likely than the uniform distribution that pgbench has. Similarly, this *considerably* reduces the impact a concurrently running vacuum or COPY has on concurrent queries. Because suddenly VACUUM/COPY can't create a couple gigabytes of dirty buffers which will be written back at some random point in time later, stalling everything. I think the benefits of a more predictable (and often faster!) performance in a bunch of actual real-worl-ish workloads are higher than optimizing for benchmarks. > We have nobody writing in to say that > backend_flush_after>0 is making things way better for them, and > Ashutosh and I have independently hit massive slowdowns on unrelated > workloads. Actually, we have some of evidence of that? Just so far not in this thread; which I don't find particularly surprising. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT
On Wed, Jun 1, 2016 at 5:40 PM, Tom Lane wrote: > > On Wed, Jun 1, 2016 at 5:04 PM, Robert Haas > wrote: > >> Probably not, but yes, I do want to reduce the commit load. I also > >> think that we essentially have a contract with our users to limit what > >> we back-patch to critical bug fixes and security fixes. When we don't > >> do that, people start asking to have individual fixes cherry-picked > >> instead of just upgrading, and that's not good. We may know that such > >> changes are low-risk, but that doesn't mean everyone else does. > > I suggest that there's a more principled reason for refusing a back-patch > here, which is that we don't back-patch new features, only bug fixes. > This request is certainly not a bug fix. It's in support of a new feature > --- and one that's not even ours, but a third-party extension. > Maybe I don't understand PGDLLEXPORT... The PostgreSQL function/feature in question is already in place and can be accessed by someone using Linux or other unix-like variant. But it cannot be access by our Window's users because we failed to add a PGDLLEXPORT somewhere. If it is our goal to treat Windows and Linux/Unix equally then that discrepancy is on its face a bug. The fact we don't catch these until some third-party points it out doesn't make it any less a bug. > Considering that said extension isn't finished yet, it's hard to guess > whether this will be the only blocking factor preventing it from working > with older versions, but it might well be that there are other problems. > From my prior point the reason someone wants to use the unexposed but documented public API shouldn't matter. Also, even if it would work, the author would be reduced to saying things > like "it will work in 9.4, if it's 9.4.9 or later". Keeping track of that level of detail is no fun either for the author or for users. This seems hollow. "It works on the latest version of all officially supported PostgreSQL releases" is a reasonable policy for a third-party application to take. That it might work on older releases would be a bonus for some new users. It'd be > a lot simpler all around to just say "my spiffy new extension requires 9.6 > or later". > And it makes the available user base considerably smaller. A small change to get the other 80% of the users is something to take seriously. > > In short, I'd vote for putting this change in HEAD, but I see no need to > back-patch. > I see a need for back-patching and no technical reason why it cannot be done - easily. David J.
Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT
> On Wed, Jun 1, 2016 at 5:04 PM, Robert Haas wrote: >> Probably not, but yes, I do want to reduce the commit load. I also >> think that we essentially have a contract with our users to limit what >> we back-patch to critical bug fixes and security fixes. When we don't >> do that, people start asking to have individual fixes cherry-picked >> instead of just upgrading, and that's not good. We may know that such >> changes are low-risk, but that doesn't mean everyone else does. I suggest that there's a more principled reason for refusing a back-patch here, which is that we don't back-patch new features, only bug fixes. This request is certainly not a bug fix. It's in support of a new feature --- and one that's not even ours, but a third-party extension. Considering that said extension isn't finished yet, it's hard to guess whether this will be the only blocking factor preventing it from working with older versions, but it might well be that there are other problems. Also, even if it would work, the author would be reduced to saying things like "it will work in 9.4, if it's 9.4.9 or later". Keeping track of that level of detail is no fun either for the author or for users. It'd be a lot simpler all around to just say "my spiffy new extension requires 9.6 or later". In short, I'd vote for putting this change in HEAD, but I see no need to back-patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 06/01/2016 02:21 PM, Robert Haas wrote: > If you lined up ten people in a room all of whom were competent > database professionals and none of whom knew anything about PostgreSQL > and asked them to guess what a setting called work_mem does and what a > setting called max_parallel_degree does, I will wager you $5 that > they'd do better on the second one. Likewise, I bet the guesses for > max_parallel_degree would be closer to the mark than the guesses for > maintenance_work_mem or replacement_sort_tuples or commit_siblings or > bgwriter_lru_multiplier. Incidentally, the reason I didn't jump into this thread until the patches showed up is that I don't think it actually matters what the parameters are named. They're going to require documentation regardless, parallism just isn't something people grok instinctively. I care about how the parameters *work*, and whether that's consistent across our various resource management settings. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
Robert Haas writes: > I've largely given up hope of coming up with an alternative that can > attract more than one vote and that is also at least mildly accurate, > but one idea is max_parallel_workers_per_gather_node. That will be > totally clear. Given the reference to Gather nodes, couldn't you drop the word "parallel"? "node" might not be necessary either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT
On Wed, Jun 1, 2016 at 5:04 PM, Robert Haas wrote: > On Wed, Jun 1, 2016 at 12:24 PM, Alvaro Herrera > wrote: > > Robert Haas wrote: > >> On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer > wrote: > >> > On 1 June 2016 at 11:48, Michael Paquier > wrote: > >> >> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD > >> >> and back-branches? > >> > > >> > Sounds sensible to me. > >> > >> I don't really want to set a precedent that we'll back-patch > >> PGDLLIMPORT markings every time somebody needs a new symbol for some > >> extension they are writing, but I don't mind changing this in master. > > > > I wonder why is that -- just to reduce the commit load? I don't think > > this kind of change is likely to break anything, is it? > > Probably not, but yes, I do want to reduce the commit load. I also > think that we essentially have a contract with our users to limit what > we back-patch to critical bug fixes and security fixes. When we don't > do that, people start asking to have individual fixes cherry-picked > instead of just upgrading, and that's not good. We may know that such > changes are low-risk, but that doesn't mean everyone else does. Are there two concerns here? One, that people will think we are back-patching stuff and destabilizing back-branches, and two, that people will see increase back-patching and therefore make unreasonable requests of us to which we dislike saying "no". The later doesn't seem likely, and I'd say you can't stop people from having badly formed opinions and that our track record on back-patching decisions is excellent. We want third-party tools to support our prior releases and if miss making one of our features available to Windows because of a missing PGDLLIMPORT that's on our heads and should be fixed. If a user equates that to "please batch-patch jsonb to 9.3 because I don't want to upgrade" I'm not going to feel much guilt saying "that's different, ain't gonna happen". Informed people will understand the purpose of the back-patch and until I start hearing a vocal uninformed person start griping I'd rather give the community the benefit of the doubt. David J.
Re: [HACKERS] Rename max_parallel_degree?
On Wed, Jun 1, 2016 at 10:10 AM, Tom Lane wrote: > Amit Kapila writes: >> Your explanation is clear, however the name max_parallel_workers makes it >> sound like that parallelising an operation is all about workers. Yes it >> depends a lot on the number of workers allocated for parallel operation, >> but that is not everything. I think calling it max_parallelism as >> suggested by Alvaro upthread suits better than max_parallel_workers. > > I don't think that's a good direction at all. This entire discussion is > caused by the fact that it's not very clear what "max_parallel_degree" > measures. Fixing that problem by renaming the variable to something that > doesn't even pretend to tell you what it's counting is not an improvement. I agree with that, but I also think you're holding the name of this GUC to a ridiculously high standard. It's not like you can look at "work_mem" and know what it measures without reading the fine manual. If you lined up ten people in a room all of whom were competent database professionals and none of whom knew anything about PostgreSQL and asked them to guess what a setting called work_mem does and what a setting called max_parallel_degree does, I will wager you $5 that they'd do better on the second one. Likewise, I bet the guesses for max_parallel_degree would be closer to the mark than the guesses for maintenance_work_mem or replacement_sort_tuples or commit_siblings or bgwriter_lru_multiplier. I've largely given up hope of coming up with an alternative that can attract more than one vote and that is also at least mildly accurate, but one idea is max_parallel_workers_per_gather_node. That will be totally clear. Three possible problems, none of them fatal, are: - I have plans to eventually fix it so that the workers are shared across the whole plan rather than just the plan node. In most cases that won't matter, but there are corner cases where it does. Now when that happens, we can rename this to max_parallel_workers_per_query, breaking backward compatibility. - It will force us to use a different GUC for utility commands. - It's a bit long-winded. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typo in comment in nbtree.h
Hi Following along with a btree bug report, I saw a typo "referencd" in a comment. Also "we've" seems a bit odd here, but maybe it's just me. Maybe it should be like this? --- a/src/include/access/nbtree.h +++ b/src/include/access/nbtree.h @@ -522,7 +522,7 @@ typedef struct BTScanPosData Buffer buf;/* if valid, the buffer is pinned */ XLogRecPtr lsn;/* pos in the WAL stream when page was read */ - BlockNumber currPage; /* page we've referencd by items array */ + BlockNumber currPage; /* page referenced by items array */ BlockNumber nextPage; /* page's right link when we scanned it */ /* -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT
On Wed, Jun 1, 2016 at 12:24 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer wrote: >> > On 1 June 2016 at 11:48, Michael Paquier wrote: >> >> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD >> >> and back-branches? >> > >> > Sounds sensible to me. >> >> I don't really want to set a precedent that we'll back-patch >> PGDLLIMPORT markings every time somebody needs a new symbol for some >> extension they are writing, but I don't mind changing this in master. > > I wonder why is that -- just to reduce the commit load? I don't think > this kind of change is likely to break anything, is it? Probably not, but yes, I do want to reduce the commit load. I also think that we essentially have a contract with our users to limit what we back-patch to critical bug fixes and security fixes. When we don't do that, people start asking to have individual fixes cherry-picked instead of just upgrading, and that's not good. We may know that such changes are low-risk, but that doesn't mean everyone else does. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Paul Ramsey writes: > One of the things people find annoying about postgis is that > ST_Intersects(ST_Intersection(a, b), a) can come out as false (a > derived point at a crossing of lines may not exactly intersect either > of the input lines), which is a direct result of our use of exact math > for the boolean intersects test. That's an interesting comment, because it's more or less exactly the type of failure that we could expect to get if we remove fuzzy comparisons from the built-in types. How much of a problem is it in practice for PostGIS users? Do you have any plans to change it? > Anyways, go forth and do whatever makes sense for PgSQL I think we're trying to figure out what that is ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON[B] arrays are second-class citizens
2016-06-01 17:55 GMT+02:00 Jim Nasby : > On 5/31/16 7:04 PM, Peter van Hardenberg wrote: > >> The idea of converting a JSONB array to a PG array is appealing and >> would potentially be more general-purpose than adding a new unnest. I'm >> not sure how feasible either suggestion is. >> > > The one part I think is missing right now is unnest allows you to 'stitch' > or 'zip' multiple arrays together into a single recordset via > unnest(array1, array2, ...). Presumably that could be added to the json > equivalents. > > I will say that I think the current state of affairs is gratuitously >> verbose and expects users to memorize a substantial number of long >> function names to perform simple tasks. >> > > +100. It's *much* easier to deal with JSON in other languages because they > have native support for the concept of a dictionary, so changing an element > is as simple as json['foo'][3] = 'new'. Trying to do that in Postgres is > horrible partly because of the need to remember some odd operator, but > moreso because it's ultimately still an operator. What we need is a form of > *addressing*. If you could directly access items in a JSON doc with [] > notation then a lot of the current functions could go away, *especially* if > the [] notation allowed things like a slice and a list of values (ie: > json['foo', 'bar', 'baz'] = '[42,{"my": "nice object"},"with a random > string"]'. Or = row(42, ...). > these features I would to see in Postgres too. Regards Pavel > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Misdesigned command/status APIs for parallel dump/restore
I wrote: > I am pretty strongly tempted to get rid of MasterStartParallelItem > altogether and just hard-code what it does in DispatchJobForTocEntry. > ... > A different line of thought would be to fix the worker-command-parsing > situation by allowing the parsing to happen in format-specific code, > but avoid duplicative coding by letting archive formats share a common > implementation function if they had no need for any custom data. In the attached patch for this, I took a middle ground of separating out the command and status string building and parsing functions. There isn't presently any provision for letting archive format modules override these, but that could easily be added if we ever need it. Meanwhile, this saves about 100 lines of rather messy code. Like the other couple of patches I've posted recently for parallel dump, this is against current HEAD. There will be minor merge conflicts when these patches are combined; but they should be reviewable independently, so I'll worry about the merge issues later. regards, tom lane diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c index e9e8698..509a312 100644 *** a/src/bin/pg_dump/parallel.c --- b/src/bin/pg_dump/parallel.c *** *** 20,49 * the desired number of worker processes, which each enter WaitForCommands(). * * The master process dispatches an individual work item to one of the worker ! * processes in DispatchJobForTocEntry(). That calls ! * AH->MasterStartParallelItemPtr, a routine of the output format. This ! * function's arguments are the parents archive handle AH (containing the full ! * catalog information), the TocEntry that the worker should work on and a ! * T_Action value indicating whether this is a backup or a restore task. The ! * function simply converts the TocEntry assignment into a command string that ! * is then sent over to the worker process. In the simplest case that would be ! * something like "DUMP 1234", with 1234 being the TocEntry id. ! * * The worker process receives and decodes the command and passes it to the * routine pointed to by AH->WorkerJobDumpPtr or AH->WorkerJobRestorePtr, * which are routines of the current archive format. That routine performs ! * the required action (dump or restore) and returns a malloc'd status string. ! * The status string is passed back to the master where it is interpreted by ! * AH->MasterEndParallelItemPtr, another format-specific routine. That ! * function can update state or catalog information on the master's side, ! * depending on the reply from the worker process. In the end it returns a ! * status code, which is 0 for successful execution. * ! * Remember that we have forked off the workers only after we have read in ! * the catalog. That's why our worker processes can also access the catalog ! * information. (In the Windows case, the workers are threads in the same ! * process. To avoid problems, they work with cloned copies of the Archive ! * data structure; see init_spawned_worker_win32().) * * In the master process, the workerStatus field for each worker has one of * the following values: --- 20,41 * the desired number of worker processes, which each enter WaitForCommands(). * * The master process dispatches an individual work item to one of the worker ! * processes in DispatchJobForTocEntry(). We send a command string such as ! * "DUMP 1234" or "RESTORE 1234", where 1234 is the TocEntry ID. * The worker process receives and decodes the command and passes it to the * routine pointed to by AH->WorkerJobDumpPtr or AH->WorkerJobRestorePtr, * which are routines of the current archive format. That routine performs ! * the required action (dump or restore) and returns an integer status code. ! * This is passed back to the master which updates its state accordingly. * ! * In principle additional archive-format-specific information might be needed ! * in commands or worker status responses, but so far that hasn't proved ! * necessary, since workers have full copies of the ArchiveHandle/TocEntry ! * data structures. Remember that we have forked off the workers only after ! * we have read in the catalog. That's why our worker processes can also ! * access the catalog information. (In the Windows case, the workers are ! * threads in the same process. To avoid problems, they work with cloned ! * copies of the Archive data structure; see init_spawned_worker_win32().) * * In the master process, the workerStatus field for each worker has one of * the following values: *** ParallelBackupEnd(ArchiveHandle *AH, Par *** 703,708 --- 695,804 } /* + * These next four functions handle construction and parsing of the command + * strings and response strings for parallel workers. + * + * Currently, these can be the same regardless of which archive format we are + * processing. In future,
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Jun 1, 2016 at 8:59 AM, Jim Nasby wrote: > On 6/1/16 10:03 AM, Paul Ramsey wrote: >> >> We don't depend on these, we have our own :/ >> The real answer for a GIS system is to have an explicit tolerance >> parameter for calculations like distance/touching/containment, but >> unfortunately we didn't do that so now we have our own >> compatibility/boil the ocean problem if we ever wanted/were funded to >> add one. > > > Well it sounds like what's currently happening in Postgres is probably going > to change, so how might we structure that to help PostGIS? Would simply > lopping off the last few bits of the significand/mantissa work, or is that > not enough when different GRSes are involved? PostGIS doesn't look at all at what the PgSQL geotypes do, so go forward w/o fear. Tolerance in geo world is more than vertex rounding though, it's things like saying that when distance(pt,line) < epsilon then distance(pt,line) == 0, or similarly for shape touching, etc. One of the things people find annoying about postgis is that ST_Intersects(ST_Intersection(a, b), a) can come out as false (a derived point at a crossing of lines may not exactly intersect either of the input lines), which is a direct result of our use of exact math for the boolean intersects test. Anyways, go forth and do whatever makes sense for PgSQL P > > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON[B] arrays are second-class citizens
On Tue, May 31, 2016 at 06:15:32PM -0400, David G. Johnston wrote: > I stand corrected. I was thinking you could somehow craft unnest(' value here>') but there is no way to auto-convert to "anyarray"... > > > The json_array_elements family manages to do the right thing. Why > > would it be harder to make sure UNNEST and ROWS FROM() do so? > > > > I have apparently failed to understand your point. All I saw was that you > wanted "unnest(jsonb)" to work in an identical fashion to > "jsonb_array_elements(jsonb)". If there is some aspect beyond this being > an aliasing situation then you have failed to communicate it such that I > comprehended that fact. > Upon further investigation, I think UNNEST should Just Work™ which is to say that it should unnest arrays into their top-level constituent elements if the standard doesn't specify some other behavior. Separately, I suppose, I think there needs to be an easy way to cast the output of UNNEST. Lacking knowledge of the intricacies of parsing, etc., I'd propose CAST(UNNEST(...) AS ...), or better yet, UNNEST(...):: at least in the case without WITH ORDINALITY. Further out in the future, at least so it seems to me, it would be nice to have a feature where one could cast a column to an expanded row type, e.g.: SELECT my_jsonb::(i INT, t TEXT, p POINT), foo, bar FROM ... and get a result set with 5 columns in it. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT
Robert Haas wrote: > On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer wrote: > > On 1 June 2016 at 11:48, Michael Paquier wrote: > >> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD > >> and back-branches? > > > > Sounds sensible to me. > > I don't really want to set a precedent that we'll back-patch > PGDLLIMPORT markings every time somebody needs a new symbol for some > extension they are writing, but I don't mind changing this in master. I wonder why is that -- just to reduce the commit load? I don't think this kind of change is likely to break anything, is it? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON[B] arrays are second-class citizens
On Tue, May 31, 2016 at 6:20 PM, Tom Lane wrote: > David Fetter writes: > > On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote: > >> While likely not that common the introduction of an ambiguity makes > >> raises the bar considerably. > > > What ambiguity? > > You get that as a result of the recent introduction of unnest(tsvector), > which we debated a few weeks ago and seem to have decided to leave as-is. > But it failed before 9.6 too, with > > So at least in this particular case, adding unnest(jsonb) wouldn't be a > problem from the standpoint of not being able to resolve calls that we > could resolve before. > > Nonetheless, there *is* an ambiguity here, which is specific to json(b): > what type of array are you expecting to get? The reason we have both > json[b]_array_elements() and json[b]_array_elements_text() is that there > are plausible use-cases for returning either json or plain text. It's not > hard to imagine that somebody will want json[b]_array_elements_numeric() > before long, too. If you want to have an unnest(jsonb) then you will need > to make an arbitrary decision about which type it will return, and that > doesn't seem like an especially great idea to me. > I'm on the fence given the presence of the tsvector overload and the lack of any syntactic concerns. I would either have it keep the same form as our main unnest function: and/or have two functions unnest(json, anyelement) : anyelement unnest(jsonb, anyelement) : anyelement The first one cannot fail at runtime (do to type conversion) while the later two can. If you can't tell I do like our introduction of what are basically Java generics into our idiomatic json implementation. I'd call this implementing a better option for polymorphism than creating a new function every time someone wants typed output. David J.
Re: [HACKERS] JSON[B] arrays are second-class citizens
On Wed, Jun 1, 2016 at 11:55 AM, Jim Nasby wrote: > On 5/31/16 7:04 PM, Peter van Hardenberg wrote: > >> The idea of converting a JSONB array to a PG array is appealing and >> would potentially be more general-purpose than adding a new unnest. I'm >> not sure how feasible either suggestion is. >> > > The one part I think is missing right now is unnest allows you to 'stitch' > or 'zip' multiple arrays together into a single recordset via > unnest(array1, array2, ...). Presumably that could be added to the json > equivalents. You can just use "ROWS FROM" to get the same result. https://www.postgresql.org/docs/9.6/static/queries-table-expressions.html#QUERIES-TABLEFUNCTIONS """ The special table function UNNEST may be called with any number of array parameters, and it returns a corresponding number of columns, as if UNNEST (Section 9.18) had been called on each parameter separately and combined using the ROWS FROM construct. """ Yes, the unnest form can be used within the target-list but that argument is not going to get you very far as that use of SRF is greatly frowned upon now that we have LATERAL. David J.
Re: [HACKERS] Parallel safety tagging of extension functions
On 06/01/2016 05:15 PM, Tom Lane wrote: Andreas Karlsson writes: On 06/01/2016 04:44 PM, Tom Lane wrote: I don't understand why you think you need the CREATE OR REPLACE FUNCTION commands? We only need to change proargtypes, and the updates did that. The catcache can take care of itself. Maybe I did something wrong (I try to avoid doing manual catalog updates), but when I tested it, I needed to run the CREATE OR REPLACE FUNCTION command to later be able to set the parallel safety. See the example below. In this particular example, the problem seems to be that you forgot to update pronargs; it works for me when I add "pronargs = 2" to the UPDATE. Your "working" example is actually creating a new function, not replacing the old pg_proc entry. BTW, it strikes me that the pronamespace tests in these queries are redundant: the OID is unique, so what matters is the search path in the regprocedure lookups. Thanks, I have fixed this. The reason I use to_regprocedure is so that these scripts work for people who have installed the extensions in beta1 and therefore only have the new signatures. If they run these scripts they would get an error if I used the cast. Is it ok if these scripts break for beta1 users? Ugh. This is more of a mess than I thought. I don't like update scripts that might silently fail to do what they're supposed to, but leaving beta1 users in the lurch is not very nice either. I wonder ... could we get away with using regproc rather than regprocedure? These function names are probably unique anyway ... Yeah, I would have strongly preferred to be able to use the cast. All these functions have unique names within the core, but there is the small risk of someone adding a user function with the same name. I do not like either option. The attached patch still uses to_regprocedure, but I can change to using ::regproc if that is what you prefer. Andreas gin-gist-signatures-v2.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On 6/1/16 10:03 AM, Paul Ramsey wrote: We don't depend on these, we have our own :/ The real answer for a GIS system is to have an explicit tolerance parameter for calculations like distance/touching/containment, but unfortunately we didn't do that so now we have our own compatibility/boil the ocean problem if we ever wanted/were funded to add one. Well it sounds like what's currently happening in Postgres is probably going to change, so how might we structure that to help PostGIS? Would simply lopping off the last few bits of the significand/mantissa work, or is that not enough when different GRSes are involved? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON[B] arrays are second-class citizens
On 5/31/16 7:04 PM, Peter van Hardenberg wrote: The idea of converting a JSONB array to a PG array is appealing and would potentially be more general-purpose than adding a new unnest. I'm not sure how feasible either suggestion is. The one part I think is missing right now is unnest allows you to 'stitch' or 'zip' multiple arrays together into a single recordset via unnest(array1, array2, ...). Presumably that could be added to the json equivalents. I will say that I think the current state of affairs is gratuitously verbose and expects users to memorize a substantial number of long function names to perform simple tasks. +100. It's *much* easier to deal with JSON in other languages because they have native support for the concept of a dictionary, so changing an element is as simple as json['foo'][3] = 'new'. Trying to do that in Postgres is horrible partly because of the need to remember some odd operator, but moreso because it's ultimately still an operator. What we need is a form of *addressing*. If you could directly access items in a JSON doc with [] notation then a lot of the current functions could go away, *especially* if the [] notation allowed things like a slice and a list of values (ie: json['foo', 'bar', 'baz'] = '[42,{"my": "nice object"},"with a random string"]'. Or = row(42, ...). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Wed, Jun 1, 2016 at 11:45 AM, Petr Jelinek wrote: > That GUC also controls worker processes that are started by extensions, > not just ones that parallel query starts. This is btw one thing I don't > like at all about how the current limits work, the parallel query will > fight for workers with extensions because they share the same limit. Given that this models reality the GUC is doing its job. Now, maybe we need additional knobs to give the end-user the ability to influence how those fights will turn out. But as far as a high-level setting goes max_worker_processes seems to fit the bill - and apparently fits within our existing cluster options naming convention. Parallel query uses workers to assist in query execution. Background tasks use workers during execution. Others. David J.
Re: [HACKERS] Rename max_parallel_degree?
On 01/06/16 17:27, Jim Nasby wrote: On 5/31/16 8:48 PM, Robert Haas wrote: On Tue, May 31, 2016 at 5:58 PM, Tom Lane wrote: Alvaro Herrera writes: Robert Haas wrote: I just want to point out that if we change #1, we're breaking postgresql.conf compatibility for, IMHO, not a whole lot of benefit. I'd just leave it alone. We can add the old name as a synonym in guc.c to maintain compatibility. I doubt this is much of an issue at this point; max_worker_processes has only been there a release or so, and surely there are very few people explicitly setting it, given its limited use-case up to now. It will be really hard to change it after 9.6, but I think we could still get away with that today. max_worker_processes was added in 9.4, so it's been there for two releases, but it probably is true that few people have set it. Nevertheless, I don't think there's much evidence that it is a bad enough name that we really must change it. ISTM that all the confusion about parallel query would go away if the setting was max_parallel_assistants instead of _workers. It's exactly how parallel query works: there are helpers that *assist* the backend in executing the query. The big downside to "assistants" is it breaks all lexical connection to max_worker_processes. So what if we change that to max_assistant_processes? I think "assistant" and "worker" are close enough in meaning for "stand alone" uses of BG workers so as not to be confusing, and I don't see any options for parallelism that are any clearer. That GUC also controls worker processes that are started by extensions, not just ones that parallel query starts. This is btw one thing I don't like at all about how the current limits work, the parallel query will fight for workers with extensions because they share the same limit. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 5/31/16 8:48 PM, Robert Haas wrote: On Tue, May 31, 2016 at 5:58 PM, Tom Lane wrote: Alvaro Herrera writes: Robert Haas wrote: I just want to point out that if we change #1, we're breaking postgresql.conf compatibility for, IMHO, not a whole lot of benefit. I'd just leave it alone. We can add the old name as a synonym in guc.c to maintain compatibility. I doubt this is much of an issue at this point; max_worker_processes has only been there a release or so, and surely there are very few people explicitly setting it, given its limited use-case up to now. It will be really hard to change it after 9.6, but I think we could still get away with that today. max_worker_processes was added in 9.4, so it's been there for two releases, but it probably is true that few people have set it. Nevertheless, I don't think there's much evidence that it is a bad enough name that we really must change it. ISTM that all the confusion about parallel query would go away if the setting was max_parallel_assistants instead of _workers. It's exactly how parallel query works: there are helpers that *assist* the backend in executing the query. The big downside to "assistants" is it breaks all lexical connection to max_worker_processes. So what if we change that to max_assistant_processes? I think "assistant" and "worker" are close enough in meaning for "stand alone" uses of BG workers so as not to be confusing, and I don't see any options for parallelism that are any clearer. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety tagging of extension functions
Andreas Karlsson writes: > On 06/01/2016 04:44 PM, Tom Lane wrote: >> I don't understand why you think you need the CREATE OR REPLACE FUNCTION >> commands? We only need to change proargtypes, and the updates did that. >> The catcache can take care of itself. > Maybe I did something wrong (I try to avoid doing manual catalog > updates), but when I tested it, I needed to run the CREATE OR REPLACE > FUNCTION command to later be able to set the parallel safety. See the > example below. In this particular example, the problem seems to be that you forgot to update pronargs; it works for me when I add "pronargs = 2" to the UPDATE. Your "working" example is actually creating a new function, not replacing the old pg_proc entry. BTW, it strikes me that the pronamespace tests in these queries are redundant: the OID is unique, so what matters is the search path in the regprocedure lookups. > The reason I use to_regprocedure is so that these scripts work for > people who have installed the extensions in beta1 and therefore only > have the new signatures. If they run these scripts they would get an > error if I used the cast. Is it ok if these scripts break for beta1 users? Ugh. This is more of a mess than I thought. I don't like update scripts that might silently fail to do what they're supposed to, but leaving beta1 users in the lurch is not very nice either. I wonder ... could we get away with using regproc rather than regprocedure? These function names are probably unique anyway ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On 06/01/2016 07:52 AM, Jim Nasby wrote: > On 6/1/16 9:27 AM, Tom Lane wrote: >> Kevin Grittner writes: >>> > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas >>> wrote: >> Figuring out what to do about it is harder. Your proposal seems to be >> to remove them except where we need the fuzzy behavior, which doesn't >> sound unreasonable, but I don't personally understand why we need it >> in some places and not others. >>> > +1 >>> > My first inclination is to remove those macros in version 10, but >>> > it would be good to hear from some people using these types on what >>> > the impact of that would be. >> As I understand it, the key problem is that tests like "is point on line" >> would basically never succeed except in the most trivial cases, >> because of >> roundoff error. That's not very nice, and it might cascade to larger >> problems like object-containment tests failing unexpectedly. We would >> need to go through all the geometric operations and figure out where that >> kind of gotcha is significant and what we can do about it. Seems like a >> fair amount of work :-(. If somebody's willing to do that kind of >> investigation, then sure, but I don't think just blindly removing these >> macros is going to lead to anything good. > > I suspect another wrinkle here is that in the GIS world a single point > can be represented it multiple reference/coordinate systems, and it > would have different values in each of them. AIUI the transforms between > those systems can be rather complicated if different projection methods > are involved. I don't know if PostGIS depends on what these macros are > doing or not. If it doesn't, perhaps it would be sufficient to lop of > the last few bits of the significand. ISTM that'd be much better than > what the macros currently do. IIRC PostGIS uses a function from libgeos to do things like "point equals" (ST_Equals). I've never looked at that source, but would be unsurprised to find that it does something similar to this although probably also more sophisticated. (looks) yes -- ST_Equals calls GEOSEquals() after some sanity checking... > BTW, I suspect the macro values were chosen specifically for dealing > with LAT/LONG. I think that is it exactly. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Jun 1, 2016 at 7:52 AM, Jim Nasby wrote: > > I suspect another wrinkle here is that in the GIS world a single point can > be represented it multiple reference/coordinate systems, and it would have > different values in each of them. AIUI the transforms between those systems > can be rather complicated if different projection methods are involved. I > don't know if PostGIS depends on what these macros are doing or not. If it > doesn't, perhaps it would be sufficient to lop of the last few bits of the > significand. ISTM that'd be much better than what the macros currently do. We don't depend on these, we have our own :/ The real answer for a GIS system is to have an explicit tolerance parameter for calculations like distance/touching/containment, but unfortunately we didn't do that so now we have our own compatibility/boil the ocean problem if we ever wanted/were funded to add one. P. > BTW, I suspect the macro values were chosen specifically for dealing with > LAT/LONG. > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON[B] arrays are second-class citizens
On Tue, May 31, 2016 at 06:20:26PM -0400, Tom Lane wrote: > David Fetter writes: > > On Tue, May 31, 2016 at 05:06:00PM -0400, David G. Johnston wrote: > >> While likely not that common the introduction of an ambiguity makes > >> raises the bar considerably. > > > What ambiguity? > > My first thought about it was that > > select unnest('{1,2,3}'); > > would start failing. But it turns out it already does fail: > > ERROR: function unnest(unknown) is not unique > > You get that as a result of the recent introduction of unnest(tsvector), > which we debated a few weeks ago and seem to have decided to leave as-is. > But it failed before 9.6 too, with > > ERROR: could not determine polymorphic type because input has type "unknown" > > So at least in this particular case, adding unnest(jsonb) wouldn't be a > problem from the standpoint of not being able to resolve calls that we > could resolve before. > > Nonetheless, there *is* an ambiguity here, which is specific to json(b): > what type of array are you expecting to get? The reason we have both > json[b]_array_elements() and json[b]_array_elements_text() is that there > are plausible use-cases for returning either json or plain text. It's not > hard to imagine that somebody will want json[b]_array_elements_numeric() > before long, too. If you want to have an unnest(jsonb) then you will need > to make an arbitrary decision about which type it will return, and that > doesn't seem like an especially great idea to me. How about making casts work? UNNEST(jsonb)::NUMERIC or similar, whatever won't make the parser barf. > > UNNEST, and ROWS FROM have more capabilities including WITH ORDINALITY > > than the json_array_elements-like functions do. > > AFAICT, this is nonsense. We did not tie WITH ORDINALITY to UNNEST; > it works for any set-returning function. Oops. My mistake. Sorry about the noise. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety tagging of extension functions
On 06/01/2016 04:44 PM, Tom Lane wrote: Andreas Karlsson writes: It is the least ugly of all the ugly solutions I could think of. I have attached a patch which fixes the signatures using this method. I use CREATE OR REPLACE FUNCTION to update to catcache. What do you think? Is it too ugly? I don't understand why you think you need the CREATE OR REPLACE FUNCTION commands? We only need to change proargtypes, and the updates did that. The catcache can take care of itself. Maybe I did something wrong (I try to avoid doing manual catalog updates), but when I tested it, I needed to run the CREATE OR REPLACE FUNCTION command to later be able to set the parallel safety. See the example below. CREATE FUNCTION f(text) RETURNS int LANGUAGE SQL AS $$ SELECT 1 $$; BEGIN; UPDATE pg_proc SET proargtypes = array_to_string('{int,int}'::regtype[]::oid[], ' ')::oidvector WHERE oid = to_regprocedure('f(text)') AND pronamespace = current_schema()::regnamespace; ALTER FUNCTION f(int, int) STRICT; COMMIT; vs CREATE FUNCTION f(text) RETURNS int LANGUAGE SQL AS $$ SELECT 1 $$; BEGIN; UPDATE pg_proc SET proargtypes = array_to_string('{int,int}'::regtype[]::oid[], ' ')::oidvector WHERE oid = to_regprocedure('f(text)') AND pronamespace = current_schema()::regnamespace; CREATE OR REPLACE FUNCTION f(int, int) RETURNS int LANGUAGE SQL AS $$ SELECT 1 $$; ALTER FUNCTION f(int, int) STRICT; COMMIT; I think it would be good practice to be more careful about schema-qualifying all the pg_catalog table and type names. I do not think we generally schema qualify things in extension scripts and instead rely of the CREATE/ALTER EXTENSION commands to set the search path correct. Am I mistaken? I also think it's a bad idea to use to_regprocedure() rather than a cast to regprocedure. If the name isn't found, we want an error, not a silent NULL result leading to no update occurring. The reason I use to_regprocedure is so that these scripts work for people who have installed the extensions in beta1 and therefore only have the new signatures. If they run these scripts they would get an error if I used the cast. Is it ok if these scripts break for beta1 users? Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing overhead commands in parallel dump/restore
While testing parallel dump/restore over the past few days, I noticed that it seemed to do an awful lot of duplicative SET commands, which I traced to the fact that restore was doing _doSetFixedOutputState(AH) in the wrong place, ie, once per TOC entry not once per worker. Another thing that's useless overhead is that lockTableForWorker() is doing an actual SQL query to fetch the name of a table that we already have at hand. Poking around in this area also convinced me that it was pretty weird for CloneArchive to be managing encoding, and only encoding, when cloning a dump connection; that should be handled by setup_connection. I also noticed several unchecked strdup() calls that of course should be pg_strdup(). I put together the attached patch that cleans all this up. It's hard to show any noticeable performance difference, but the query log certainly looks cleaner. Any objections? regards, tom lane diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c index e9e8698..6a808dc 100644 *** a/src/bin/pg_dump/parallel.c --- b/src/bin/pg_dump/parallel.c *** IsEveryWorkerIdle(ParallelState *pstate) *** 802,808 static void lockTableForWorker(ArchiveHandle *AH, TocEntry *te) { - Archive*AHX = (Archive *) AH; const char *qualId; PQExpBuffer query; PGresult *res; --- 802,807 *** lockTableForWorker(ArchiveHandle *AH, To *** 813,845 query = createPQExpBuffer(); ! /* ! * XXX this is an unbelievably expensive substitute for knowing how to dig ! * a table name out of a TocEntry. ! */ ! appendPQExpBuffer(query, ! "SELECT pg_namespace.nspname," ! " pg_class.relname " ! " FROM pg_class " ! " JOIN pg_namespace on pg_namespace.oid = relnamespace " ! " WHERE pg_class.oid = %u", te->catalogId.oid); ! ! res = PQexec(AH->connection, query->data); ! ! if (!res || PQresultStatus(res) != PGRES_TUPLES_OK) ! exit_horribly(modulename, ! "could not get relation name for OID %u: %s\n", ! te->catalogId.oid, PQerrorMessage(AH->connection)); ! ! resetPQExpBuffer(query); ! ! qualId = fmtQualifiedId(AHX->remoteVersion, ! PQgetvalue(res, 0, 0), ! PQgetvalue(res, 0, 1)); appendPQExpBuffer(query, "LOCK TABLE %s IN ACCESS SHARE MODE NOWAIT", qualId); - PQclear(res); res = PQexec(AH->connection, query->data); --- 812,821 query = createPQExpBuffer(); ! qualId = fmtQualifiedId(AH->public.remoteVersion, te->namespace, te->tag); appendPQExpBuffer(query, "LOCK TABLE %s IN ACCESS SHARE MODE NOWAIT", qualId); res = PQexec(AH->connection, query->data); diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index ad8e132..32e1c82 100644 *** a/src/bin/pg_dump/pg_backup_archiver.c --- b/src/bin/pg_dump/pg_backup_archiver.c *** restore_toc_entries_postfork(ArchiveHand *** 3893,3898 --- 3893,3899 ropt->pghost, ropt->pgport, ropt->username, ropt->promptPassword); + /* re-establish fixed state */ _doSetFixedOutputState(AH); /* *** parallel_restore(ParallelArgs *args) *** 4071,4080 TocEntry *te = args->te; int status; - _doSetFixedOutputState(AH); - Assert(AH->connection != NULL); AH->public.n_errors = 0; /* Restore the TOC item */ --- 4072,4080 TocEntry *te = args->te; int status; Assert(AH->connection != NULL); + /* Count only errors associated with this TOC entry */ AH->public.n_errors = 0; /* Restore the TOC item */ *** CloneArchive(ArchiveHandle *AH) *** 4443,4452 --- 4443,4456 RestoreOptions *ropt = AH->public.ropt; Assert(AH->connection == NULL); + /* this also sets clone->connection */ ConnectDatabase((Archive *) clone, ropt->dbname, ropt->pghost, ropt->pgport, ropt->username, ropt->promptPassword); + + /* re-establish fixed state */ + _doSetFixedOutputState(clone); } else { *** CloneArchive(ArchiveHandle *AH) *** 4454,4460 char *pghost; char *pgport; char *username; - const char *encname; Assert(AH->connection != NULL); --- 4458,4463 *** CloneArchive(ArchiveHandle *AH) *** 4468,4485 pghost = PQhost(AH->connection); pgport = PQport(AH->connection); username = PQuser(AH->connection); - encname = pg_encoding_to_char(AH->public.encoding); /* this also sets clone->connection */ ConnectDatabase((Archive *) clone, dbname, pghost, pgport, username, TRI_NO); ! /* ! * Set the same encoding, whatever we set here is what we got from ! * pg_encoding_to_char(), so we really shouldn't run into an error ! * setting that very same value. Also see the comment in ! * SetupConnection(). ! */ ! PQsetClientEncoding(clone->connection, encname); } /* Let the
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On 6/1/16 9:27 AM, Tom Lane wrote: Kevin Grittner writes: > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas wrote: >> Figuring out what to do about it is harder. Your proposal seems to be >> to remove them except where we need the fuzzy behavior, which doesn't >> sound unreasonable, but I don't personally understand why we need it >> in some places and not others. > +1 > My first inclination is to remove those macros in version 10, but > it would be good to hear from some people using these types on what > the impact of that would be. As I understand it, the key problem is that tests like "is point on line" would basically never succeed except in the most trivial cases, because of roundoff error. That's not very nice, and it might cascade to larger problems like object-containment tests failing unexpectedly. We would need to go through all the geometric operations and figure out where that kind of gotcha is significant and what we can do about it. Seems like a fair amount of work :-(. If somebody's willing to do that kind of investigation, then sure, but I don't think just blindly removing these macros is going to lead to anything good. I suspect another wrinkle here is that in the GIS world a single point can be represented it multiple reference/coordinate systems, and it would have different values in each of them. AIUI the transforms between those systems can be rather complicated if different projection methods are involved. I don't know if PostGIS depends on what these macros are doing or not. If it doesn't, perhaps it would be sufficient to lop of the last few bits of the significand. ISTM that'd be much better than what the macros currently do. BTW, I suspect the macro values were chosen specifically for dealing with LAT/LONG. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety tagging of extension functions
Andreas Karlsson writes: > It is the least ugly of all the ugly solutions I could think of. I have > attached a patch which fixes the signatures using this method. I use > CREATE OR REPLACE FUNCTION to update to catcache. What do you think? Is > it too ugly? I don't understand why you think you need the CREATE OR REPLACE FUNCTION commands? We only need to change proargtypes, and the updates did that. The catcache can take care of itself. I think it would be good practice to be more careful about schema-qualifying all the pg_catalog table and type names. I also think it's a bad idea to use to_regprocedure() rather than a cast to regprocedure. If the name isn't found, we want an error, not a silent NULL result leading to no update occurring. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does people favor to have matrix data type?
On 5/30/16 9:05 PM, Kouhei Kaigai wrote: Due to performance reason, location of each element must be deterministic without walking on the data structure. This approach guarantees we can reach individual element with 2 steps. Agreed. On various other points... Yes, please keep the discussion here, even when it relates only to PL/R. Whatever is being done for R needs to be done for plpython as well. I've looked at ways to improve analytics in plpython related to this, and it looks like I need to take a look at the fast-path function stuff. One of the things I've pondered for storing ndarrays in Postgres is how to reduce or eliminate the need to copy data from one memory region to another. It would be nice if there was a way to take memory that was allocated by one manager (ie: python's) and transfer ownership of that memory directly to Postgres without having to copy everything. Obviously you'd want to go the other way as well. IIRC cython's memory manager is the same as palloc in regard to very large allocations basically being ignored completely, so this should be possible in that case. One thing I don't understand is why this type needs to be limited to 1 or 2 dimensions? Isn't the important thing how many individual elements you can fit into GPU? So if you can fit a 1024x1024, you could also fit a 100x100x100, a 32x32x32x32, etc. At low enough values maybe that stops making sense, but I don't see why there needs to be an artificial limit. I think what's important for something like kNN is that the storage is optimized for this, which I think means treating the highest dimension as if it was a list. I don't know if it then matters whither the lower dimensions are C style vs FORTRAN style. Other algorithms might want different storage. Something else to consider is the 1G toast limit. I'm pretty sure that's why MADlib stores matricies as a table of vectors. I know for certain it's a problem they run into, because they've discussed it on their mailing list. BTW, take a look at MADlib svec[1]... ISTM that's just a special case of what you're describing with entire grids being zero (or vice-versa). There might be some commonality there. [1] https://madlib.incubator.apache.org/docs/v1.8/group__grp__svec.html -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
Kevin Grittner writes: > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas wrote: >> Figuring out what to do about it is harder. Your proposal seems to be >> to remove them except where we need the fuzzy behavior, which doesn't >> sound unreasonable, but I don't personally understand why we need it >> in some places and not others. > +1 > My first inclination is to remove those macros in version 10, but > it would be good to hear from some people using these types on what > the impact of that would be. As I understand it, the key problem is that tests like "is point on line" would basically never succeed except in the most trivial cases, because of roundoff error. That's not very nice, and it might cascade to larger problems like object-containment tests failing unexpectedly. We would need to go through all the geometric operations and figure out where that kind of gotcha is significant and what we can do about it. Seems like a fair amount of work :-(. If somebody's willing to do that kind of investigation, then sure, but I don't think just blindly removing these macros is going to lead to anything good. Also, I suppose this means that Robert promises not to make any of his usual complaints about breaking compatibility? Because we certainly would be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas wrote: > On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeli wrote: >> There are those macros defined for the built-in geometric types: >> >>> #define EPSILON 1.0E-06 >> >>> #define FPzero(A) (fabs(A) <= EPSILON) >>> #define FPeq(A,B) (fabs((A) - (B)) <= EPSILON) >>> #define FPne(A,B) (fabs((A) - (B)) > EPSILON) >>> #define FPlt(A,B) ((B) - (A) > EPSILON) >>> #define FPle(A,B) ((A) - (B) <= EPSILON) >>> #define FPgt(A,B) ((A) - (B) > EPSILON) >>> #define FPge(A,B) ((B) - (A) <= EPSILON) >> >> with this warning: >> >>> *XXX These routines were not written by a numerical analyst. > > I agree that those macros looks like a pile of suck. +1 > It's unclear to > me what purpose they're trying to accomplish, but regardless of what > it is, it's hard for me to believe that they are accomplishing it. > Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on > the scale of the number; e.g. if the values are all like "3" it's > probably fine but if they are like "37142816124856" it's probably not > enough fuzz and if they are all like ".0004" it's probably way too > much fuzz. Also, it's a pretty lame heuristic. It doesn't really eliminate *any* problems; it just aims to make them less frequent by shifting the edges around. In doing so, it creates whole new classes of problems: test=# \set A '''(1.996,1.996),(1,1)''::box' test=# \set B '''(2,2),(1,1)''::box' test=# \set C '''(2.004,2.004),(1,1)''::box' test=# select :A = :B, :B = :C, :A = :C; ?column? | ?column? | ?column? --+--+-- t| t| f (1 row) Is the benefit we get from the macros worth destroying the transitive property of the comparison operators on these types? > Figuring out what to do about it is harder. Your proposal seems to be > to remove them except where we need the fuzzy behavior, which doesn't > sound unreasonable, but I don't personally understand why we need it > in some places and not others. +1 My first inclination is to remove those macros in version 10, but it would be good to hear from some people using these types on what the impact of that would be. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
Amit Kapila writes: > Your explanation is clear, however the name max_parallel_workers makes it > sound like that parallelising an operation is all about workers. Yes it > depends a lot on the number of workers allocated for parallel operation, > but that is not everything. I think calling it max_parallelism as > suggested by Alvaro upthread suits better than max_parallel_workers. I don't think that's a good direction at all. This entire discussion is caused by the fact that it's not very clear what "max_parallel_degree" measures. Fixing that problem by renaming the variable to something that doesn't even pretend to tell you what it's counting is not an improvement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question and suggestion about application binary compatibility policy
> However, the problem I pointed out is that when the new library is > incompatible with the older one, say the major version of libpq.dll > changes from 5 to 6, the application user and/or developer cannot > notice the incompatibility immediately and easily. On Unix/Linux, > the application fails to start because the older library is not > found. On the other hand, the application will start on Windows and > probably cause difficult trouble due to the incompatibility. I don't think this is a very good situation, but I have no idea if this can be solved. However, I'd prefer a technical solution over a documentation one. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array of domain types
On 1 June 2016 at 14:20, Konstantin Knizhnik wrote: > I wonder why domain types can not be used for specification of array > element: > > create domain objref as bigint; > create table foo(x objref[]); > ERROR: type "objref[]" does not exist > create table foo(x bigint[]); > CREATE TABLE > > Is there some principle problem here or it is just not implemented? It's not implemented, but patches welcome. Thom
[HACKERS] array of domain types
I wonder why domain types can not be used for specification of array element: create domain objref as bigint; create table foo(x objref[]); ERROR: type "objref[]" does not exist create table foo(x bigint[]); CREATE TABLE Is there some principle problem here or it is just not implemented? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT
On Wed, Jun 1, 2016 at 9:00 AM, Robert Haas wrote: > On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer > wrote: > > On 1 June 2016 at 11:48, Michael Paquier > wrote: > >> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD > >> and back-branches? > > > > Sounds sensible to me. > > +1 > I don't really want to set a precedent that we'll back-patch > PGDLLIMPORT markings every time somebody needs a new symbol for some > extension they are writing > [...] > > -1 David J.
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeli wrote: > There are those macros defined for the built-in geometric types: > >> #define EPSILON 1.0E-06 > >> #define FPzero(A) (fabs(A) <= EPSILON) >> #define FPeq(A,B) (fabs((A) - (B)) <= EPSILON) >> #define FPne(A,B) (fabs((A) - (B)) > EPSILON) >> #define FPlt(A,B) ((B) - (A) > EPSILON) >> #define FPle(A,B) ((A) - (B) <= EPSILON) >> #define FPgt(A,B) ((A) - (B) > EPSILON) >> #define FPge(A,B) ((B) - (A) <= EPSILON) > > with this warning: > >> *XXX These routines were not written by a numerical analyst. I agree that those macros looks like a pile of suck. It's unclear to me what purpose they're trying to accomplish, but regardless of what it is, it's hard for me to believe that they are accomplishing it. Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on the scale of the number; e.g. if the values are all like "3" it's probably fine but if they are like "37142816124856" it's probably not enough fuzz and if they are all like ".0004" it's probably way too much fuzz. Figuring out what to do about it is harder. Your proposal seems to be to remove them except where we need the fuzzy behavior, which doesn't sound unreasonable, but I don't personally understand why we need it in some places and not others. It would be good if some of the people who are more numerically inclined than I am (and hate floats less, but then that's everyone) could jump in here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostmasterPid not marked with PGDLLIMPORT
On Wed, Jun 1, 2016 at 12:06 AM, Craig Ringer wrote: > On 1 June 2016 at 11:48, Michael Paquier wrote: >> Could it be possible to mark PostmasterPid with PGDLLIMPORT on HEAD >> and back-branches? > > Sounds sensible to me. I don't really want to set a precedent that we'll back-patch PGDLLIMPORT markings every time somebody needs a new symbol for some extension they are writing, but I don't mind changing this in master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename synchronous_standby_names?
On Wed, Jun 1, 2016 at 9:49 AM, Michael Paquier wrote: > On Wed, Jun 1, 2016 at 3:56 AM, David G. Johnston > wrote: >> On Tue, May 31, 2016 at 2:19 PM, Peter Eisentraut >> wrote: >>> >>> On 5/31/16 1:47 PM, Jaime Casanova wrote: Are we going to change synchronous_standby_names? Certainly the GUC is not *only* a list of names anymore. synchronous_standby_config? synchronous_standbys (adjust to correct english if necesary)? >>> >>> >>> If the existing values are still going to be accepted, then I would leave >>> it as is. >> >> >> +1 > > +1. We've made quite a lot of deal to take an approach for the N-sync > that is 100% backward-compatible, it would be good to not break that > effort. +1 -- Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COMMENT ON, psql and access methods
As far as I can see, COMMENT ON has no support for access methods. Wouldn't we want to add it as it is created by a command? On top of that, perhaps we could have a backslash command in psql to list the supported access methods, like \dam[S]? The system methods would be in this case all the in-core ones. +1. Are there other opinions? That's not a 9.6 blocker IMO, so I could get patches out for 10.0 if needed. I missed that on review process, but I'm agree it should be implemented. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Tue, May 31, 2016 at 11:30 PM, Tom Lane wrote: > > I wrote: > > I really think that a GUC named "max_parallel_workers", which in fact > > limits the number of workers and not something else, is the way to go. > > To be concrete, I suggest comparing the attached documentation patch > with Robert's. Which one is more understandable? > Your explanation is clear, however the name max_parallel_workers makes it sound like that parallelising an operation is all about workers. Yes it depends a lot on the number of workers allocated for parallel operation, but that is not everything. I think calling it max_parallelism as suggested by Alvaro upthread suits better than max_parallel_workers. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Change in order of criteria - reg
On 2016/06/01 13:07, sri harsha wrote: > Hi, > > In PostgreSQL , does the order in which the criteria is given matter ?? > For example > > Query 1 : Select * from TABLE where a > 5 and b < 10; > > Query 2 : Select * from TABLE where b <10 and a > 5; > > Are query 1 and query 2 the same in PostgreSQL or different ?? If its > different , WHY ?? tl;dr they are the same. As in they obviously produce the same result and result in invoking the same plan. Internally, optimizer will order application of those quals in resulting plan based on per-tuple cost of individual quals. So a cheaper, more selective qual might result in short-circuiting of relatively expensive quals for a large number of rows in the table saving some cost in run-time. Also, if index scan is chosen and quals pushed down, the underlying index method might know to order quals smartly. However, the cost-markings of operators/functions involved in quals better match reality. By default, most operators/functions in a database are marked with cost of 1 unit. Stable sorting used in ordering of quals would mean the order of applying quals in resulting plan matches the original order (ie, the order in which they appear in the query). So, if the first specified qual really happens to be an expensive qual but marked as having the same cost as other less expensive quals, one would have to pay the price of evaluating it for all the rows. Whereas, correctly marking the costs could have avoided that (as explained above). Note that I am not suggesting that ordering quals in query by their perceived cost is the solution. Keep optimizer informed by setting costs appropriately and it will do the right thing more often than not. :) Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reviewing freeze map code
On Sat, May 7, 2016 at 5:34 AM, Robert Haas wrote: > On Mon, May 2, 2016 at 8:25 PM, Andres Freund wrote: >> + * heap_tuple_needs_eventual_freeze >> + * >> + * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac) >> + * will eventually require freezing. Similar to heap_tuple_needs_freeze, >> + * but there's no cutoff, since we're trying to figure out whether freezing >> + * will ever be needed, not whether it's needed now. >> + */ >> +bool >> +heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple) >> >> Wouldn't redefining this to heap_tuple_is_frozen() and then inverting the >> checks be easier to understand? > > I thought it much safer to keep this as close to a copy of > heap_tuple_needs_freeze() as possible. Copying a function and > inverting all of the return values is much more likely to introduce > bugs, IME. I agree. >> + /* >> +* If xmax is a valid xact or multixact, this tuple is also not >> frozen. >> +*/ >> + if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) >> + { >> + MultiXactId multi; >> + >> + multi = HeapTupleHeaderGetRawXmax(tuple); >> + if (MultiXactIdIsValid(multi)) >> + return true; >> + } >> >> Hm. What's the test inside the if() for? There shouldn't be any case >> where xmax is invalid if HEAP_XMAX_IS_MULTI is set. Now there's a >> check like that outside of this commit, but it seems strange to me >> (Alvaro, perhaps you could comment on this?). > > Here again I was copying existing code, with appropriate simplifications. > >> + * >> + * Clearing both visibility map bits is not separately WAL-logged. The >> callers >> * must make sure that whenever a bit is cleared, the bit is cleared on WAL >> * replay of the updating operation as well. >> >> I think including "both" here makes things less clear, because it >> differentiates clearing one bit from clearing both. There's no practical >> differentce atm, but still. > > I agree. Fixed. >> * >> * VACUUM will normally skip pages for which the visibility map bit is set; >> * such pages can't contain any dead tuples and therefore don't need >> vacuuming. >> - * The visibility map is not used for anti-wraparound vacuums, because >> - * an anti-wraparound vacuum needs to freeze tuples and observe the latest >> xid >> - * present in the table, even on pages that don't have any dead tuples. >> * >> >> I think the remaining sentence isn't entirely accurate, there's now more >> than one bit, and they're different with regard to scan_all/!scan_all >> vacuums (or will be - maybe this updated further in a later commit? But >> if so, that sentence shouldn't yet be removed...). > > We can adjust the language, but I don't really see a big problem here. This comment is not incorporate this patch so far. >> -/* Number of heap blocks we can represent in one byte. */ >> -#define HEAPBLOCKS_PER_BYTE 8 >> - >> Hm, why was this moved to the header? Sounds like something the outside >> shouldn't care about. > > Oh... yeah. Let's undo that. Fixed. >> #define HEAPBLK_TO_MAPBIT(x) (((x) % HEAPBLOCKS_PER_BYTE) * >> BITS_PER_HEAPBLOCK) >> >> Hm. This isn't really a mapping to an individual bit anymore - but I >> don't really have a better name in mind. Maybe TO_OFFSET? > > Well, it sorta is... but we could change it, I suppose. > >> +static const uint8 number_of_ones_for_visible[256] = { >> ... >> +}; >> +static const uint8 number_of_ones_for_frozen[256] = { >> ... >> }; >> >> Did somebody verify the new contents are correct? > > I admit that I didn't. It seemed like an unlikely place for a goof, > but I guess we should verify. >> /* >> - * visibilitymap_clear - clear a bit in visibility map >> + * visibilitymap_clear - clear all bits in visibility map >> * >> >> This seems rather easy to misunderstand, as this really only clears all >> the bits for one page, not actually all the bits. > > We could change "in" to "for one page in the". Fixed. >> * the bit for heapBlk, or InvalidBuffer. The caller is responsible for >> - * releasing *buf after it's done testing and setting bits. >> + * releasing *buf after it's done testing and setting bits, and must pass >> flags >> + * for which it needs to check the value in visibility map. >> * >> * NOTE: This function is typically called without a lock on the heap page, >> * so somebody else could change the bit just after we look at it. In fact, >> @@ -327,17 +351,16 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, >> Buffer heapBuf, >> >> I'm not seing what flags the above comment change is referring to? > > Ugh. I think that's leftover cruft from an earlier patch version that > should have been excised from what got committed. Fixed. >> /* >> -* A single-bit read is atomic. There could be memory-ordering >> effects >> +* A single byte read is atomic. There could be memory-ordering >> effects >> * here, b