Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2
On Sat, Aug 13, 2016 at 11:10 AM, Rushabh Lathia wrote: > Hi All, > > Recently while running tpc-h queries on postgresql master branch, I am > noticed > random server crash. Most of the time server crash coming while turn tpch > query > number 3 - (but its very random). > > > Here its clear that work_instrument is either corrupted or Un-inililized > that is the > reason its ending up with server crash. > > With bit more debugging and looked at git history I found that issue started > coming > with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext() > calls > ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers() > calls ExecParallelFinish() which collects the instrumentation before marking > ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do the > allocation > of planstate->worker_instrument. > > With commit af33039317 now we calling the gather_readnext() with per-tuple > context, > but with nreader == 0 with ExecShutdownGatherWorkers() we end up with > allocation > of planstate->worker_instrument into per-tuple context - which is wrong. > > Now fix can be: > > 1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and > let > ExecEndGather() do that things. > I don't think we can wait till ExecEndGather() to collect statistics, as we need it before that for explain path. However, we do call ExecShutdownNode() from ExecutePlan() when there are no more tuples which can take care of ensuring the shutdown of Gather node. I think the advantage of calling it in gather_readnext() is that it will resources to be released early and populating the instrumentation/statistics as early as possible. > But with this change, gather_readread() and > gather_getnext() depend on planstate->reader structure to continue reading > tuple. > Now either we can change those condition to be depend on planstate->nreaders > or > just pfree(planstate->reader) into gather_readnext() instead of calling > ExecShutdownGatherWorkers(). > > > Attaching patch, which fix the issue with approach 1). > AFAICS, your patch seems to be the right fix for this issue, unless we need the instrumentation information during execution (other than for explain) for some purpose. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new autovacuum criterion for visible pages
On Thu, Aug 11, 2016 at 9:29 PM, Jeff Janes wrote: > On Thu, Aug 11, 2016 at 8:32 AM, Amit Kapila wrote: >> On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes wrote: >>> I wanted to create a new relopt named something like >>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to >>> vacuum a table once less than a certain fraction of the relation's >>> pages are marked allvisible. >>> >> >> Why would it more convenient for a user to set such a parameter as >> compare to existing parameters (autovacuum_vacuum_threshold + >> autovacuum_vacuum_scale_factor)? > > Insertions and HOT-updates clear vm bits but don't increment the > counters that those existing parameters are compared to. > > Also, the relationship between number of updated/deleted rows and the > number of hint-bits cleared can be hard to predict due to possible > clustering of the updates into the same blocks. So it would be hard > to know what to set the values to. > Okay. What I was slightly worried about was that how many users can understand *pagevisible_* parameters as compare to what we have now (number of updated/deleted tuples). However if we have some mechanism where autovacuum can be triggered automatically based on pagevisibility, then I think that would be quite beneficial (not sure, if such a mechanism can be feasible). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling unconvertible error messages
Victor>We don't have 190 message catalog translations in the PostgreSQL. Victor>So problem with encoding for messages is quite limited. Even though the number of translations is limited, there's a problem when trying to tell one "one-byte-encoding" from another "one-byte" one. It would be so much better if ServerErrorMessages included encoding right in the message itself. For pgjdbc, I've implemented a workaround that relies on the following: 1) It knows how "FATAL" looks like in several translations, and it knows often used encodings in those translations. For instance, for Russian it tries CP1251, KOI8, and ALT encodings. It converts "ВАЖНО" (Russian for FATAL) using those three encodings and searches that byte sequence in the error message. If there's a match, then the encoding is identified. 2) Unfortunately, it does not help for Japanese, as "FATAL there is translated as FATAL". So I hard-coded several typical words like "database", "user", "role" (see [1]), so if those byte sequences are present, the message is assumed to be in Japanese. It would be great if someone could review those as I do not speak Japanese. 3) Then it tries different LATIN encodings. Here's the commit https://github.com/pgjdbc/pgjdbc/commit/ec5fb4f5a66b6598aea1c7ab8df3126ee77d15e2 Kyotaro> Is there any source to know the compatibility for any combination Kyotaro> of language vs encoding? Maybe we need a ground for the list. I use "locale -a" for that. For instance, for Japanese it prints the following on my machine (OS X 10.11.6): locale -a | grep ja ja_JP ja_JP.eucJP ja_JP.SJIS ja_JP.UTF-8 [1]: https://github.com/pgjdbc/pgjdbc/commit/ec5fb4f5a66b6598aea1c7ab8df3126ee77d15e2#diff-57ed15f90f50144391f1c134bf08a45cR47 Vladimir
Re: [HACKERS] handling unconvertible error messages
On Sat, 13 Aug 2016 09:24:47 + Vladimir Sitnikov wrote: > Victor>We don't have 190 message catalog translations in the > Victor>PostgreSQL. So problem with encoding for messages is quite > Victor>limited. > > Even though the number of translations is limited, there's a problem > when trying to tell one "one-byte-encoding" from another "one-byte" > one. It would be so much better if ServerErrorMessages included > encoding right in the message itself. I think it is better to avoid such a problem and fix system so server would never send a message in the encoding, different from client one. It is not a client job to convert encodings. In most cases server does know which encoding client requests from the very first protocol message. (if it is startup message). So, server can easily tell if it is able to convert NLS messages into the client desired encoding, and if not - fall back to untranslated messages. -- Victor Wagner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
Shay>I don't know much about the Java world, but both pgbouncer and pgpool (the major pools?) In Java world, https://github.com/brettwooldridge/HikariCP is a very good connection pool. Neither pgbouncer nor pgpool is required. The architecture is: application <=> HikariCP <=> pgjdbc <=> PostgreSQL The idea is HikariCP pools pgjdbc connections, and server-prepared statements are per-pgjdbc connection, so there is no reason to "discard all" or "deallocate all" or whatever. Shay>send DISCARD ALL by default. That is a fact, and it has nothing to do with any bugs or issues pgbouncer may have. That is configurable. If pgbouncer/pgpool defaults to "wrong configuration", why should we (driver developers, backend developers) try to accommodate that? Shay> What? Do you mean you do implicit savepoints and autorollback too? I mean that. On top of that it enables opt-in psql-like ON_ERROR_ROLLBACK functionality so it could automatically roll back the latest statement if it failed. For instance, that might simplify migration from other DBs that have the same "rollback just one statement on error" semantics by default. Shay>How does the driver decide when to do a savepoint? By default it would set a savepoint in a case when there's open transaction, and the query to be executed has been previously described. In other words, the default mode is to protect user from "cached plan cannot change result type" error. Shay>Is it on every single command? Exactly (modulo the above). If user manually sets "autosave=always", every command would get prepended with a savepoint and rolled back. Shay>f you do a savepoint on every single command, that surely would impact performance even without extra roundtrips...? My voltmeter says me that the overhead is just 3us for a simple "SELECT" statement (see https://github.com/pgjdbc/pgjdbc/pull/477#issuecomment-239579547 ). I think it is acceptable to have it enabled by default, however it would be nice if the database did not require a savepoint dance to heal its "not implemented" "cache query cannot change result type" error. Shay>I'm not aware of other drivers that implicitly prepare statements, Shay >and definitely of no drivers that internally create savepoints and Shay> roll the back without explicit user APIs Glad to hear that. Are you aware of other drivers that translate "insert into table(a,b,c) values(?,?,?)" =into=> "insert into table(a,b,c) values(?,?,?),(?,?,?),...,(?,?,?)" statement on the fly? That gives 2-3x performance boost (that includes client result processing as well!) for batched inserts since "bind/exec" message processing is not that fast. That is why I say that investing into "bind/exec performance" makes more sense than investing into "improved execution of non-prepared statements". Shay>you're doing something very different - not necessarily wrong - and not Shay>attempt to impose your ideas on everyone as if it's the only true way Shay>to write a db driver. 1) Feel free to pick ideas you like 2) I don't say "it is the only true way". I would change my mind if someone would suggest better solution. Everybody makes mistakes, and I have no problem with admitting that "I made a mistake" and moving forward. They say: "Don't cling to a mistake just because you spent a lot of time making it" However, no-one did suggest a case when application issues lots of unique SQL statements. The suggested alternative "a new message for non-prepared extended query" might shave 5-10us per query for those who are lazy to implement a query cache. However, that is just 5-10ms per 1000 queries. Would that be noticeable by the end-users? I don't think so. Having a query cache can easily shave 5-10+ms for each query, that is why I suggest driver developers to implement a query cache first, and only then ask for new performance-related messages. 3) For "performance related" issues, a business case and a voltmeter is required to prove there's an issue. Shay>But the second query, which should be invalidated, has already been Shay>sent to the server (because of batching), and boom -- Doctor, it hurts me when I do that -- Don't do that When doing batched SQL, some of the queries might fail with "duplicate key", or "constraint violation". There's already API that covers those kind of cases and reports which statements did succeed (if any). In the case as you described (one query in a batch somehow invalidates the subsequent one) it would just resort to generic error handling. Shay>When would you send this ValidatePreparedStatement? Shay>Before each execution as a separate roundtrip? Shay>That would kill performance. Why do you think the performance would degrade? Once again: the current problem is the client thinks it knows "which columns will be returned by a particular server-prepared statement" but the table might get change behind the scenes (e.g. online schema upgrade), so the error occurs. That "return type" is already validated by the database (the time is al
Re: [HACKERS] Slowness of extended protocol
On 11 August 2016 at 10:18, Shay Rojansky wrote: > > > On Thu, Aug 11, 2016 at 1:22 PM, Vladimir Sitnikov < > sitnikov.vladi...@gmail.com> wrote: > > 2) The driver can use safepoints and autorollback to the good "right >> before failure" state in case of a known failure. Here's the >> implementation: https://github.com/pgjdbc/pgjdbc/pull/477 >> > As far as I can remember, performance overheads are close to zero (no >> extra roundtrips to create a safepoint) >> > > What? Do you mean you do implicit savepoints and autorollback too? How > does the driver decide when to do a savepoint? Is it on every single > command? If not, commands can get lost when an error is raised and you > automatically roll back? If you do a savepoint on every single command, > that surely would impact performance even without extra roundtrips...? > > You seem to have a very "unique" idea of what a database driver should do > under-the-hood for its users. At the very least I can say that your concept > is very far from almost any database driver I've seen up to now (PostgreSQL > JDBC, psycopg, Npgsql, libpq...). I'm not aware of other drivers that > implicitly prepare statements, and definitely of no drivers that internally > create savepoints and roll the back without explicit user APIs. At the very > least you should be aware (and also clearly admit!) that you're doing > something very different - not necessarily wrong - and not attempt to > impose your ideas on everyone as if it's the only true way to write a db > driver. > A number of other drivers default to this behaviour, including at least MS-SQL and Oracle. psqlODBC also supports this behaviour with statement rollback mode. And obviously PostgreSQL JDBC which Vladimir is referring to. Dave Cramer da...@postgresintl.com www.postgresintl.com
Re: [HACKERS] handling unconvertible error messages
Victor>It is not a client job to convert encodings. Of course. However, there is a vast amount of old PG versions deployed in the wild that send wrong data to clients. This indeed makes bad user experience, so it is worth doing 2 things: 1) Implement proper solution in further PostgreSQL versions (e.g. include encoding name right into the error message). 2) Implement workaround for current drivers, so clients would get proper error messages even when trying to connect to unpatched server. Vladimir
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
Latest patch. Names and scopes are as per discussion. New files for code and regression test. Docs included. -- Andrew (irc:RhodiumToad) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7830334..4552a74 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -16290,6 +16290,18 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); +pg_indexam_has_property + + + +pg_index_column_has_property + + + +pg_index_has_property + + + pg_options_to_table @@ -16477,6 +16489,21 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); number of columns, pretty-printing is implied + pg_indexam_has_property(am_oid, prop_name) + boolean + Test whether an index access method has a specified property + + + pg_index_column_has_property(index_oid, column_no, prop_name) + boolean + Test whether an index column has a specified property + + + pg_index_has_property(index_oid, prop_name) + boolean + Test whether the access method for the specified index has a specified property + + pg_options_to_table(reloptions) setof record get the set of storage option name/value pairs @@ -16620,6 +16647,141 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); + pg_indexam_has_property, + pg_index_has_property, and + pg_index_column_has_property return whether the + specified access method, index, or index column possesses the named + property. NULL is returned if the property name is not + known; true if the property is present, + false if it is not. Refer to +for column properties, +for index properties, and +for access method properties. + + + + Index Column Properties + + + NameDescription + + + + asc + Does the column sort in ascending order on a forward scan? + + + + desc + Does the column sort in descending order on a forward scan? + + + + nulls_first + Does the column sort with nulls first on a forward scan? + + + + nulls_last + Does the column sort with nulls last on a forward scan? + + + + orderable + Does the column possess any ordering properties such + as ASC or DESC + + + distance_orderable + Can the column be returned in order by a "distance" operator, + for example ORDER BY col <-> constant + + + returnable + Can the column value be returned in an index-only scan? + + + + search_array + Does the column support array queries with ANY + natively in the index AM? + + + search_nulls + Does the column support IS NULL or + IS NOT NULL conditions in the index? + + + + + + + + Index Properties + + + NameDescription + + + + clusterable + Can this index be used in a CLUSTER operation? + + + + backward_scan + Can this index be scanned in the reverse direction? + + + + index_scan + Does this index support plain (non-bitmap) scans? + + + + bitmap_scan + Does this index support bitmap scans? + + + + + + + + Index Access Method Properties + + + NameDescription + + + + can_order + Does this access method support ASC, + DESC and related keywords on columns in + CREATE INDEX? + + + + can_unique + Does this access method support + CREATE UNIQUE INDEX? + + + + can_multi_col + Does this access method support multiple columns? + + + + can_exclude + Does this access method support exclusion constraints? + + + + + + + pg_options_to_table returns the set of storage option name/value pairs (option_name/option_value) when passed diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c index e8034b9..fece954 100644 --- a/src/backend/access/gist/gist.c +++ b/src/backend/access/gist/gist.c @@ -16,11 +16,14 @@ #include "access/gist_private.h" #include "access/gistscan.h" +#include "access/htup_details.h" #include "catalog/pg_collation.h" +#include "catalog/pg_opclass.h" #include "miscadmin.h" #include "utils/index_selfuncs.h" #include "utils/memutils.h" #include "utils/rel.h" +#include "utils/syscache.h" /* non-export function prototypes */ @@ -87,6 +90,7 @@ gisthandler(PG_FUNCTION_ARGS) amroutine->amendscan = gistendscan; amroutine->ammarkpos = NULL; amroutine->amrestrpos = NULL; + amroutine->amproperty = gistproperty; PG_RETURN_POINTER(amroutine); } @@ -1490,6 +1494,98 @@ freeGISTstate(GISTSTATE *giststate) MemoryContextDelete(giststate->scanCxt); } + +/* + * gistproperty() -
Re: [HACKERS] Logical Replication WIP
On 08/05/2016 11:00 AM, Petr Jelinek wrote: Hi, as promised here is WIP version of logical replication patch. Thanks for keeping on this. This is important work Feedback is welcome. + + Publication + +A Publication object can be defined on any master node, owned by one +user. A Publication is a set of changes generated from a group of +tables, and might also be described as a Change Set or Replication Set. +Each Publication exists in only one database. 'A publication object can be defined on *any master node*'. I found this confusing the first time I read it because I thought it was circular (what makes a node a 'master' node? Having a publication object published from it?). On reflection I realized that you mean ' any *physical replication master*'. I think this might be better worded as 'A publication object can be defined on any node other than a standby node'. I think referring to 'master' in the context of logical replication might confuse people. I am raising this in the context of the larger terminology that we want to use and potential confusion with the terminology we use for physical replication. I like the publication / subscription terminology you've gone with. +Publications are different from table schema and do not affect +how the table is accessed. Each table can be added to multiple +Publications if needed. Publications may include both tables +and materialized views. Objects must be added explicitly, except +when a Publication is created for "ALL TABLES". There is no +default name for a Publication which specifies all tables. + + +The Publication is different from table schema, it does not affect +how the table is accessed and each table can be added to multiple Those 2 paragraphs seem to start the same way. I get the feeling that there is some point your trying to express that I'm not catching onto. Of course a publication is different than a tables schema, or different than a function. The definition of publication you have on the CREATE PUBLICATION page seems better and should be repeated here (A publication is essentially a group of tables intended for managing logical replication. See Section 30.1 for details about how publications fit into logical replication setup. ) + +Conflicts happen when the replicated changes is breaking any +specified constraints (with the exception of foreign keys which are +not checked). Currently conflicts are not resolved automatically and +cause replication to be stopped with an error until the conflict is +manually resolved. What options are there for manually resolving conflicts? Is the only option to change the data on the subscriber to avoid the conflict? I assume there isn't a way to flag a particular row coming from the publisher and say ignore it. I don't think this is something we need to support for the first version. + Architecture + +Logical replication starts by copying a snapshot of the data on +the Provider database. Once that is done, the changes on Provider I notice the user of 'Provider' above do you intend to update that to 'Publisher' or does provider mean something different. If we like the 'publication' terminology then I think 'publishers' should publish them not providers. I'm trying to test a basic subscription and I do the following I did the following: cluster 1: create database test1; create table a(id serial8 primary key,b text); create publication testpub1; alter publication testpub1 add table a; insert into a(b) values ('1'); cluster2 create database test1; create table a(id serial8 primary key,b text); create subscription testsub2 publication testpub1 connection 'host=localhost port=5440 dbname=test1'; NOTICE: created replication slot "testsub2" on provider NOTICE: synchronized table states CREATE SUBSCRIPTION This resulted in LOG: logical decoding found consistent point at 0/15625E0 DETAIL: There are no running transactions. LOG: exported logical decoding snapshot: "0494-1" with 0 transaction IDs LOG: logical replication apply for subscription testsub2 started LOG: starting logical decoding for slot "testsub2" DETAIL: streaming transactions committing after 0/1562618, reading WAL from 0/15625E0 LOG: logical decoding found consistent point at 0/15625E0 DETAIL: There are no running transactions. LOG: logical replication sync for subscription testsub2, table a started LOG: logical decoding found consistent point at 0/1562640 DETAIL: There are no running transactions. LOG: exported logical decoding snapshot: "0495-1" with 0 transaction IDs LOG: logical replication synchronization worker finished processing The initial sync completed okay, then I did insert into a(b) values ('2'); but the second insert never replicated. I had the following output LOG: terminating walsender process due to replication timeout On cluster 1 I do select
Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta
Michael Paquier wrote: > Being cautious pays more in the long term, so seeing the number of > bugs that showed up I'd rather vote for having it disabled by default > in 9.6 stable, and enabled on master to aim at enabling it in 10.0. I too prefer to keep it turned off in 9.6 and consider enabling it by default on a future release (10 is probably good). Interested users can carefully test the feature without endangering other unsuspecting users. I agree with the idea of keeping it enabled in master, so that it'll get a modicum of testing there by hackers, too. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling unconvertible error messages
Victor Wagner writes: > I think it is better to avoid such a problem and fix system so server > would never send a message in the encoding, different from client one. Don't hold your breath waiting for that to happen. Quite aside from the question of whether we want to trust GUC settings from the startup packet before we've authenticated the user, there's a small problem that the server *can't* translate *any* encoding until it's successfully connected to a database and is able to read the pg_conversion catalog. We might be able to institute some rule like "examine the startup packet and see if it specifies a client_encoding different from what we inherited from the postmaster. If not, continue with current behavior (send messages localized per postmaster's settings). If so, fall back to English messages/C locale until startup is complete." This would preserve current functionality in cases where it actually, er, functions, while giving something at least passable in the cases that are broken today. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling unconvertible error messages
Tom> while giving something at least passable in the cases that are broken today. Would you mind adding an explicit "encoding" field to the error message? At least it would give clear explanation how to parse that message without resorting to a guess dance. The biggest problem is client has no idea how to parse backend error messages. If implementing client_encoding properly is too hard at this point in time, then I would rather have "encoding field" right in the startup error message. That "encoding" field would enable sending properly localized messages in the future if "pre-connect client_encoding" would be implemented somehow. Vladimir
Re: [HACKERS] handling unconvertible error messages
On Sat, 13 Aug 2016 12:02:30 -0400 Tom Lane wrote: > Victor Wagner writes: > > I think it is better to avoid such a problem and fix system so > > server would never send a message in the encoding, different from > > client one. > > Don't hold your breath waiting for that to happen. > > Quite aside from the question of whether we want to trust GUC settings > from the startup packet before we've authenticated the user, there's a What's wrong with trusting this particular setting? I cannot think of any meaningful exploit. Really, there are lot of http servers out there, which typically do accept connections from anywhere (which is seldom case for postgresql servers) and trust Accept-Charset and Accept-Language client header without any authentication. There can be attacks that exploits errors in the message parsing, but startup message is parsed anyway before authentication. > small problem that the server *can't* translate *any* encoding until > it's successfully connected to a database and is able to read the > pg_conversion catalog. > > We might be able to institute some rule like "examine the startup > packet and see if it specifies a client_encoding different from what > we inherited from the postmaster. If not, continue with current > behavior (send messages localized per postmaster's settings). If so, > fall back to English messages/C locale until startup is complete." > This would preserve current functionality in cases where it actually, > er, functions, while giving something at least passable in the cases > that are broken today. I think that we can do a bit more than this. We use GNU gettext library to provide message translation. These library are able to perform limited set of encoding conversion itself. So, we can have two-stage fallback here: 1. If encoding is different, but compatible with language, inherited from postmaster, ask gettext via bind_textdomain_encoding function to provide messages in this encoding. 2. If it is not possible, fall back to English messages, which are compatible with any of supported encoding. The same goes for session which doesn't specify encoding at all (i.e. CancelMessage). > > regards, tom lane > > -- Victor Wagner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
Andrew Gierth writes: > Latest patch. > Names and scopes are as per discussion. New files for code and > regression test. Docs included. I'm starting to review this now. If anyone has objections to the property names/behaviors shown in Andrew's previous message, speak up now ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
Vladimir wrote: Shay>I don't know much about the Java world, but both pgbouncer and pgpool > (the major pools?) > > In Java world, https://github.com/brettwooldridge/HikariCP is a very good > connection pool. > Neither pgbouncer nor pgpool is required. > The architecture is: application <=> HikariCP <=> pgjdbc <=> PostgreSQL > > The idea is HikariCP pools pgjdbc connections, and server-prepared > statements are per-pgjdbc connection, so there is no reason to "discard > all" or "deallocate all" or whatever. > > Shay>send DISCARD ALL by default. That is a fact, and it has nothing to do > with any bugs or issues pgbouncer may have. > > That is configurable. If pgbouncer/pgpool defaults to "wrong > configuration", why should we (driver developers, backend developers) try > to accommodate that? > In a nutshell, that sentence represents much of the problem I have with this conversation: you simply consider that anything that's different from your approach is simply "wrong". Shay>f you do a savepoint on every single command, that surely would impact > performance even without extra roundtrips...? > > My voltmeter says me that the overhead is just 3us for a simple "SELECT" > statement (see https://github.com/pgjdbc/pgjdbc/pull/477# > issuecomment-239579547 ). > I think it is acceptable to have it enabled by default, however it would > be nice if the database did not require a savepoint dance to heal its "not > implemented" "cache query cannot change result type" error. > That's interesting (I don't mean that ironically). Aside from simple client-observed speed which you're measuring, are you sure there aren't "hidden" costs on the PostgreSQL side for generating so many implicit savepoints? I don't know anything about PostgreSQL transaction/savepoint internals, but adding savepoints to each and every statement in a transaction seems like it would have some impact. It would be great to have the confirmation of someone who really knows the internals. > Shay>I'm not aware of other drivers that implicitly prepare statements, > Shay >and definitely of no drivers that internally create savepoints and > Shay> roll the back without explicit user APIs > > Glad to hear that. > Are you aware of other drivers that translate "insert into table(a,b,c) > values(?,?,?)" =into=> "insert into table(a,b,c) values(?,?,?), > (?,?,?),...,(?,?,?)" statement on the fly? > > That gives 2-3x performance boost (that includes client result processing > as well!) for batched inserts since "bind/exec" message processing is not > that fast. That is why I say that investing into "bind/exec performance" > makes more sense than investing into "improved execution of non-prepared > statements". > I'm not aware of any, but I also don't consider that part of the driver's job. There's nothing your driver is doing that the application developer can't do themselves - so your driver isn't faster than other drivers. It's faster only when used by lazy programmers. What you're doing is optimizing developer code, with the assumption that developers can't be trusted to code efficiently - they're going to write bad SQL and forget to prepare their statements. That's your approach, and that's fine. The other approach, which is also fine, is that each software component has its own role, and that clearly-defined boundaries of responsibility should exist - that writing SQL is the developer's job, and that delivering it to the database is the driver's job. To me this separation of layers/roles is key part of software engineering: it results in simpler, leaner components which interact in predictable ways, and when there's a problem it's easier to know exactly where it originated. To be honest, the mere idea of having an SQL parser inside my driver makes me shiver. The HikariCP page you sent (thanks, didn't know it) contains a great Dijkstra quote that summarizes what I think about this - "Simplicity is prerequisite for reliability.". IMHO you're going the opposite way by implicitly rewriting SQL, preparing statements and creating savepoints. But at the end of the day it's two different philosophies. All I ask is that you respect the one that isn't yours, which means accepting the optimization of non-prepared statements. There's no mutual exclusion. > 2) I don't say "it is the only true way". I would change my mind if > someone would suggest better solution. Everybody makes mistakes, and I have > no problem with admitting that "I made a mistake" and moving forward. > They say: "Don't cling to a mistake just because you spent a lot of time > making it" > :) So your way isn't the only true way, but others are just clinging to their mistakes... :) > However, no-one did suggest a case when application issues lots of unique > SQL statements. > We did, you just dismissed or ignored them. 3) For "performance related" issues, a business case and a voltmeter is > required to prove there's an issue. > > > Shay>But the second query, which should be invalidated
[HACKERS] Undiagnosed bug in Bloom index
I am getting corrupted Bloom indexes in which a tuple in the table heap is not in the index. I see it as early as commit a9284849b48b, with commit e13ac5586c49c cherry picked onto it. I don't see it before a9284849b48b because the test-case seg faults before anything interesting can happen. I think this is an ab initio bug, either in bloom contrib or in the core index am. I see it as recently as 371b572, which is as new as I have tested. The problem is that an update which must properly update exactly one row instead updates zero rows. It takes 5 to 16 hours to reproduce when run as 8 clients on 8 cores. I suspect it is some kind of race condition, and testing with more clients on more cores would make it happen faster. If you inject crash/recovery cycles into the system, it seems to happen sooner. But crash/recovery cycles are not necessary. If you use the attached do_nocrash.sh script, the error will generate a message like: child abnormal exit update did not update 1 row: key 6838 updated 0E0 at count.pl line 189.\n at count.pl line 197. (And I've added code so that once this is detected, the script will soon terminate) If you want to run do_nocrash.sh, change the first few lines to hardcode the correct path for the binaries and the temp data directory (which will be ruthlessly deleted). It will run on an unpatched server, since crash injection is turned off. If you want to make it fork more clients, change the 8 in 'perl count.pl 8 0|| on_error;' I have preserved a large tarball (215M) of a corrupt data directory. It was run with the a928484 compilation with e13ac5586 cherrypicked, and is at https://drive.google.com/open?id=0Bzqrh1SO9FcEci1FQTkwZW9ZU1U. Or, if you can tell me how to look for myself (pageinspect doesn't offer much for Bloom). With that tarball, the first query using the index returns nothing, will the second forcing a seq scan returns a row: select * from foo where bloom = md5('6838'); select * from foo where bloom||'' = md5('6838'); The machinery posted here is probably much more elaborate than necessary to detect the problem. You could probably detect it with pgbench -N, except that that doesn't check the results to make sure the expected number of rows were actually selected/updated. Cheers, Jeff count.pl Description: Binary data do_nocrash.sh Description: Bourne shell script -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
Shay>To be honest, the mere idea of having an SQL parser inside my driver makes me shiver. Same for me. However I cannot wait for PostgreSQL 18 that does not need client-side parsing. Shay>We did, you just dismissed or ignored them Please prove me wrong, but I did provide a justified answer to both yours: https://www.postgresql.org/message-id/CAB%3DJe-FHSwrbJiTcTDeT4J3y_%2BWvN1d%2BS%2B26aesr85swocb7EA%40mail.gmail.com (starting with "Why the connection is reset") and Robert's examples: https://www.postgresql.org/message-id/CAB%3DJe-GSAs_340dqdrJoTtP6KO6xxN067CtB6Y0ea5c8LRHC9Q%40mail.gmail.com Shay>There's nothing your driver is doing that the application developer can't do themselves - Shay>so your driver isn't faster than other drivers. It's faster only when used by lazy programmers. I'm afraid you do not get the point. ORMs like Hibernate, EclipseLink, etc send regular "insert ... values" via batch API. For the developer the only way to make use of "multivalues" is to implement either "ORM fix" or "the driver fix" or "postgresql fix". So the feature has very little to do with laziness of the programmers. Application developer just does not have full control of each SQL when working though ORM. Do you suggest "stop using ORMs"? Do you suggest fixing all the ORMs so it uses optimal for each DB insert statement? Do you suggest fixing postgresql? Once again "multivalues rewrite at pgjdbc level" enables the feature transparently for all the users. If PostgreSQL 10/11 would improve bind/exec performance, we could even drop that rewrite at pgjdbc level and revert to the regular flow. That would again be transparent to the application. Shay>are you sure there aren't "hidden" costs on the PostgreSQL side for generating so many implicit savepoints? Technically speaking I use the same savepoint name through bind/exec message. Shay>What you're doing is optimizing developer code, with the assumption that developers can't be trusted to code efficiently - they're going to write bad SQL and forget to prepare their statements Please, be careful. "you are completely wrong here" he-he. Well, you list the wrong assumption. Why do you think my main assumption is "developers can't be trusted"? The proper assumption is: I follow Java database API specification, and I optimize pgjdbc for the common use case (e.g. ORM or ORM-like). For instance, if Java application wants to use bind values (e.g. to prevent security issues), then the only way is to go through java.sql.PreparedStatement. Here's the documentation: https://docs.oracle.com/javase/8/docs/api/java/sql/Connection.html#prepareStatement-java.lang.String- Here's a quote: Javadoc> *Note:* This method is optimized for handling parametric SQL statements that benefit from precompilation. If the driver supports precompilation, the methodprepareStatement will send the statement to the database for precompilation. Some drivers may not support precompilation. In this case, the statement may not be sent to the database until the PreparedStatement object is executed. This has no direct effect on users; however, it does affect which methods throw certainSQLException objects. The most important part is "if the driver supports precompilation..." There's no API to enable/disable precompilation at all. So, when using Java, there is no such thing as "statement.enableServerPrepare=true". It is expected, that "driver" would "optimize" the handling somehow in the best possible way. It is Java API specification that enables me (as a driver developer) to be flexible, and leverage database features so end user gets best experience. Vladimir >
Re: [HACKERS] Add hint for function named "is"
On 8/12/16 1:40 PM, Tom Lane wrote: What this is telling us is that given input like, say, SELECT 'foo'::character varying Bison is no longer sure whether "varying" is meant as a type name modifier or a ColLabel. And indeed there is *no* principled answer to that that doesn't involve giving up the ability for "varying" to be a ColLabel. Just promoting it to a fully reserved word (which it is not today) wouldn't be enough, because right now even fully reserved words can be ColLabels. FWIW, I've always disliked how some types could contains spaces without being quoted. AFAIK nothing else in the system allows that, and I don't see why character varying and timestamp with* should get a special pass. I doubt we could get rid of this in CREATE TABLE, but I wonder how many people actually cast using the unquoted form. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add hint for function named "is"
Jim Nasby writes: > FWIW, I've always disliked how some types could contains spaces without > being quoted. AFAIK nothing else in the system allows that, and I don't > see why character varying and timestamp with* should get a special pass. Because the SQL standard says so. If we were going to start ignoring SQL-standard spellings, this area would not be very high on my list, actually. Their insistence on inventing crazy new special syntaxes for things that could be normal function calls is what bugs me ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] condition variables
On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas wrote: > [condition-variable-v1.patch] Don't you need to set proc->cvSleeping = false in ConditionVariableSignal? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
Andrew Gierth writes: > Latest patch. > Names and scopes are as per discussion. New files for code and > regression test. Docs included. Pushed with (mostly) cosmetic adjustments. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
> Shay>I don't know much about the Java world, but both pgbouncer and pgpool > (the major pools?) > > In Java world, https://github.com/brettwooldridge/HikariCP is a very good > connection pool. > Neither pgbouncer nor pgpool is required. > The architecture is: application <=> HikariCP <=> pgjdbc <=> PostgreSQL > > The idea is HikariCP pools pgjdbc connections, and server-prepared > statements are per-pgjdbc connection, so there is no reason to "discard > all" or "deallocate all" or whatever. Interesting. What would happen if a user changes some of GUC parameters? Subsequent session accidentally inherits the changed GUC parameter? > Shay>send DISCARD ALL by default. That is a fact, and it has nothing to do > with any bugs or issues pgbouncer may have. That is correct for pgpool as well. > That is configurable. If pgbouncer/pgpool defaults to "wrong > configuration", why should we (driver developers, backend developers) try > to accommodate that? There's nothing wrong with DICARD ALL. It's just not suitable for your program (and HikariCP?). I don't know about pgbouncer but pgpool has been created for a general purpose connection pooler, which means it must behave as much as similarly to PostgreSQL backend from frontend's point of view. "DISCARD ALL" is needed to simulate the behavior of backend: it discards all properties set by a frontend including prepared statements when a session terminates. "DISCARD ALL" is perfect for this goal. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Reminder: upcoming 9.6 branch split and rc1 release
Just to make sure that everyone's on the same page: the schedule the release team has agreed to is that we'll branch off REL9_6_STABLE on Aug 15 (Monday!) and issue 9.6rc1 the week of Aug 29. Barring scary bugs emerging, this should keep us on track for 9.6.0 release in late September. I will take care of making the branch sometime Monday afternoon. I plan to do a pgindent run right before the branch, as per regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] to_date_valid()
On 27.07.2016 05:00, Joshua D. Drake wrote: On 07/26/2016 06:25 PM, Peter Eisentraut wrote: On 7/5/16 4:24 AM, Albe Laurenz wrote: But notwithstanding your feeling that you would like your application to break if it makes use of this behaviour, it is a change that might make some people pretty unhappy - nobody can tell how many. What is the use of the existing behavior? You get back an arbitrary implementation dependent value. We don't even guarantee what the value will be. If we changed it to return a different implementation dependent value, would users get upset? No they would not get upset because they wouldn't know. Can we just do the right thing? Attached is a patch to "do the right thing". The verification is in "to_date()" now, the extra function is removed. Regression tests are updated - two or three of them returned a wrong date before, and still passed. They fail now. Documentation is also updated. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project to_date_valid.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: Barriers
Hi hackers, I would like to propose "barriers" for Postgres processes. A barrier is a very simple mechanism for coordinating parallel computation, as found in many threading libraries. First, you initialise a Barrier object somewhere in shared memory, most likely in the DSM segment used by parallel query, by calling BarrierInit(&barrier, nworkers). Then workers can call BarrierWait(&barrier) when they want to block until all workers arrive at the barrier. When the final worker arrives, BarrierWait returns in all workers, releasing them to continue their work. One arbitrary worker receives a different return value as a way of "electing" it to perform serial phases of computation. For parallel phases of computation, the return value can be ignored. For example, there may be preparation, merging, or post-processing phases which must be done by just one worker, interspersed with phases where all workers do something. My use case for this is coordinating the phases of parallel hash joins, but I strongly suspect there are other cases. Parallel sort springs to mind, which is why I wanted to post this separately and earlier than my larger patch series, to get feedback from people working on other parallel features. A problem that I'm still grappling with is how to deal with workers that fail to launch. What I'm proposing so far is based on static worker sets, where you can only give the number of workers at initialisation time, just like pthread_barrier_init. Some other libraries allow for adjustable worker sets, and I'm wondering if a parallel leader might need to be able to adjust the barrier when it hears of a worker not starting. More on that soon. Please see the attached WIP patch. I had an earlier version with its own waitlists and signalling machinery etc, but I've now rebased it to depend on Robert Haas's proposed condition variables, making this code much shorter and sweeter. So it depends on his condition-variable-vX.patch[1], which in turn depends on my lwlocks-in-dsm-vX.patch[2] (for proclist). When Michaël Paquier's work on naming wait points[3] lands, I plan to include event IDs as an extra argument to BarrierWait which will be passed though so as to show up in pg_stat_activity. Then you'll be able to see where workers are waiting for each other! For now I didn't want to tangle this up with yet another patch. I thought about using a different name to avoid colliding with barrier.h and overloading the term: there are of course also compiler barriers and memory barriers. But then I realised that that header was basically vacant real estate, and 'barrier' is the super-well established standard term for this parallel computing primitive. I'd be grateful for any thoughts, feedback, flames etc. [1] https://www.postgresql.org/message-id/flat/CA%2BTgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr%3DC56Xng%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CAEepm%3D0Vvr9zgwHt67RwuTfwMEby1GiGptBk3xFPDbbgEtZgMg%40mail.gmail.com [3] https://www.postgresql.org/message-id/flat/cab7npqtghfouhag1ejrvskn8-e5fpqvhm7al0tafsdzjqg_...@mail.gmail.com -- Thomas Munro http://www.enterprisedb.com barrier-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
Tatsuo>Interesting. What would happen if a user changes some of GUC parameters? Subsequent session accidentally inherits the changed GUC parameter? Yes, that is what happens. The idea is not to mess with gucs. Tatsuo>There's nothing wrong with DICARD ALL Tatsuo>"DISCARD ALL" is perfect for this goal. It looks like you mean: "let's reset the connection state just in case". I see where it might help: e.g. when providing database access to random people who might screw a connection in every possible way. Just in case: do you think one should reset CPU caches, OS filesystem caches, DNS caches, bounce the application, and bounce the OS in addition to "discard all"? Why reset only "connection properties" when we can reset everything to its pristine state? Just in case: PostgreSQL does not execute "discard all" on its own. If you mean "pgpool is exactly like reconnect to postgresql but faster since connection is already established", then OK, that might work in some cases (e.g. when response times/throughput are not important), however why forcing "you must always start from scratch" execution model? Developers are not that dumb, and they can understand that "changing gucs at random is bad". When a connection pool is dedicated to a particular application (or a set of applications), then it makes sense to reuse statement cache for performance reasons. Of course, it requires some discipline like "don't mess with gucs", however that is acceptable and it is easily understandable by developers. My application cannot afford re-parsing hot SQL statements as hurts performance. It is very visible in the end-to-end performance tests, so "discard all" is not used, and developers know not to mess with gucs. Vladimir
Re: [HACKERS] Undiagnosed bug in Bloom index
Jeff Janes writes: > I am getting corrupted Bloom indexes in which a tuple in the table > heap is not in the index. Hmm. I can trivially reproduce a problem, but I'm not entirely sure whether it matches yours. Same basic test case as the bloom regression test: regression=# CREATE TABLE tst ( i int4, t text ); CREATE TABLE regression=# CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3); CREATE INDEX regression=# INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM generate_series(1,2000) i; INSERT 0 2000 regression=# vacuum verbose tst; ... INFO: index "bloomidx" now contains 2000 row versions in 5 pages ... regression=# delete from tst; DELETE 2000 regression=# vacuum verbose tst; ... INFO: index "bloomidx" now contains 0 row versions in 5 pages DETAIL: 2000 index row versions were removed. ... regression=# INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM generate_series(1,2000) i; INSERT 0 2000 regression=# vacuum verbose tst; ... INFO: index "bloomidx" now contains 1490 row versions in 5 pages ... Ooops. (Note: this is done with some fixes already in place to make blvacuum.c return correct tuple counts for VACUUM VERBOSE; right now it tends to double-count during a VACUUM.) The problem seems to be that (1) blvacuum marks all the index pages as BLOOM_DELETED, but doesn't bother to clear the notFullPage array on the metapage; (2) blinsert uses a page from notFullPage, failing to notice that it's inserting data into a BLOOM_DELETED page; (3) once we ask the FSM for a page, it returns a BLOOM_DELETED page that we've already put tuples into, which we happily blow away by reinit'ing the page. A race-condition variant of this could be that after an autovacuum has marked a page BLOOM_DELETED, but before it's reached the point of updating the metapage, blinsert could stick data into the deleted page. That would make it possible to reach the problem without requiring the extreme edge case that blvacuum finds no partly-full pages to put into the metapage. If this does explain your problem, it's probably that variant. Will push a fix in a bit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
> Tatsuo>Interesting. What would happen if a user changes some of GUC > parameters? Subsequent session accidentally inherits the changed GUC > parameter? > > Yes, that is what happens. Ouch. > The idea is not to mess with gucs. > > Tatsuo>There's nothing wrong with DICARD ALL > Tatsuo>"DISCARD ALL" is perfect for this goal. > > It looks like you mean: "let's reset the connection state just in case". > I see where it might help: e.g. when providing database access to random > people who might screw a connection in every possible way. Yes, that's what clients of pgpool is expecting. Clients do not want to change their applications which were working with PostgreSQL without pgpool. > Just in case: do you think one should reset CPU caches, OS filesystem > caches, DNS caches, bounce the application, and bounce the OS in addition > to "discard all"? Aparently no, because that is different from what PostgreSQL is doing when backend exits. > Why reset only "connection properties" when we can reset everything to its > pristine state? You can propose that kind of variants of DISCARD command. PostgreSQL is an open source project. > Just in case: PostgreSQL does not execute "discard all" on its own. > If you mean "pgpool is exactly like reconnect to postgresql but faster > since connection is already established", then OK, that might work in some > cases (e.g. when response times/throughput are not important), however why > forcing "you must always start from scratch" execution model? I'm not doing that. I do not ask all the people to use pgpool. People can choose whatever tools he/she thinks suitable for their purpose. > Developers are not that dumb, and they can understand that "changing gucs > at random is bad". > > When a connection pool is dedicated to a particular application (or a set > of applications), then it makes sense to reuse statement cache for > performance reasons. > Of course, it requires some discipline like "don't mess with gucs", however > that is acceptable and it is easily understandable by developers. I'm not against such a developer's way. If you like it, you can do it. Nobody disturbs you. I just want to say that not all the client want that way. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON
On Sat, Aug 13, 2016 at 1:05 AM, David G. Johnston wrote: > > I'll admit I haven't tried to find fault with the idea (or discover better > alternatives) nor how it would look in implementation. As a user, though, > it would make sense if the system behaved in this way. That only AUTOCOMMIT > needs this capability at the moment doesn't bother me. I'm also fine with > making it an error and moving on - but if you want to accommodate both > possibilities this seems like a cleaner solution than yet another > environment variable that a user would need to consider. > I agree on your broad point that we should consider user convenience, but in this case I am not sure if it will be convenient for a user to make the behaviour dependent on value of ON_ERROR_STOP. I have checked this behaviour in one of the top commercial database and it just commits the previous transaction after the user turns the Autocommit to on and the first command is complete. I am not saying that we should blindly follow that behaviour, but just to indicate that it should be okay for users if we don't try to define multiple behaviours here based on value of ON_ERROR_STOP. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Barriers
Hi Thomas, Barriers are really very simple and convenient mechanism for process synchronization. But it is actually a special case of semaphores: having semaphore primitive it is trivial to implement a barrier. We have semaphores in Postgres, but ... them can not be used by extensions: there is fixed number of semaphores allocated based on maximal number of connections and there is no mechanism for requesting additional semaphores. Rober has recently proposed conditional variables, which are also very useful. Right now we have spinlocks, LW-locks and latches. >From my point of view it is not enough. While creating various extensions for >Postgres I always fill lack of such synchronization primitive as events >(condition variables) and semaphores. Events and semaphores are similar and it >is possible to implement any of them based on another. But from user's point >of view them have different semantic and use cases, so it is better to provide >both of them. I wonder if we should provide system-abstraction-layer (SAL) for Postgres, where we can isolate all system dependent code and which will be available for vore developers as well as for developers of extensions? The obvious candidates for SAL are: 1. Synchronization primitives (locks, events, semaphores, mutexes, spinlocks, latches, barriers) 2. Shared memory 3. File access 4. Network sockets 5. Process control (fork, ...) Certainly it requires a lot of refactoring but will make Postgres code much more elegant, easer to read and maintain. Also it is not necessary to do all this changes in one step: here we do not need atomic transactions:) We can start for example with synchronization primitives, as far as in any case a lot of changes are proposed here. Parallel execution is one of the most promising approach to improve Postgres performance. I do not mean just parallel execution of single query. Parallel vacuum, parallel index creation, parallel sort, ... And to implement all this stuff we definitely need convenient and efficient synchronization primitives. The set of such primitives can be discussed. IMHO it should include RW-locks (current LW-locks), mutexes (spinlock + some mechanism to wait), events (condition variables), semaphores and barriers (based on semaphores). Latches can be left for backward compatibility or be replaced with events. I wonder if somebody has measured how much times latches (signal+socket) are slower then posix semaphores or conditional variables? On Aug 14, 2016, at 2:18 AM, Thomas Munro wrote: > Hi hackers, > > I would like to propose "barriers" for Postgres processes. A barrier > is a very simple mechanism for coordinating parallel computation, as > found in many threading libraries. > > First, you initialise a Barrier object somewhere in shared memory, > most likely in the DSM segment used by parallel query, by calling > BarrierInit(&barrier, nworkers). Then workers can call > BarrierWait(&barrier) when they want to block until all workers arrive > at the barrier. When the final worker arrives, BarrierWait returns in > all workers, releasing them to continue their work. One arbitrary > worker receives a different return value as a way of "electing" it to > perform serial phases of computation. For parallel phases of > computation, the return value can be ignored. For example, there may > be preparation, merging, or post-processing phases which must be done > by just one worker, interspersed with phases where all workers do > something. > > My use case for this is coordinating the phases of parallel hash > joins, but I strongly suspect there are other cases. Parallel sort > springs to mind, which is why I wanted to post this separately and > earlier than my larger patch series, to get feedback from people > working on other parallel features. > > A problem that I'm still grappling with is how to deal with workers > that fail to launch. What I'm proposing so far is based on static > worker sets, where you can only give the number of workers at > initialisation time, just like pthread_barrier_init. Some other > libraries allow for adjustable worker sets, and I'm wondering if a > parallel leader might need to be able to adjust the barrier when it > hears of a worker not starting. More on that soon. > > Please see the attached WIP patch. I had an earlier version with its > own waitlists and signalling machinery etc, but I've now rebased it to > depend on Robert Haas's proposed condition variables, making this code > much shorter and sweeter. So it depends on his > condition-variable-vX.patch[1], which in turn depends on my > lwlocks-in-dsm-vX.patch[2] (for proclist). > > When Michaël Paquier's work on naming wait points[3] lands, I plan to > include event IDs as an extra argument to BarrierWait which will be > passed though so as to show up in pg_stat_activity. Then you'll be > able to see where workers are waiting for each other! For now I > didn't want to tangle this up