Re: [HACKERS] proposal: schema variables
2017-11-01 6:07 GMT+01:00 Serge Rielau: > "Although the syntax of CREATE TEMPORARY TABLE resembles that of the SQL > standard, the effect is not the same. In the standard, temporary tables are > defined just once and automatically exist (starting with empty contents) in > every session that needs them. PostgreSQL instead requires each session > to issue its own CREATE TEMPORARY TABLE command for each temporary table > to be used. This allows different sessions to use the same temporary table > name for different purposes, whereas the standard's approach constrains all > instances of a given temporary table name to have the same table structure.” > Yeah, that’s a DECLAREd table in my book. No wonder we didn’t link up. > This is known discussion about local / global temp tables in PostgresSQL. And ToDo point: implementation of global temp tables in Postgres. This temporary behave is marginal part of proposal - so I can to remove it from proposal - and later open discussion about CREATE TEMPORARY VARIABLE versus DECLARE VARIABLE Regards Pavel Serge >
Re: [HACKERS] proposal: schema variables
" Although the syntax of CREATE TEMPORARY TABLE resembles that of the SQL standard, the effect is not the same. In the standard, temporary tables are defined just once and automatically exist (starting with empty contents) in every session that needs them. PostgreSQL instead requires each session to issue its own CREATE TEMPORARY TABLE command for each temporary table to be used. This allows different sessions to use the same temporary table name for different purposes, whereas the standard's approach constrains all instances of a given temporary table name to have the same table structure.” Yeah, that’s a DECLAREd table in my book. No wonder we didn’t link up. Cheers Serge
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Here is a rebased version of the patch. Andreas diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index a0ca2851e5..f8c59ea127 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -926,6 +926,7 @@ ERROR: could not serialize access due to read/write dependencies among transact Acquired by VACUUM (without FULL), ANALYZE, CREATE INDEX CONCURRENTLY, + REINDEX CONCURRENTLY, CREATE STATISTICS and ALTER TABLE VALIDATE and other ALTER TABLE variants (for full details see diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index 2e053c4c24..4019bad4c2 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name +REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name @@ -67,10 +67,7 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } An index build with the CONCURRENTLY option failed, leaving an invalid index. Such indexes are useless but it can be - convenient to use REINDEX to rebuild them. Note that - REINDEX will not perform a concurrent build. To build the - index without interfering with production you should drop the index and - reissue the CREATE INDEX CONCURRENTLY command. + convenient to use REINDEX to rebuild them. @@ -151,6 +148,21 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } + +CONCURRENTLY + + + When this option is used, PostgreSQL will rebuild the + index without taking any locks that prevent concurrent inserts, + updates, or deletes on the table; whereas a standard reindex build + locks out writes (but not reads) on the table until it's done. + There are several caveats to be aware of when using this option + see . + + + + VERBOSE @@ -231,6 +243,173 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } + + Rebuilding Indexes Concurrently + + + index + rebuilding concurrently + + + +Rebuilding an index can interfere with regular operation of a database. +Normally PostgreSQL locks the table whose index is rebuilt +against writes and performs the entire index build with a single scan of the +table. Other transactions can still read the table, but if they try to +insert, update, or delete rows in the table they will block until the +index rebuild is finished. This could have a severe effect if the system is +a live production database. Very large tables can take many hours to be +indexed, and even for smaller tables, an index rebuild can lock out writers +for periods that are unacceptably long for a production system. + + + +PostgreSQL supports rebuilding indexes with minimum locking +of writes. This method is invoked by specifying the +CONCURRENTLY option of REINDEX. When this option +is used, PostgreSQL must perform two scans of the table +for each index that needs to be rebuild and in addition it must wait for +all existing transactions that could potentially use the index to +terminate. This method requires more total work than a standard index +rebuild and takes significantly longer to complete as it needs to wait +for unfinished transactions that might modify the index. However, since +it allows normal operations to continue while the index is rebuilt, this +method is useful for rebuilding indexes in a production environment. Of +course, the extra CPU, memory and I/O load imposed by the index rebuild +may slow down other operations. + + + +The following steps occur in a concurrent index build, each in a separate +transaction except when the new index definitions are created, where all +the concurrent entries are created using only one transaction. Note that +if there are multiple indexes to be rebuilt then each step loops through +all the indexes we're rebuilding, using a separate transaction for each one. +REINDEX CONCURRENTLY proceeds as follows when rebuilding +indexes: + + + + + A new temporary index definition is added into the catalog + pg_index. This definition will be used to replace the + old index. This step is done as a single transaction for all the indexes + involved in this process, meaning that if + REINDEX CONCURRENTLY is run on a table with multiple + indexes, all the catalog entries of the new indexes are created within a + single transaction. A SHARE UPDATE EXCLUSIVE lock at + session level is taken on the indexes being reindexed as well as its + parent table to prevent any schema modification while processing. + + + + + A first
Re: [HACKERS] proposal: schema variables
2017-10-31 22:28 GMT+01:00 srielau: > Pavel, > > There is no > DECLARE TEMP CURSOR > or > DECLARE TEMP variable in PLpgSQL > and > sure .. DECLARE TEMP has no sense, I talked about similarity DECLARE and CREATE TEMP CREATE TEMP TABLE has a different meaning from what I understand you > envision for variables. > > But maybe I'm mistaken. Your original post did not describe the entire > syntax: > CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type > [ DEFAULT expression ] [[NOT] NULL] > [ ON TRANSACTION END { RESET | DROP } ] > [ { VOLATILE | STABLE } ]; > > Especially the TEMP is not spelled out and how its presence affects or > doesn't ON TRANSACTION END. > So may be if you elaborate I understand where you are coming from. > TEMP has same functionality (and implementation) like our temp tables - so at session end the temp variables are destroyed, but it can be assigned to transaction. > > > > > -- > Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers- > f1928748.html > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?
On 1 November 2017 at 11:49, Andres Freundwrote: > Right. It'd probably be good to be a bit more adaptive here. But it's > hard to do with posix_fadvise - we'd need an operation that actually > notifies us of IO completion. If we were using, say, asynchronous > direct IO, we could initiate the request and regularly check how many > blocks ahead of the current window are already completed and adjust the > queue based on that, rather than jus tfiring off fadvises and hoping for > the best. In case it's of interest, I did some looking into using Linux's AIO support in Pg a while ago, when chasing some issues around fsync retries and handling of I/O errors. It was a pretty serious dead end; it was clear that fsync support in AIO is not only incomplete but inconsistent across kernel versions, let alone other platforms. But I see your name in the relevant threads, so you know that. To save others the time, see: * https://lwn.net/Articles/724198/ * https://lwn.net/Articles/671649/ -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tomas Vondra wrote: > FWIW I can reproduce this on 9.5, and I don't even need to run the > UPDATE part. That is, INSERT + VACUUM running concurrently is enough to > produce broken BRIN indexes :-( Hmm, I'm pretty sure we stress-tested brin in pretty much the same way. But I see this misbehavior too. Looking ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?
Hi, On 2017-10-31 18:47:06 +0100, Tomas Vondra wrote: > On 10/31/2017 04:48 PM, Greg Stark wrote: > > On 31 October 2017 at 07:05, Chris Travers> wrote: > >> Hi; > >> > >> After Andres's excellent talk at PGConf we tried benchmarking > >> effective_io_concurrency on some of our servers and found that those > which > >> have a number of NVME storage volumes could not fill the I/O queue > even at > >> the maximum setting (1000). > > > > And was the system still i/o bound? If the cpu was 100% busy then > > perhaps Postgres just can't keep up with the I/O system. It would > > depend on workload though, if you start many very large sequential > > scans you may be able to push the i/o system harder. > > > > Keep in mind effective_io_concurrency only really affects bitmap > > index scans (and to a small degree index scans). It works by issuing > > posix_fadvise() calls for upcoming buffers one by one. That gets > > multiple spindles active but it's not really going to scale to many > > thousands of prefetches (and effective_io_concurrency of 1000 > > actually means 7485 prefetches). At some point those i/o are going > > to start completing before Postgres even has a chance to start > > processing the data. Note that even if they finish well before postgres gets around to looking at the block, you might still be seeing benefits. SSDs benefit from larger reads, and a deeper queue gives more chances for reordering / coalescing of requests. Won't beenefit the individual reader, but might help the overall capacity of the system. > Yeah, initiating the prefetches is not expensive, but it's not free > either. So there's a trade-off between time spent on prefetching and > processing the data. Right. It'd probably be good to be a bit more adaptive here. But it's hard to do with posix_fadvise - we'd need an operation that actually notifies us of IO completion. If we were using, say, asynchronous direct IO, we could initiate the request and regularly check how many blocks ahead of the current window are already completed and adjust the queue based on that, rather than jus tfiring off fadvises and hoping for the best. > I believe this may be actually illustrated using Amdahl's law - the I/O > is the parallel part, and processing the data is the serial part. And no > matter what you do, the device only has so much bandwidth, which defines > the maximum possible speedup (compared to "no prefetch" case). Right. > Furthermore, the device does not wait for all the I/O requests to be > submitted - it won't wait for 1000 requests and then go "OMG! There's a > lot of work to do!" It starts processing the requests as they arrive, > and some of them will complete before you're done with submitting the > rest, so you'll never see all the requests in the queue at once. It'd be interesting to see how much it helps to scale the size of readahead requests with the distance from the current read iterator. E.g. if you're less than 16 blocks away from the current head, issue size 1, up to 32 issue a 2 block request for consecutive blocks. I suspect it won't help because at least linux's block layer / io elevator seems quite successfully at merging. E.g. for the query: EXPLAIN ANALYZE SELECT sum(l_quantity) FROM lineitem where l_receiptdate between '1993-05-03' and '1993-08-03'; on a tpc-h scale dataset on my laptop, I see: Devicer/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sda 25702.000.00495.27 0.00 37687.00 0.00 59.45 0.005.130.00 132.0919.73 0.00 0.04 100.00 but it'd be worthwhile to see whether doing the merging ourselves allows for deeper queues. I think we really should start incorporating explicit prefetching in more places. Ordered indexscans might actually be one case that's not too hard to do in a simple manner - whenever at an inner node, prefetch the leaf nodes below it. We obviously could do better, but that might be a decent starting point to get some numbers. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkovwrote: > On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada > wrote: >> >> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas >> wrote: >> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov >> > wrote: >> >> Hello. I made some bugfixes and rewrite the patch. >> > >> > I don't think it's a good idea to deliberately leave the state of the >> > standby different from the state of the master on the theory that it >> > won't matter. I feel like that's something that's likely to come back >> > to bite us. >> >> I agree with Robert. What happen if we intentionally don't apply the >> truncation WAL and switched over? If we insert a tuple on the new >> master server to a block that has been truncated on the old master, >> the WAL apply on the new standby will fail? I guess there are such >> corner cases causing failures of WAL replay after switch-over. > > > Yes, that looks dangerous. One approach to cope that could be teaching heap > redo function to handle such these situations. But I expect this approach > to be criticized for bad design. And I understand fairness of this > criticism. > > However, from user prospective of view, current behavior of > hot_standby_feedback is just broken, because it both increases bloat and > doesn't guarantee that read-only query on standby wouldn't be cancelled > because of vacuum. Therefore, we should be looking for solution: if one > approach isn't good enough, then we should look for another approach. > > I can propose following alternative approach: teach read-only queries on hot > standby to tolerate concurrent relation truncation. Therefore, when > non-existent heap page is accessed on hot standby, we can know that it was > deleted by concurrent truncation and should be assumed to be empty. Any > thoughts? > You also meant that the applying WAL for AccessExclusiveLock is always skipped on standby servers to allow scans to access the relation? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic result sets from procedures
On 1 November 2017 at 05:08, Peter Eisentrautwrote: > CREATE PROCEDURE pdrstest1() > LANGUAGE SQL > AS $$ > DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2; > DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3; > $$; > > CALL pdrstest1(); FWIW, this is similar to the model already used by PgJDBC to emulate multiple result sets, though the current support in the driver is rather crude. It detects a REFCURSOR in an output parameter / result set and transparently FETCHes the result set, making it look to the client app like it's a nested result set. This shouldn't conflict with what you're doing because the driver does not follow the JDBC standard behaviour of using Statement.getMoreResults() and Statement.getResultSet() for multiple result sets. That's currently only used by PgJDBC when fetching result sets from batch query executions. Instead, the multiple result set emulation requires the caller to 'getObject' the 'refcursor' field's result-object, then cast it to ResultSet, and treat it as a new (nested) result set. True multiple result sets would be exposed in PgJDBC via getMoreResults(). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)
> Now, I agree this is somewhat more limited than I hoped for, but OTOH it > still solves the issue I initially aimed for (processing large query > results in efficient way). I don't quite understand this part. We already send results to the client in a stream unless it's something we have to materialize, in which case a cursor won't help anyway. If the client wants to fetch in chunks it can use a portal and limited size fetches. That shouldn't (?) be parallel-unsafe, since nothing else can happen in the middle anyway. But in most cases the client can just fetch all, and let the socket buffering take care of things, reading results only when it wants them, and letting the server block when the windows are full. That's not to say that SQL-level cursor support wouldn't be nice. I'm just trying to better understand what it's solving. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw
On Wed, Oct 4, 2017 at 5:58 PM, Ashutosh Bapatwrote: > The view with WCO is local but the modification which violates WCO is > being made on remote server by a trigger on remote table. Trying to > control that doesn't seem to be a good idea, just like we can't > control what rows get inserted on the foreign server when they violate > local constraints. I think that's a fair point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Tue, Oct 31, 2017 at 5:07 PM, Thomas Munrowrote: > So that's this bit: > > + pg_itoa(worker, filename); > + lts->pfile = BufFileCreateShared(fileset, filename); > > ... and: > > + pg_itoa(i, filename); > + file = BufFileOpenShared(fileset, filename); Right. > What's wrong with using a worker number like this? I guess nothing, though there is the question of discoverability for DBAs, etc. You do address this separately, by having (potentially) descriptive filenames, as you go into. > It's not random choice: buffile.c creates a uniquely named directory > (or directories, if you have more than one location configured in the > temp_tablespaces GUC) to hold all the backing files involved in each > BufFileSet. Naming of BufFiles within the BufFileSet is the caller's > problem, and a worker number seems like a reasonable choice to me. It > won't collide with a concurrent parallel CREATE INDEX because that'll > be using its own BufFileSet. Oh, I see. I may have jumped the gun on that one. > One complaint about the current coding that someone might object to: > MakeSharedSegmentPath() just dumps the caller's BufFile name into a > path without sanitisation: I should fix that so that we only accept > fairly limited strings here. Another complaint is that perhaps fd.c > knows too much about buffile.c's business. For example, > RemovePgTempFilesInDir() knows about the ".set" directories created by > buffile.c, which might be called a layering violation. Perhaps the > set/directory logic should move entirely into fd.c, so you'd call > FileSetInit(FileSet *), not BufFileSetInit(BufFileSet *), and then > BufFileOpenShared() would take a FileSet *, not a BufFileSet *. > Thoughts? I'm going to make an item on my personal TODO list for that. No useful insights on that right now, though. > 3. sharedtuplestore.c takes a caller-supplied BufFileSet and creates > its shared BufFiles in there. Earlier versions created and owned a > BufFileSet, but in the current Parallel Hash patch I create loads of > separate SharedTuplestore objects but I didn't want to create load of > directories to back them, so you can give them all the same > BufFileSet. That works because SharedTuplestores are also given a > name, and it's the caller's job (in my case nodeHash.c) to make sure > the SharedTuplestores are given unique names within the same > BufFileSet. For Parallel Hash you'll see names like 'i3of8' (inner > batch 3 of 8). There is no need for there to be in any sort of > central registry for that though, because it rides on top of the > guarantees from 2 above: buffile.c will put those files into a > uniquely named directory, and that works as long as no one else is > allowed to create files or directories in the temp directory that > collide with its reserved pattern /^pgsql_tmp.+\.set$/. For the same > reason, parallel CREATE INDEX is free to use worker numbers as BufFile > names, since it has its own BufFileSet to work within. If the new standard is that you have temp file names that suggest the purpose of each temp file, then that may be something that parallel CREATE INDEX should buy into. > In an earlier version, BufFileSet was one of those annoying data > structures with a FLEXIBLE_ARRAY_MEMBER that you'd use as an > incomplete type (declared but not defined in the includable header), > and here it was being used "inside" (or rather after) SharedSort, > which *itself* had a FLEXIBLE_ARRAY_MEMBER. The reason for the > variable sized object was that I needed all backends to agree on the > set of temporary tablespace OIDs, of which there could be any number, > but I also needed a 'flat' (pointer-free) object I could stick in > relocatable shared memory. In the newest version I changed that > flexible array to tablespaces[8], because 8 should be enough > tablespaces for anyone (TM). I guess that that's something that you'll need to take up with Andres, if you haven't already. I have a hard time imagining a single query needed to use more than that many tablespaces at once, so maybe this is fine. > I don't really believe anyone uses > temp_tablespaces for IO load balancing anymore and I hate code like > the above. So I think Rushabh should now remove the above-quoted code > and just use a BufFileSet directly as a member of SharedSort. FWIW, I agree with you that nobody uses temp_tablespaces this way these days. This seems like a discussion for your hash join patch, though. I'm happy to buy into that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding column_constraint description in ALTER TABLE synopsis
On 2017/10/31 21:31, Stephen Frost wrote: > * Lætitia Avrot (laetitia.av...@gmail.com) wrote: >> As Amit Langot pointed out, the column_constraint definition is missing >> whereas it is used in ALTER TABLE synopsis. It can be easily found in the >> CREATE TABLE synopsis, but it's not very user friendly. > > Thanks, this looks pretty reasonable, but did you happen to look for any > other keywords in the ALTER TABLE that should really be in ALTER TABLE > also? > > I'm specifically looking at, at least, partition_bound_spec. Maybe you > could propose an updated patch which addresses that also, and any other > cases you find? Ah, yes. I remember having left out partition_bound_spec simply because I thought it was kind of how it was supposed to be done, seeing that neither column_constraint and table_constraint were expanded in the ALTER TABLE's synopsis. It seems that there are indeed a couple of other things that need to be brought over to ALTER TABLE synopsis including partition_bound_spec. 9f295c08f877 [1] added table_constraint, but missed to add the description of index_parameters and exclude_element which are referenced therein. Attached find updated version of the Lætitia's patch. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9f295c diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 41acda003f..e059f87875 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -85,6 +85,20 @@ ALTER TABLE [ IF EXISTS ] name OWNER TO { new_owner | CURRENT_USER | SESSION_USER } REPLICA IDENTITY { DEFAULT | USING INDEX index_name | FULL | NOTHING } +and column_constraint is: + +[ CONSTRAINT constraint_name ] +{ NOT NULL | + NULL | + CHECK ( expression ) [ NO INHERIT ] | + DEFAULT default_expr | + GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( sequence_options ) ] | + UNIQUE index_parameters | + PRIMARY KEY index_parameters | + REFERENCES reftable [ ( refcolumn ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] +[ ON DELETE action ] [ ON UPDATE action ] } +[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ] + and table_constraint is: [ CONSTRAINT constraint_name ] @@ -96,6 +110,15 @@ ALTER TABLE [ IF EXISTS ] name [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] } [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ] +index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are: + +[ WITH ( storage_parameter [= value] [, ... ] ) ] +[ USING INDEX TABLESPACE tablespace_name ] + +exclude_element in an EXCLUDE constraint is: + +{ column_name | ( expression ) } [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] + and table_constraint_using_index is: [ CONSTRAINT constraint_name ] @@ -104,6 +127,13 @@ ALTER TABLE [ IF EXISTS ] name +and partition_bound_spec is: + +IN ( { numeric_literal | string_literal | NULL } [, ...] ) | +FROM ( { numeric_literal | string_literal | MINVALUE | MAXVALUE } [, ...] ) + TO ( { numeric_literal | string_literal | MINVALUE | MAXVALUE } [, ...] ) + + Description -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing LEFT JOINs in more cases
Hackers, Normally we'll only ever remove a LEFT JOIN relation if it's unused and there's no possibility that the join would cause row duplication. To check that the join wouldn't cause row duplicate we make use of proofs, such as unique indexes, or for sub-queries, we make use of DISTINCT and GROUP BY clauses. There's another case that we don't handle, and it's VERY simple to test for. Quite simply, it seems we could remove the join in cases such as: create table t1 (id int primary key); create table t2 (id int primary key, b int not null); insert into t2 values(1,1),(2,1); insert into t1 values(1); select distinct t1.* from t1 left join t2 on t1.id=t2.b; and select t1.id from t1 left join t2 on t1.id=t2.b GROUP BY t1.id; but not: select t1.id,count(*) from t1 left join t2 on t1.id=t2.b GROUP BY t1.id; In this case, the join *can* cause row duplicates, but the distinct or group by would filter these out again anyway, so in these cases, we'd not only get the benefit of not joining but also not having to remove the duplicate rows caused by the join. Given how simple the code is to support this, it seems to me to be worth handling. A patch to do this is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Support-removing-LEFT-JOINs-with-DISTINCT-GROUP-BY.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe crashes with access violation on Windows while starting up
From: Michael Paquier [mailto:michael.paqu...@gmail.com] > On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayuki >wrote: > > When CurrentMemoryContext is NULL, the message is logged with > ReportEventA(). This is similar to write_console(). > > My point is that as Postgres is running as a service, isn't it wrong to > write a message to stderr as a fallback if the memory context is not set? > You would lose a message. It seems to me that for an operation that can > happen at a low-level like the postmaster startup, we should really use > a low-level operation as well. I'm sorry I may not have been clear. With this patch, write_eventlog() outputs the message to event log with ReportEventA() when CurrentMemoryContext is NULL. It doesn't write to stderr. So the message won't be lost. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Wed, Nov 1, 2017 at 11:29 AM, Peter Geogheganwrote: > On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia > wrote: >> Attaching the re based patch according to the v22 parallel-hash patch sets > > I took a quick look at this today, and noticed a few issues: > > * make_name() is used to name files in sharedtuplestore.c, which is > what is passed to BufFileOpenShared() for parallel hash join. Your > using your own logic for that within the equivalent logtape.c call to > BufFileOpenShared(), presumably because make_name() wants to identify > participants by PID rather than by an ordinal identifier number. So that's this bit: + pg_itoa(worker, filename); + lts->pfile = BufFileCreateShared(fileset, filename); ... and: + pg_itoa(i, filename); + file = BufFileOpenShared(fileset, filename); What's wrong with using a worker number like this? > I think that we need some kind of central registry for things that use > shared buffiles. It could be that sharedtuplestore.c is further > generalized to support this, or it could be that they both call > something else that takes care of naming. It's not okay to have this > left to random chance. It's not random choice: buffile.c creates a uniquely named directory (or directories, if you have more than one location configured in the temp_tablespaces GUC) to hold all the backing files involved in each BufFileSet. Naming of BufFiles within the BufFileSet is the caller's problem, and a worker number seems like a reasonable choice to me. It won't collide with a concurrent parallel CREATE INDEX because that'll be using its own BufFileSet. > You're going to have to ask Thomas about this. You should also use > MAXPGPATH for the char buffer on the stack. Here's a summary of namespace management scheme I currently have at the three layers fd.c, buffile.c, sharedtuplestore.c: 1. fd.c has new lower-level functions provides PathNameCreateTemporaryFile(const char *path) and PathNameOpenTemporaryFile(const char *path). It also provides PathNameCreateTemporaryDir(). Clearly callers of these interfaces will need to be very careful about managing the names they use. Callers also own the problem of cleaning up files, since there is no automatic cleanup of files created this way. My intention was that these facilities would *only* be used by BufFileSet, since it has machinery to manage those things. 2. buffile.c introduces BufFileSet, which is conceptually a set of BufFiles that can be shared by multiple backends with DSM segment-scoped cleanup. It is implemented as a set of directories: one for each tablespace in temp_tablespaces. It controls the naming of those directories. The BufFileSet directories are named similarly to fd.c's traditional temporary file names using the usual recipe "pgsql_tmp" + PID + per-process counter but have an additional ".set" suffix. RemovePgTempFilesInDir() recognises directories with that prefix and suffix as junk left over from a crash when cleaning up. I suppose it's that knowledge about reserved name patterns and cleanup that you are thinking of as a central registry? As for the BufFiles that are in a BufFileSet, buffile.c has no opinion on that: the calling code (parallel CREATE INDEX, sharedtuplestore.c, ...) is responsible for coming up with its own scheme. If parallel CREATE INDEX wants to name shared BufFiles "walrus" and "banana", that's OK by me, and those files won't collide with anything in another BufFileSet because each BufFileSet has its own directory (-ies). One complaint about the current coding that someone might object to: MakeSharedSegmentPath() just dumps the caller's BufFile name into a path without sanitisation: I should fix that so that we only accept fairly limited strings here. Another complaint is that perhaps fd.c knows too much about buffile.c's business. For example, RemovePgTempFilesInDir() knows about the ".set" directories created by buffile.c, which might be called a layering violation. Perhaps the set/directory logic should move entirely into fd.c, so you'd call FileSetInit(FileSet *), not BufFileSetInit(BufFileSet *), and then BufFileOpenShared() would take a FileSet *, not a BufFileSet *. Thoughts? 3. sharedtuplestore.c takes a caller-supplied BufFileSet and creates its shared BufFiles in there. Earlier versions created and owned a BufFileSet, but in the current Parallel Hash patch I create loads of separate SharedTuplestore objects but I didn't want to create load of directories to back them, so you can give them all the same BufFileSet. That works because SharedTuplestores are also given a name, and it's the caller's job (in my case nodeHash.c) to make sure the SharedTuplestores are given unique names within the same BufFileSet. For Parallel Hash you'll see names like 'i3of8' (inner batch 3 of 8). There is no need for there to be in any sort of central registry for that though, because it rides on top of the guarantees from 2 above: buffile.c
Re: [HACKERS] Account for cost and selectivity of HAVING quals
"David G. Johnston"writes: > On Tue, Oct 31, 2017 at 4:31 PM, Tels wrote: >> That looks odd to me, it first uses output_tuples in a formula, then >> overwrites the value with a new value. Should these lines be swapped? > IIUC it is correct: the additional total_cost comes from processing every > output group to check whether it is qualified - since every group is > checked the incoming output_tuples from the prior grouping is used. Right --- we'll expend the effort to compute the HAVING expression once per group row, whether the row passes the qual or not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Account for cost and selectivity of HAVING quals
On Tue, Oct 31, 2017 at 4:31 PM, Telswrote: > > > That looks odd to me, it first uses output_tuples in a formula, then > overwrites the value with a new value. Should these lines be swapped? > IIUC it is correct: the additional total_cost comes from processing every output group to check whether it is qualified - since every group is checked the incoming output_tuples from the prior grouping is used. The side-effect of the effort is that the number of output_tuples has now been reduced to only those matching the qual - and so it now must take on a new value to represent this. David J.
[HACKERS] Proposal: generic WAL compression
Hackers, a few years ago generic WAL was proposed by Alexander Korotkov (https://www.postgresql.org/message-id/flat/CAPpHfdsXwZmojm6Dx%2BTJnpYk27kT4o7Ri6X_4OSWcByu1Rm%2BVA%40mail.gmail.com#capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com). and was committed into PostgreSQL 9.6. One of the generic WAL advantages is the common interface for safe interaction with WAL for modules and extensions. The interface allows module to register the page, then change it, and then generic WAL generates and writes into xlog the algorithm of changing the old version of page into the new one. In the case of crash and recovery this algorithm may be applied. Bloom and RUM indexes use generic WAL. The issue is that the generic algorithm of turning the old page into the new one is not optimal in the sense of record size in xlog. Now the main idea of the approach is to find all positions in the page where new version is different from the original one and write these changes into xlog. It works well if the page was rewritten completely or if only a few bytes have been changed. Nevertheless, this approach produces too large WAL record in the case of inserting or deleting a few bytes into the page. In this case there are almost no position where the old version and the new one are equal, so the record size is near the page size, but actual amount of changes in the page is small. This is exactly what happens often in RUM indexes. In order to overcome that issue, I would like to propose the patch, which provides possibility to use another approach of the WAL record construction. If another approach fails to make a short enough record, it rolls back to the original approach. The main idea of another approach is not to perform bytewise comparison of pages, but finding the minimal editing distance between two pages and the corresponding editing algorithm. In the general case, finding editing distance takes O(N^2) time and memory. But there is also an algorithm which requires only O(ND) time and O(D^2) memory, where D is the editing distance between two sequences. Also for given D algorithm may show that the minimal editing distance between two sequences is more than D in the same amount of time and memory. The special case of this algorithm which does not consider replace operations is described in the paper (http://www.xmailserver.org/diff2.pdf). The version of this algorithm which consumes O(ND) time and O(N) memory is used in diff console command, but for our purposes we don't need to increase the constant for the time in order to decrease memory complexity. For RUM indexes we usually have small enough editing distance (less than 64), so D^2 is not too much to store. The results of experiments: +--+++++ | Name | WAL_diff(MB) | WAL_orig(MB) | Time_diff(s) | Time_orig(s) | +--+++++ | rum: make installcheck | 38.9 | 82.5 | 4.37 | 4.16 | +--+++++ | bloom: make installcheck | 1.0| 1.0 | 0.66 | 0.53 | +--+++++ | test00.sql | 20.5 | 51.0 | 1.86 | 1.41 | +--+++++ | test01.sql | 207.1 | 219.7 | 8.06 | 6.89 | +--+++++ We can see that the patch provides a little slowdown, but compresses generic WAL efficiently for RUM index. Also I'm going to do a few more experiments on this patch with another data. The patch was tested on Lubuntu 14.04, but should not contain any platform-specific items. The patch, the files and scripts for doing the experiments and performance tests are attached. Oleg Ivanov Postgres Professional The Russian PostgreSQL Company diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c index 3adbf7b..cd0ed2a 100644 --- a/src/backend/access/transam/generic_xlog.c +++ b/src/backend/access/transam/generic_xlog.c @@ -43,9 +43,18 @@ * a full page's worth of data. *- */ -#define FRAGMENT_HEADER_SIZE (2 * sizeof(OffsetNumber)) +#define FRAGMENT_HEADER_SIZE (2 * (sizeof(OffsetNumber) + \ + sizeof(char) + sizeof(int))) #define MATCH_THRESHOLD FRAGMENT_HEADER_SIZE -#define MAX_DELTA_SIZE (BLCKSZ + 2 * FRAGMENT_HEADER_SIZE) +#define MAX_DELTA_SIZE (BLCKSZ + 2 * FRAGMENT_HEADER_SIZE) + sizeof(bool) +
Re: [HACKERS] Account for cost and selectivity of HAVING quals
Moin, On Tue, October 31, 2017 5:45 pm, Tom Lane wrote: > Pursuant to the discussion at > https://www.postgresql.org/message-id/20171029112420.8920b5f...@mx.zeyos.com > here's a patch to fix the planner so that eval costs and selectivity of > HAVING quals are factored into the appropriate plan node numbers. > Perhaps unsurprisingly, this doesn't change the results of any > existing regression tests. > > It's slightly annoying that this approach will result in calculating > the eval costs and selectivity several times, once for each aggregation > plan type we consider. I thought about inserting RestrictInfo nodes > into the havingQual so that those numbers could be cached, but that turned > out to break various code that doesn't expect to see such nodes there. > I'm not sure it's worth going to the trouble of fixing that; in the big > scheme of things, the redundant calculations likely don't cost much, since > we aren't going to have relevant statistics. > > Comments? If anyone wants to do a real review of this, I'm happy to stick > it into the upcoming CF; but without an expression of interest, I'll just > push it. I don't think there's anything terribly controversial here. Not a review, but the patch has this: + total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple; + + output_tuples = clamp_row_est(output_tuples * ... That looks odd to me, it first uses output_tuples in a formula, then overwrites the value with a new value. Should these lines be swapped? Best regards, Tels -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
On 10/31/2017 11:44 PM, Tomas Vondra wrote: > ... > Unfortunately, I think we still have a problem ... I've been wondering > if we end up producing correct indexes, so I've done a simple test. > > 1) create the table as before > > 2) let the insert + vacuum run for some time, to see if there are > crashes (result: no crashes after one hour, inserting ~92M rows) > > 3) do a bunch of random updates on the data (while still doing the > concurrent vacuum in another session) > > 4) run a bunch of simple queries to compare the results, essentially > >-- BRIN index >SET enable_bitmapscan = on; >SELECT COUNT(*) FROM brin_test WHERE a = $1; > > >-- seq scan >SET enable_bitmapscan = on; >SELECT COUNT(*) FROM brin_test WHERE a = $1; > > and unfortunately what I get is not particularly pleasant: > > test=# set enable_bitmapscan = on; > SET > test=# select count(*) from brin_test where a = 0; > count > --- > 9062 > (1 row) > > test=# set enable_bitmapscan = off; > SET > test=# select count(*) from brin_test where a = 0; > count > --- > 9175 > (1 row) > > Attached is a SQL script with commands I used. You'll need to copy the > commands into multiple psql sessions, though, to simulate concurrent > activity). > FWIW I can reproduce this on 9.5, and I don't even need to run the UPDATE part. That is, INSERT + VACUUM running concurrently is enough to produce broken BRIN indexes :-( regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors
"David G. Johnston"writes: > Definitely moderates my opinion in my concurrent email...though > postponement is not strictly bad given the seeming frequency of the > existing problematic syntax in the wild already. Yeah, I'd hoped to get some capability extensions done here before v10 shipped, in line with the theory I've expressed in the past that it's better if you can point to actual new features justifying a compatibility break. However, that didn't happen in time. I'm disinclined to revert the change though; if there are people making use of this oddity now, then the longer we leave it in place the more people are going to be hurt when we do break it. If I had a time machine, I'd go back and fix the original multi-column SET patch so that it required the word ROW in all cases --- then at least it'd have accepted a conformant subset of the standard syntax. Oh well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
Le 31/10/2017 à 23:36, Gilles Darold a écrit : > Le 31/10/2017 à 22:28, srielau a écrit : >> Pavel, >> >> There is no >> DECLARE TEMP CURSOR >> or >> DECLARE TEMP variable in PLpgSQL >> and >> CREATE TEMP TABLE has a different meaning from what I understand you >> envision for variables. >> >> But maybe I'm mistaken. Your original post did not describe the entire >> syntax: >> CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type >> [ DEFAULT expression ] [[NOT] NULL] >> [ ON TRANSACTION END { RESET | DROP } ] >> [ { VOLATILE | STABLE } ]; >> >> Especially the TEMP is not spelled out and how its presence affects or >> doesn't ON TRANSACTION END. >> So may be if you elaborate I understand where you are coming from. > I think that the TEMP keyword can be removed. If I understand well the > default scope for variable is the session, every transaction in a > session will see the same value. For the transaction level, probably the > reason of the TEMP keyword, I think the [ ON TRANSACTION END { RESET | > DROP } ] will allow to restrict the scope to this transaction level > without needing the TEMP keyword. When a variable is created in a > transaction, it is temporary if "ON TRANSACTION END DROP" is used > otherwise it will persist after the transaction end. I guess that this > is the same as using TEMP keyword? I forgot to say that in the last case the DECLARE statement can be used so I don't see the reason of this kind of "temporary" variables. Maybe the variable object like used in DB2 and defined in document : https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/sqlref/src/tpc/db2z_sql_createvariable.html could be enough to cover our needs. -- Gilles Darold Consultant PostgreSQL http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors
On Tue, Oct 31, 2017 at 3:43 PM, Tom Lanewrote: > According to the spec, the elements of a parenthesized > SET list should be assigned from the fields of a composite RHS. If > there's just one element of the SET list, the RHS should be a single-field > composite value, and this syntax should result in extracting its field. > This patch doesn't do that; it preserves our previous broken behavior, > and thus just puts off the day of reckoning that inevitably will come. > Definitely moderates my opinion in my concurrent email...though postponement is not strictly bad given the seeming frequency of the existing problematic syntax in the wild already. David J.
Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors
On Tue, Oct 31, 2017 at 3:14 PM, Rob McCollwrote: > >> I believe that this is not an intended change or behavior, but is instead >> an unintentional side effect of 906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd >> Improve handling of "UPDATE ... SET (column_list) = row_constructor". ( >> https://github.com/postgres/postgres/commit/906bfcad7ba7cb3 >> 863fe0e2a7810be8e3cd84fbd). >> > At this point the intent of 906bfcad doesn't really matter - and given the number of complaints in the month since v10 went live I'm tending to lean toward bringing the pre-10 behavior back. There is no ambiguity involved here and the breakage of existing applications seems considerably worse than the technical oddity of allowing (val) to be interpreted as a row_constructor in this situation. From a standards perspective we are strictly more permissive so no new problem there. On a related note, the 10.0 syntax guide is wrong, it needs to break out the parenthesized single-column and multi-column variants separately: Presently: ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) Basically one cannot specify only a single column_name AND omit ROW It would have been nice if the syntax for that variant would have been: ( column_name, column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT }, { expression | DEFAULT } [, ...] ) If the v10 behavior remains the above change should be made as well as adding: ( column_name ) = ROW ( expression | DEFAULT ) If we revert 10 to the pre-10 behavior the existing syntax will work. David J.
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Hi, On 10/31/2017 08:46 PM, Tom Lane wrote: > I wrote: >> maybe >> we just have some run-of-the-mill bugs to find, like the off-the-end >> bug I spotted in brin_doupdate. There's apparently at least one >> more, but given the error message it must be something like not >> checking for a page to have turned into a revmap page. Shouldn't >> be too hard to find... > > Actually, I think it might be as simple as the attached. > brin_getinsertbuffer checks for the old page having turned into revmap, > but the "samepage" path in brin_doupdate does not :-( > > With this applied, Alvaro's version of the test case has survived > without error for quite a bit longer than its former MTBF. There > might still be some issues though in other code paths. > That does fix the crashes for me - I've been unable to reproduce any even after one hour (it took a couple of minutes to crash before). Unfortunately, I think we still have a problem ... I've been wondering if we end up producing correct indexes, so I've done a simple test. 1) create the table as before 2) let the insert + vacuum run for some time, to see if there are crashes (result: no crashes after one hour, inserting ~92M rows) 3) do a bunch of random updates on the data (while still doing the concurrent vacuum in another session) 4) run a bunch of simple queries to compare the results, essentially -- BRIN index SET enable_bitmapscan = on; SELECT COUNT(*) FROM brin_test WHERE a = $1; -- seq scan SET enable_bitmapscan = on; SELECT COUNT(*) FROM brin_test WHERE a = $1; and unfortunately what I get is not particularly pleasant: test=# set enable_bitmapscan = on; SET test=# select count(*) from brin_test where a = 0; count --- 9062 (1 row) test=# set enable_bitmapscan = off; SET test=# select count(*) from brin_test where a = 0; count --- 9175 (1 row) Attached is a SQL script with commands I used. You'll need to copy the commands into multiple psql sessions, though, to simulate concurrent activity). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services brin-test.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors
Rob McCollwrites: > Attaching patch... :-/ The reason why hacking your way to a backwards-compatible solution is a bad idea is that it breaks the SQL standard compliance we're trying to achieve here. According to the spec, the elements of a parenthesized SET list should be assigned from the fields of a composite RHS. If there's just one element of the SET list, the RHS should be a single-field composite value, and this syntax should result in extracting its field. This patch doesn't do that; it preserves our previous broken behavior, and thus just puts off the day of reckoning that inevitably will come. As a concrete example, the spec says this should work: create table t (f1 int); update t set (f1) = row(4); and it does in v10, but (without having tested) your patch will break it. This is not so exciting for simple row constructors, where you could just leave off the word ROW; but it is a critical distinction if we're ever to get to the point of allowing other composite-returning constructs here, for example composite-returning functions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
Le 31/10/2017 à 22:28, srielau a écrit : > Pavel, > > There is no > DECLARE TEMP CURSOR > or > DECLARE TEMP variable in PLpgSQL > and > CREATE TEMP TABLE has a different meaning from what I understand you > envision for variables. > > But maybe I'm mistaken. Your original post did not describe the entire > syntax: > CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type > [ DEFAULT expression ] [[NOT] NULL] > [ ON TRANSACTION END { RESET | DROP } ] > [ { VOLATILE | STABLE } ]; > > Especially the TEMP is not spelled out and how its presence affects or > doesn't ON TRANSACTION END. > So may be if you elaborate I understand where you are coming from. I think that the TEMP keyword can be removed. If I understand well the default scope for variable is the session, every transaction in a session will see the same value. For the transaction level, probably the reason of the TEMP keyword, I think the [ ON TRANSACTION END { RESET | DROP } ] will allow to restrict the scope to this transaction level without needing the TEMP keyword. When a variable is created in a transaction, it is temporary if "ON TRANSACTION END DROP" is used otherwise it will persist after the transaction end. I guess that this is the same as using TEMP keyword? -- Gilles Darold Consultant PostgreSQL http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathiawrote: > Attaching the re based patch according to the v22 parallel-hash patch sets I took a quick look at this today, and noticed a few issues: * make_name() is used to name files in sharedtuplestore.c, which is what is passed to BufFileOpenShared() for parallel hash join. Your using your own logic for that within the equivalent logtape.c call to BufFileOpenShared(), presumably because make_name() wants to identify participants by PID rather than by an ordinal identifier number. I think that we need some kind of central registry for things that use shared buffiles. It could be that sharedtuplestore.c is further generalized to support this, or it could be that they both call something else that takes care of naming. It's not okay to have this left to random chance. You're going to have to ask Thomas about this. You should also use MAXPGPATH for the char buffer on the stack. * This logtape.c comment needs to be updated, as it's no longer true: * successfully. In general, workers can take it that the leader will * reclaim space in files under their ownership, and so should not * reread from tape. * Robert hated the comment changes in the header of nbtsort.c. You might want to change it back, because he is likely to be the one that commits this. * You should look for similar comments in tuplesort.c (IIRC a couple of places will need to be revised). * tuplesort_begin_common() should actively reject a randomAccess parallel case using elog(ERROR). * tuplesort.h should note that randomAccess isn't supported, too. * What's this all about?: + /* Accessor for the SharedBufFileSet that is at the end of Sharedsort. */ + #define GetSharedBufFileSet(shared)\ + ((BufFileSet *) (&(shared)->tapes[(shared)->nTapes])) You can't just cast from one type to the other without regard for the underling size of the shared memory buffer, which is what this looks like to me. This only fails to crash because you're only abusing the last member in the tapes array for this purpose, and there happens to be enough shared memory slop that you get away with it. I'm pretty sure that ltsUnify() ends up clobbering the last/leader tape, which is a place where BufFileSet is also used, so this is just wrong. You should rethink the shmem structure a little bit. * There is still that FIXME comment within leader_takeover_tapes(). I believe that you should still have a leader tape (at least in local memory in the leader), even though you'll never be able to do anything with it, since randomAccess is no longer supported. You can remove the FIXME, and just note that you have a leader tape to be consistent with the serial case, though recognize that it's not useful. Note that even with randomAccess, we always had the leader tape, so it's not that different, really. I suppose it might make sense to make shared->tapes not have a leader tape. It hardly matters -- perhaps you should leave it there in order to keep the code simple, as you'll be keeping the leader tape in local memory, too. (But it still won't fly to continue to clobber it, of course -- you still need to find a dedicated place for BufFileSet in shared memory.) That's all I have right now. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On 10/30/17 6:36 AM, Michael Paquier wrote: > On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers >> >> How does rep mgr or other programs using pg_rewind know what to exclude? > > Good question. Answers could come from folks such as David Steele > (pgBackRest) or Marco (barman) whom I am attaching in CC. pgBackRest does not use pg_rewind. However, pgBackRest does use the same exclusion list for backups as pg_basebackup. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)
Hi, On 10/20/2017 03:23 PM, Robert Haas wrote: > > ... > > The main points I want to make clearly understood is the current > design relies on (1) functions being labeled correctly and (2) other > dangerous code paths being unreachable because there's nothing that > runs between EnterParallelMode and ExitParallelMode which could invoke > them, except by calling a mislabeled function. Your patch expands the > vulnerability surface from "executor code that can be reached without > calling a mislabeled function" to "any code that can be reached by > typing an SQL command". Just rejecting any queries that are > parallel-unsafe probably closes a good chunk of the holes, but that > still leaves a lot of code that's never been run in parallel mode > before potentially now running in parallel mode - e.g. any DDL command > you happen to type, transaction control commands, code that only runs > when the server is idle like idle_in_transaction_timeout, cursor > operations. A lot of that stuff is probably fine, but it's got to be > thought through. Error handling might be a problem, too: what happens > if a parallel worker is killed while the query is suspended? I > suspect that doesn't work very nicely at all. > OK, understood and thanks for explaining what may be the possible issues. I do appreciate that. I still think it'd be valuable to support this, though, so I'm going to spend more time on investigating what needs to be handled. But maybe there's a simpler option - what if we only allow fetches from the PARALLEL cursor while the cursor is open? That is, this would work: BEGIN; ... DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...; FETCH 1000 FROM x; FETCH 1000 FROM x; FETCH 1000 FROM x; CLOSE x; ... COMMIT; but adding any other command between the OPEN/CLOSE commands would fail. That should close all the holes with parallel-unsafe stuff, right? Of course, this won't solve the issue with error handling / killing suspended workers (which didn't occur to me before as a possible issue at all, so that's for pointing that out). But that's a significantly more limited issue to fix than all the parallel-unsafe bits. Now, I agree this is somewhat more limited than I hoped for, but OTOH it still solves the issue I initially aimed for (processing large query results in efficient way). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors
Attaching patch... :-/ On Tue, Oct 31, 2017 at 4:27 PM, Rob McCollwrote: > Between 9.6.5 and 10, the handling of parenthesized single-column UPDATE > statements changed. In 9.6.5, they were treated identically to > unparenthesized single-column UPDATES. In 10, they are treated as > multiple-column updates. This results in this being valid in Postgres > 9.6.5, but an error in Postgres 10: > > CREATE TABLE test (id INT PRIMARY KEY, data INT); > INSERT INTO test VALUES (1, 1); > UPDATE test SET (data) = (2) WHERE id = 1; > > In 10 and the current master, this produces the error: > > errmsg("source for a multiple-column UPDATE item must be a sub-SELECT or > ROW() expression") > > I believe that this is not an intended change or behavior, but is instead > an unintentional side effect of 906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd > Improve handling of "UPDATE ... SET (column_list) = row_constructor". ( > https://github.com/postgres/postgres/commit/906bfcad7ba7cb3 > 863fe0e2a7810be8e3cd84fbd). > > This is a small patch to the grammar that restores the previous behavior > by adding a rule to the set_clause rule and modifying the final rule of the > set_clause rule to only match lists of more then one element. I'm not sure > if there are more elegant or preferred ways to address this. > > Compiled and tested on Ubuntu 17.04 Linux 4.10.0-33-generic x86_64. > > Regression test added under the update test to cover the parenthesized > single-column case. > > I see no reason this would affect performance. > > Thanks, > -rob > > -- > Rob McColl > @robmccoll > r...@robmccoll.com > 205.422.0909 <(205)%20422-0909> > diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y new file mode 100644 index 4c83a63..b5f4ccf *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *** set_clause: *** 10694,10712 $1->val = (Node *) $3; $$ = list_make1($1); } ! | '(' set_target_list ')' '=' a_expr { ! int ncolumns = list_length($2); int i = 1; ListCell *col_cell; /* Create a MultiAssignRef source for each target */ foreach(col_cell, $2) { ResTarget *res_col = (ResTarget *) lfirst(col_cell); MultiAssignRef *r = makeNode(MultiAssignRef); ! r->source = (Node *) $5; r->colno = i; r->ncolumns = ncolumns; res_col->val = (Node *) r; --- 10694,10720 $1->val = (Node *) $3; $$ = list_make1($1); } ! | '(' set_target ')' '=' a_expr { ! $2->val = (Node *) $5; ! $$ = list_make1($2); ! } ! | '(' set_target_list ',' set_target ')' '=' a_expr ! { ! int ncolumns; int i = 1; ListCell *col_cell; + $2 = lappend($2,$4); + ncolumns = list_length($2); + /* Create a MultiAssignRef source for each target */ foreach(col_cell, $2) { ResTarget *res_col = (ResTarget *) lfirst(col_cell); MultiAssignRef *r = makeNode(MultiAssignRef); ! r->source = (Node *) $7; r->colno = i; r->ncolumns = ncolumns; res_col->val = (Node *) r; diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out new file mode 100644 index cef70b1..90d33ad *** a/src/test/regress/expected/update.out --- b/src/test/regress/expected/update.out *** SELECT * FROM update_test; *** 76,82 100 | 21 | (4 rows) ! UPDATE update_test SET (c,b,a) = ('bugle', b+11, DEFAULT) WHERE c = 'foo'; SELECT * FROM update_test; a | b | c -++--- --- 76,93 100 | 21 | (4 rows) ! -- parenthesized single column should be valid ! UPDATE update_test SET (c) = ('bungle') WHERE c = 'foo'; ! SELECT * FROM update_test; ! a | b | c ! -++ ! 100 | 20 | ! 100 | 21 | ! 100 | 20 | bungle ! 100 | 21 | bungle ! (4 rows) ! ! UPDATE update_test SET (c,b,a) = ('bugle', b+11, DEFAULT) WHERE c = 'bungle'; SELECT * FROM update_test; a | b | c -++--- diff --git a/src/test/regress/sql/update.sql b/src/test/regress/sql/update.sql new file mode 100644 index 66d1fec..0ed5f6a *** a/src/test/regress/sql/update.sql --- b/src/test/regress/sql/update.sql *** UPDATE update_test SET a = v.* FROM (VAL *** 51,57 INSERT INTO update_test SELECT a,b+1,c FROM update_test; SELECT * FROM update_test; ! UPDATE update_test SET (c,b,a) = ('bugle', b+11, DEFAULT) WHERE c = 'foo'; SELECT * FROM update_test; UPDATE update_test SET (c,b) = ('car', a+b), a = a + 1 WHERE a = 10; SELECT * FROM update_test; --- 51,60 INSERT INTO update_test SELECT a,b+1,c FROM update_test; SELECT * FROM update_test; ! -- parenthesized single column should be valid ! UPDATE update_test SET (c) = ('bungle') WHERE c = 'foo'; ! SELECT * FROM update_test; ! UPDATE update_test SET (c,b,a) = ('bugle', b+11, DEFAULT) WHERE c = 'bungle'; SELECT * FROM update_test; UPDATE
[HACKERS] Account for cost and selectivity of HAVING quals
Pursuant to the discussion at https://www.postgresql.org/message-id/20171029112420.8920b5f...@mx.zeyos.com here's a patch to fix the planner so that eval costs and selectivity of HAVING quals are factored into the appropriate plan node numbers. Perhaps unsurprisingly, this doesn't change the results of any existing regression tests. It's slightly annoying that this approach will result in calculating the eval costs and selectivity several times, once for each aggregation plan type we consider. I thought about inserting RestrictInfo nodes into the havingQual so that those numbers could be cached, but that turned out to break various code that doesn't expect to see such nodes there. I'm not sure it's worth going to the trouble of fixing that; in the big scheme of things, the redundant calculations likely don't cost much, since we aren't going to have relevant statistics. Comments? If anyone wants to do a real review of this, I'm happy to stick it into the upcoming CF; but without an expression of interest, I'll just push it. I don't think there's anything terribly controversial here. regards, tom lane diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index ce32b8a..98fb16e 100644 *** a/src/backend/optimizer/path/costsize.c --- b/src/backend/optimizer/path/costsize.c *** void *** 1874,1879 --- 1874,1880 cost_agg(Path *path, PlannerInfo *root, AggStrategy aggstrategy, const AggClauseCosts *aggcosts, int numGroupCols, double numGroups, + List *quals, Cost input_startup_cost, Cost input_total_cost, double input_tuples) { *** cost_agg(Path *path, PlannerInfo *root, *** 1955,1960 --- 1956,1981 output_tuples = numGroups; } + /* + * If there are quals (HAVING quals), account for their cost and + * selectivity. + */ + if (quals) + { + QualCost qual_cost; + + cost_qual_eval(_cost, quals, root); + startup_cost += qual_cost.startup; + total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple; + + output_tuples = clamp_row_est(output_tuples * + clauselist_selectivity(root, + quals, + 0, + JOIN_INNER, + NULL)); + } + path->rows = output_tuples; path->startup_cost = startup_cost; path->total_cost = total_cost; *** cost_windowagg(Path *path, PlannerInfo * *** 2040,2051 --- 2061,2075 void cost_group(Path *path, PlannerInfo *root, int numGroupCols, double numGroups, + List *quals, Cost input_startup_cost, Cost input_total_cost, double input_tuples) { + double output_tuples; Cost startup_cost; Cost total_cost; + output_tuples = numGroups; startup_cost = input_startup_cost; total_cost = input_total_cost; *** cost_group(Path *path, PlannerInfo *root *** 2055,2061 */ total_cost += cpu_operator_cost * input_tuples * numGroupCols; ! path->rows = numGroups; path->startup_cost = startup_cost; path->total_cost = total_cost; } --- 2079,2105 */ total_cost += cpu_operator_cost * input_tuples * numGroupCols; ! /* ! * If there are quals (HAVING quals), account for their cost and ! * selectivity. ! */ ! if (quals) ! { ! QualCost qual_cost; ! ! cost_qual_eval(_cost, quals, root); ! startup_cost += qual_cost.startup; ! total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple; ! ! output_tuples = clamp_row_est(output_tuples * ! clauselist_selectivity(root, ! quals, ! 0, ! JOIN_INNER, ! NULL)); ! } ! ! path->rows = output_tuples; path->startup_cost = startup_cost; path->total_cost = total_cost; } diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 1c84a2c..f620243 100644 *** a/src/backend/optimizer/prep/prepunion.c --- b/src/backend/optimizer/prep/prepunion.c *** choose_hashed_setop(PlannerInfo *root, L *** 977,982 --- 977,983 */ cost_agg(_p, root, AGG_HASHED, NULL, numGroupCols, dNumGroups, + NIL, input_path->startup_cost, input_path->total_cost, input_path->rows); *** choose_hashed_setop(PlannerInfo *root, L *** 991,996 --- 992,998 input_path->rows, input_path->pathtarget->width, 0.0, work_mem, -1.0); cost_group(_p, root, numGroupCols, dNumGroups, + NIL, sorted_p.startup_cost, sorted_p.total_cost, input_path->rows); diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 2d491eb..eafec2a 100644 *** a/src/backend/optimizer/util/pathnode.c --- b/src/backend/optimizer/util/pathnode.c *** create_result_path(PlannerInfo *root, Re *** 1374,1379 --- 1374,1380 pathnode->path.startup_cost = target->cost.startup;
Re: [HACKERS] proposal: schema variables
Pavel, There is no DECLARE TEMP CURSOR or DECLARE TEMP variable in PLpgSQL and CREATE TEMP TABLE has a different meaning from what I understand you envision for variables. But maybe I'm mistaken. Your original post did not describe the entire syntax: CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type [ DEFAULT expression ] [[NOT] NULL] [ ON TRANSACTION END { RESET | DROP } ] [ { VOLATILE | STABLE } ]; Especially the TEMP is not spelled out and how its presence affects or doesn't ON TRANSACTION END. So may be if you elaborate I understand where you are coming from. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
2017-10-31 22:08 GMT+01:00 Serge Rielau: > Pavel, > > I can imagine, so DECLARE command will be introduced as short cut for > CREATE TEMP VARIABLE, but in this moment I would not to open this topic. I > afraid of bikeshedding and I hope so CREATE TEMP VAR is anough. > > Language is important because language stays. > You choice of syntax will outlive your code and possibly yourself. > sure. But in this moment I don't see difference between DECLARE VARIABLE and CREATE TEMP VARIABLE different than "TEMP" keyword. Regards Pavel > My 2 cents > Serge >
Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index
On 9 October 2017 at 18:57, Alexander Korotkovwrote: > On Thu, Oct 5, 2017 at 9:48 PM, Shubham Barai > wrote: > >> On 3 October 2017 at 00:32, Alexander Korotkov > > wrote: >> >>> On Mon, Oct 2, 2017 at 9:11 PM, Andrew Borodin >>> wrote: >>> On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkov wrote: > What happen if exactly this "continue" fires? > >> if (GistFollowRight(stack->page)) >> { >> if (!xlocked) >> { >> LockBuffer(stack->buffer, GIST_UNLOCK); >> LockBuffer(stack->buffer, GIST_EXCLUSIVE); >> xlocked = true; >> /* someone might've completed the split when we unlocked */ >> if (!GistFollowRight(stack->page)) >> continue; > > > In this case we might get xlocked == true without calling > CheckForSerializableConflictIn(). Indeed! I've overlooked it. I'm remembering this issue, we were considering not fixing splits if in Serializable isolation, but dropped the idea. >>> >>> Yeah, current insert algorithm assumes that split must be fixed before >>> we can correctly traverse the tree downwards. >>> >>> CheckForSerializableConflictIn() must be after every exclusive lock. >>> >>> I'm not sure, that fixing split is the case to necessary call >>> CheckForSerializableConflictIn(). This lock on leaf page is not taken >>> to do modification of the page. It's just taken to ensure that nobody else >>> is fixing this split the same this. After fixing the split, it might >>> appear that insert would go to another page. >>> >>> > I think it would be rather safe and easy for understanding to more > CheckForSerializableConflictIn() directly before gistinserttuple(). The difference is that after lock we have conditions to change page, and before gistinserttuple() we have actual intent to change page. From the point of future development first version is better (if some new calls occasionally spawn in), but it has 3 calls while your proposal have 2 calls. It seems to me that CheckForSerializableConflictIn() before gistinserttuple() is better as for now. >>> >>> Agree. >>> >> >> I have updated the location of CheckForSerializableConflictIn() and >> changed the status of the patch to "needs review". >> > > Now, ITSM that predicate locks and conflict checks are placed right for > now. > However, it would be good to add couple comments to gistdoinsert() whose > would state why do we call CheckForSerializableConflictIn() in these > particular places. > > I also take a look at isolation tests. You made two separate test specs: > one to verify that serialization failures do fire, and another to check > there are no false positives. > I wonder if we could merge this two test specs into one, but use more > variety of statements with different keys for both inserts and selects. > Please find the updated version of patch here. I have made suggested changes. Regards, Shubham Predicate-locking-in-gist-index_4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Dynamic result sets from procedures
This patch is more of a demo of what could be done, not my primary focus, but if there is interest and some assistance, maybe we can make something out of it. This patch also goes on top of "SQL procedures" version 1. The purpose is to return multiple result sets from a procedure. This is, I think, a common request when coming from MS SQL and DB2. MS SQL has a completely different procedure syntax, but this proposal is compatible with DB2, which as usual was the model for the SQL standard. So this is what it can do: CREATE PROCEDURE pdrstest1() LANGUAGE SQL AS $$ DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2; DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3; $$; CALL pdrstest1(); and that returns those two result sets to the client. That's all it does for now. Things get more complex when you consider nested calls. The SQL standard describes additional facilities how an outer procedure can accept a called procedure's result sets, or not. In the thread on transaction control, I mentioned that we might need some kind of procedure call stack. Something like that would be needed here as well. There are also probably some namespacing issues around the cursors that need more investigation. A more mundane issue is how we get psql to print multiple result sets. I have included here a patch that does that, and you can see that new result sets start popping up in the regression tests already. There is also one need error that needs further investigation. We need to think about how the \timing option should work in such scenarios. Right now it does start timer run query fetch result stop timer print result If we had multiple result sets, the most natural flow would be start timer run query while result sets fetch result print result stop timer print time but that would include the printing time in the total time, which the current code explicitly does not. We could also temporarily save the result sets, like start timer run query while result sets fetch result stop timer foreach result set print result but that would have a lot more overhead, potentially. Thoughts? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 2e5d50cb39b926b29a6081f2387b95621357a4a0 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Thu, 19 Oct 2017 08:18:47 -0400 Subject: [PATCH v1 1/2] psql: Display multiple result sets If a query returns multiple result sets, display all of them instead of only the one that PQexec() returns. Adjust various regression tests to handle the new additional output. --- src/bin/psql/common.c | 25 +++-- src/test/regress/expected/copyselect.out | 5 +++ src/test/regress/expected/psql.out | 6 +--- src/test/regress/expected/transactions.out | 56 ++ 4 files changed, 76 insertions(+), 16 deletions(-) diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 9b59ee840b..2b6bd56e12 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -1390,22 +1390,25 @@ SendQuery(const char *query) if (pset.timing) INSTR_TIME_SET_CURRENT(before); - results = PQexec(pset.db, query); + PQsendQuery(pset.db, query); /* these operations are included in the timing result: */ ResetCancelConn(); - OK = ProcessResult(); - - if (pset.timing) + while ((results = PQgetResult(pset.db))) { - INSTR_TIME_SET_CURRENT(after); - INSTR_TIME_SUBTRACT(after, before); - elapsed_msec = INSTR_TIME_GET_MILLISEC(after); - } + OK = ProcessResult(); + + if (pset.timing) + { + INSTR_TIME_SET_CURRENT(after); + INSTR_TIME_SUBTRACT(after, before); + elapsed_msec = INSTR_TIME_GET_MILLISEC(after); + } - /* but printing results isn't: */ - if (OK && results) - OK = PrintQueryResults(results); + /* but printing results isn't: */ + if (OK && results) + OK = PrintQueryResults(results); + } } else { diff --git a/src/test/regress/expected/copyselect.out b/src/test/regress/expected/copyselect.out index 72865fe1eb..a13e1b411b 100644 --- a/src/test/regress/expected/copyselect.out +++ b/src/test/regress/expected/copyselect.out @@ -136,6 +136,11 @@ copy (select 1) to stdout\; copy (select 2) to stdout\; select 0\; select 3; -- create table test3 (c int); select 0\; copy test3 from stdin\; copy test3 from stdin\; select 1; -- 1 + ?column?
Re: [HACKERS] proposal: schema variables
Pavel, I can imagine, so DECLARE command will be introduced as short cut for CREATE TEMP VARIABLE, but in this moment I would not to open this topic. I afraid of bikeshedding and I hope so CREATE TEMP VAR is anough. Language is important because language stays. You choice of syntax will outlive your code and possibly yourself. My 2 cents Serge
Re: [HACKERS] SQL procedures
2017-10-31 18:23 GMT+01:00 Peter Eisentraut < peter.eisentr...@2ndquadrant.com>: > I've been working on SQL procedures. (Some might call them "stored > procedures", but I'm not aware of any procedures that are not stored, so > that's not a term that I'm using here.) > > Everything that follows is intended to align with the SQL standard, at > least in spirit. > > This first patch does a bunch of preparation work. It adds the > CREATE/ALTER/DROP PROCEDURE commands and the CALL statement to call a > procedure. It also adds ROUTINE syntax which can refer to a function or > procedure. I have extended that to include aggregates. And then there > is a bunch of leg work, such as psql and pg_dump support. The > documentation is a lot of copy-and-paste right now; that can be > revisited sometime. The provided procedural languages (an ever more > confusing term) each needed a small touch-up to handle pg_proc entries > with prorettype == 0. > > Right now, there is no support for returning values from procedures via > OUT parameters. That will need some definitional pondering; and see > also below for a possible alternative. > > With this, you can write procedures that are somewhat compatible with > DB2, MySQL, and to a lesser extent Oracle. > > Separately, I will send patches that implement (the beginnings of) two > separate features on top of this: > > - Transaction control in procedure bodies > > - Returning multiple result sets > > (In various previous discussions on "real stored procedures" or > something like that, most people seemed to have one of these two > features in mind. I think that depends on what other SQL systems one > has worked with previously.) > Not sure if disabling RETURN is good idea. I can imagine so optional returning something like int status can be good idea. Cheaper than raising a exception. Regards Pavel > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] proposal: schema variables
Hi 2017-10-30 22:42 GMT+01:00 srielau: > Pavel, > > I wouldn't put in the DROP option. > Or at least not in that form of syntax. > > By convention CREATE persists DDL and makes object definitions visible > across sessions. > DECLARE defines session private objects which cannot collide with other > sessions. > > If you want variables with a short lifetime that get dropped at the end of > the transaction that by definition would imply a session private object. So > it ought to be DECLARE'd. > > As far as I can see PG has been following this practice so far. > I am thinking so there is little bit overlap between DECLARE and CREATE TEMP VARIABLE command. With DECLARE command, you are usually has not any control when variable will be destroyed. For CREATE TEMP is DROP IF EXISTS, but it should not be used. It should be very similar to our current temporary tables, that are created in session related temp schema. I can imagine, so DECLARE command will be introduced as short cut for CREATE TEMP VARIABLE, but in this moment I would not to open this topic. I afraid of bikeshedding and I hope so CREATE TEMP VAR is anough. Regards Pavel > Cheers > Serge Rielau > Salesforce.com > > > > -- > Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers- > f1928748.html > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] SQL procedures
2017-10-31 18:23 GMT+01:00 Peter Eisentraut < peter.eisentr...@2ndquadrant.com>: > I've been working on SQL procedures. (Some might call them "stored > procedures", but I'm not aware of any procedures that are not stored, so > that's not a term that I'm using here.) > > Everything that follows is intended to align with the SQL standard, at > least in spirit. > > This first patch does a bunch of preparation work. It adds the > CREATE/ALTER/DROP PROCEDURE commands and the CALL statement to call a > procedure. It also adds ROUTINE syntax which can refer to a function or > procedure. I have extended that to include aggregates. And then there > is a bunch of leg work, such as psql and pg_dump support. The > documentation is a lot of copy-and-paste right now; that can be > revisited sometime. The provided procedural languages (an ever more > confusing term) each needed a small touch-up to handle pg_proc entries > with prorettype == 0. > > Right now, there is no support for returning values from procedures via > OUT parameters. That will need some definitional pondering; and see > also below for a possible alternative. > > With this, you can write procedures that are somewhat compatible with > DB2, MySQL, and to a lesser extent Oracle. > > Separately, I will send patches that implement (the beginnings of) two > separate features on top of this: > > - Transaction control in procedure bodies > > - Returning multiple result sets > > (In various previous discussions on "real stored procedures" or > something like that, most people seemed to have one of these two > features in mind. I think that depends on what other SQL systems one > has worked with previously.) > great. I hope so I can help with testing Regards Pavel > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
I wrote: > maybe > we just have some run-of-the-mill bugs to find, like the off-the-end > bug I spotted in brin_doupdate. There's apparently at least one > more, but given the error message it must be something like not > checking for a page to have turned into a revmap page. Shouldn't > be too hard to find... Actually, I think it might be as simple as the attached. brin_getinsertbuffer checks for the old page having turned into revmap, but the "samepage" path in brin_doupdate does not :-( With this applied, Alvaro's version of the test case has survived without error for quite a bit longer than its former MTBF. There might still be some issues though in other code paths. regards, tom lane diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c index 80f803e..b0f86f3 100644 *** a/src/backend/access/brin/brin_pageops.c --- b/src/backend/access/brin/brin_pageops.c *** brin_doupdate(Relation idxrel, BlockNumb *** 113,121 /* * Check that the old tuple wasn't updated concurrently: it might have ! * moved someplace else entirely ... */ ! if (!ItemIdIsNormal(oldlp)) { LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); --- 113,127 /* * Check that the old tuple wasn't updated concurrently: it might have ! * moved someplace else entirely, and for that matter the whole page ! * might've become a revmap page. Note that in the first two cases ! * checked here, the "oldlp" we just calculated is garbage; but ! * PageGetItemId() is simple enough that it was safe to do that ! * calculation anyway. */ ! if (!BRIN_IS_REGULAR_PAGE(oldpage) || ! oldoff > PageGetMaxOffsetNumber(oldpage) || ! !ItemIdIsNormal(oldlp)) { LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Transaction control in procedures
Here is a patch that implements transaction control in PL/Python procedures. (This patch goes on top of "SQL procedures" patch v1.) So you can do this: CREATE PROCEDURE transaction_test1() LANGUAGE plpythonu AS $$ for i in range(0, 10): plpy.execute("INSERT INTO test1 (a) VALUES (%d)" % i) if i % 2 == 0: plpy.commit() else: plpy.rollback() $$; CALL transaction_test1(); I started with PL/Python because the internal structures there are more manageable. Obviously, people will want this for PL/pgSQL as well, and I have that in the works. It's not in a usable state, but I have found that the work needed is essentially the same as in PL/Python for example. I have discovered three groups of obstacles that needed addressing to make this work. At this point, the patch is more of a demo of what these issues are, and if we come to satisfactory solutions for each of them, things should fall into place more easily afterwards. 1) While calling CommitTransactionCommand() in the middle of a utility command works just fine (several utility commands do this, of course), calling AbortCurrentTransaction() in a similar way does not. There are a few pieces of code that think that a transaction abort will always result in a return to the main control loop, and so they will just clean away everything. This is what the changes in portalmem.c are about. Some comments there already hint about the issue. No doubt this will need further refinement. I think it would be desirable in general to separate the control flow concerns from the transaction management concerns more cleanly. 2) SPI needs some work. It thinks that it can clean everything away at transaction end. I have found that instead of TopTransactionContext one can use PortalContext and get a more suitable life cycle for the memory. I have played with some variants to make this configurable (e.g., argument to SPI_connect()), but that didn't seem very useful. There are some comments indicating that there might not always be a PortalContext, but the existing tests don't seem to mind. (There was a thread recently about making a fake PortalContext for autovacuum, so maybe the current idea is that we make sure there always is a PortalContext.) Maybe we need another context like StatementContext or ProcedureContext. There also needs to be a way via SPI to end transactions and allowing *some* cleanup to happen but leaving the memory around. I have done that via additional SPI API functions like SPI_commit(), which are then available to PL implementations. I also tried making it possible calling transaction statements directly via SPI_exec() or similar, but that ended up a total disaster. So from the API perspective, I like the current implementation, but the details will no doubt need refinement. 3) The PL implementations themselves allocate memory in transaction-bound contexts for convenience as well. This is usually easy to fix by switching to PortalContext as well. As you see, the PL/Python code part of the patch is actually very small. Changes in other PLs would be similar. Two issues have not been addressed yet: A) It would be desirable to be able to run commands such as VACUUM and CREATE INDEX CONCURRENTLY in a procedure. This needs a bit of thinking and standards-lawyering about the semantics, like where exactly do transactions begin and end in various combinations. It will probably also need a different entry point into SPI, because SPI_exec cannot handle statements ending transactions. But so far my assessment is that this can be added in a mostly independent way later on. B) There needs to be some kind of call stack for procedure and function invocations, so that we can track when we are allowed to make transaction controlling calls. The key term in the SQL standard is "non-atomic execution context", which seems specifically devised to cover this scenario. So for example, if you have CALL -> CALL -> CALL, the third call can issue a transaction statement. But if you have CALL -> SELECT -> CALL, then the last call cannot, because the SELECT introduces an atomic execution context. I don't know if we have such a thing yet or something that we could easily latch on to. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 1b3318154540d0fe71480ca58938433ecfccabbd Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Tue, 31 Oct 2017 14:48:51 -0400 Subject: [PATCH v1] Transaction control in PL/Python procedures Add .commit, .rollback, and .start_transaction functions to plpy module to control transactions in a PL/Python function. Add similar underlying functions to SPI. Some additional cleanup so that transaction commit or abort doesn't blow away data structures still used by the procedure call. --- src/backend/commands/functioncmds.c | 3 + src/backend/executor/spi.c
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Alvaro Herrerawrites: > Tom Lane wrote: >> I really don't understand how any of this "let's release the buffer >> lock and then take it back later" logic is supposed to work reliably. > So summarize_range first inserts the placeholder tuple, which is there > purposefully for other processes to update concurrently; then, without > blocking any other process, scan the page range and update the > placeholder tuple (this could take a long time, so it'd be a bad idea > to hold buffer lock for that long). Yeah, agreed, and your upthread point about avoiding deadlock when we need to take two buffer locks at the same time is also something that it's hard to see any other way around. Maybe we just have to find a way to make the existing structure work. The sticking point is that not only might the tuple we expected have been deleted, but someone could have put an unrelated tuple in its place. Are you confident that you can detect that and recover from it? If you're sure that brin_tuples_equal() is trustworthy for this purpose, then maybe we just have some run-of-the-mill bugs to find, like the off-the-end bug I spotted in brin_doupdate. There's apparently at least one more, but given the error message it must be something like not checking for a page to have turned into a revmap page. Shouldn't be too hard to find... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL procedures
Peter Eisentrautwrites: > I've been working on SQL procedures. No comment yet on the big picture here, but ... > The provided procedural languages (an ever more > confusing term) each needed a small touch-up to handle pg_proc entries > with prorettype == 0. Putting 0 in prorettype seems like a pretty bad idea. Why not use VOIDOID for the prorettype value? Or if there is some reason why "void" isn't the right pseudotype, maybe you should invent a new one, analogous to the "trigger" and "event_trigger" pseudotypes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > I really don't understand how any of this "let's release the buffer > lock and then take it back later" logic is supposed to work reliably. So summarize_range first inserts the placeholder tuple, which is there purposefully for other processes to update concurrently; then, without blocking any other process, scan the page range and update the placeholder tuple (this could take a long time, so it'd be a bad idea to hold buffer lock for that long). I think what we should do is rethink the locking considerations in brin_doupdate vs. brinGetTupleForHeapBlock, and how they are used in summarize_range and brininsert. In summarize_range, instead of hoping that in some cases we will not need to re-obtain the placeholder tuple, just do that in all cases keeping the buffer locked until the tuple is updated. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove inbound links to sql-createuser
David, * Stephen Frost (sfr...@snowman.net) wrote: > * David G. Johnston (david.g.johns...@gmail.com) wrote: > > Since CREATE USER is officially an alias for CREATE ROLE other parts of the > > documentation should point to CREATE ROLE, not CREATE USER. Most do but I > > noticed when looking at CREATE DATABASE that it did not. Further searching > > turned up the usage in client-auth.sgml. That one is questionable since we > > are indeed talking about LOGIN roles there but we are already pointing to > > sql-alterrole instead of sql-alteruser and so changing the create variation > > to point to sql-createrole seems warranted. > > +1. > > Barring objections, I'll commit this in a bit. Pushed to master, with a small bit of word-smithing in the CREATE ROLE docs also. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists
On 10/31/17, Tom Lanewrote: > Yeah, there are quite a few unqualified casts in pg_dump, but AFAICS > all the rest are OK because the search_path is just pg_catalog. > > But I did find psql's describe.c making a similar mistake :-(. > Pushed that along with your fix. > > regards, tom lane > Oops. I missed it in "describe.c" because I grepped for exact "::name" string. Thank you very much! -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Oct 31, 2017 at 2:25 AM, Simon Riggswrote: > If there are challenges ahead, its reasonable to ask for test cases > for that now especially if you think you know what they already are. > Imagine we go forwards 2 months - if you dislike my patch when it > exists you will submit a test case showing the fault. Why not save us > all the trouble and describe that now? Test Driven Development. I already have, on several occasions now. But if you're absolutely insistent on my constructing the test case in terms of a real SQL statement, then that's what I'll do. Consider this MERGE statement, from your mock documentation: MERGE INTO wines w USING wine_stock_changes s ON s.winename = w.winename WHEN NOT MATCHED AND s.stock_delta > 0 THEN INSERT VALUES(s.winename, s.stock_delta) WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN UPDATE SET stock = w.stock + s.stock_delta ELSE DELETE; Suppose we remove the WHEN NOT MATCHED case, leaving us with: MERGE INTO wines w USING wine_stock_changes s ON s.winename = w.winename WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN UPDATE SET stock = w.stock + s.stock_delta ELSE DELETE; We now have a MERGE that will not INSERT, but will continue to UPDATE and DELETE. (It's implied that your syntax cannot do this at all, because you propose use the ON CONFLICT infrastructure, but I think we have to imagine a world in which that restriction either never existed or has subsequently been lifted.) The problem here is: Iff the first statement uses ON CONFLICT infrastructure, doesn't the absence of WHEN NOT MATCHED imply different semantics for the remaining updates and deletes in the second version of the query? You've removed what seems like a neat adjunct to the MERGE, but it actually changes everything else too when using READ COMMITTED. Isn't that pretty surprising? If you're not clear on what I mean, see my previous remarks on EPQ, live lock, and what a CONFLICT could be in READ COMMITTED mode. Concurrent activity at READ COMMITTED mode can be expected to significantly alter the outcome here. Why not just always use the behavior that the second query requires, which is very much like an UPDATE FROM with an outer join that can sometimes do deletes (and inserts)? We should always use an MVCC snapshot, and never play ON CONFLICT style games with visibility/dirty snapshots. > It's difficult to discuss anything with someone that refuses to > believe that there are acceptable ways around things. I believe there > are. Isn't that blind faith? Again, it seems like you haven't really tried to convince me. > If you can calm down the rhetoric we can work together, but if you > continue to grandstand it makes it more difficult. I'm trying to break the deadlock, by getting you to actually consider what I'm saying. I don't enjoy confrontation. Currently, it seems like you're just ignoring my objections, which you actually could fairly easily work through. That is rather frustrating. > You've said its possible another way. Show that assertion is actually > true. We're all listening, me especially, for the technical details. My proposal, if you want to call it that, has the merit of actually being how MERGE works in every other system. Both Robert and Stephen seem to be almost sold on what I suggest, so I think that I've probably already explained my position quite well. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?
Hi, On 10/31/2017 04:48 PM, Greg Stark wrote: > On 31 October 2017 at 07:05, Chris Traverswrote: >> Hi; >> >> After Andres's excellent talk at PGConf we tried benchmarking >> effective_io_concurrency on some of our servers and found that those which >> have a number of NVME storage volumes could not fill the I/O queue even at >> the maximum setting (1000). > > And was the system still i/o bound? If the cpu was 100% busy then > perhaps Postgres just can't keep up with the I/O system. It would > depend on workload though, if you start many very large sequential > scans you may be able to push the i/o system harder. > > Keep in mind effective_io_concurrency only really affects bitmap > index scans (and to a small degree index scans). It works by issuing > posix_fadvise() calls for upcoming buffers one by one. That gets > multiple spindles active but it's not really going to scale to many > thousands of prefetches (and effective_io_concurrency of 1000 > actually means 7485 prefetches). At some point those i/o are going > to start completing before Postgres even has a chance to start > processing the data. > Yeah, initiating the prefetches is not expensive, but it's not free either. So there's a trade-off between time spent on prefetching and processing the data. I believe this may be actually illustrated using Amdahl's law - the I/O is the parallel part, and processing the data is the serial part. And no matter what you do, the device only has so much bandwidth, which defines the maximum possible speedup (compared to "no prefetch" case). Furthermore, the device does not wait for all the I/O requests to be submitted - it won't wait for 1000 requests and then go "OMG! There's a lot of work to do!" It starts processing the requests as they arrive, and some of them will complete before you're done with submitting the rest, so you'll never see all the requests in the queue at once. And of course, iostat and other tools only give you "average queue length", which is mostly determined by the average throughput. In my experience (on all types of storage, including SSDs and NVMe), the performance quickly and significantly improves once you start increasing the value (say, to 8 or 16, maybe 64). And then the gains become much more modest - not because the device could not handle more, but because of the prefetch/processing ratio reached the optimal value. But all this is actually per-process, if you can run multiple backends (particularly when doing bitmap index scans), I'm sure you'll see the queues being more full. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement-level rollback
From: Simon Riggs On 14 August 2017 at 23:58, Peter Eisentrautwrote: > On 2/28/17 02:39, Tsunakawa, Takayuki wrote: >> The code for stored functions is not written yet, but I'd like your feedback for the specification and design based on the current patch. I'll add this patch to CommitFest 2017-3. > > This patch needs to be rebased for the upcoming commit fest. I'm willing to review this if the patch is going to be actively worked on. I'm very sorry I couldn't reply to your kind offer. I rebased the patch and will add it to CF 2017/11. I hope I will complete the patch in this CF. Regards Takayuki Tsunakawa stmt_rollback_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL procedures
On 31 October 2017 at 18:23, Peter Eisentrautwrote: > I've been working on SQL procedures. (Some might call them "stored > procedures", but I'm not aware of any procedures that are not stored, so > that's not a term that I'm using here.) I guess that the DO command might have a variant to allow you to execute a procedure that isn't stored? Not suggesting you implement that, just thinking about why/when the "stored" word would be appropriate. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists
Vitaly Burovoywrites: > I left an other "NULL::name AS rolname" at > src/bin/pg_dump/pg_dump.c:2978 because can't check (remoteVersion < > 9) it and it is under strict "selectSourceSchema(fout, > "pg_catalog");" schema set. Yeah, there are quite a few unqualified casts in pg_dump, but AFAICS all the rest are OK because the search_path is just pg_catalog. But I did find psql's describe.c making a similar mistake :-(. Pushed that along with your fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Tom Lane wrote: > So in a few more runs this morning using Alvaro's simplified test case, > I have seen the following behaviors not previously reported: > 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index > offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note the > same message appears in plain PageIndexTupleDelete as well. > I think we've been too hasty to assume all instances of this came out of > PageIndexTupleDeleteNoCompact. Ah, I wasn't paying close attention to the originator routine of the message, but you're right, I see this one too. > 2. Crashes in the data-insertion process, not only the process running > summarize_range: Yeah, I saw these. I was expecting it, since the two routines (brininsert and summarize_range) pretty much share the insertion protocol. > I really don't understand how any of this "let's release the buffer > lock and then take it back later" logic is supposed to work reliably. Yeah, evidently that was way too optimistic and I'll need to figure out a better mechanism to handle this. The intention was to avoid deadlocks while locking the target page for the insertion: by having both pages start unlocked we can simply lock them in block number order. If we keep the page containing the tuple locked, I don't see how to reliably avoid a deadlock while acquiring a buffer to insert the new tuple. > BTW, while I'm bitching, it seems fairly insane from a concurrency > standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace > while holding at least one and possibly two buffer locks. Shouldn't > that be done someplace else? Hmm. I spent a lot of effort (commit ccc4c074994d) to avoid leaving pages uninitialized / unrecorded in FSM. I left this on purpose on the rationale that trying to fix it would make the callsites more convoluted (the retry logic doesn't help). But as I recall this was supposed to be done only in the rare case where the buffer could not be returned to caller ... but that's not what the current code does, so there is something wrong there. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove secondary checkpoint
On Tue, Oct 31, 2017 at 2:00 PM, Simon Riggswrote: > On 30 October 2017 at 18:58, Simon Riggs wrote: >> On 30 October 2017 at 15:22, Simon Riggs wrote: >> You forgot to update this formula in xlog.c: distance = (2.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate; /* add 10% for good measure. */ distance *= 1.10; You can guess easily where the mistake is. >>> >>> Doh - fixed that before posting, so I must have sent the wrong patch. >>> >>> It's the 10%, right? ;-) >> >> So, there are 2 locations that mention 2.0 in xlog.c >> >> I had already fixed one, which is why I remembered editing it. >> >> The other one you mention has a detailed comment above it explaining >> why it should be 2.0 rather than 1.0, so I left it. >> >> Let me know if you still think it should be removed? I can see the >> argument both ways. >> Or maybe we need another patch to account for manual checkpoints. > > Revised patch to implement this Here is the comment and the formula: * The reason this calculation is done from the prior checkpoint, not the * one that just finished, is that this behaves better if some checkpoint * cycles are abnormally short, like if you perform a manual checkpoint * right after a timed one. The manual checkpoint will make almost a full * cycle's worth of WAL segments available for recycling, because the * segments from the prior's prior, fully-sized checkpoint cycle are no * longer needed. However, the next checkpoint will make only few segments * available for recycling, the ones generated between the timed * checkpoint and the manual one right after that. If at the manual * checkpoint we only retained enough segments to get us to the next timed * one, and removed the rest, then at the next checkpoint we would not * have enough segments around for recycling, to get us to the checkpoint * after that. Basing the calculations on the distance from the prior redo * pointer largely fixes that problem. */ distance = (2.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate; While the mention about a manual checkpoint happening after a timed one will cause a full range of WAL segments to be recycled, it is not actually true that segments of the prior's prior checkpoint are not needed, because with your patch the segments of the prior checkpoint are getting recycled. So it seems to me that based on that the formula ought to use 1.0 instead of 2.0... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
So in a few more runs this morning using Alvaro's simplified test case, I have seen the following behaviors not previously reported: 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note the same message appears in plain PageIndexTupleDelete as well. I think we've been too hasty to assume all instances of this came out of PageIndexTupleDeleteNoCompact. 2. Crashes in the data-insertion process, not only the process running summarize_range: #1 0x003b78a33c75 in abort () at abort.c:92 #2 0x0086e527 in errfinish (dummy=) at elog.c:557 #3 0x0086f334 in elog_finish (elevel=, fmt=) at elog.c:1378 #4 0x0075d48f in PageIndexTupleDeleteNoCompact ( rel=, buf=2772, page=0x7f0188438080 "\002", offnum=39) at bufpage.c:995 #5 0x00476fd2 in brin_doupdate (idxrel=0x7f0186633e90, pagesPerRange=1, revmap=0xba, heapBlk=2542, oldbuf=2772, oldoff=39, origtup=0x2784010, origsz=8, newtup=0x2784098, newsz=400, samepage=0 '\000') at brin_pageops.c:270 #6 0x00475430 in brininsert (idxRel=0x7f0186633e90, values=0x7ffce7b827f0, nulls=0x7ffce7b828f0 "", heaptid=0x279fb3c, heapRel=, checkUnique=, indexInfo=0x279e370) at brin.c:292 #7 0x0061da18 in ExecInsertIndexTuples (slot=0x279e638, tupleid=0x279fb3c, estate=0x279dd10, noDupErr=0 '\000', specConflict=0x0, arbiterIndexes=0x0) at execIndexing.c:387 The postmaster log for this, with even more debug logging thrown in: 2017-10-31 11:58:03.466 EDT [23506] LOG: summarize_range(brin_test_a_idx,2542,2543) created placeholder at 68,39 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:03.466 EDT [23506] LOG: brin_doupdate(brin_test_a_idx) at 68,39: getinsertbuffer returned page 70 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:03.466 EDT [23506] LOG: deleting tuple 39 (of 39) in rel brin_test_a_idx page 68 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:03.466 EDT [23506] LOG: brin_doupdate(brin_test_a_idx) at 68,39: placed at 70,1 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:03.466 EDT [23509] LOG: brin_doupdate(brin_test_a_idx) at 68,39: getinsertbuffer returned page 70 2017-10-31 11:58:03.466 EDT [23509] STATEMENT: insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200))); 2017-10-31 11:58:03.466 EDT [23509] LOG: deleting tuple 39 (of 38) in rel brin_test_a_idx page 68 2017-10-31 11:58:03.466 EDT [23509] STATEMENT: insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200))); 2017-10-31 11:58:03.466 EDT [23509] PANIC: invalid index offnum: 39 2017-10-31 11:58:03.466 EDT [23509] STATEMENT: insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200))); 2017-10-31 11:58:03.466 EDT [23506] LOG: summarize_range(brin_test_a_idx,2542,2543): update at 68,39 succeeded 2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx') 2017-10-31 11:58:04.517 EDT [23493] LOG: server process (PID 23509) was terminated by signal 6: Aborted So what evidently happened here is that brininsert decided it needed to insert into the placeholder tuple that summarize_range had just created, but by the time it got lock on the buffer, summarize_range had deleted the placeholder and instead created a new tuple on another page. HAH: I see how the connection to PageIndexTupleDeleteNoCompact's change applies. After re-locking the buffer, brin_doupdate tries to check whether the tuple's been deleted, but it needs to do it more like this, else it might be accessing off the end of the lp array: /* * Check that the old tuple wasn't updated concurrently: it might have * moved someplace else entirely ... */ - if (!ItemIdIsNormal(oldlp)) + if (oldoff > PageGetMaxOffsetNumber(oldpage) || + !ItemIdIsNormal(oldlp)) { LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK); This is a pre-existing bug, but before 24992c6d, it would only have manifested in the (very?) unusual corner case that PageIndexDeleteNoCompact was able to reset the page to empty. However, when I fix that and then run Alvaro's test case, I get ERROR: corrupted BRIN index: inconsistent range map so there's more creepy-crawlies around here somewhere. I really don't understand how any of this "let's release the buffer lock and then take it back later" logic is supposed to work reliably. BTW, while I'm bitching, it seems fairly insane from a concurrency standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace while holding at
Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists
On 10/31/17, Tom Lanewrote: > Vitaly Burovoy writes: >> Recently my colleagues found a bug. > >> - "SELECT 'bigint'::name AS >> sequence_type, " >> + "SELECT >> 'bigint'::pg_catalog.name AS sequence_type, > > Good catch, but I think we could simplify this by just omitting the cast > altogether: > > - "SELECT 'bigint'::name AS > sequence_type, " > + "SELECT 'bigint' AS > sequence_type, > > pg_dump doesn't particularly care whether the column comes back marked > as 'name' or 'text' or 'unknown'. > > regards, tom lane OK, just for convenience I'm attaching your version of the fix. I left an other "NULL::name AS rolname" at src/bin/pg_dump/pg_dump.c:2978 because can't check (remoteVersion < 9) it and it is under strict "selectSourceSchema(fout, "pg_catalog");" schema set. -- Best regards, Vitaly Burovoy 0001-Fix-dumping-schema-if-a-table-named-name-exists.ver2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?
On 31 October 2017 at 07:05, Chris Traverswrote: > Hi; > > After Andres's excellent talk at PGConf we tried benchmarking > effective_io_concurrency on some of our servers and found that those which > have a number of NVME storage volumes could not fill the I/O queue even at > the maximum setting (1000). And was the system still i/o bound? If the cpu was 100% busy then perhaps Postgres just can't keep up with the I/O system. It would depend on workload though, if you start many very large sequential scans you may be able to push the i/o system harder. Keep in mind effective_io_concurrency only really affects bitmap index scans (and to a small degree index scans). It works by issuing posix_fadvise() calls for upcoming buffers one by one. That gets multiple spindles active but it's not really going to scale to many thousands of prefetches (and effective_io_concurrency of 1000 actually means 7485 prefetches). At some point those i/o are going to start completing before Postgres even has a chance to start processing the data. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] postgres_fdw: Add support for INSERT OVERRIDING clause
It has been pointed out to me that the command deparsing in postgres_fdw does not support the INSERT OVERRIDING clause that was added in PG10. Here is a patch that seems to fix that. I don't know much about this, whether anything else needs to be added or whether there should be tests. Perhaps someone more familiar with postgres_fdw details can check it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 097ecf61a9c2740b02a21be19a92aed92388346a Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Tue, 31 Oct 2017 11:44:14 -0400 Subject: [PATCH] postgres_fdw: Add support for INSERT OVERRIDING clause --- contrib/postgres_fdw/deparse.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 2af8364010..8eb227605f 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -1573,7 +1573,21 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root, deparseColumnRef(buf, rtindex, attnum, root, false); } - appendStringInfoString(buf, ") VALUES ("); + appendStringInfoString(buf, ") "); + + switch (root->parse->override) + { + case OVERRIDING_USER_VALUE: + appendStringInfoString(buf, "OVERRIDING USER VALUE "); + break; + case OVERRIDING_SYSTEM_VALUE: + appendStringInfoString(buf, "OVERRIDING SYSTEM VALUE "); + break; + default: + break; + } + + appendStringInfoString(buf, "VALUES ("); pindex = 1; first = true; -- 2.14.3 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Consistently catch errors from Python _New() functions
While reviewing some unrelated code, I noticed that we are handling error conditions from Python API functions such as PyList_New() and PyDict_New() in pretty random ways or not at all. Here is a patch to fix that. Arguably, this is a bug fix, but I'm not sure whether it's worth meddling with this in the back branches. Maybe only the places where the errors are not caught at all should be fixed there. Comments welcome. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From eca2709cd1654e2fc37b75fa3bdfe5e199c86cb9 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Tue, 31 Oct 2017 10:49:36 -0400 Subject: [PATCH] Consistently catch errors from Python _New() functions Python Py*_New() functions can fail and return NULL in out-of-memory conditions. The previous code handled that inconsistently or not at all. This change organizes that better. If we are in a function that is called from Python, we just check for failure and return NULL ourselves, which will cause any exception information to be passed up. If we are called from PostgreSQL, we consistently create an "out of memory" error. --- contrib/hstore_plpython/hstore_plpython.c | 4 contrib/ltree_plpython/ltree_plpython.c | 4 src/pl/plpython/plpy_cursorobject.c | 19 --- src/pl/plpython/plpy_exec.c | 10 +- src/pl/plpython/plpy_main.c | 2 +- src/pl/plpython/plpy_plpymodule.c | 6 -- src/pl/plpython/plpy_procedure.c | 2 ++ src/pl/plpython/plpy_resultobject.c | 11 +++ src/pl/plpython/plpy_spi.c| 25 + src/pl/plpython/plpy_typeio.c | 4 +++- 10 files changed, 67 insertions(+), 20 deletions(-) diff --git a/contrib/hstore_plpython/hstore_plpython.c b/contrib/hstore_plpython/hstore_plpython.c index 22366bd40f..218e6612b1 100644 --- a/contrib/hstore_plpython/hstore_plpython.c +++ b/contrib/hstore_plpython/hstore_plpython.c @@ -93,6 +93,10 @@ hstore_to_plpython(PG_FUNCTION_ARGS) PyObject *dict; dict = PyDict_New(); + if (!dict) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), +errmsg("out of memory"))); for (i = 0; i < count; i++) { diff --git a/contrib/ltree_plpython/ltree_plpython.c b/contrib/ltree_plpython/ltree_plpython.c index ae9b90dd10..e88636a0a9 100644 --- a/contrib/ltree_plpython/ltree_plpython.c +++ b/contrib/ltree_plpython/ltree_plpython.c @@ -46,6 +46,10 @@ ltree_to_plpython(PG_FUNCTION_ARGS) ltree_level *curlevel; list = PyList_New(in->numlevel); + if (!list) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), +errmsg("out of memory"))); curlevel = LTREE_FIRST(in); for (i = 0; i < in->numlevel; i++) diff --git a/src/pl/plpython/plpy_cursorobject.c b/src/pl/plpython/plpy_cursorobject.c index 0108471bfe..4af7c6621f 100644 --- a/src/pl/plpython/plpy_cursorobject.c +++ b/src/pl/plpython/plpy_cursorobject.c @@ -464,15 +464,20 @@ PLy_cursor_fetch(PyObject *self, PyObject *args) Py_DECREF(ret->rows); ret->rows = PyList_New(SPI_processed); - - for (i = 0; i < SPI_processed; i++) + if (!ret->rows) { - PyObject *row = PLyDict_FromTuple(>result, - SPI_tuptable->vals[i], - SPI_tuptable->tupdesc); - - PyList_SetItem(ret->rows, i, row); + Py_DECREF(ret); + ret = NULL; } + else + for (i = 0; i < SPI_processed; i++) + { + PyObject *row = PLyDict_FromTuple(>result, + SPI_tuptable->vals[i], + SPI_tuptable->tupdesc); + + PyList_SetItem(ret->rows, i, row); + } } SPI_freetuptable(SPI_tuptable); diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c index 26f61dd0f3..7cfa146c82 100644 --- a/src/pl/plpython/plpy_exec.c +++ b/src/pl/plpython/plpy_exec.c @@ -434,6 +434,9 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
Re: [HACKERS] [PATCH] Generic type subscripting
On Sun, Oct 29, 2017 at 10:56:19PM +0100, Dmitry Dolgov wrote: > > So, here is the new version of patch that contains modifications we've > discussed, namely: > > * store oids of `parse`, `fetch` and `assign` functions > > * introduce dependencies from a data type > > * as a side effect of previous two I also eliminated some unnecessary > arguments > in `parse` function. Thank you for new version of the patch. There are some notes. Documentation - Documentation is compiled. But there are warnings about end-tags. Now it is necessary to have full named end-tags: =# make -C doc/src/sgml check /usr/sbin/onsgmls:json.sgml:574:20:W: empty end-tag ... Documentation is out of date: - catalogs.sgml needs to add information about additional pg_type fields - create_type.sgml needs information about subscripting_parse, subscripting_assign and subscripting_fetch options - xsubscripting.sgml is out of date Code I think it is necessary to check Oids of subscripting_parse, subscripting_assign, subscripting_fetch. Maybe within TypeCreate(). Otherwise next cases possible: =# CREATE TYPE custom ( internallength = 8, input = custom_in, output = custom_out, subscripting_parse = custom_subscripting_parse); =# CREATE TYPE custom ( internallength = 8, input = custom_in, output = custom_out, subscripting_fetch = custom_subscripting_fetch); Are all subscripting_* fields mandatory? If so if user provided at least one of them then all fields should be provided. Should all types have support assigning via subscript? If not then subscripting_assign parameter is optional. > +Datum > +jsonb_subscript_parse(PG_FUNCTION_ARGS) > +{ > + boolisAssignment = PG_GETARG_BOOL(0); and > +Datum > +custom_subscripting_parse(PG_FUNCTION_ARGS) > +{ > + boolisAssignment = PG_GETARG_BOOL(0); Here isAssignment is unused variable, so it could be removed. > + > + scratch->d.sbsref.eval_finfo = eval_finfo; > + scratch->d.sbsref.nested_finfo = nested_finfo; > + As I mentioned earlier we need assigning eval_finfo and nested_finfo only for EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN and EEOP_SBSREF_FETCH steps. Also they should be assigned before calling ExprEvalPushStep(), not after. Otherwise some bugs may appear in future. > - ArrayRef *aref = makeNode(ArrayRef); > + NodeTag sbstag = nodeTag(src_expr); > + Size nodeSize = sizeof(SubscriptingRef); > + SubscriptingRef *sbsref = (SubscriptingRef *) newNode(nodeSize, > sbstag); Is there necessity to use newNode() instead using makeNode(). The previous code was shorter. There is no changes in execnodes.h except removed line. So I think execnodes.h could be removed from the patch. > > I'm going to make few more improvements, but in the meantime I hope we can > continue to review the patch. I will wait. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists
Vitaly Burovoywrites: > Recently my colleagues found a bug. > - "SELECT 'bigint'::name AS > sequence_type, " > + "SELECT > 'bigint'::pg_catalog.name AS sequence_type, Good catch, but I think we could simplify this by just omitting the cast altogether: - "SELECT 'bigint'::name AS sequence_type, " + "SELECT 'bigint' AS sequence_type, pg_dump doesn't particularly care whether the column comes back marked as 'name' or 'text' or 'unknown'. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add some const decorations to prototypes
Peter Eisentrautwrites: > Here is a patch that adds const decorations to many char * arguments in > functions. It should have no impact otherwise; there are very few code > changes caused by it. +1 in general ... > Some functions have a strtol()-like behavior > where they take in a const char * and return a pointer into that as > another argument. In those cases, I added a cast or two. ... but I'm not sure that it's an improvement in cases where you have to cast away the const somewhere else. I realize that strtol has an ancient pedigree, but I do not think it's very good design. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add some const decorations to prototypes
Here is a patch that adds const decorations to many char * arguments in functions. It should have no impact otherwise; there are very few code changes caused by it. Some functions have a strtol()-like behavior where they take in a const char * and return a pointer into that as another argument. In those cases, I added a cast or two. Generally, I find these const decorations useful as easy function documentation. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From a8163e84c83887e4a3b81642c137995932701bb5 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Tue, 31 Oct 2017 10:34:31 -0400 Subject: [PATCH] Add some const decorations to prototypes --- contrib/cube/cubeparse.y | 12 contrib/dict_xsyn/dict_xsyn.c | 2 +- contrib/fuzzystrmatch/dmetaphone.c | 4 +-- contrib/pgcrypto/pgcrypto.c| 4 +-- contrib/seg/seg.c | 4 +-- contrib/seg/segdata.h | 2 +- contrib/seg/segparse.y | 4 +-- contrib/unaccent/unaccent.c| 2 +- contrib/uuid-ossp/uuid-ossp.c | 2 +- src/backend/access/common/reloptions.c | 12 src/backend/access/gist/gistbuild.c| 2 +- src/backend/access/transam/xact.c | 6 ++-- src/backend/access/transam/xlogarchive.c | 4 +-- src/backend/catalog/heap.c | 10 +++ src/backend/commands/comment.c | 4 +-- src/backend/commands/event_trigger.c | 4 +-- src/backend/commands/extension.c | 4 +-- src/backend/commands/indexcmds.c | 8 +++--- src/backend/commands/opclasscmds.c | 2 +- src/backend/commands/tablecmds.c | 16 +-- src/backend/commands/typecmds.c| 6 ++-- src/backend/commands/view.c| 2 +- src/backend/libpq/auth.c | 24 src/backend/libpq/hba.c| 6 ++-- src/backend/parser/parse_expr.c| 2 +- src/backend/parser/parse_func.c| 4 +-- src/backend/parser/parse_relation.c| 8 +++--- src/backend/parser/parse_target.c | 2 +- src/backend/port/dynloader/darwin.c| 8 +++--- src/backend/port/dynloader/darwin.h| 4 +-- src/backend/port/dynloader/hpux.c | 4 +-- src/backend/port/dynloader/hpux.h | 4 +-- src/backend/port/dynloader/linux.c | 4 +-- src/backend/postmaster/postmaster.c| 2 +- src/backend/replication/basebackup.c | 8 +++--- src/backend/rewrite/rewriteDefine.c| 4 +-- src/backend/snowball/dict_snowball.c | 2 +- src/backend/storage/lmgr/lwlock.c | 8 +++--- src/backend/tsearch/dict_thesaurus.c | 2 +- src/backend/tsearch/spell.c| 4 +-- src/backend/utils/adt/float.c | 16 +-- src/backend/utils/adt/genfile.c| 2 +- src/backend/utils/adt/ruleutils.c | 4 +-- src/backend/utils/adt/varlena.c| 2 +- src/backend/utils/adt/xml.c| 32 +++--- src/bin/initdb/initdb.c| 12 src/bin/pg_dump/pg_backup_db.c | 2 +- src/bin/pg_dump/pg_backup_db.h | 2 +- src/bin/pg_rewind/fetch.c | 2 +- src/bin/pg_rewind/fetch.h | 2 +- src/bin/pg_upgrade/option.c| 6 ++-- src/bin/pg_upgrade/pg_upgrade.c| 4 +-- src/bin/pg_waldump/pg_waldump.c| 2 +- src/bin/pgbench/pgbench.c | 4 +-- src/include/access/gist_private.h | 2 +- src/include/access/reloptions.h| 14 +- src/include/access/xact.h | 6 ++-- src/include/access/xlog_internal.h | 4 +-- src/include/catalog/heap.h | 2 +- src/include/commands/comment.h | 4 +-- src/include/commands/defrem.h | 4 +-- src/include/commands/typecmds.h| 2 +- src/include/commands/view.h| 2 +- src/include/executor/tablefunc.h | 8 +++--- src/include/parser/parse_relation.h| 6 ++-- src/include/parser/parse_target.h | 2 +- src/include/postmaster/bgworker.h | 2 +- src/include/rewrite/rewriteDefine.h| 2 +-
Re: [HACKERS] Query regarding permission on table_column%type access
Stephen Frostwrites: > * Neha Sharma (neha.sha...@enterprisedb.com) wrote: >> I have observed that even if the user does not have permission on a >> table(created in by some other user),the function parameter still can have >> a parameter of that table_column%type. > This is because the creation of the table also creates a type of the > same name and the type's permissions are independent of the table's. I > imagine that you could REVOKE USAGE ON TYPE from the type and deny > access to that type if you wanted to. Right. (I checked, seems to work as expected.) > I'm not sure that we should change the REVOKE on the table-level to also > mean to REVOKE access to the type automatically (and what happens if you > GRANT the access back for the table..? It seems pretty silly for privileges on table rowtypes to behave differently from those on other rowtypes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove secondary checkpoint
On 30 October 2017 at 18:58, Simon Riggswrote: > On 30 October 2017 at 15:22, Simon Riggs wrote: > >>> You forgot to update this formula in xlog.c: >>> distance = (2.0 + CheckPointCompletionTarget) * >>> CheckPointDistanceEstimate; >>> /* add 10% for good measure. */ >>> distance *= 1.10; >>> You can guess easily where the mistake is. >> >> Doh - fixed that before posting, so I must have sent the wrong patch. >> >> It's the 10%, right? ;-) > > So, there are 2 locations that mention 2.0 in xlog.c > > I had already fixed one, which is why I remembered editing it. > > The other one you mention has a detailed comment above it explaining > why it should be 2.0 rather than 1.0, so I left it. > > Let me know if you still think it should be removed? I can see the > argument both ways. > > Or maybe we need another patch to account for manual checkpoints. Revised patch to implement this -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services remove_secondary_checkpoint.v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
Michael Paquierwrites: > On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane wrote: >> Yeah, we're still missing an understanding of why we didn't see it >> before; the inadequate locking was surely there before. > Because 24992c6d has added a check on the offset number by using > PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks > tighter, no? No, I don't see how it's tighter. The old code matched supplied offnum(s) against the indexes of not-unused items, and then after that loop it complained if they weren't all matched. So it should also have failed, albeit with a different error message, if it were passed an offnum corresponding to a no-longer-live tuple. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 31 October 2017 at 12:56, Stephen Frostwrote: > Simon, > > * Simon Riggs (si...@2ndquadrant.com) wrote: >> On 30 October 2017 at 19:55, Stephen Frost wrote: >> > I don't think MERGE should be radically different from other database >> > systems and just syntax sugar over a capability we have. >> >> I've proposed a SQL Standard compliant implementation that would do >> much more than be new syntax over what we already have. >> >> So these two claims aren't accurate: "radical difference" and "syntax >> sugar over a capability we have". > > Based on the discussion so far, those are the conclusions I've come to. > Saying they're isn't accurate without providing anything further isn't > likely to be helpful. I'm trying to be clear, accurate and non-confrontational. I appreciate those are your conclusions, which is why I need to be clear that the proposal has been misunderstood on those stated points. What else would you like me to add to be helpful? I will try... I've spent weeks looking at other implementations and combing the SQLStandard with a fine tooth comb to see what is possible and what is not. My proposal shows a new way forwards and yet follows the standard in horrible detail. It has taken me nearly 2 months to write the doc page submitted here. It is not simply new syntax stretched over existing capability. The full syntax of MERGE is quite powerful and will take some months to make work, even with my proposal. I'm not sure what else to say, but I am happy to answer specific technical questions that have full context to allow a rational discussion. >> > Time changes >> > many things, but I don't think anything's changed in this from the prior >> > discussions about it. >> >> My proposal is new, that is what has changed. > > I'm happy to admit that I've missed something in the discussion, but > from what I read the difference appeared to be primairly that you're > proposing it, and doing so a couple years later. > >> At this stage, general opinions can be misleading. > > I'd certainly love to see a MERGE capability that meets the standard and > works in much the same way from a user's perspective as the other RDBMS' > which already implement it. The standard explains how it should work, with the proviso that the standard allows us to define concurrent behavior. Serge has explained that he sees that my proposal covers the main use cases. I don't yet see how to cover all, but I'm looking to do the most important use cases first and other things later. > From prior discussions with Peter on > exactly that subject, I'm also convinced that having that would be a > largely independent piece of work from the INSERT .. ON CONFLICT > capability that we have (just as MERGE is distinct from similar UPSERT > capabilities in other RDBMSs). > > The goal here is really just to avoid time wasted developing MERGE based > on top of the INSERT .. ON CONFLICT system only to have it be rejected > later because multiple other committers and major contributors have said > that they don't agree with that approach. If, given all of this > discussion, you don't feel that's a high risk with your approach then by > all means continue on with what you're thinking and we can review the > patch once it's posted. It is certainly a risk and I don't want to waste time either. I am happy to consider any complete proposal for how to proceed, including details and/or code, but I will be prioritising things to ensure that we have a candidate feature for PG11 from at least one author, rather than a research project for PG12+. The key point here is concurrency. I have said that there is an intermediate step that is useful and achievable, but does not cover every case. Serge has also stated this - who it should be noted is the only person in this discussion that has already written the MERGE statement in SQL, for DB2, so I give more weight to his technical opinion than I do to others, at this time. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make async slave to wait for lsn to be replayed
On Mon, Oct 30, 2017 at 7:25 PM, Ivan Kartyshovwrote: > It sounds reasonable. I can offer the following version. > > WAIT LSN lsn_number; > WAIT LSN lsn_number TIMEOUT delay; > WAIT LSN lsn_number INFINITE; > WAIT LSN lsn_number NOWAIT; > > > WAIT [token] wait_value [option]; > > token - [LSN, TIME | TIMESTAMP | CSN | XID] > option - [TIMEOUT | INFINITE | NOWAIT] > > Ready to listen to your suggestions. Making the interface more specific about the mechanism is not what I had in mind, quite the opposite. I would like to see the interface describe the desired effect of the wait. Stepping back for a while, from what I understand the reason we want to waiting is to prevent observation of database state going backwards. To limit the amount of waiting needed we tell the database what we have observed. For example "I just observed my transaction commit", or "the last time I observed state was X", and then have the database provide us with a snapshot that is causally dependent on those states. This does not give us linearizability, for that we still need at the very least serializable transactions on standby. But it seems to provide a form of sequential consistency, which (if it can be proved to hold) makes reasoning about concurrency a lot nicer. For lack of a better proposal I would like something along the lines of: WAIT FOR state_id[, state_id] [ OPTIONS (..)] And to get the tokens maybe a function pg_last_commit_state(). Optionally, to provide read-to-read causality, pg_snapshot_state() could return for example replay_lsn at the start of the current transaction. This makes sure that things don't just appear and disappear when load balancing across many standby servers. WAIT FOR semantics is to ensure that next snapshot is causally dependent (happens after) each of the specified observed states. The state_id could simply be a LSN, or to allow for future expansion something like 'lsn:/1234'. Example extension would be to allow for waiting on xids. On master that would be just a ShareLock on the transactionid. On standby it would wait for the commit or rollback record for that transaction to be replayed. Robert made a good point that people will still rely on the token being an LSN, but perhaps they will be slightly less angry when we explicitly tell them that this might change in the future. Regards, Ants Aasma [1] https://www.postgresql.org/docs/devel/static/functions-admin.html#functions-snapshot-synchronization -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: restrict pg_rewind to whitelisted directories
On Mon, Oct 30, 2017 at 6:44 PM, Chris Traverswrote: > The attached patch is cleaned up and filed for the commit fest this next > month: It's generally better to post the patch on the same message as the discussion thread, or at least link back to the discussion thread from the new one. Otherwise people may have trouble understanding the history. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Query regarding permission on table_column%type access
Greetings, * Neha Sharma (neha.sha...@enterprisedb.com) wrote: > I have observed that even if the user does not have permission on a > table(created in by some other user),the function parameter still can have > a parameter of that table_column%type. This is because the creation of the table also creates a type of the same name and the type's permissions are independent of the table's. I imagine that you could REVOKE USAGE ON TYPE from the type and deny access to that type if you wanted to. I'm not sure that we should change the REVOKE on the table-level to also mean to REVOKE access to the type automatically (and what happens if you GRANT the access back for the table..? Would we need to track that dependency?) considering that's been the behavior for a very long time. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Adding column_constraint description in ALTER TABLE synopsis
Greetings, * Lætitia Avrot (laetitia.av...@gmail.com) wrote: > As Amit Langot pointed out, the column_constraint definition is missing > whereas it is used in ALTER TABLE synopsis. It can be easily found in the > CREATE TABLE synopsis, but it's not very user friendly. Agreed. > You will find enclosed my patch. Thanks, this looks pretty reasonable, but did you happen to look for any other keywords in the ALTER TABLE that should really be in ALTER TABLE also? I'm specifically looking at, at least, partition_bound_spec. Maybe you could propose an updated patch which addresses that also, and any other cases you find? Thanks again! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Remove inbound links to sql-createuser
David, * David G. Johnston (david.g.johns...@gmail.com) wrote: > Since CREATE USER is officially an alias for CREATE ROLE other parts of the > documentation should point to CREATE ROLE, not CREATE USER. Most do but I > noticed when looking at CREATE DATABASE that it did not. Further searching > turned up the usage in client-auth.sgml. That one is questionable since we > are indeed talking about LOGIN roles there but we are already pointing to > sql-alterrole instead of sql-alteruser and so changing the create variation > to point to sql-createrole seems warranted. +1. Barring objections, I'll commit this in a bit. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Deadlock in ALTER SUBSCRIPTION REFRESH PUBLICATION
On Tue, Oct 24, 2017 at 7:13 PM, Konstantin Knizhnikwrote: > Parallel execution of ALTER SUBSCRIPTION REFRESH PUBLICATION at several > nodes may cause deadlock: > > knizhnik 1480 0.0 0.1 1417532 16496 ? Ss 20:01 0:00 postgres: > bgworker: logical replication worker for subscription 16589 sync 16720 > waiting > knizhnik 1481 0.0 0.1 1417668 17668 ? Ss 20:01 0:00 postgres: > knizhnik postgres 127.0.0.1(36378) idle > knizhnik 1484 0.0 0.1 1406952 16676 ? Ss 20:01 0:00 postgres: > knizhnik postgres 127.0.0.1(56194) ALTER SUBSCRIPTION waiting > knizhnik 1486 0.0 0.1 1410040 21268 ? Ss 20:01 0:00 postgres: > wal sender process knizhnik 127.0.0.1(36386) idle > knizhnik 1487 0.0 0.1 1417532 16736 ? Ss 20:01 0:00 postgres: > bgworker: logical replication worker for subscription 16589 sync 16774 > knizhnik 1489 0.0 0.1 1410040 21336 ? Ss 20:01 0:00 postgres: > wal sender process knizhnik 127.0.0.1(36388) idle > knizhnik 1510 0.0 0.1 1406952 16820 ? Ss 20:01 0:00 postgres: > knizhnik postgres 127.0.0.1(56228) ALTER SUBSCRIPTION waiting > knizhnik 1530 0.0 0.1 1406952 16736 ? Ss 20:01 0:00 postgres: > knizhnik postgres 127.0.0.1(56260) ALTER SUBSCRIPTION waiting > knizhnik 1534 0.0 0.0 1417504 14360 ? Ss 20:01 0:00 postgres: > bgworker: logical replication worker for subscription 16590 > knizhnik 1537 0.0 0.1 1409164 16920 ? Ss 20:01 0:00 postgres: > wal sender process knizhnik 127.0.0.1(45054) idle > knizhnik 1545 0.0 0.0 1417344 14576 ? Ss 20:01 0:00 postgres: > bgworker: logical replication worker for subscription 16592 > knizhnik 1546 0.0 0.1 1409168 16588 ? Ss 20:01 0:00 postgres: > wal sender process knizhnik 127.0.0.1(56274) idle > knizhnik 1547 0.0 0.0 1417348 14332 ? Ss 20:01 0:00 postgres: > bgworker: logical replication worker for subscription 16592 sync 16606 > knizhnik 1549 0.0 0.0 1409004 14512 ? Ss 20:01 0:00 postgres: > wal sender process knizhnik 127.0.0.1(56276) idle in transaction waiting > knizhnik 1553 0.0 0.0 1417348 14540 ? Ss 20:01 0:00 postgres: > bgworker: logical replication worker for subscription 16592 > knizhnik 1554 0.0 0.1 1409168 16596 ? Ss 20:01 0:00 postgres: > wal sender process knizhnik 127.0.0.1(56280) idle > knizhnik 1555 0.0 0.0 1417348 14332 ? Ss 20:01 0:00 postgres: > bgworker: logical replication worker for subscription 16592 sync 16660 > knizhnik 1556 0.0 0.0 1409004 14520 ? Ss 20:01 0:00 postgres: > wal sender process knizhnik 127.0.0.1(56282) idle in transaction waiting > knizhnik 1558 0.0 0.0 1417348 14460 ? Ss 20:01 0:00 postgres: > bgworker: logical replication worker for subscription 16592 sync 16696 > knizhnik 1560 0.0 0.0 1409004 14516 ? Ss 20:01 0:00 postgres: > wal sender process knizhnik 127.0.0.1(56286) idle in transaction waiting > knizhnik 1562 0.0 0.0 1417348 14084 ? Ss 20:01 0:00 postgres: > bgworker: logical replication worker for subscription 16592 sync 16732 > knizhnik 1567 0.0 0.0 1409004 14516 ? Ss 20:01 0:00 postgres: > wal sender process knizhnik 127.0.0.1(56288) idle in transaction waiting > > knizhnik@knizhnik:~$ gdb -p 1556 > (gdb) bt > #0 0x7f7b991569b3 in __epoll_wait_nocancel () at > ../sysdeps/unix/syscall-template.S:84 > #1 0x0087185a in WaitEventSetWaitBlock (set=0x246d1e8, > cur_timeout=-1, occurred_events=0x73a43580, nevents=1) at latch.c:1048 > #2 0x00871716 in WaitEventSetWait (set=0x246d1e8, timeout=-1, > occurred_events=0x73a43580, nevents=1, wait_event_info=50331652) at > latch.c:1000 > #3 0x00870e6c in WaitLatchOrSocket (latch=0x7f7b975daba4, > wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:385 > #4 0x00870d3e in WaitLatch (latch=0x7f7b975daba4, wakeEvents=1, > timeout=0, wait_event_info=50331652) at latch.c:339 > #5 0x0088995c in ProcSleep (locallock=0x24764d0, > lockMethodTable=0xc17420 ) at proc.c:1255 > #6 0x008828cb in WaitOnLock (locallock=0x24764d0, owner=0x2517c78) > at lock.c:1702 > #7 0x0088157e in LockAcquireExtended (locktag=0x73a439a0, > lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 > '\001') at lock.c:998 > #8 0x00880ba8 in LockAcquire (locktag=0x73a439a0, lockmode=5, > sessionLock=0 '\000', dontWait=0 '\000') at lock.c:688 > #9 0x0087f8dc in XactLockTableWait (xid=624, rel=0x0, ctid=0x0, > oper=XLTW_None) at lmgr.c:587 > #10 0x0082f524 in SnapBuildWaitSnapshot (running=0x246d1b0, > cutoff=632) at snapbuild.c:1400 > #11 0x0082f2f6 in SnapBuildFindSnapshot (builder=0x254b550, > lsn=24410008, running=0x246d1b0) at snapbuild.c:1311 > #12 0x0082ee67 in SnapBuildProcessRunningXacts (builder=0x254b550, > lsn=24410008, running=0x246d1b0) at
[HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists
Hello, hackers! Recently my colleagues found a bug. They could not migrate from PG9.5 to PG10 due to error during pg_upgrage (the same as in the "reproduce" part below). An investigation showed there is a table "name" in the same schema where the dumped sequence is located and the PG tries to unpack the literal "bigint" as a composite type and fails. The bug was introduced by two commits, one of them changes search_path, the second one introduces the "'bigint'::name" without specifying schema: da4d1c0c15ab9afdfeee8bad9a1a9989b6bd59b5 pg_dump: Fix some schema issues when dumping sequences 2ea5b06c7a7056dca0af1610aadebe608fbcca08 Add CREATE SEQUENCE AS clause Steps to reproduce: $ psql -h pre-10 postgres Password for user postgres: psql (11devel, server 9.5.8) Type "help" for help. postgres=# create table name(id serial); CREATE TABLE postgres=# \q $ pg_dump -h pre-10 postgres pg_dump: [archiver (db)] query failed: ERROR: malformed record literal: "bigint" LINE 1: SELECT 'bigint'::name AS sequence_type, start_value, increme... ^ DETAIL: Missing left parenthesis. pg_dump: [archiver (db)] query was: SELECT 'bigint'::name AS sequence_type, start_value, increment_by, max_value, min_value, cache_value, is_cycled FROM name_id_seq I've implemented a little fix (attached), don't think there is something to be written to docs and tests. -- Best regards, Vitaly Burovoy 0001-Fix-dumping-schema-if-a-table-named-name-exists.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon, * Simon Riggs (si...@2ndquadrant.com) wrote: > On 30 October 2017 at 19:55, Stephen Frostwrote: > > I don't think MERGE should be radically different from other database > > systems and just syntax sugar over a capability we have. > > I've proposed a SQL Standard compliant implementation that would do > much more than be new syntax over what we already have. > > So these two claims aren't accurate: "radical difference" and "syntax > sugar over a capability we have". Based on the discussion so far, those are the conclusions I've come to. Saying they're isn't accurate without providing anything further isn't likely to be helpful. > > Time changes > > many things, but I don't think anything's changed in this from the prior > > discussions about it. > > My proposal is new, that is what has changed. I'm happy to admit that I've missed something in the discussion, but from what I read the difference appeared to be primairly that you're proposing it, and doing so a couple years later. > At this stage, general opinions can be misleading. I'd certainly love to see a MERGE capability that meets the standard and works in much the same way from a user's perspective as the other RDBMS' which already implement it. From prior discussions with Peter on exactly that subject, I'm also convinced that having that would be a largely independent piece of work from the INSERT .. ON CONFLICT capability that we have (just as MERGE is distinct from similar UPSERT capabilities in other RDBMSs). The goal here is really just to avoid time wasted developing MERGE based on top of the INSERT .. ON CONFLICT system only to have it be rejected later because multiple other committers and major contributors have said that they don't agree with that approach. If, given all of this discussion, you don't feel that's a high risk with your approach then by all means continue on with what you're thinking and we can review the patch once it's posted. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] MERGE SQL Statement for PG11
On 30 October 2017 at 19:55, Stephen Frostwrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs wrote: >> > Nothing I am proposing blocks later work. >> >> That's not really true. Nobody's going to be happy if MERGE has one >> behavior in one set of cases and an astonishingly different behavior >> in another set of cases. If you adopt a behavior for certain cases >> that can't be extended to other cases, then you're blocking a >> general-purpose MERGE. >> >> And, indeed, it seems that you're proposing an implementation that >> adds no new functionality, just syntax compatibility. Do we really >> want or need two syntaxes for the same thing in core? I kinda think >> Peter might have the right idea here. Under his proposal, we'd be >> getting something that is, in a way, new. > > +1. > > I don't think MERGE should be radically different from other database > systems and just syntax sugar over a capability we have. I've proposed a SQL Standard compliant implementation that would do much more than be new syntax over what we already have. So these two claims aren't accurate: "radical difference" and "syntax sugar over a capability we have". > Time changes > many things, but I don't think anything's changed in this from the prior > discussions about it. My proposal is new, that is what has changed. At this stage, general opinions can be misleading. Hi ho, hi ho. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Comment typo
On 2017/10/30 22:39, Magnus Hagander wrote: On Mon, Oct 30, 2017 at 3:49 AM, Etsuro Fujita> wrote: Here is a patch to fix a typo in a comment in partition.c: s/specificiation/specification/. Applied, thanks. Thank you! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Can I add my 2c worth, as someone without a horse in the race, as it were, in the hope that telling me how I've got this wrong might clarify the argument a bit (or at least you can all start shouting at me rather than each other :) ) The point of merge is to allow you to choose to either INSERT or UPDATE (or indeed DELETE) records based on existing state, yes? That state is whatever the state of the system at the start of the transaction? If I understand correctly, the only time when this would be problematic is if you try to insert a record into a table which would not allow that INSERT because another transaction has performed an INSERT by the time the COMMIT happens, and where that new record would have changed the state of the MERGE clause, yes? Isn't the only reason this would fail if there is a unique constraint on that table? Yes, you could end up INSERTing values from the merge when another transaction has INSERTed another, but (again, unless I've misunderstood) there's nothing in the spec that says that shouldn't happen; meanwhile for those tables that do require unique values you can use the UPSERT mechanism, no? Geoff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Query regarding permission on table_column%type access
Hi, I have observed that even if the user does not have permission on a table(created in by some other user),the function parameter still can have a parameter of that table_column%type. Scenario: postgres=# create user u1 with login ; CREATE ROLE postgres=# create user u2 with login ; CREATE ROLE postgres=# \c - u1 You are now connected to database "postgres" as user "u1". postgres=> create table t1(a int); CREATE TABLE postgres=> revoke ALL on t1 from u2; REVOKE postgres=> \c - u2 You are now connected to database "postgres" as user "u2". postgres=> create table t2(a int); CREATE TABLE postgres=> create or replace function foo(x t1.a%type) returns int as $$ BEGIN return x + 1; END; $$ LANGUAGE plpgsql; NOTICE: type reference t1.a%TYPE converted to integer CREATE FUNCTION postgres=> select foo(1); foo - 2 (1 row) postgres=> select * from t1; ERROR: permission denied for relation t1 Is this an expected behaviour? What if the user does not wants the object type to be accessed across? Thanks. -- Regards, Neha Sharma
Re: [HACKERS] parallelize queries containing initplans
On 10/30/2017 01:36 PM, tushar wrote: On 10/30/2017 09:02 AM, Amit Kapila wrote: Thanks a lot Tushar for testing this patch. In the latest patch, I have just rebased some comments, there is no code change, so I don't expect any change in behavior, but feel free to test it once again. Thanks Amit. Sure. I have done re-verification ,Everything looks good. -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] postgres.exe crashes with access violation on Windows while starting up
On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayukiwrote: > From: pgsql-hackers-ow...@postgresql.org >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier >> So you are basically ready to lose any message that could be pushed >> here if there is no memory context? That does not sound like a good >> trade-off to me. A static buffer does not look like the best idea >> either to not truncate message, so couldn't we envisage to just use >> malloc? pgwin32_message_to_UTF16() is called in two places in elog.c, >> and there is a full control on the error code paths. > > Thank you for reviewing a rare bug fix on Windows that most people wouldn't > be interested in. Yeah, it may take a while until a committer gets interested I am afraid. See my bug about pg_basebackup on Windows with path names.. > When CurrentMemoryContext is NULL, the message is logged with > ReportEventA(). This is similar to write_console(). My point is that as Postgres is running as a service, isn't it wrong to write a message to stderr as a fallback if the memory context is not set? You would lose a message. It seems to me that for an operation that can happen at a low-level like the postmaster startup, we should really use a low-level operation as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More stats about skipped vacuums
This is just a repost as a (true) new thread. At Mon, 30 Oct 2017 20:57:50 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHIwrote in <20171030.205750.246076862.horiguchi.kyot...@lab.ntt.co.jp> > At Fri, 20 Oct 2017 19:15:16 +0900, Masahiko Sawada > wrote in
Re: [HACKERS] show "aggressive" or not in autovacuum logs
At Thu, 26 Oct 2017 12:42:23 +0200, Robert Haaswrote in > On Thu, Oct 26, 2017 at 10:18 AM, Kyotaro HORIGUCHI > wrote: > > Thank you. I forgot that point. Changed them so that the messages > > are detected as msgids. > > Committed, changing "aggressive" to "aggressively" in one place for > correct English. Thank you for commiting! regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protect syscache from bloating with negative cache entries
This is a rebased version of the patch. At Fri, 17 Mar 2017 14:23:13 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHIwrote in <20170317.142313.232290068.horiguchi.kyot...@lab.ntt.co.jp> > At Tue, 7 Mar 2017 19:23:14 -0800, David Steele wrote > in <3b7b7f90-db46-8c37-c4f7-443330c3a...@pgmasters.net> > > On 3/3/17 4:54 PM, David Steele wrote: > > > > > On 2/1/17 1:25 AM, Kyotaro HORIGUCHI wrote: > > >> Hello, thank you for moving this to the next CF. > > >> > > >> At Wed, 1 Feb 2017 13:09:51 +0900, Michael Paquier > > >> wrote in > > >> > > >>> On Tue, Jan 24, 2017 at 4:58 PM, Kyotaro HORIGUCHI > > >>> wrote: > > Six new syscaches in 665d1fa was conflicted and 3-way merge > > worked correctly. The new syscaches don't seem to be targets of > > this patch. > > >>> To be honest, I am not completely sure what to think about this patch. > > >>> Moved to next CF as there is a new version, and no new reviews to make > > >>> the discussion perhaps move on. > > >> I'm thinking the following is the status of this topic. > > >> > > >> - The patch stll is not getting conflicted. > > >> > > >> - This is not a hollistic measure for memory leak but surely > > >>saves some existing cases. > > >> > > >> - Shared catcache is another discussion (and won't really > > >>proposed in a short time due to the issue on locking.) > > >> > > >> - As I mentioned, a patch that caps the number of negative > > >>entries is avaiable (in first-created - first-delete manner) > > >>but it is having a loose end of how to determine the > > >>limitation. > > > While preventing bloat in the syscache is a worthwhile goal, it > > > appears > > > there are a number of loose ends here and a new patch has not been > > > provided. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 9f2c81dbc9bc344cafd6995dfc5969d55a8457d9 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Mon, 28 Aug 2017 11:36:21 +0900 Subject: [PATCH 1/2] Cleanup negative cache of pg_statistic when dropping a relation. Accessing columns that don't have statistics leaves negative entries in catcache for pg_statstic, but there's no chance to remove them. Especially when repeatedly creating then dropping temporary tables bloats catcache so much that memory pressure becomes significant. This patch removes negative entries in STATRELATTINH, ATTNAME and ATTNUM when corresponding relation is dropped. --- src/backend/utils/cache/catcache.c | 58 ++- src/backend/utils/cache/syscache.c | 302 +++-- src/include/utils/catcache.h | 3 + src/include/utils/syscache.h | 2 + 4 files changed, 282 insertions(+), 83 deletions(-) diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 95a0742..bd303f3 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -423,10 +423,11 @@ CatCachePrintStats(int code, Datum arg) if (cache->cc_ntup == 0 && cache->cc_searches == 0) continue; /* don't print unused caches */ - elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %ld lsrch, %ld lhits", + elog(DEBUG2, "catcache %s/%u: %d tup, %d negtup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %ld lsrch, %ld lhits", cache->cc_relname, cache->cc_indexoid, cache->cc_ntup, + cache->cc_nnegtup, cache->cc_searches, cache->cc_hits, cache->cc_neg_hits, @@ -495,8 +496,11 @@ CatCacheRemoveCTup(CatCache *cache, CatCTup *ct) * point into tuple, allocated together with the CatCTup. */ if (ct->negative) + { CatCacheFreeKeys(cache->cc_tupdesc, cache->cc_nkeys, cache->cc_keyno, ct->keys); + --cache->cc_nnegtup; + } pfree(ct); @@ -697,6 +701,51 @@ ResetCatalogCache(CatCache *cache) } /* + * CleanupCatCacheNegEntries + * + * Remove negative cache tuples matching a partial key. + * + */ +void +CleanupCatCacheNegEntries(CatCache *cache, ScanKeyData *skey) +{ + int i; + + /* If this cache has no negative entries, nothing to do */ + if (cache->cc_nnegtup == 0) + return; + + /* searching with a partial key means scanning the whole cache */ + for (i = 0; i < cache->cc_nbuckets; i++) + { + dlist_head *bucket = >cc_bucket[i]; + dlist_mutable_iter iter; + + dlist_foreach_modify(iter, bucket) + { + const CCFastEqualFN *cc_fastequal = cache->cc_fastequal; + CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur); + int oid_attnum = skey->sk_attno - 1; + + if (!ct->negative) +continue; + + /* Compare the OIDs */ + if (!(cc_fastequal[oid_attnum])(ct->keys[oid_attnum], + skey[0].sk_argument)) +continue; + + /* + * the negative cache entries can no longer be
Re: [HACKERS] Restricting maximum keep segments by repslots
Hello, this is a rebased version. It gets a change of the meaning of monitoring value along with rebasing. In previous version, the "live" column mysteriously predicts the necessary segments will be kept or lost by the next checkpoint and the "distance" offered a still more mysterious value. In this version the meaning of the two columns became clear and informative. pg_replication_slots - live: true the slot have not lost necessary segments. - distance: how many bytes LSN can advance before the margin defined by max_slot_wal_keep_size (and wal_keep_segments) is exhasuted, or how many bytes this slot have lost xlog from restart_lsn. There is a case where live = t and distance = 0. The slot is currently having all the necessary segments but will start to lose them at most two checkpoint passes. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 57eaa2b878d30bfcebb093cca0e772fe7a9bff0e Mon Sep 17 00:00:00 2001 From: Kyotaro HoriguchiDate: Tue, 28 Feb 2017 11:39:48 +0900 Subject: [PATCH 1/2] Add WAL releaf vent for replication slots Adds a capability to limit the number of segments kept by replication slots by a GUC variable. --- src/backend/access/transam/xlog.c | 24 src/backend/utils/misc/guc.c | 11 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/access/xlog.h | 1 + 4 files changed, 37 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index dd028a1..f79cefb 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -105,6 +105,7 @@ int wal_level = WAL_LEVEL_MINIMAL; int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ int wal_retrieve_retry_interval = 5000; +int max_slot_wal_keep_size_mb = 0; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; @@ -9432,9 +9433,32 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo) if (max_replication_slots > 0 && keep != InvalidXLogRecPtr) { XLogSegNo slotSegNo; + int slotlimitsegs = ConvertToXSegs(max_slot_wal_keep_size_mb); XLByteToSeg(keep, slotSegNo, wal_segment_size); + /* + * ignore slots if too many wal segments are kept. + * max_slot_wal_keep_size is just accumulated on wal_keep_segments. + */ + if (max_slot_wal_keep_size_mb > 0 && slotSegNo + slotlimitsegs < segno) + { + segno = segno - slotlimitsegs; /* must be positive */ + + /* + * warn only if the checkpoint flushes the required segment. + * we assume here that *logSegNo is calculated keep location. + */ + if (slotSegNo < *logSegNo) +ereport(WARNING, + (errmsg ("restart LSN of replication slots is ignored by checkpoint"), + errdetail("Some replication slots have lost required WAL segnents to continue by up to %ld segments.", + (segno < *logSegNo ? segno : *logSegNo) - slotSegNo))); + + /* emergency vent */ + slotSegNo = segno; + } + if (slotSegNo <= 0) segno = 1; else if (slotSegNo < segno) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 65372d7..511023a 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2368,6 +2368,17 @@ static struct config_int ConfigureNamesInt[] = }, { + {"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING, + gettext_noop("Sets the maximum size of extra WALs kept by replication slots."), + NULL, + GUC_UNIT_MB + }, + _slot_wal_keep_size_mb, + 0, 0, INT_MAX, + NULL, NULL, NULL + }, + + { {"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING, gettext_noop("Sets the maximum time to wait for WAL replication."), NULL, diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 368b280..e76c73a 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -234,6 +234,7 @@ #max_wal_senders = 10 # max number of walsender processes # (change requires restart) #wal_keep_segments = 0 # in logfile segments; 0 disables +#max_slot_wal_keep_size = 0 # measured in bytes; 0 disables #wal_sender_timeout = 60s # in milliseconds; 0 disables #max_replication_slots = 10 # max number of replication slots diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 0f2b8bd..f0c0255 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -98,6 +98,7 @@ extern int wal_segment_size; extern int min_wal_size_mb; extern int max_wal_size_mb; extern int wal_keep_segments; +extern int max_slot_wal_keep_size_mb; extern int XLOGbuffers; extern int XLogArchiveTimeout; extern int wal_retrieve_retry_interval; -- 2.9.2 >From 37749d46ba97de38e4593f141bb8a82c67fc0af5 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu,
Re: [HACKERS] EXPLAIN (ANALYZE, BUFFERS) reports bogus temporary buffer reads
On Tue, Oct 17, 2017 at 2:29 AM, Thomas Munrowrote: > Vik Fearing asked off-list why hash joins appear to read slightly more > temporary data than they write. The reason is that we notch up a > phantom block read when we hit the end of each file. Harmless but it > looks a bit weird and it's easily fixed. > > Unpatched, a 16 batch hash join reports that we read 30 more blocks > than we wrote (2 per batch after the first, as expected): > >Buffers: shared hit=434 read=16234, temp read=5532 written=5502 > > With the attached patch: > >Buffers: shared hit=547 read=16121, temp read=5502 written=5502 Committed. Arguably we ought to back-patch this, but it's minor so I didn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 30 October 2017 at 19:17, Peter Geogheganwrote: > On Mon, Oct 30, 2017 at 11:07 AM, Simon Riggs wrote: >> Please explain in detail the MERGE SQL statements that you think will >> be problematic and why. > > Your proposal is totally incomplete, so I can only surmise its > behavior in certain cases, to make a guess at what the problems might > be (see my remarks on EPQ, live lock, etc). This makes it impossible > to do what you ask right now. Impossible, huh. Henry Ford was right. If there are challenges ahead, its reasonable to ask for test cases for that now especially if you think you know what they already are. Imagine we go forwards 2 months - if you dislike my patch when it exists you will submit a test case showing the fault. Why not save us all the trouble and describe that now? Test Driven Development. > Besides, you haven't answered the question from my last e-mail > ("What's wrong with that [set of MERGE semantics]?"), so why should I > go to further trouble? You're just not constructively engaging with me > at this point. It's difficult to discuss anything with someone that refuses to believe that there are acceptable ways around things. I believe there are. If you can calm down the rhetoric we can work together, but if you continue to grandstand it makes it more difficult. > We're going around in circles. Not really. You've said some things and I'm waiting for further details about the problems you've raised. You've said its possible another way. Show that assertion is actually true. We're all listening, me especially, for the technical details. I've got a fair amount of work to do to get the rest of the patch in shape, so take your time and make it a complete explanation. My only goal is the MERGE feature in PG11. For me this is a collaborative engineering challenge not a debate and definitely not an argument. If you describe your plan of how to do this, I may be able to follow it and include that design. If you don't, then it will be difficult for me to include your thoughts. If you or others wish to write something as well, I have already said that is OK too. If anybody's goal is to block development or to wait for perfection to exist at some unstated time in the future, than I disagree with those thoughts. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
On Mon, Oct 16, 2017 at 3:02 PM, Masahiko Sawadawrote: > I guess that is the patch I proposed. However I think that there still > is room for discussion because the patch cannot skip to cleanup vacuum > when aggressive vacuum, which is one of the situation that I really > wanted to skip. Well, I think there are outstanding concerns that the patch in question is not safe, and I don't see that anything has been done to resolve them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: MultiXactId 3268957 has not been created yet -- apparent wraparound after missused pg_resetxlogs
On Mon, Oct 16, 2017 at 5:41 PM, alain radixwrote: > I’m facing a problem with a PostgreSQL 9.6.2 reporting this error when > selecting a table > > ERROR: MultiXactId 3268957 has not been created yet -- apparent wraparound > > The root cause is not a Postgres bug but a buggy person that used > pg_resetxlogs obviously without reading or understanding the documentation. Hmm, I hope you patched that person... > I’m trying to extract data from the db to create a new one ;-) > > But pg_dump logically end with the same error. > > I’ve seen the row with t_max = 3268957 page_inspect, I might use it to > extract row from the page, but this will not be quite easy. Not sure if this would work, but maybe try BEGIN ... UPDATE the row, perhaps via TID qual ... ROLLBACK? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawadawrote: > On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas > wrote: > > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov > > wrote: > >> Hello. I made some bugfixes and rewrite the patch. > > > > I don't think it's a good idea to deliberately leave the state of the > > standby different from the state of the master on the theory that it > > won't matter. I feel like that's something that's likely to come back > > to bite us. > > I agree with Robert. What happen if we intentionally don't apply the > truncation WAL and switched over? If we insert a tuple on the new > master server to a block that has been truncated on the old master, > the WAL apply on the new standby will fail? I guess there are such > corner cases causing failures of WAL replay after switch-over. > Yes, that looks dangerous. One approach to cope that could be teaching heap redo function to handle such these situations. But I expect this approach to be criticized for bad design. And I understand fairness of this criticism. However, from user prospective of view, current behavior of hot_standby_feedback is just broken, because it both increases bloat and doesn't guarantee that read-only query on standby wouldn't be cancelled because of vacuum. Therefore, we should be looking for solution: if one approach isn't good enough, then we should look for another approach. I can propose following alternative approach: teach read-only queries on hot standby to tolerate concurrent relation truncation. Therefore, when non-existent heap page is accessed on hot standby, we can know that it was deleted by concurrent truncation and should be assumed to be empty. Any thoughts? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Mon, Oct 16, 2017 at 5:03 PM, Ashutosh Bapatwrote: > set_append_rel_size() crashes when it encounters a partitioned table > with a dropped column. Dropped columns do not have any translations > saved in AppendInfo::translated_vars; the corresponding entry is NULL > per make_inh_translation_list(). > 1802 att = TupleDescAttr(old_tupdesc, old_attno); > 1803 if (att->attisdropped) > 1804 { > 1805 /* Just put NULL into this list entry */ > 1806 vars = lappend(vars, NULL); > 1807 continue; > 1808 } > > In set_append_rel_size() we try to attr_needed for child tables. While > doing so we try to translate a user attribute number of parent to that > of a child and crash since the translated Var is NULL. Here's patch to > fix the crash. The patch also contains a testcase to test dropped > columns in partitioned table. > > Sorry for not noticing this problem earlier. OK, committed. This is a good example of how having good code coverage doesn't necessarily mean you've found all the bugs. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Flexible configuration for full-text search
> I'm mostly happy with mentioned modifications, but I have few questions > to clarify some points. I will send new patch in week or two. I am glad you liked it. Though, I think we should get approval from more senior community members or committers about the syntax, before we put more effort to the code. > But configuration: > > CASE english_noun WHEN MATCH THEN english_hunspell ELSE simple END > > is not (as I understand ELSE can be used only with KEEP). > > I think we should decide to allow or disallow usage of different > dictionaries for match checking (between CASE and WHEN) and a result > (after THEN). If answer is 'allow', maybe we should allow the > third example too for consistency in configurations. I think you are right. We better allow this too. Then the CASE syntax becomes: CASE config WHEN [ NO ] MATCH THEN { KEEP | config } [ ELSE config ] END > Based on formal definition it is possible to describe this example in > following manner: > CASE english_noun WHEN MATCH THEN english_hunspell END > > The question is same as in the previous example. I couldn't understand the question. > Currently, stopwords increment position, for example: > SELECT to_tsvector('english','a test message'); > - > 'messag':3 'test':2 > > A stopword 'a' has a position 1 but it is not in the vector. Is this problem only applies to stopwords and the whole thing we are inventing? Shouldn't we preserve the positions through the pipeline? > If we want to save this behavior, we should somehow pass a stopword to > tsvector composition function (parsetext in ts_parse.c) for counter > increment or increment it in another way. Currently, an empty lexemes > array is passed as a result of LexizeExec. > > One of possible way to do so is something like: > CASE polish_stopword > WHEN MATCH THEN KEEP -- stopword counting > ELSE polish_isspell > END This would mean keeping the stopwords. What we want is CASE polish_stopword-- stopword counting WHEN NO MATCH THEN polish_isspell END Do you think it is possible? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed
On Sun, Oct 29, 2017 at 12:47 AM, Pavel Stehulewrote: > 2017-10-28 23:35 GMT+02:00 Alexander Korotkov : > >> On Sat, Oct 28, 2017 at 3:46 PM, Pavel Stehule >> wrote: >> >>> 2017-09-22 21:31 GMT+02:00 Pavel Stehule : >>> 2017-09-22 21:12 GMT+02:00 Peter Eisentraut < peter.eisentr...@2ndquadrant.com>: > On 9/22/17 09:16, Pavel Stehule wrote: > > Example: somebody set SORT_COLUMNS to schema_name value. This is > > nonsense for \l command > > > > Now, I am thinking so more correct and practical design is based on > > special mode, activated by variable > > > > PREFER_SIZE_SORT .. (off, asc, desc) > > > > This has sense for wide group of commands that can show size. And > when > > size is not visible, then this option is not active. > > Maybe this shouldn't be a variable at all. It's not like you'll set > this as a global preference. You probably want it for one command > only. > So a per-command option might make more sense. > Sure, I cannot to know, what users will do. But, when I need to see a size of objects, then I prefer the sort by size desc every time. If I need to find some object, then I can to use a searching in pager. So in my case, this settings will be in psqlrc. In GoodData we used years own customization - the order by size was hardcoded and nobody reported me any issue. Alexander proposed some per command option, but current syntax of psql commands don't allows some simple parametrization. If it can be user friendly, then it should be short. From implementation perspective, it should be simply parsed. It should be intuitive too - too much symbols together is not good idea. Maybe some prefix design - but it is not design for common people (although these people don't use psql usually) '\sort size \dt ? \dt:sort_by_size \dt+:sort_by_size ? I don't see any good design in this direction >>> I though about Alexander proposal, and I am thinking so it can be >>> probably best if we respect psql design. I implemented two command suffixes >>> (supported only when it has sense) "s" sorted by size and "d" as descent >>> >>> so list of tables can be sorted with commands: >>> >>> \dt+sd (in this case, the order is not strict), so command >>> \dtsd+ is working too (same \disd+ or \di+sd) >>> >>> These two chars are acceptable. Same principle is used for \l command >>> >>> \lsd+ or \l+sd >>> >>> What do you think about it? >>> >> >> I think \lsd+ command would be another postgres meme :) >> BTW, are you going to provide an ability to sort by name, schema? >> > > It has sense only for tables - probably only \dtn "n" like name > In general, this approach looks good for me. Regarding current state of patch, I'd like to see new options documented. Also, it would be better to replace "bool sort_size" with enum assuming there could be other sorting orders in future. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM
On Tue, Oct 31, 2017 at 4:56 AM, Tom Lanewrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> So: I put the blame on the fact that summarize_range() thinks that >>> the tuple offset it has for the placeholder tuple is guaranteed to >>> hold good, even across possibly-long intervals where it's holding >>> no lock on the containing buffer. > >> Yeah, I think this is a pretty reasonable explanation for the problem. >> I don't understand why it doesn't fail in 9.6. > > Yeah, we're still missing an understanding of why we didn't see it > before; the inadequate locking was surely there before. I'm guessing > that somehow the previous behavior of PageIndexDeleteNoCompact managed > to mask the problem (perhaps only by not throwing an error, which doesn't > imply that the index state was good afterwards). But I don't see quite > how it did that. Because 24992c6d has added a check on the offset number by using PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks tighter, no? I have not tested, and I lack knowledge about the brin code, but it seems to me that if we had a similar check then things could likely blow up. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
Thanks for the test case. On 2017/10/30 17:09, Rajkumar Raghuwanshi wrote: > I am getting wrong output when default is sub-partitioned further, below is > a test case. > > CREATE TABLE lpd(a int, b varchar, c float) PARTITION BY LIST (a); > CREATE TABLE lpd_p1 PARTITION OF lpd FOR VALUES IN (1,2,3); > CREATE TABLE lpd_p2 PARTITION OF lpd FOR VALUES IN (4,5); > CREATE TABLE lpd_d PARTITION OF lpd DEFAULT PARTITION BY LIST(a); > CREATE TABLE lpd_d1 PARTITION OF lpd_d FOR VALUES IN (7,8,9); > CREATE TABLE lpd_d2 PARTITION OF lpd_d FOR VALUES IN (10,11,12); > CREATE TABLE lpd_d3 PARTITION OF lpd_d FOR VALUES IN (6,null); > INSERT INTO lpd SELECT i,i,i FROM generate_Series (1,12)i; > INSERT INTO lpd VALUES (null,null,null); > > --on HEAD > postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE > a IS NOT NULL ORDER BY 1; > QUERY PLAN > - > Sort >Sort Key: ((lpd_p1.tableoid)::regclass) >-> Result > -> Append >-> Seq Scan on lpd_p1 > Filter: (a IS NOT NULL) >-> Seq Scan on lpd_p2 > Filter: (a IS NOT NULL) >-> Seq Scan on lpd_d3 > Filter: (a IS NOT NULL) >-> Seq Scan on lpd_d1 > Filter: (a IS NOT NULL) >-> Seq Scan on lpd_d2 > Filter: (a IS NOT NULL) > (14 rows) > > postgres=# > postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER > BY 1; > tableoid | a | b | c > --+++ > lpd_p1 | 1 | 1 | 1 > lpd_p1 | 2 | 2 | 2 > lpd_p1 | 3 | 3 | 3 > lpd_p2 | 4 | 4 | 4 > lpd_p2 | 5 | 5 | 5 > lpd_d1 | 7 | 7 | 7 > lpd_d1 | 8 | 8 | 8 > lpd_d1 | 9 | 9 | 9 > lpd_d2 | 12 | 12 | 12 > lpd_d2 | 10 | 10 | 10 > lpd_d2 | 11 | 11 | 11 > lpd_d3 | 6 | 6 | 6 > (12 rows) > > > --on HEAD + v8 patches > > postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE > a IS NOT NULL ORDER BY 1; > QUERY PLAN > - > Sort >Sort Key: ((lpd_p1.tableoid)::regclass) >-> Result > -> Append >-> Seq Scan on lpd_p1 > Filter: (a IS NOT NULL) >-> Seq Scan on lpd_p2 > Filter: (a IS NOT NULL) > (8 rows) > > postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER > BY 1; > tableoid | a | b | c > --+---+---+--- > lpd_p1 | 1 | 1 | 1 > lpd_p1 | 2 | 2 | 2 > lpd_p1 | 3 | 3 | 3 > lpd_p2 | 4 | 4 | 4 > lpd_p2 | 5 | 5 | 5 > (5 rows) I found bugs in 0003 and 0005 that caused this. Will post the patches containing the fix in reply to the Dilip's email which contains some code review comments [1]. Also, I noticed that the new pruning code was having a hard time do deal with the fact that the default "range" partition doesn't explicitly say in its partition constraint that it might contain null values. More precisely perhaps, the default range partition's constraint appears to imply that it can only contain non-null values, which confuses the new pruning code. Thanks, Amit [1] https://www.postgresql.org/message-id/CAFiTN-thYsobXxPS6bwOA_9erpax_S=iztsn3rtuxkkmkg4...@mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Anyone have experience benchmarking very high effective_io_concurrency on NVME's?
Hi; After Andres's excellent talk at PGConf we tried benchmarking effective_io_concurrency on some of our servers and found that those which have a number of NVME storage volumes could not fill the I/O queue even at the maximum setting (1000). Before we start benchmarking patched versions of PostgreSQL to bump this setting up to ever higher values, I am wondering if anyone has done this yet and if so if you would be willing to share results. -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] [bug fix] postgres.exe crashes with access violation on Windows while starting up
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier > So you are basically ready to lose any message that could be pushed > here if there is no memory context? That does not sound like a good > trade-off to me. A static buffer does not look like the best idea > either to not truncate message, so couldn't we envisage to just use > malloc? pgwin32_message_to_UTF16() is called in two places in elog.c, > and there is a full control on the error code paths. Thank you for reviewing a rare bug fix on Windows that most people wouldn't be interested in. When CurrentMemoryContext is NULL, the message is logged with ReportEventA(). This is similar to write_console(). Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers