Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Pavel Stehule
2017-11-01 6:07 GMT+01:00 Serge Rielau :

> "Although the syntax of CREATE TEMPORARY TABLE resembles that of the SQL
> standard, the effect is not the same. In the standard, temporary tables are
> defined just once and automatically exist (starting with empty contents) in
> every session that needs them. PostgreSQL instead requires each session
> to issue its own CREATE TEMPORARY TABLE command for each temporary table
> to be used. This allows different sessions to use the same temporary table
> name for different purposes, whereas the standard's approach constrains all
> instances of a given temporary table name to have the same table structure.”
> Yeah, that’s a DECLAREd table in my book. No wonder we didn’t link up.
>

This is known discussion about local / global temp tables in PostgresSQL.
And ToDo point: implementation of global temp tables in Postgres.

This temporary behave is marginal part of proposal - so I can to remove it
from proposal - and later open discussion about CREATE TEMPORARY VARIABLE
versus DECLARE VARIABLE

Regards

Pavel

Serge

>


Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Serge Rielau
" Although the syntax of CREATE TEMPORARY TABLE resembles that of the SQL 
standard, the effect is not the same. In the standard, temporary tables are 
defined just once and automatically exist (starting with empty contents) in 
every session that needs them. PostgreSQL instead requires each session to 
issue its own CREATE TEMPORARY TABLE command for each temporary table to be 
used. This allows different sessions to use the same temporary table name for 
different purposes, whereas the standard's approach constrains all instances of 
a given temporary table name to have the same table structure.”
Yeah, that’s a DECLAREd table in my book. No wonder we didn’t link up.
Cheers Serge

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-10-31 Thread Andreas Karlsson

Here is a rebased version of the patch.

Andreas
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index a0ca2851e5..f8c59ea127 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -926,6 +926,7 @@ ERROR:  could not serialize access due to read/write dependencies among transact
 
  Acquired by VACUUM (without FULL),
  ANALYZE, CREATE INDEX CONCURRENTLY,
+ REINDEX CONCURRENTLY,
  CREATE STATISTICS and
  ALTER TABLE VALIDATE and other
  ALTER TABLE variants (for full details see
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 2e053c4c24..4019bad4c2 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
 
  
 
@@ -67,10 +67,7 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 
   An index build with the CONCURRENTLY option failed, leaving
   an invalid index. Such indexes are useless but it can be
-  convenient to use REINDEX to rebuild them. Note that
-  REINDEX will not perform a concurrent build. To build the
-  index without interfering with production you should drop the index and
-  reissue the CREATE INDEX CONCURRENTLY command.
+  convenient to use REINDEX to rebuild them.
  
 
 
@@ -151,6 +148,21 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 

 
+   
+CONCURRENTLY
+
+ 
+  When this option is used, PostgreSQL will rebuild the
+  index without taking any locks that prevent concurrent inserts,
+  updates, or deletes on the table; whereas a standard reindex build
+  locks out writes (but not reads) on the table until it's done.
+  There are several caveats to be aware of when using this option
+   see .
+ 
+
+   
+

 VERBOSE
 
@@ -231,6 +243,173 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 
 
+  
+   Rebuilding Indexes Concurrently
+
+   
+   index
+   rebuilding concurrently
+   
+
+   
+Rebuilding an index can interfere with regular operation of a database.
+Normally PostgreSQL locks the table whose index is rebuilt
+against writes and performs the entire index build with a single scan of the
+table. Other transactions can still read the table, but if they try to
+insert, update, or delete rows in the table they will block until the
+index rebuild is finished. This could have a severe effect if the system is
+a live production database. Very large tables can take many hours to be
+indexed, and even for smaller tables, an index rebuild can lock out writers
+for periods that are unacceptably long for a production system.
+   
+
+   
+PostgreSQL supports rebuilding indexes with minimum locking
+of writes.  This method is invoked by specifying the
+CONCURRENTLY option of REINDEX. When this option
+is used, PostgreSQL must perform two scans of the table
+for each index that needs to be rebuild and in addition it must wait for
+all existing transactions that could potentially use the index to
+terminate. This method requires more total work than a standard index
+rebuild and takes significantly longer to complete as it needs to wait
+for unfinished transactions that might modify the index. However, since
+it allows normal operations to continue while the index is rebuilt, this
+method is useful for rebuilding indexes in a production environment. Of
+course, the extra CPU, memory and I/O load imposed by the index rebuild
+may slow down other operations.
+   
+
+   
+The following steps occur in a concurrent index build, each in a separate
+transaction except when the new index definitions are created, where all
+the concurrent entries are created using only one transaction. Note that
+if there are multiple indexes to be rebuilt then each step loops through
+all the indexes we're rebuilding, using a separate transaction for each one.
+REINDEX CONCURRENTLY proceeds as follows when rebuilding
+indexes:
+
+
+ 
+  
+   A new temporary index definition is added into the catalog
+   pg_index. This definition will be used to replace the
+   old index. This step is done as a single transaction for all the indexes
+   involved in this process, meaning that if
+   REINDEX CONCURRENTLY is run on a table with multiple
+   indexes, all the catalog entries of the new indexes are created within a
+   single transaction. A SHARE UPDATE EXCLUSIVE lock at
+   session level is taken on the indexes being reindexed as well as its
+   parent table to prevent any schema modification while processing.
+  
+ 
+ 
+  
+   A first 

Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Pavel Stehule
2017-10-31 22:28 GMT+01:00 srielau :

> Pavel,
>
> There is no
> DECLARE TEMP CURSOR
> or
> DECLARE TEMP variable in PLpgSQL
> and
>

sure .. DECLARE TEMP has no sense, I talked about similarity DECLARE and
CREATE TEMP


CREATE TEMP TABLE has a different meaning from what I understand you
> envision for variables.
>
> But maybe I'm mistaken. Your original post did not describe the entire
> syntax:
> CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type
>   [ DEFAULT expression ] [[NOT] NULL]
>   [ ON TRANSACTION END { RESET | DROP } ]
>   [ { VOLATILE | STABLE } ];
>
> Especially the TEMP is not spelled out and how its presence affects or
> doesn't ON TRANSACTION END.
> So may be if you elaborate I understand where you are coming from.
>

TEMP has same functionality (and implementation) like our temp tables - so
at session end the temp variables are destroyed, but it can be assigned to
transaction.



>
>
>
>
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-
> f1928748.html
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

2017-10-31 Thread Craig Ringer
On 1 November 2017 at 11:49, Andres Freund  wrote:

> Right. It'd probably be good to be a bit more adaptive here. But it's
> hard to do with posix_fadvise - we'd need an operation that actually
> notifies us of IO completion.  If we were using, say, asynchronous
> direct IO, we could initiate the request and regularly check how many
> blocks ahead of the current window are already completed and adjust the
> queue based on that, rather than jus tfiring off fadvises and hoping for
> the best.

In case it's of interest, I did some looking into using Linux's AIO
support in Pg a while ago, when chasing some issues around fsync
retries and handling of I/O errors.

It was a pretty serious dead end; it was clear that fsync support in
AIO is not only incomplete but inconsistent across kernel versions,
let alone other platforms.

But I see your name in the relevant threads, so you know that. To save
others the time, see:

* https://lwn.net/Articles/724198/
* https://lwn.net/Articles/671649/

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tomas Vondra wrote:

> FWIW I can reproduce this on 9.5, and I don't even need to run the
> UPDATE part. That is, INSERT + VACUUM running concurrently is enough to
> produce broken BRIN indexes :-(

Hmm, I'm pretty sure we stress-tested brin in pretty much the same way.
But I see this misbehavior too.  Looking ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

2017-10-31 Thread Andres Freund
Hi,

On 2017-10-31 18:47:06 +0100, Tomas Vondra wrote:
> On 10/31/2017 04:48 PM, Greg Stark wrote:
> > On 31 October 2017 at 07:05, Chris Travers 
> wrote:
> >> Hi;
> >>
> >> After Andres's excellent talk at PGConf we tried benchmarking
> >> effective_io_concurrency on some of our servers and found that those
> which
> >> have a number of NVME storage volumes could not fill the I/O queue
> even at
> >> the maximum setting (1000).
> >
> > And was the system still i/o bound? If the cpu was 100% busy then
> > perhaps Postgres just can't keep up with the I/O system. It would
> > depend on workload though, if you start many very large sequential
> > scans you may be able to push the i/o system harder.
> >
> > Keep in mind effective_io_concurrency only really affects bitmap
> > index scans (and to a small degree index scans). It works by issuing
> > posix_fadvise() calls for upcoming buffers one by one. That gets
> > multiple spindles active but it's not really going to scale to many
> > thousands of prefetches (and effective_io_concurrency of 1000
> > actually means 7485 prefetches). At some point those i/o are going
> > to start completing before Postgres even has a chance to start
> > processing the data.

Note that even if they finish well before postgres gets around to
looking at the block, you might still be seeing benefits. SSDs benefit
from larger reads, and a deeper queue gives more chances for reordering
/ coalescing of requests. Won't beenefit the individual reader, but
might help the overall capacity of the system.


> Yeah, initiating the prefetches is not expensive, but it's not free
> either. So there's a trade-off between time spent on prefetching and
> processing the data.

Right. It'd probably be good to be a bit more adaptive here. But it's
hard to do with posix_fadvise - we'd need an operation that actually
notifies us of IO completion.  If we were using, say, asynchronous
direct IO, we could initiate the request and regularly check how many
blocks ahead of the current window are already completed and adjust the
queue based on that, rather than jus tfiring off fadvises and hoping for
the best.


> I believe this may be actually illustrated using Amdahl's law - the I/O
> is the parallel part, and processing the data is the serial part. And no
> matter what you do, the device only has so much bandwidth, which defines
> the maximum possible speedup (compared to "no prefetch" case).

Right.


> Furthermore, the device does not wait for all the I/O requests to be
> submitted - it won't wait for 1000 requests and then go "OMG! There's a
> lot of work to do!" It starts processing the requests as they arrive,
> and some of them will complete before you're done with submitting the
> rest, so you'll never see all the requests in the queue at once.

It'd be interesting to see how much it helps to scale the size of
readahead requests with the distance from the current read
iterator. E.g. if you're less than 16 blocks away from the current head,
issue size 1, up to 32 issue a 2 block request for consecutive blocks.
I suspect it won't help because at least linux's block layer / io
elevator seems quite successfully at merging. E.g. for the query:
EXPLAIN ANALYZE SELECT sum(l_quantity) FROM lineitem where l_receiptdate 
between '1993-05-03' and '1993-08-03';
on a tpc-h scale dataset on my laptop, I see:
Devicer/s w/s rMB/s wMB/s   rrqm/s   wrqm/s  %rrqm  
%wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda   25702.000.00495.27  0.00 37687.00 0.00  59.45   
0.005.130.00 132.0919.73 0.00   0.04 100.00

but it'd be worthwhile to see whether doing the merging ourselves allows
for deeper queues.


I think we really should start incorporating explicit prefetching in
more places. Ordered indexscans might actually be one case that's not
too hard to do in a simple manner - whenever at an inner node, prefetch
the leaf nodes below it. We obviously could do better, but that might be
a decent starting point to get some numbers.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-31 Thread Masahiko Sawada
On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkov
 wrote:
> On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada 
> wrote:
>>
>> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas 
>> wrote:
>> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
>> >  wrote:
>> >> Hello. I made some bugfixes and rewrite the patch.
>> >
>> > I don't think it's a good idea to deliberately leave the state of the
>> > standby different from the state of the  master on the theory that it
>> > won't matter.  I feel like that's something that's likely to come back
>> > to bite us.
>>
>> I agree with Robert. What happen if we intentionally don't apply the
>> truncation WAL and switched over? If we insert a tuple on the new
>> master server to a block that has been truncated on the old master,
>> the WAL apply on the new standby will fail? I guess there are such
>> corner cases causing failures of WAL replay after switch-over.
>
>
> Yes, that looks dangerous.  One approach to cope that could be teaching heap
> redo function to handle such these situations.  But I expect this approach
> to be criticized for bad design.  And I understand fairness of this
> criticism.
>
> However, from user prospective of view, current behavior of
> hot_standby_feedback is just broken, because it both increases bloat and
> doesn't guarantee that read-only query on standby wouldn't be cancelled
> because of vacuum.  Therefore, we should be looking for solution: if one
> approach isn't good enough, then we should look for another approach.
>
> I can propose following alternative approach: teach read-only queries on hot
> standby to tolerate concurrent relation truncation.  Therefore, when
> non-existent heap page is accessed on hot standby, we can know that it was
> deleted by concurrent truncation and should be assumed to be empty.  Any
> thoughts?
>

You also meant that the applying WAL for AccessExclusiveLock is always
skipped on standby servers to allow scans to access the relation?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dynamic result sets from procedures

2017-10-31 Thread Craig Ringer
On 1 November 2017 at 05:08, Peter Eisentraut
 wrote:

> CREATE PROCEDURE pdrstest1()
> LANGUAGE SQL
> AS $$
> DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
> DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
> $$;
>
> CALL pdrstest1();

FWIW, this is similar to the model already used by PgJDBC to emulate
multiple result sets, though the current support in the driver is
rather crude. It detects a REFCURSOR in an output parameter / result
set and transparently FETCHes the result set, making it look to the
client app like it's a nested result set.

This shouldn't conflict with what you're doing because the driver does
not follow the JDBC standard behaviour of using
Statement.getMoreResults() and Statement.getResultSet() for multiple
result sets. That's currently only used by PgJDBC when fetching result
sets from batch query executions. Instead, the multiple result set
emulation requires the caller to 'getObject' the 'refcursor' field's
result-object, then cast it to ResultSet, and treat it as a new
(nested) result set.

True multiple result sets would be exposed in PgJDBC via getMoreResults().

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-10-31 Thread Craig Ringer
> Now, I agree this is somewhat more limited than I hoped for, but OTOH it
> still solves the issue I initially aimed for (processing large query
> results in efficient way).

I don't quite understand this part.

We already send results to the client in a stream unless it's
something we have to materialize, in which case a cursor won't help
anyway.

If the client wants to fetch in chunks it can use a portal and limited
size fetches. That shouldn't (?) be parallel-unsafe, since nothing
else can happen in the middle anyway.

But in most cases the client can just fetch all, and let the socket
buffering take care of things, reading results only when it wants
them, and letting the server block when the windows are full.

That's not to say that SQL-level cursor support wouldn't be nice. I'm
just trying to better understand what it's solving.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2017-10-31 Thread Robert Haas
On Wed, Oct 4, 2017 at 5:58 PM, Ashutosh Bapat
 wrote:
> The view with WCO is local but the modification which violates WCO is
> being made on remote server by a trigger on remote table. Trying to
> control that doesn't seem to be a good idea, just like we can't
> control what rows get inserted on the foreign server when they violate
> local constraints.

I think that's a fair point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-10-31 Thread Peter Geoghegan
On Tue, Oct 31, 2017 at 5:07 PM, Thomas Munro
 wrote:
> So that's this bit:
>
> + pg_itoa(worker, filename);
> + lts->pfile = BufFileCreateShared(fileset, filename);
>
> ... and:
>
> + pg_itoa(i, filename);
> + file = BufFileOpenShared(fileset, filename);

Right.

> What's wrong with using a worker number like this?

I guess nothing, though there is the question of discoverability for
DBAs, etc. You do address this separately, by having (potentially)
descriptive filenames, as you go into.

> It's not random choice: buffile.c creates a uniquely named directory
> (or directories, if you have more than one location configured in the
> temp_tablespaces GUC) to hold all the backing files involved in each
> BufFileSet.  Naming of BufFiles within the BufFileSet is the caller's
> problem, and a worker number seems like a reasonable choice to me.  It
> won't collide with a concurrent parallel CREATE INDEX because that'll
> be using its own BufFileSet.

Oh, I see. I may have jumped the gun on that one.

> One complaint about the current coding that someone might object to:
> MakeSharedSegmentPath() just dumps the caller's BufFile name into a
> path without sanitisation: I should fix that so that we only accept
> fairly limited strings here.  Another complaint is that perhaps fd.c
> knows too much about buffile.c's business.  For example,
> RemovePgTempFilesInDir() knows about the ".set" directories created by
> buffile.c, which might be called a layering violation.  Perhaps the
> set/directory logic should move entirely into fd.c, so you'd call
> FileSetInit(FileSet *), not BufFileSetInit(BufFileSet *), and then
> BufFileOpenShared() would take a FileSet *, not a BufFileSet *.
> Thoughts?

I'm going to make an item on my personal TODO list for that. No useful
insights on that right now, though.

> 3.  sharedtuplestore.c takes a caller-supplied BufFileSet and creates
> its shared BufFiles in there.  Earlier versions created and owned a
> BufFileSet, but in the current Parallel Hash patch I create loads of
> separate SharedTuplestore objects but I didn't want to create load of
> directories to back them, so you can give them all the same
> BufFileSet.  That works because SharedTuplestores are also given a
> name, and it's the caller's job (in my case nodeHash.c) to make sure
> the SharedTuplestores are given unique names within the same
> BufFileSet.  For Parallel Hash you'll see names like 'i3of8' (inner
> batch 3 of 8).  There is no need for there to be in any sort of
> central registry for that though, because it rides on top of the
> guarantees from 2 above: buffile.c will put those files into a
> uniquely named directory, and that works as long as no one else is
> allowed to create files or directories in the temp directory that
> collide with its reserved pattern /^pgsql_tmp.+\.set$/.  For the same
> reason, parallel CREATE INDEX is free to use worker numbers as BufFile
> names, since it has its own BufFileSet to work within.

If the new standard is that you have temp file names that suggest the
purpose of each temp file, then that may be something that parallel
CREATE INDEX should buy into.

> In an earlier version, BufFileSet was one of those annoying data
> structures with a FLEXIBLE_ARRAY_MEMBER that you'd use as an
> incomplete type (declared but not defined in the includable header),
> and here it was being used "inside" (or rather after) SharedSort,
> which *itself* had a FLEXIBLE_ARRAY_MEMBER.  The reason for the
> variable sized object was that I needed all backends to agree on the
> set of temporary tablespace OIDs, of which there could be any number,
> but I also needed a 'flat' (pointer-free) object I could stick in
> relocatable shared memory.  In the newest version I changed that
> flexible array to tablespaces[8], because 8 should be enough
> tablespaces for anyone (TM).

I guess that that's something that you'll need to take up with Andres,
if you haven't already. I have a hard time imagining a single query
needed to use more than that many tablespaces at once, so maybe this
is fine.

> I don't really believe anyone uses
> temp_tablespaces for IO load balancing anymore and I hate code like
> the above.  So I think Rushabh should now remove the above-quoted code
> and just use a BufFileSet directly as a member of SharedSort.

FWIW, I agree with you that nobody uses temp_tablespaces this way
these days. This seems like a discussion for your hash join patch,
though. I'm happy to buy into that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding column_constraint description in ALTER TABLE synopsis

2017-10-31 Thread Amit Langote
On 2017/10/31 21:31, Stephen Frost wrote:
> * Lætitia Avrot (laetitia.av...@gmail.com) wrote:
>> As Amit Langot pointed out, the column_constraint definition is missing
>> whereas it is used in ALTER TABLE synopsis. It can be easily found in the
>> CREATE TABLE synopsis, but it's not very user friendly.
>
> Thanks, this looks pretty reasonable, but did you happen to look for any
> other keywords in the ALTER TABLE that should really be in ALTER TABLE
> also?
> 
> I'm specifically looking at, at least, partition_bound_spec.  Maybe you
> could propose an updated patch which addresses that also, and any other
> cases you find?

Ah, yes.  I remember having left out partition_bound_spec simply because I
thought it was kind of how it was supposed to be done, seeing that neither
column_constraint and table_constraint were expanded in the ALTER TABLE's
synopsis.

It seems that there are indeed a couple of other things that need to be
brought over to ALTER TABLE synopsis including partition_bound_spec.
9f295c08f877 [1] added table_constraint, but missed to add the description
of index_parameters and exclude_element which are referenced therein.

Attached find updated version of the Lætitia's patch.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9f295c
diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 41acda003f..e059f87875 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -85,6 +85,20 @@ ALTER TABLE [ IF EXISTS ] name
 OWNER TO { new_owner | 
CURRENT_USER | SESSION_USER }
 REPLICA IDENTITY { DEFAULT | USING INDEX index_name | FULL | NOTHING }
 
+and column_constraint 
is:
+
+[ CONSTRAINT constraint_name ]
+{ NOT NULL |
+  NULL |
+  CHECK ( expression ) [ NO 
INHERIT ] |
+  DEFAULT default_expr |
+  GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( 
sequence_options ) ] |
+  UNIQUE index_parameters |
+  PRIMARY KEY index_parameters |
+  REFERENCES reftable [ ( 
refcolumn ) ] [ MATCH FULL | MATCH 
PARTIAL | MATCH SIMPLE ]
+[ ON DELETE action ] [ ON 
UPDATE action ] }
+[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+
 and table_constraint 
is:
 
 [ CONSTRAINT constraint_name ]
@@ -96,6 +110,15 @@ ALTER TABLE [ IF EXISTS ] name
 [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] }
 [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
 
+index_parameters in 
UNIQUE, PRIMARY KEY, and 
EXCLUDE constraints are:
+
+[ WITH ( storage_parameter [= 
value] [, ... ] ) ]
+[ USING INDEX TABLESPACE tablespace_name ]
+
+exclude_element in an 
EXCLUDE constraint is:
+
+{ column_name | ( expression ) } [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST 
} ]
+
 and table_constraint_using_index is:
 
 [ CONSTRAINT constraint_name ]
@@ -104,6 +127,13 @@ ALTER TABLE [ IF EXISTS ] name
 
  
 
+and partition_bound_spec 
is:
+
+IN ( { numeric_literal | 
string_literal | NULL } [, ...] ) |
+FROM ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] )
+  TO ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] )
+
+
  
   Description
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Removing LEFT JOINs in more cases

2017-10-31 Thread David Rowley
Hackers,

Normally we'll only ever remove a LEFT JOIN relation if it's unused
and there's no possibility that the join would cause row duplication.
To check that the join wouldn't cause row duplicate we make use of
proofs, such as unique indexes, or for sub-queries, we make use of
DISTINCT and GROUP BY clauses.

There's another case that we don't handle, and it's VERY simple to test for.

Quite simply, it seems we could remove the join in cases such as:

create table t1 (id int primary key);
create table t2 (id int primary key, b int not null);

insert into t2 values(1,1),(2,1);
insert into t1 values(1);

select distinct t1.* from t1 left join t2 on t1.id=t2.b;

and

select t1.id from t1 left join t2 on t1.id=t2.b GROUP BY t1.id;

but not:

select t1.id,count(*) from t1 left join t2 on t1.id=t2.b GROUP BY t1.id;

In this case, the join *can* cause row duplicates, but the distinct or
group by would filter these out again anyway, so in these cases, we'd
not only get the benefit of not joining but also not having to remove
the duplicate rows caused by the join.

Given how simple the code is to support this, it seems to me to be
worth handling.

A patch to do this is attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Support-removing-LEFT-JOINs-with-DISTINCT-GROUP-BY.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] postgres.exe crashes with access violation on Windows while starting up

2017-10-31 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:michael.paqu...@gmail.com]
> On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayuki
>  wrote:
> > When CurrentMemoryContext is NULL, the message  is logged with
> ReportEventA().  This is similar to write_console().
> 
> My point is that as Postgres is running as a service, isn't it wrong to
> write a message to stderr as a fallback if the memory context is not set?
> You would lose a message. It seems to me that for an operation that can
> happen at a low-level like the postmaster startup, we should really use
> a low-level operation as well.

I'm sorry I may not have been clear.  With this patch, write_eventlog() outputs 
the message to event log with ReportEventA() when CurrentMemoryContext is NULL. 
 It doesn't write to stderr.  So the message won't be lost.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-10-31 Thread Thomas Munro
On Wed, Nov 1, 2017 at 11:29 AM, Peter Geoghegan  wrote:
> On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia
>  wrote:
>> Attaching the re based patch according to the v22 parallel-hash patch sets
>
> I took a quick look at this today, and noticed a few issues:
>
> * make_name() is used to name files in sharedtuplestore.c, which is
> what is passed to BufFileOpenShared() for parallel hash join. Your
> using your own logic for that within the equivalent logtape.c call to
> BufFileOpenShared(), presumably because make_name() wants to identify
> participants by PID rather than by an ordinal identifier number.

So that's this bit:

+ pg_itoa(worker, filename);
+ lts->pfile = BufFileCreateShared(fileset, filename);

... and:

+ pg_itoa(i, filename);
+ file = BufFileOpenShared(fileset, filename);

What's wrong with using a worker number like this?

> I think that we need some kind of central registry for things that use
> shared buffiles. It could be that sharedtuplestore.c is further
> generalized to support this, or it could be that they both call
> something else that takes care of naming. It's not okay to have this
> left to random chance.

It's not random choice: buffile.c creates a uniquely named directory
(or directories, if you have more than one location configured in the
temp_tablespaces GUC) to hold all the backing files involved in each
BufFileSet.  Naming of BufFiles within the BufFileSet is the caller's
problem, and a worker number seems like a reasonable choice to me.  It
won't collide with a concurrent parallel CREATE INDEX because that'll
be using its own BufFileSet.

> You're going to have to ask Thomas about this.  You should also use
> MAXPGPATH for the char buffer on the stack.

Here's a summary of namespace management scheme I currently have at
the three layers fd.c, buffile.c, sharedtuplestore.c:

1.  fd.c has new lower-level functions provides
PathNameCreateTemporaryFile(const char *path) and
PathNameOpenTemporaryFile(const char *path).  It also provides
PathNameCreateTemporaryDir().  Clearly callers of these interfaces
will need to be very careful about managing the names they use.
Callers also own the problem of cleaning up files, since there is no
automatic cleanup of files created this way.  My intention was that
these facilities would *only* be used by BufFileSet, since it has
machinery to manage those things.

2.  buffile.c introduces BufFileSet, which is conceptually a set of
BufFiles that can be shared by multiple backends with DSM
segment-scoped cleanup.  It is implemented as a set of directories:
one for each tablespace in temp_tablespaces.  It controls the naming
of those directories.  The BufFileSet directories are named similarly
to fd.c's traditional temporary file names using the usual recipe
"pgsql_tmp" + PID + per-process counter but have an additional ".set"
suffix.  RemovePgTempFilesInDir() recognises directories with that
prefix and suffix as junk left over from a crash when cleaning up.  I
suppose it's that knowledge about reserved name patterns and cleanup
that you are thinking of as a central registry?  As for the BufFiles
that are in a BufFileSet, buffile.c has no opinion on that: the
calling code (parallel CREATE INDEX, sharedtuplestore.c, ...) is
responsible for coming up with its own scheme.  If parallel CREATE
INDEX wants to name shared BufFiles "walrus" and "banana", that's OK
by me, and those files won't collide with anything in another
BufFileSet because each BufFileSet has its own directory (-ies).

One complaint about the current coding that someone might object to:
MakeSharedSegmentPath() just dumps the caller's BufFile name into a
path without sanitisation: I should fix that so that we only accept
fairly limited strings here.  Another complaint is that perhaps fd.c
knows too much about buffile.c's business.  For example,
RemovePgTempFilesInDir() knows about the ".set" directories created by
buffile.c, which might be called a layering violation.  Perhaps the
set/directory logic should move entirely into fd.c, so you'd call
FileSetInit(FileSet *), not BufFileSetInit(BufFileSet *), and then
BufFileOpenShared() would take a FileSet *, not a BufFileSet *.
Thoughts?

3.  sharedtuplestore.c takes a caller-supplied BufFileSet and creates
its shared BufFiles in there.  Earlier versions created and owned a
BufFileSet, but in the current Parallel Hash patch I create loads of
separate SharedTuplestore objects but I didn't want to create load of
directories to back them, so you can give them all the same
BufFileSet.  That works because SharedTuplestores are also given a
name, and it's the caller's job (in my case nodeHash.c) to make sure
the SharedTuplestores are given unique names within the same
BufFileSet.  For Parallel Hash you'll see names like 'i3of8' (inner
batch 3 of 8).  There is no need for there to be in any sort of
central registry for that though, because it rides on top of the
guarantees from 2 above: buffile.c 

Re: [HACKERS] Account for cost and selectivity of HAVING quals

2017-10-31 Thread Tom Lane
"David G. Johnston"  writes:
> On Tue, Oct 31, 2017 at 4:31 PM, Tels  wrote:
>> That looks odd to me, it first uses output_tuples in a formula, then
>> overwrites the value with a new value. Should these lines be swapped?

> ​IIUC it is correct: the additional total_cost comes from processing every
> output group to check whether it is qualified - since every group is
> checked the incoming output_tuples from the prior grouping is used.

Right --- we'll expend the effort to compute the HAVING expression once
per group row, whether the row passes the qual or not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Account for cost and selectivity of HAVING quals

2017-10-31 Thread David G. Johnston
On Tue, Oct 31, 2017 at 4:31 PM, Tels  wrote:

>
> ​​
> That looks odd to me, it first uses output_tuples in a formula, then
> overwrites the value with a new value. Should these lines be swapped?
>

​IIUC it is correct: the additional total_cost comes from processing every
output group to check whether it is qualified - since every group is
checked the incoming output_tuples from the prior grouping is used.  The
side-effect of the effort is that the number of output_tuples has now been
reduced to only those matching the qual - and so it now must take on a new
value to represent this.

David J.​


[HACKERS] Proposal: generic WAL compression

2017-10-31 Thread Oleg Ivanov

Hackers,

a few years ago generic WAL was proposed by Alexander Korotkov 
(https://www.postgresql.org/message-id/flat/CAPpHfdsXwZmojm6Dx%2BTJnpYk27kT4o7Ri6X_4OSWcByu1Rm%2BVA%40mail.gmail.com#capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com). 
and was committed into PostgreSQL 9.6.
One of the generic WAL advantages is the common interface for safe 
interaction with WAL for modules and extensions. The interface allows 
module to register the page, then change it, and then generic WAL 
generates and writes into xlog the algorithm of changing the old version 
of page into the new one. In the case of crash and recovery this 
algorithm may be applied.


Bloom and RUM indexes use generic WAL. The issue is that the generic 
algorithm of turning the old page into the new one is not optimal in the 
sense of record size in xlog. Now the main idea of the approach is to 
find all positions in the page where new version is different from the 
original one and write these changes into xlog. It works well if the 
page was rewritten completely or if only a few bytes have been changed. 
Nevertheless, this approach produces too large WAL record in the case of 
inserting or deleting a few bytes into the page. In this case there are 
almost no position where the old version and the new one are equal, so 
the record size is near the page size, but actual amount of changes in 
the page is small. This is exactly what happens often in RUM indexes.


In order to overcome that issue, I would like to propose the patch, 
which provides possibility to use another approach of the WAL record 
construction. If another approach fails to make a short enough record, 
it rolls back to the original approach. The main idea of another 
approach is not to perform bytewise comparison of pages, but finding the 
minimal editing distance between two pages and the corresponding editing 
algorithm. In the general case, finding editing distance takes O(N^2) 
time and memory. But there is also an algorithm which requires only 
O(ND) time and O(D^2) memory, where D is the editing distance between 
two sequences. Also for given D algorithm may show that the minimal 
editing distance between two sequences is more than D in the same amount 
of time and memory.


The special case of this algorithm which does not consider replace 
operations is described in the paper 
(http://www.xmailserver.org/diff2.pdf). The version of this algorithm 
which consumes O(ND) time and O(N) memory is used in diff console 
command, but for our purposes we don't need to increase the constant for 
the time in order to decrease memory complexity. For RUM indexes we 
usually have small enough editing distance (less than 64), so D^2 is not 
too much to store.


The results of experiments:

+--+++++
| Name |  WAL_diff(MB)  |  WAL_orig(MB) |  
Time_diff(s)  |  Time_orig(s)  |

+--+++++
| rum: make installcheck   | 38.9   | 82.5 | 4.37   
| 4.16   |

+--+++++
| bloom: make installcheck | 1.0| 1.0 | 0.66   | 
0.53   |

+--+++++
| test00.sql | 20.5   | 51.0 | 1.86   | 
1.41   |

+--+++++
| test01.sql   | 207.1  | 219.7   | 8.06 | 6.89   
|

+--+++++

We can see that the patch provides a little slowdown, but compresses 
generic WAL efficiently for RUM index. Also I'm going to do a few more 
experiments on this patch with another data.


The patch was tested on Lubuntu 14.04, but should not contain any 
platform-specific items. The patch, the files and scripts for doing the 
experiments and performance tests are attached.


Oleg Ivanov
Postgres Professional
The Russian PostgreSQL Company
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index 3adbf7b..cd0ed2a 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -43,9 +43,18 @@
  * a full page's worth of data.
  *-
  */
-#define FRAGMENT_HEADER_SIZE	(2 * sizeof(OffsetNumber))
+#define FRAGMENT_HEADER_SIZE	(2 * (sizeof(OffsetNumber) + \
+	  sizeof(char) + sizeof(int)))
 #define MATCH_THRESHOLD			FRAGMENT_HEADER_SIZE
-#define MAX_DELTA_SIZE			(BLCKSZ + 2 * FRAGMENT_HEADER_SIZE)
+#define MAX_DELTA_SIZE			(BLCKSZ + 2 * FRAGMENT_HEADER_SIZE) + sizeof(bool)
+

Re: [HACKERS] Account for cost and selectivity of HAVING quals

2017-10-31 Thread Tels
Moin,

On Tue, October 31, 2017 5:45 pm, Tom Lane wrote:
> Pursuant to the discussion at
> https://www.postgresql.org/message-id/20171029112420.8920b5f...@mx.zeyos.com
> here's a patch to fix the planner so that eval costs and selectivity of
> HAVING quals are factored into the appropriate plan node numbers.
> Perhaps unsurprisingly, this doesn't change the results of any
> existing regression tests.
>
> It's slightly annoying that this approach will result in calculating
> the eval costs and selectivity several times, once for each aggregation
> plan type we consider.  I thought about inserting RestrictInfo nodes
> into the havingQual so that those numbers could be cached, but that turned
> out to break various code that doesn't expect to see such nodes there.
> I'm not sure it's worth going to the trouble of fixing that; in the big
> scheme of things, the redundant calculations likely don't cost much, since
> we aren't going to have relevant statistics.
>
> Comments?  If anyone wants to do a real review of this, I'm happy to stick
> it into the upcoming CF; but without an expression of interest, I'll just
> push it.  I don't think there's anything terribly controversial here.

Not a review, but the patch has this:


+ total_cost += qual_cost.startup + output_tuples *
qual_cost.per_tuple;
+
+ output_tuples = clamp_row_est(output_tuples *

...

That looks odd to me, it first uses output_tuples in a formula, then
overwrites the value with a new value. Should these lines be swapped?

Best regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tomas Vondra


On 10/31/2017 11:44 PM, Tomas Vondra wrote:
> ...
> Unfortunately, I think we still have a problem ... I've been wondering
> if we end up producing correct indexes, so I've done a simple test.
> 
> 1) create the table as before
> 
> 2) let the insert + vacuum run for some time, to see if there are
> crashes (result: no crashes after one hour, inserting ~92M rows)
> 
> 3) do a bunch of random updates on the data (while still doing the
> concurrent vacuum in another session)
> 
> 4) run a bunch of simple queries to compare the results, essentially
> 
>-- BRIN index
>SET enable_bitmapscan = on;
>SELECT COUNT(*) FROM brin_test WHERE a = $1;
> 
> 
>-- seq scan
>SET enable_bitmapscan = on;
>SELECT COUNT(*) FROM brin_test WHERE a = $1;
> 
> and unfortunately what I get is not particularly pleasant:
> 
> test=# set enable_bitmapscan = on;
> SET
> test=# select count(*) from brin_test where a = 0;
>  count
> ---
>   9062
> (1 row)
> 
> test=# set enable_bitmapscan = off;
> SET
> test=# select count(*) from brin_test where a = 0;
>  count
> ---
>   9175
> (1 row)
> 
> Attached is a SQL script with commands I used. You'll need to copy the
> commands into multiple psql sessions, though, to simulate concurrent
> activity).
> 

FWIW I can reproduce this on 9.5, and I don't even need to run the
UPDATE part. That is, INSERT + VACUUM running concurrently is enough to
produce broken BRIN indexes :-(


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors

2017-10-31 Thread Tom Lane
"David G. Johnston"  writes:
> ​Definitely moderates my opinion in my concurrent emai​l...though
> postponement is not strictly bad given the seeming frequency of the
> existing problematic syntax in the wild already.

Yeah, I'd hoped to get some capability extensions done here before v10
shipped, in line with the theory I've expressed in the past that it's
better if you can point to actual new features justifying a compatibility
break.  However, that didn't happen in time.

I'm disinclined to revert the change though; if there are people making
use of this oddity now, then the longer we leave it in place the more
people are going to be hurt when we do break it.

If I had a time machine, I'd go back and fix the original multi-column
SET patch so that it required the word ROW in all cases --- then at
least it'd have accepted a conformant subset of the standard syntax.
Oh well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Gilles Darold
Le 31/10/2017 à 23:36, Gilles Darold a écrit :
> Le 31/10/2017 à 22:28, srielau a écrit :
>> Pavel,
>>
>> There is no
>> DECLARE TEMP CURSOR
>> or
>> DECLARE TEMP variable in PLpgSQL
>> and
>> CREATE TEMP TABLE has a different meaning from what I understand you
>> envision for variables.
>>
>> But maybe I'm mistaken. Your original post did not describe the entire
>> syntax:
>> CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type 
>>   [ DEFAULT expression ] [[NOT] NULL]
>>   [ ON TRANSACTION END { RESET | DROP } ]
>>   [ { VOLATILE | STABLE } ];
>>
>> Especially the TEMP is not spelled out and how its presence affects or
>> doesn't ON TRANSACTION END.
>> So may be if you elaborate I understand where you are coming from.
> I think that the TEMP keyword can be removed. If I understand well the
> default scope for variable is the session, every transaction in a
> session will see the same value. For the transaction level, probably the
> reason of the TEMP keyword, I think the [ ON TRANSACTION END { RESET |
> DROP } ] will allow to restrict the scope to this transaction level
> without needing the TEMP keyword. When a variable is created in a
> transaction, it is temporary if "ON TRANSACTION END DROP" is used
> otherwise it will persist after the transaction end. I guess that this
> is the same as using TEMP keyword?

I forgot to say that in the last case the DECLARE statement can be used
so I don't see the reason of this kind of "temporary" variables.

Maybe the variable object like used in DB2 and defined in document :
https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/sqlref/src/tpc/db2z_sql_createvariable.html
could be enough to cover our needs.


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors

2017-10-31 Thread David G. Johnston
On Tue, Oct 31, 2017 at 3:43 PM, Tom Lane  wrote:

> According to the spec, the elements of a parenthesized
> SET list should be assigned from the fields of a composite RHS.  If
> there's just one element of the SET list, the RHS should be a single-field
> composite value, and this syntax should result in extracting its field.
> This patch doesn't do that; it preserves our previous broken behavior,
> and thus just puts off the day of reckoning that inevitably will come.
>

​Definitely moderates my opinion in my concurrent emai​l...though
postponement is not strictly bad given the seeming frequency of the
existing problematic syntax in the wild already.

David J.


Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors

2017-10-31 Thread David G. Johnston
On Tue, Oct 31, 2017 at 3:14 PM, Rob McColl  wrote:

>
>> I believe that this is not an intended change or behavior, but is instead
>> an unintentional side effect of 906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd
>> Improve handling of "UPDATE ... SET (column_list) = row_constructor". (
>> https://github.com/postgres/postgres/commit/906bfcad7ba7cb3
>> 863fe0e2a7810be8e3cd84fbd).
>>
>
​At this point the intent of 906bfcad doesn't really matter - and given the
number of complaints in the month since ​v10 went live I'm tending to lean
toward bringing the pre-10 behavior back. There is no ambiguity involved
here and the breakage of existing applications seems considerably worse
than the technical oddity of allowing (val) to be interpreted as a
row_constructor in this situation.  From a standards perspective we are
strictly more permissive so no new problem there.

On a related note, the 10.0 syntax guide is wrong, it needs to break out
the parenthesized single-column and multi-column variants separately:

Presently: ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [,
...] )

Basically one cannot specify only a single column_name AND omit ROW

It would have been nice if the syntax for that variant would have been:

( column_name, column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT },
{ expression | DEFAULT } [, ...] )
​
If the v10 behavior remains the above change should be made as well as
adding:

( column_name ) = ROW ( expression | DEFAULT )

If we revert 10 to the pre-10 behavior the existing syntax will work.

David J.


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tomas Vondra
Hi,

On 10/31/2017 08:46 PM, Tom Lane wrote:
> I wrote:
>> maybe
>> we just have some run-of-the-mill bugs to find, like the off-the-end
>> bug I spotted in brin_doupdate.  There's apparently at least one
>> more, but given the error message it must be something like not
>> checking for a page to have turned into a revmap page.  Shouldn't
>> be too hard to find...
> 
> Actually, I think it might be as simple as the attached.
> brin_getinsertbuffer checks for the old page having turned into revmap,
> but the "samepage" path in brin_doupdate does not :-(
> 
> With this applied, Alvaro's version of the test case has survived
> without error for quite a bit longer than its former MTBF.  There
> might still be some issues though in other code paths.
> 

That does fix the crashes for me - I've been unable to reproduce any
even after one hour (it took a couple of minutes to crash before).

Unfortunately, I think we still have a problem ... I've been wondering
if we end up producing correct indexes, so I've done a simple test.

1) create the table as before

2) let the insert + vacuum run for some time, to see if there are
crashes (result: no crashes after one hour, inserting ~92M rows)

3) do a bunch of random updates on the data (while still doing the
concurrent vacuum in another session)

4) run a bunch of simple queries to compare the results, essentially

   -- BRIN index
   SET enable_bitmapscan = on;
   SELECT COUNT(*) FROM brin_test WHERE a = $1;


   -- seq scan
   SET enable_bitmapscan = on;
   SELECT COUNT(*) FROM brin_test WHERE a = $1;

and unfortunately what I get is not particularly pleasant:

test=# set enable_bitmapscan = on;
SET
test=# select count(*) from brin_test where a = 0;
 count
---
  9062
(1 row)

test=# set enable_bitmapscan = off;
SET
test=# select count(*) from brin_test where a = 0;
 count
---
  9175
(1 row)

Attached is a SQL script with commands I used. You'll need to copy the
commands into multiple psql sessions, though, to simulate concurrent
activity).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


brin-test.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors

2017-10-31 Thread Tom Lane
Rob McColl  writes:
> Attaching patch... :-/

The reason why hacking your way to a backwards-compatible solution is
a bad idea is that it breaks the SQL standard compliance we're trying to
achieve here.  According to the spec, the elements of a parenthesized
SET list should be assigned from the fields of a composite RHS.  If
there's just one element of the SET list, the RHS should be a single-field
composite value, and this syntax should result in extracting its field.
This patch doesn't do that; it preserves our previous broken behavior,
and thus just puts off the day of reckoning that inevitably will come.

As a concrete example, the spec says this should work:

create table t (f1 int);
update t set (f1) = row(4);

and it does in v10, but (without having tested) your patch will break it.
This is not so exciting for simple row constructors, where you could just
leave off the word ROW; but it is a critical distinction if we're ever to
get to the point of allowing other composite-returning constructs here,
for example composite-returning functions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Gilles Darold
Le 31/10/2017 à 22:28, srielau a écrit :
> Pavel,
>
> There is no
> DECLARE TEMP CURSOR
> or
> DECLARE TEMP variable in PLpgSQL
> and
> CREATE TEMP TABLE has a different meaning from what I understand you
> envision for variables.
>
> But maybe I'm mistaken. Your original post did not describe the entire
> syntax:
> CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type 
>   [ DEFAULT expression ] [[NOT] NULL]
>   [ ON TRANSACTION END { RESET | DROP } ]
>   [ { VOLATILE | STABLE } ];
>
> Especially the TEMP is not spelled out and how its presence affects or
> doesn't ON TRANSACTION END.
> So may be if you elaborate I understand where you are coming from.

I think that the TEMP keyword can be removed. If I understand well the
default scope for variable is the session, every transaction in a
session will see the same value. For the transaction level, probably the
reason of the TEMP keyword, I think the [ ON TRANSACTION END { RESET |
DROP } ] will allow to restrict the scope to this transaction level
without needing the TEMP keyword. When a variable is created in a
transaction, it is temporary if "ON TRANSACTION END DROP" is used
otherwise it will persist after the transaction end. I guess that this
is the same as using TEMP keyword?


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-10-31 Thread Peter Geoghegan
On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia
 wrote:
> Attaching the re based patch according to the v22 parallel-hash patch sets

I took a quick look at this today, and noticed a few issues:

* make_name() is used to name files in sharedtuplestore.c, which is
what is passed to BufFileOpenShared() for parallel hash join. Your
using your own logic for that within the equivalent logtape.c call to
BufFileOpenShared(), presumably because make_name() wants to identify
participants by PID rather than by an ordinal identifier number.

I think that we need some kind of central registry for things that use
shared buffiles. It could be that sharedtuplestore.c is further
generalized to support this, or it could be that they both call
something else that takes care of naming. It's not okay to have this
left to random chance.

You're going to have to ask Thomas about this. You should also use
MAXPGPATH for the char buffer on the stack.

* This logtape.c comment needs to be updated, as it's no longer true:

 * successfully.  In general, workers can take it that the leader will
 * reclaim space in files under their ownership, and so should not
 * reread from tape.

* Robert hated the comment changes in the header of nbtsort.c. You
might want to change it back, because he is likely to be the one that
commits this.

* You should look for similar comments in tuplesort.c (IIRC a couple
of places will need to be revised).

* tuplesort_begin_common() should actively reject a randomAccess
parallel case using elog(ERROR).

* tuplesort.h should note that randomAccess isn't supported, too.

* What's this all about?:

+ /* Accessor for the SharedBufFileSet that is at the end of Sharedsort. */
+ #define GetSharedBufFileSet(shared)\
+   ((BufFileSet *) (&(shared)->tapes[(shared)->nTapes]))

You can't just cast from one type to the other without regard for the
underling size of the shared memory buffer, which is what this looks
like to me. This only fails to crash because you're only abusing the
last member in the tapes array for this purpose, and there happens to
be enough shared memory slop that you get away with it. I'm pretty
sure that ltsUnify() ends up clobbering the last/leader tape, which is
a place where BufFileSet is also used, so this is just wrong. You
should rethink the shmem structure a little bit.

* There is still that FIXME comment within leader_takeover_tapes(). I
believe that you should still have a leader tape (at least in local
memory in the leader), even though you'll never be able to do anything
with it, since randomAccess is no longer supported. You can remove the
FIXME, and just note that you have a leader tape to be consistent with
the serial case, though recognize that it's not useful. Note that even
with randomAccess, we always had the leader tape, so it's not that
different, really.

I suppose it might make sense to make shared->tapes not have a leader
tape. It hardly matters -- perhaps you should leave it there in order
to keep the code simple, as you'll be keeping the leader tape in local
memory, too. (But it still won't fly to continue to clobber it, of
course -- you still need to find a dedicated place for BufFileSet in
shared memory.)

That's all I have right now.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-31 Thread David Steele
On 10/30/17 6:36 AM, Michael Paquier wrote:
> On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
>>
>> How does rep mgr or other programs using pg_rewind know what to exclude?
> 
> Good question. Answers could come from folks such as David Steele
> (pgBackRest) or Marco (barman) whom I am attaching in CC.

pgBackRest does not use pg_rewind.  However, pgBackRest does use the
same exclusion list for backups as pg_basebackup.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-10-31 Thread Tomas Vondra
Hi,

On 10/20/2017 03:23 PM, Robert Haas wrote:
>
> ...
> 
> The main points I want to make clearly understood is the current
> design relies on (1) functions being labeled correctly and (2) other
> dangerous code paths being unreachable because there's nothing that
> runs between EnterParallelMode and ExitParallelMode which could invoke
> them, except by calling a mislabeled function.  Your patch expands the
> vulnerability surface from "executor code that can be reached without
> calling a mislabeled function" to "any code that can be reached by
> typing an SQL command".  Just rejecting any queries that are
> parallel-unsafe probably closes a good chunk of the holes, but that
> still leaves a lot of code that's never been run in parallel mode
> before potentially now running in parallel mode - e.g. any DDL command
> you happen to type, transaction control commands, code that only runs
> when the server is idle like idle_in_transaction_timeout, cursor
> operations.  A lot of that stuff is probably fine, but it's got to be
> thought through.  Error handling might be a problem, too: what happens
> if a parallel worker is killed while the query is suspended?  I
> suspect that doesn't work very nicely at all.
> 

OK, understood and thanks for explaining what may be the possible
issues. I do appreciate that.

I still think it'd be valuable to support this, though, so I'm going to
spend more time on investigating what needs to be handled.

But maybe there's a simpler option - what if we only allow fetches from
the PARALLEL cursor while the cursor is open? That is, this would work:

BEGIN;
...
DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...;
FETCH 1000 FROM x;
FETCH 1000 FROM x;
FETCH 1000 FROM x;
CLOSE x;
...
COMMIT;

but adding any other command between the OPEN/CLOSE commands would fail.
That should close all the holes with parallel-unsafe stuff, right?

Of course, this won't solve the issue with error handling / killing
suspended workers (which didn't occur to me before as a possible issue
at all, so that's for pointing that out). But that's a significantly
more limited issue to fix than all the parallel-unsafe bits.

Now, I agree this is somewhat more limited than I hoped for, but OTOH it
still solves the issue I initially aimed for (processing large query
results in efficient way).


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors

2017-10-31 Thread Rob McColl
Attaching patch... :-/

On Tue, Oct 31, 2017 at 4:27 PM, Rob McColl  wrote:

> Between 9.6.5 and 10, the handling of parenthesized single-column UPDATE
> statements changed. In 9.6.5, they were treated identically to
> unparenthesized single-column UPDATES. In 10, they are treated as
> multiple-column updates.  This results in this being valid in Postgres
> 9.6.5, but an error in Postgres 10:
>
> CREATE TABLE test (id INT PRIMARY KEY, data INT);
> INSERT INTO test VALUES (1, 1);
> UPDATE test SET (data) = (2) WHERE id = 1;
>
> In 10 and the current master, this produces the error:
>
> errmsg("source for a multiple-column UPDATE item must be a sub-SELECT or
> ROW() expression")
>
> I believe that this is not an intended change or behavior, but is instead
> an unintentional side effect of 906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd
> Improve handling of "UPDATE ... SET (column_list) = row_constructor". (
> https://github.com/postgres/postgres/commit/906bfcad7ba7cb3
> 863fe0e2a7810be8e3cd84fbd).
>
> This is a small patch to the grammar that restores the previous behavior
> by adding a rule to the set_clause rule and modifying the final rule of the
> set_clause rule to only match lists of more then one element.  I'm not sure
> if there are more elegant or preferred ways to address this.
>
> Compiled and tested on Ubuntu 17.04 Linux 4.10.0-33-generic x86_64.
>
> Regression test added under the update test to cover the parenthesized
> single-column case.
>
> I see no reason this would affect performance.
>
> Thanks,
> -rob
>
> --
> Rob McColl
> @robmccoll
> r...@robmccoll.com
> 205.422.0909 <(205)%20422-0909>
>
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
new file mode 100644
index 4c83a63..b5f4ccf
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*** set_clause:
*** 10694,10712 
  	$1->val = (Node *) $3;
  	$$ = list_make1($1);
  }
! 			| '(' set_target_list ')' '=' a_expr
  {
! 	int ncolumns = list_length($2);
  	int i = 1;
  	ListCell *col_cell;
  
  	/* Create a MultiAssignRef source for each target */
  	foreach(col_cell, $2)
  	{
  		ResTarget *res_col = (ResTarget *) lfirst(col_cell);
  		MultiAssignRef *r = makeNode(MultiAssignRef);
  
! 		r->source = (Node *) $5;
  		r->colno = i;
  		r->ncolumns = ncolumns;
  		res_col->val = (Node *) r;
--- 10694,10720 
  	$1->val = (Node *) $3;
  	$$ = list_make1($1);
  }
! 			| '(' set_target ')' '=' a_expr
  {
! 	$2->val = (Node *) $5;
! 	$$ = list_make1($2);
! }
! 			| '(' set_target_list ',' set_target ')' '=' a_expr
! {
! 	int ncolumns;
  	int i = 1;
  	ListCell *col_cell;
  
+ 	$2 = lappend($2,$4);
+ 	ncolumns = list_length($2);
+ 
  	/* Create a MultiAssignRef source for each target */
  	foreach(col_cell, $2)
  	{
  		ResTarget *res_col = (ResTarget *) lfirst(col_cell);
  		MultiAssignRef *r = makeNode(MultiAssignRef);
  
! 		r->source = (Node *) $7;
  		r->colno = i;
  		r->ncolumns = ncolumns;
  		res_col->val = (Node *) r;
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
new file mode 100644
index cef70b1..90d33ad
*** a/src/test/regress/expected/update.out
--- b/src/test/regress/expected/update.out
*** SELECT * FROM update_test;
*** 76,82 
   100 | 21 | 
  (4 rows)
  
! UPDATE update_test SET (c,b,a) = ('bugle', b+11, DEFAULT) WHERE c = 'foo';
  SELECT * FROM update_test;
a  | b  |   c   
  -++---
--- 76,93 
   100 | 21 | 
  (4 rows)
  
! -- parenthesized single column should be valid
! UPDATE update_test SET (c) = ('bungle') WHERE c = 'foo';
! SELECT * FROM update_test;
!   a  | b  |   c
! -++
!  100 | 20 | 
!  100 | 21 | 
!  100 | 20 | bungle
!  100 | 21 | bungle
! (4 rows)
! 
! UPDATE update_test SET (c,b,a) = ('bugle', b+11, DEFAULT) WHERE c = 'bungle';
  SELECT * FROM update_test;
a  | b  |   c   
  -++---
diff --git a/src/test/regress/sql/update.sql b/src/test/regress/sql/update.sql
new file mode 100644
index 66d1fec..0ed5f6a
*** a/src/test/regress/sql/update.sql
--- b/src/test/regress/sql/update.sql
*** UPDATE update_test SET a = v.* FROM (VAL
*** 51,57 
  INSERT INTO update_test SELECT a,b+1,c FROM update_test;
  SELECT * FROM update_test;
  
! UPDATE update_test SET (c,b,a) = ('bugle', b+11, DEFAULT) WHERE c = 'foo';
  SELECT * FROM update_test;
  UPDATE update_test SET (c,b) = ('car', a+b), a = a + 1 WHERE a = 10;
  SELECT * FROM update_test;
--- 51,60 
  INSERT INTO update_test SELECT a,b+1,c FROM update_test;
  SELECT * FROM update_test;
  
! -- parenthesized single column should be valid
! UPDATE update_test SET (c) = ('bungle') WHERE c = 'foo';
! SELECT * FROM update_test;
! UPDATE update_test SET (c,b,a) = ('bugle', b+11, DEFAULT) WHERE c = 'bungle';
  SELECT * FROM update_test;
  UPDATE 

[HACKERS] Account for cost and selectivity of HAVING quals

2017-10-31 Thread Tom Lane
Pursuant to the discussion at
https://www.postgresql.org/message-id/20171029112420.8920b5f...@mx.zeyos.com
here's a patch to fix the planner so that eval costs and selectivity of
HAVING quals are factored into the appropriate plan node numbers.
Perhaps unsurprisingly, this doesn't change the results of any
existing regression tests.

It's slightly annoying that this approach will result in calculating
the eval costs and selectivity several times, once for each aggregation
plan type we consider.  I thought about inserting RestrictInfo nodes
into the havingQual so that those numbers could be cached, but that turned
out to break various code that doesn't expect to see such nodes there.
I'm not sure it's worth going to the trouble of fixing that; in the big
scheme of things, the redundant calculations likely don't cost much, since
we aren't going to have relevant statistics.

Comments?  If anyone wants to do a real review of this, I'm happy to stick
it into the upcoming CF; but without an expression of interest, I'll just
push it.  I don't think there's anything terribly controversial here.

regards, tom lane

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ce32b8a..98fb16e 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*** void
*** 1874,1879 
--- 1874,1880 
  cost_agg(Path *path, PlannerInfo *root,
  		 AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
  		 int numGroupCols, double numGroups,
+ 		 List *quals,
  		 Cost input_startup_cost, Cost input_total_cost,
  		 double input_tuples)
  {
*** cost_agg(Path *path, PlannerInfo *root,
*** 1955,1960 
--- 1956,1981 
  		output_tuples = numGroups;
  	}
  
+ 	/*
+ 	 * If there are quals (HAVING quals), account for their cost and
+ 	 * selectivity.
+ 	 */
+ 	if (quals)
+ 	{
+ 		QualCost	qual_cost;
+ 
+ 		cost_qual_eval(_cost, quals, root);
+ 		startup_cost += qual_cost.startup;
+ 		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
+ 
+ 		output_tuples = clamp_row_est(output_tuples *
+ 	  clauselist_selectivity(root,
+ 			 quals,
+ 			 0,
+ 			 JOIN_INNER,
+ 			 NULL));
+ 	}
+ 
  	path->rows = output_tuples;
  	path->startup_cost = startup_cost;
  	path->total_cost = total_cost;
*** cost_windowagg(Path *path, PlannerInfo *
*** 2040,2051 
--- 2061,2075 
  void
  cost_group(Path *path, PlannerInfo *root,
  		   int numGroupCols, double numGroups,
+ 		   List *quals,
  		   Cost input_startup_cost, Cost input_total_cost,
  		   double input_tuples)
  {
+ 	double		output_tuples;
  	Cost		startup_cost;
  	Cost		total_cost;
  
+ 	output_tuples = numGroups;
  	startup_cost = input_startup_cost;
  	total_cost = input_total_cost;
  
*** cost_group(Path *path, PlannerInfo *root
*** 2055,2061 
  	 */
  	total_cost += cpu_operator_cost * input_tuples * numGroupCols;
  
! 	path->rows = numGroups;
  	path->startup_cost = startup_cost;
  	path->total_cost = total_cost;
  }
--- 2079,2105 
  	 */
  	total_cost += cpu_operator_cost * input_tuples * numGroupCols;
  
! 	/*
! 	 * If there are quals (HAVING quals), account for their cost and
! 	 * selectivity.
! 	 */
! 	if (quals)
! 	{
! 		QualCost	qual_cost;
! 
! 		cost_qual_eval(_cost, quals, root);
! 		startup_cost += qual_cost.startup;
! 		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
! 
! 		output_tuples = clamp_row_est(output_tuples *
! 	  clauselist_selectivity(root,
! 			 quals,
! 			 0,
! 			 JOIN_INNER,
! 			 NULL));
! 	}
! 
! 	path->rows = output_tuples;
  	path->startup_cost = startup_cost;
  	path->total_cost = total_cost;
  }
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c84a2c..f620243 100644
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
*** choose_hashed_setop(PlannerInfo *root, L
*** 977,982 
--- 977,983 
  	 */
  	cost_agg(_p, root, AGG_HASHED, NULL,
  			 numGroupCols, dNumGroups,
+ 			 NIL,
  			 input_path->startup_cost, input_path->total_cost,
  			 input_path->rows);
  
*** choose_hashed_setop(PlannerInfo *root, L
*** 991,996 
--- 992,998 
  			  input_path->rows, input_path->pathtarget->width,
  			  0.0, work_mem, -1.0);
  	cost_group(_p, root, numGroupCols, dNumGroups,
+ 			   NIL,
  			   sorted_p.startup_cost, sorted_p.total_cost,
  			   input_path->rows);
  
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2d491eb..eafec2a 100644
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
*** create_result_path(PlannerInfo *root, Re
*** 1374,1379 
--- 1374,1380 
  	pathnode->path.startup_cost = target->cost.startup;
  

Re: [HACKERS] proposal: schema variables

2017-10-31 Thread srielau
Pavel,

There is no
DECLARE TEMP CURSOR
or
DECLARE TEMP variable in PLpgSQL
and
CREATE TEMP TABLE has a different meaning from what I understand you
envision for variables.

But maybe I'm mistaken. Your original post did not describe the entire
syntax:
CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type 
  [ DEFAULT expression ] [[NOT] NULL]
  [ ON TRANSACTION END { RESET | DROP } ]
  [ { VOLATILE | STABLE } ];

Especially the TEMP is not spelled out and how its presence affects or
doesn't ON TRANSACTION END.
So may be if you elaborate I understand where you are coming from.

 



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Pavel Stehule
2017-10-31 22:08 GMT+01:00 Serge Rielau :

> Pavel,
>
> I can imagine, so DECLARE command will be introduced as short cut for
> CREATE TEMP VARIABLE, but in this moment I would not to open this topic. I
> afraid of bikeshedding and I hope so CREATE TEMP VAR is anough.
>
> Language is important because language stays.
> You choice of syntax will outlive your code and possibly yourself.
>

sure. But in this moment I don't see difference between DECLARE VARIABLE
and CREATE TEMP VARIABLE different than "TEMP" keyword.

Regards

Pavel


> My 2 cents
> Serge
>


Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-10-31 Thread Shubham Barai
On 9 October 2017 at 18:57, Alexander Korotkov 
wrote:

> On Thu, Oct 5, 2017 at 9:48 PM, Shubham Barai 
> wrote:
>
>> On 3 October 2017 at 00:32, Alexander Korotkov > > wrote:
>>
>>> On Mon, Oct 2, 2017 at 9:11 PM, Andrew Borodin 
>>> wrote:
>>>
 On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkov
  wrote:
 > What happen if exactly this "continue" fires?
 >
 >> if (GistFollowRight(stack->page))
 >> {
 >> if (!xlocked)
 >> {
 >> LockBuffer(stack->buffer, GIST_UNLOCK);
 >> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
 >> xlocked = true;
 >> /* someone might've completed the split when we unlocked */
 >> if (!GistFollowRight(stack->page))
 >> continue;
 >
 >
 > In this case we might get xlocked == true without calling
 > CheckForSerializableConflictIn().
 Indeed! I've overlooked it. I'm remembering this issue, we were
 considering not fixing splits if in Serializable isolation, but
 dropped the idea.

>>>
>>> Yeah, current insert algorithm assumes that split must be fixed before
>>> we can correctly traverse the tree downwards.
>>>
>>>
 CheckForSerializableConflictIn() must be after every exclusive lock.

>>>
>>> I'm not sure, that fixing split is the case to necessary call
>>> CheckForSerializableConflictIn().  This lock on leaf page is not taken
>>> to do modification of the page.  It's just taken to ensure that nobody else
>>> is fixing this split the same this.  After fixing the split, it might
>>> appear that insert would go to another page.
>>>
>>> > I think it would be rather safe and easy for understanding to more
 > CheckForSerializableConflictIn() directly before gistinserttuple().
 The difference is that after lock we have conditions to change page,
 and before gistinserttuple() we have actual intent to change page.

 From the point of future development first version is better (if some
 new calls occasionally spawn in), but it has 3 calls while your
 proposal have 2 calls.
 It seems to me that CheckForSerializableConflictIn() before
 gistinserttuple() is better as for now.

>>>
>>> Agree.
>>>
>>
>> I have updated the location of  CheckForSerializableConflictIn()  and
>> changed the status of the patch to "needs review".
>>
>
> Now, ITSM that predicate locks and conflict checks are placed right for
> now.
> However, it would be good to add couple comments to gistdoinsert() whose
> would state why do we call CheckForSerializableConflictIn() in these
> particular places.
>
> I also take a look at isolation tests.  You made two separate test specs:
> one to verify that serialization failures do fire, and another to check
> there are no false positives.
> I wonder if we could merge this two test specs into one, but use more
> variety of statements with different keys for both inserts and selects.
>

Please find the updated version of patch here. I have made suggested
changes.

Regards,
Shubham


Predicate-locking-in-gist-index_4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Dynamic result sets from procedures

2017-10-31 Thread Peter Eisentraut
This patch is more of a demo of what could be done, not my primary
focus, but if there is interest and some assistance, maybe we can make
something out of it.  This patch also goes on top of "SQL procedures"
version 1.

The purpose is to return multiple result sets from a procedure.  This
is, I think, a common request when coming from MS SQL and DB2.  MS SQL
has a completely different procedure syntax, but this proposal is
compatible with DB2, which as usual was the model for the SQL standard.
So this is what it can do:

CREATE PROCEDURE pdrstest1()
LANGUAGE SQL
AS $$
DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
$$;

CALL pdrstest1();

and that returns those two result sets to the client.

That's all it does for now.  Things get more complex when you consider
nested calls.  The SQL standard describes additional facilities how an
outer procedure can accept a called procedure's result sets, or not.  In
the thread on transaction control, I mentioned that we might need some
kind of procedure call stack.  Something like that would be needed here
as well.  There are also probably some namespacing issues around the
cursors that need more investigation.

A more mundane issue is how we get psql to print multiple result sets.
I have included here a patch that does that, and you can see that new
result sets start popping up in the regression tests already.  There is
also one need error that needs further investigation.

We need to think about how the \timing option should work in such
scenarios.  Right now it does

start timer
run query
fetch result
stop timer
print result

If we had multiple result sets, the most natural flow would be

start timer
run query
while result sets
fetch result
print result
stop timer
print time

but that would include the printing time in the total time, which the
current code explicitly does not.  We could also temporarily save the
result sets, like

start timer
run query
while result sets
fetch result
stop timer
foreach result set
print result

but that would have a lot more overhead, potentially.

Thoughts?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 2e5d50cb39b926b29a6081f2387b95621357a4a0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 19 Oct 2017 08:18:47 -0400
Subject: [PATCH v1 1/2] psql: Display multiple result sets

If a query returns multiple result sets, display all of them instead of
only the one that PQexec() returns.

Adjust various regression tests to handle the new additional output.
---
 src/bin/psql/common.c  | 25 +++--
 src/test/regress/expected/copyselect.out   |  5 +++
 src/test/regress/expected/psql.out |  6 +---
 src/test/regress/expected/transactions.out | 56 ++
 4 files changed, 76 insertions(+), 16 deletions(-)

diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 9b59ee840b..2b6bd56e12 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1390,22 +1390,25 @@ SendQuery(const char *query)
if (pset.timing)
INSTR_TIME_SET_CURRENT(before);
 
-   results = PQexec(pset.db, query);
+   PQsendQuery(pset.db, query);
 
/* these operations are included in the timing result: */
ResetCancelConn();
-   OK = ProcessResult();
-
-   if (pset.timing)
+   while ((results = PQgetResult(pset.db)))
{
-   INSTR_TIME_SET_CURRENT(after);
-   INSTR_TIME_SUBTRACT(after, before);
-   elapsed_msec = INSTR_TIME_GET_MILLISEC(after);
-   }
+   OK = ProcessResult();
+
+   if (pset.timing)
+   {
+   INSTR_TIME_SET_CURRENT(after);
+   INSTR_TIME_SUBTRACT(after, before);
+   elapsed_msec = INSTR_TIME_GET_MILLISEC(after);
+   }
 
-   /* but printing results isn't: */
-   if (OK && results)
-   OK = PrintQueryResults(results);
+   /* but printing results isn't: */
+   if (OK && results)
+   OK = PrintQueryResults(results);
+   }
}
else
{
diff --git a/src/test/regress/expected/copyselect.out 
b/src/test/regress/expected/copyselect.out
index 72865fe1eb..a13e1b411b 100644
--- a/src/test/regress/expected/copyselect.out
+++ b/src/test/regress/expected/copyselect.out
@@ -136,6 +136,11 @@ copy (select 1) to stdout\; copy (select 2) to stdout\; 
select 0\; select 3; --
 
 create table test3 (c int);
 select 0\; copy test3 from stdin\; copy test3 from stdin\; select 1; -- 1
+ ?column? 

Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Serge Rielau
Pavel, I can imagine, so DECLARE command will be introduced as short cut 
for CREATE TEMP VARIABLE, but in this moment I would not to open this 
topic. I afraid of bikeshedding and I hope so CREATE TEMP VAR is anough. 
Language is important because language stays. You choice of syntax will 
outlive your code and possibly yourself.

My 2 cents Serge

Re: [HACKERS] SQL procedures

2017-10-31 Thread Pavel Stehule
2017-10-31 18:23 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> I've been working on SQL procedures.  (Some might call them "stored
> procedures", but I'm not aware of any procedures that are not stored, so
> that's not a term that I'm using here.)
>
> Everything that follows is intended to align with the SQL standard, at
> least in spirit.
>
> This first patch does a bunch of preparation work.  It adds the
> CREATE/ALTER/DROP PROCEDURE commands and the CALL statement to call a
> procedure.  It also adds ROUTINE syntax which can refer to a function or
> procedure.  I have extended that to include aggregates.  And then there
> is a bunch of leg work, such as psql and pg_dump support.  The
> documentation is a lot of copy-and-paste right now; that can be
> revisited sometime.  The provided procedural languages (an ever more
> confusing term) each needed a small touch-up to handle pg_proc entries
> with prorettype == 0.
>
> Right now, there is no support for returning values from procedures via
> OUT parameters.  That will need some definitional pondering; and see
> also below for a possible alternative.
>
> With this, you can write procedures that are somewhat compatible with
> DB2, MySQL, and to a lesser extent Oracle.
>
> Separately, I will send patches that implement (the beginnings of) two
> separate features on top of this:
>
> - Transaction control in procedure bodies
>
> - Returning multiple result sets
>
> (In various previous discussions on "real stored procedures" or
> something like that, most people seemed to have one of these two
> features in mind.  I think that depends on what other SQL systems one
> has worked with previously.)
>

Not sure if disabling RETURN is good idea. I can imagine so optional
returning something like int status can be good idea. Cheaper than raising
a exception.

Regards

Pavel


> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] proposal: schema variables

2017-10-31 Thread Pavel Stehule
Hi

2017-10-30 22:42 GMT+01:00 srielau :

> Pavel,
>
> I wouldn't put in the DROP option.
> Or at least not in that form of syntax.
>
> By convention CREATE persists DDL and makes object definitions visible
> across sessions.
> DECLARE defines session private objects which cannot collide with other
> sessions.
>
> If you want variables with a short lifetime that get dropped at the end of
> the transaction that by definition would imply a session private object. So
> it ought to be DECLARE'd.
>
> As far as I can see PG has been following this practice so far.
>

I am thinking so there is little bit overlap between DECLARE and CREATE
TEMP VARIABLE command. With DECLARE command, you are usually has not any
control when variable will be destroyed. For CREATE TEMP  is DROP IF
EXISTS, but it should not be used.

It should be very similar to our current temporary tables, that are created
in session related temp schema.

I can imagine, so DECLARE command will be introduced as short cut for
CREATE TEMP VARIABLE, but in this moment I would not to open this topic. I
afraid of bikeshedding and I hope so CREATE TEMP VAR is anough.

Regards

Pavel


> Cheers
> Serge Rielau
> Salesforce.com
>
>
>
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-
> f1928748.html
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] SQL procedures

2017-10-31 Thread Pavel Stehule
2017-10-31 18:23 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> I've been working on SQL procedures.  (Some might call them "stored
> procedures", but I'm not aware of any procedures that are not stored, so
> that's not a term that I'm using here.)
>
> Everything that follows is intended to align with the SQL standard, at
> least in spirit.
>
> This first patch does a bunch of preparation work.  It adds the
> CREATE/ALTER/DROP PROCEDURE commands and the CALL statement to call a
> procedure.  It also adds ROUTINE syntax which can refer to a function or
> procedure.  I have extended that to include aggregates.  And then there
> is a bunch of leg work, such as psql and pg_dump support.  The
> documentation is a lot of copy-and-paste right now; that can be
> revisited sometime.  The provided procedural languages (an ever more
> confusing term) each needed a small touch-up to handle pg_proc entries
> with prorettype == 0.
>
> Right now, there is no support for returning values from procedures via
> OUT parameters.  That will need some definitional pondering; and see
> also below for a possible alternative.
>
> With this, you can write procedures that are somewhat compatible with
> DB2, MySQL, and to a lesser extent Oracle.
>
> Separately, I will send patches that implement (the beginnings of) two
> separate features on top of this:
>
> - Transaction control in procedure bodies
>
> - Returning multiple result sets
>
> (In various previous discussions on "real stored procedures" or
> something like that, most people seemed to have one of these two
> features in mind.  I think that depends on what other SQL systems one
> has worked with previously.)
>

great. I hope so I can help with testing

Regards

Pavel

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
I wrote:
> maybe
> we just have some run-of-the-mill bugs to find, like the off-the-end
> bug I spotted in brin_doupdate.  There's apparently at least one
> more, but given the error message it must be something like not
> checking for a page to have turned into a revmap page.  Shouldn't
> be too hard to find...

Actually, I think it might be as simple as the attached.
brin_getinsertbuffer checks for the old page having turned into revmap,
but the "samepage" path in brin_doupdate does not :-(

With this applied, Alvaro's version of the test case has survived
without error for quite a bit longer than its former MTBF.  There
might still be some issues though in other code paths.

regards, tom lane

diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e..b0f86f3 100644
*** a/src/backend/access/brin/brin_pageops.c
--- b/src/backend/access/brin/brin_pageops.c
*** brin_doupdate(Relation idxrel, BlockNumb
*** 113,121 
  
  	/*
  	 * Check that the old tuple wasn't updated concurrently: it might have
! 	 * moved someplace else entirely ...
  	 */
! 	if (!ItemIdIsNormal(oldlp))
  	{
  		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
  
--- 113,127 
  
  	/*
  	 * Check that the old tuple wasn't updated concurrently: it might have
! 	 * moved someplace else entirely, and for that matter the whole page
! 	 * might've become a revmap page.  Note that in the first two cases
! 	 * checked here, the "oldlp" we just calculated is garbage; but
! 	 * PageGetItemId() is simple enough that it was safe to do that
! 	 * calculation anyway.
  	 */
! 	if (!BRIN_IS_REGULAR_PAGE(oldpage) ||
! 		oldoff > PageGetMaxOffsetNumber(oldpage) ||
! 		!ItemIdIsNormal(oldlp))
  	{
  		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Transaction control in procedures

2017-10-31 Thread Peter Eisentraut
Here is a patch that implements transaction control in PL/Python
procedures.  (This patch goes on top of "SQL procedures" patch v1.)

So you can do this:

CREATE PROCEDURE transaction_test1()
LANGUAGE plpythonu
AS $$
for i in range(0, 10):
plpy.execute("INSERT INTO test1 (a) VALUES (%d)" % i)
if i % 2 == 0:
 plpy.commit()
else:
 plpy.rollback()
$$;

CALL transaction_test1();

I started with PL/Python because the internal structures there are more
manageable.  Obviously, people will want this for PL/pgSQL as well, and
I have that in the works.  It's not in a usable state, but I have found
that the work needed is essentially the same as in PL/Python for example.

I have discovered three groups of obstacles that needed addressing to
make this work.  At this point, the patch is more of a demo of what
these issues are, and if we come to satisfactory solutions for each of
them, things should fall into place more easily afterwards.

1) While calling CommitTransactionCommand() in the middle of a utility
command works just fine (several utility commands do this, of course),
calling AbortCurrentTransaction() in a similar way does not.  There are
a few pieces of code that think that a transaction abort will always
result in a return to the main control loop, and so they will just clean
away everything.  This is what the changes in portalmem.c are about.
Some comments there already hint about the issue.  No doubt this will
need further refinement.  I think it would be desirable in general to
separate the control flow concerns from the transaction management
concerns more cleanly.

2) SPI needs some work.  It thinks that it can clean everything away at
transaction end.  I have found that instead of TopTransactionContext one
can use PortalContext and get a more suitable life cycle for the memory.
 I have played with some variants to make this configurable (e.g.,
argument to SPI_connect()), but that didn't seem very useful.  There are
some comments indicating that there might not always be a PortalContext,
but the existing tests don't seem to mind.  (There was a thread recently
about making a fake PortalContext for autovacuum, so maybe the current
idea is that we make sure there always is a PortalContext.)  Maybe we
need another context like StatementContext or ProcedureContext.

There also needs to be a way via SPI to end transactions and allowing
*some* cleanup to happen but leaving the memory around.  I have done
that via additional SPI API functions like SPI_commit(), which are then
available to PL implementations.  I also tried making it possible
calling transaction statements directly via SPI_exec() or similar, but
that ended up a total disaster.  So from the API perspective, I like the
current implementation, but the details will no doubt need refinement.

3) The PL implementations themselves allocate memory in
transaction-bound contexts for convenience as well.  This is usually
easy to fix by switching to PortalContext as well.  As you see, the
PL/Python code part of the patch is actually very small.  Changes in
other PLs would be similar.

Two issues have not been addressed yet:

A) It would be desirable to be able to run commands such as VACUUM and
CREATE INDEX CONCURRENTLY in a procedure.  This needs a bit of thinking
and standards-lawyering about the semantics, like where exactly do
transactions begin and end in various combinations.  It will probably
also need a different entry point into SPI, because SPI_exec cannot
handle statements ending transactions.  But so far my assessment is that
this can be added in a mostly independent way later on.

B) There needs to be some kind of call stack for procedure and function
invocations, so that we can track when we are allowed to make
transaction controlling calls.  The key term in the SQL standard is
"non-atomic execution context", which seems specifically devised to
cover this scenario.  So for example, if you have CALL -> CALL -> CALL,
the third call can issue a transaction statement.  But if you have CALL
-> SELECT -> CALL, then the last call cannot, because the SELECT
introduces an atomic execution context.  I don't know if we have such a
thing yet or something that we could easily latch on to.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 1b3318154540d0fe71480ca58938433ecfccabbd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 31 Oct 2017 14:48:51 -0400
Subject: [PATCH v1] Transaction control in PL/Python procedures

Add .commit, .rollback, and .start_transaction functions to plpy module
to control transactions in a PL/Python function.  Add similar underlying
functions to SPI.  Some additional cleanup so that transaction commit or
abort doesn't blow away data structures still used by the procedure
call.
---
 src/backend/commands/functioncmds.c   |  3 +
 src/backend/executor/spi.c

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> I really don't understand how any of this "let's release the buffer
>> lock and then take it back later" logic is supposed to work reliably.

> So summarize_range first inserts the placeholder tuple, which is there
> purposefully for other processes to update concurrently; then, without
> blocking any other process, scan the page range and update the
> placeholder tuple (this could take a long time, so it'd be a bad idea
> to hold buffer lock for that long).

Yeah, agreed, and your upthread point about avoiding deadlock when we
need to take two buffer locks at the same time is also something that
it's hard to see any other way around.  Maybe we just have to find a
way to make the existing structure work.  The sticking point is that
not only might the tuple we expected have been deleted, but someone
could have put an unrelated tuple in its place.  Are you confident
that you can detect that and recover from it?  If you're sure that
brin_tuples_equal() is trustworthy for this purpose, then maybe
we just have some run-of-the-mill bugs to find, like the off-the-end
bug I spotted in brin_doupdate.  There's apparently at least one
more, but given the error message it must be something like not
checking for a page to have turned into a revmap page.  Shouldn't
be too hard to find...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL procedures

2017-10-31 Thread Tom Lane
Peter Eisentraut  writes:
> I've been working on SQL procedures.

No comment yet on the big picture here, but ...

> The provided procedural languages (an ever more
> confusing term) each needed a small touch-up to handle pg_proc entries
> with prorettype == 0.

Putting 0 in prorettype seems like a pretty bad idea.  Why not use VOIDOID
for the prorettype value?  Or if there is some reason why "void" isn't the
right pseudotype, maybe you should invent a new one, analogous to the
"trigger" and "event_trigger" pseudotypes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tom Lane wrote:

> I really don't understand how any of this "let's release the buffer
> lock and then take it back later" logic is supposed to work reliably.

So summarize_range first inserts the placeholder tuple, which is there
purposefully for other processes to update concurrently; then, without
blocking any other process, scan the page range and update the
placeholder tuple (this could take a long time, so it'd be a bad idea
to hold buffer lock for that long).

I think what we should do is rethink the locking considerations in
brin_doupdate vs. brinGetTupleForHeapBlock, and how they are used in
summarize_range and brininsert.  In summarize_range, instead of hoping
that in some cases we will not need to re-obtain the placeholder tuple,
just do that in all cases keeping the buffer locked until the tuple is
updated.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove inbound links to sql-createuser

2017-10-31 Thread Stephen Frost
David,

* Stephen Frost (sfr...@snowman.net) wrote:
> * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > Since CREATE USER is officially an alias for CREATE ROLE other parts of the
> > documentation should point to CREATE ROLE, not CREATE USER.  Most do but I
> > noticed when looking at CREATE DATABASE that it did not.  Further searching
> > turned up the usage in client-auth.sgml.  That one is questionable since we
> > are indeed talking about LOGIN roles there but we are already pointing to
> > sql-alterrole instead of sql-alteruser and so changing the create variation
> > to point to sql-createrole seems warranted.
> 
> +1.
> 
> Barring objections, I'll commit this in a bit.

Pushed to master, with a small bit of word-smithing in the CREATE ROLE
docs also.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists

2017-10-31 Thread Vitaly Burovoy
On 10/31/17, Tom Lane  wrote:
> Yeah, there are quite a few unqualified casts in pg_dump, but AFAICS
> all the rest are OK because the search_path is just pg_catalog.
>
> But I did find psql's describe.c making a similar mistake :-(.
> Pushed that along with your fix.
>
>   regards, tom lane
>

Oops. I missed it in "describe.c" because I grepped for exact "::name" string.

Thank you very much!

--
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Peter Geoghegan
On Tue, Oct 31, 2017 at 2:25 AM, Simon Riggs  wrote:
> If there are challenges ahead, its reasonable to ask for test cases
> for that now especially if you think you know what they already are.
> Imagine we go forwards 2 months - if you dislike my patch when it
> exists you will submit a test case showing the fault. Why not save us
> all the trouble and describe that now? Test Driven Development.

I already have, on several occasions now. But if you're absolutely
insistent on my constructing the test case in terms of a real SQL
statement, then that's what I'll do.

Consider this MERGE statement, from your mock documentation:

MERGE INTO wines w
USING wine_stock_changes s
ON s.winename = w.winename
WHEN NOT MATCHED AND s.stock_delta > 0 THEN
  INSERT VALUES(s.winename, s.stock_delta)
WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
  UPDATE SET stock = w.stock + s.stock_delta
ELSE
  DELETE;

Suppose we remove the WHEN NOT MATCHED case, leaving us with:

MERGE INTO wines w
USING wine_stock_changes s
ON s.winename = w.winename
WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
  UPDATE SET stock = w.stock + s.stock_delta
ELSE
  DELETE;

We now have a MERGE that will not INSERT, but will continue to UPDATE
and DELETE. (It's implied that your syntax cannot do this at all,
because you propose use the ON CONFLICT infrastructure, but I think we
have to imagine a world in which that restriction either never existed
or has subsequently been lifted.)

The problem here is: Iff the first statement uses ON CONFLICT
infrastructure, doesn't the absence of WHEN NOT MATCHED imply
different semantics for the remaining updates and deletes in the
second version of the query? You've removed what seems like a neat
adjunct to the MERGE, but it actually changes everything else too when
using READ COMMITTED. Isn't that pretty surprising? If you're not
clear on what I mean, see my previous remarks on EPQ, live lock, and
what a CONFLICT could be in READ COMMITTED mode. Concurrent activity
at READ COMMITTED mode can be expected to significantly alter the
outcome here.

Why not just always use the behavior that the second query requires,
which is very much like an UPDATE FROM with an outer join that can
sometimes do deletes (and inserts)? We should always use an MVCC
snapshot, and never play ON CONFLICT style games with visibility/dirty
snapshots.

> It's difficult to discuss anything with someone that refuses to
> believe that there are acceptable ways around things. I believe there
> are.

Isn't that blind faith? Again, it seems like you haven't really tried
to convince me.

> If you can calm down the rhetoric we can work together, but if you
> continue to grandstand it makes it more difficult.

I'm trying to break the deadlock, by getting you to actually consider
what I'm saying. I don't enjoy confrontation. Currently, it seems like
you're just ignoring my objections, which you actually could fairly
easily work through. That is rather frustrating.

> You've said its possible another way. Show that assertion is actually
> true. We're all listening, me especially, for the technical details.

My proposal, if you want to call it that, has the merit of actually
being how MERGE works in every other system. Both Robert and Stephen
seem to be almost sold on what I suggest, so I think that I've
probably already explained my position quite well.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

2017-10-31 Thread Tomas Vondra
Hi,

On 10/31/2017 04:48 PM, Greg Stark wrote:
> On 31 October 2017 at 07:05, Chris Travers 
wrote:
>> Hi;
>>
>> After Andres's excellent talk at PGConf we tried benchmarking
>> effective_io_concurrency on some of our servers and found that those
which
>> have a number of NVME storage volumes could not fill the I/O queue
even at
>> the maximum setting (1000).
>
> And was the system still i/o bound? If the cpu was 100% busy then
> perhaps Postgres just can't keep up with the I/O system. It would
> depend on workload though, if you start many very large sequential
> scans you may be able to push the i/o system harder.
>
> Keep in mind effective_io_concurrency only really affects bitmap
> index scans (and to a small degree index scans). It works by issuing
> posix_fadvise() calls for upcoming buffers one by one. That gets
> multiple spindles active but it's not really going to scale to many
> thousands of prefetches (and effective_io_concurrency of 1000
> actually means 7485 prefetches). At some point those i/o are going
> to start completing before Postgres even has a chance to start
> processing the data.
>
Yeah, initiating the prefetches is not expensive, but it's not free
either. So there's a trade-off between time spent on prefetching and
processing the data.

I believe this may be actually illustrated using Amdahl's law - the I/O
is the parallel part, and processing the data is the serial part. And no
matter what you do, the device only has so much bandwidth, which defines
the maximum possible speedup (compared to "no prefetch" case).

Furthermore, the device does not wait for all the I/O requests to be
submitted - it won't wait for 1000 requests and then go "OMG! There's a
lot of work to do!" It starts processing the requests as they arrive,
and some of them will complete before you're done with submitting the
rest, so you'll never see all the requests in the queue at once.

And of course, iostat and other tools only give you "average queue
length", which is mostly determined by the average throughput.

In my experience (on all types of storage, including SSDs and NVMe), the
performance quickly and significantly improves once you start increasing
the value (say, to 8 or 16, maybe 64). And then the gains become much
more modest - not because the device could not handle more, but because
of the prefetch/processing ratio reached the optimal value.

But all this is actually per-process, if you can run multiple backends
(particularly when doing bitmap index scans), I'm sure you'll see the
queues being more full.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement-level rollback

2017-10-31 Thread MauMau
From: Simon Riggs
On 14 August 2017 at 23:58, Peter Eisentraut
 wrote:
> On 2/28/17 02:39, Tsunakawa, Takayuki wrote:
>> The code for stored functions is not written yet, but I'd like your
feedback for the specification and design based on the current patch.
I'll add this patch to CommitFest 2017-3.
>
> This patch needs to be rebased for the upcoming commit fest.

I'm willing to review this if the patch is going to be actively worked
on.


I'm very sorry I couldn't reply to your kind offer.  I rebased the
patch and will add it to CF 2017/11.  I hope I will complete the patch
in this CF.

Regards
Takayuki Tsunakawa




stmt_rollback_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL procedures

2017-10-31 Thread Simon Riggs
On 31 October 2017 at 18:23, Peter Eisentraut
 wrote:

> I've been working on SQL procedures.  (Some might call them "stored
> procedures", but I'm not aware of any procedures that are not stored, so
> that's not a term that I'm using here.)

I guess that the DO command might have a variant to allow you to
execute a procedure that isn't stored?

Not suggesting you implement that, just thinking about why/when the
"stored" word would be appropriate.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists

2017-10-31 Thread Tom Lane
Vitaly Burovoy  writes:
> I left an other "NULL::name AS rolname" at
> src/bin/pg_dump/pg_dump.c:2978 because can't check (remoteVersion <
> 9) it and it is under strict "selectSourceSchema(fout,
> "pg_catalog");" schema set.

Yeah, there are quite a few unqualified casts in pg_dump, but AFAICS
all the rest are OK because the search_path is just pg_catalog.

But I did find psql's describe.c making a similar mistake :-(.
Pushed that along with your fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Alvaro Herrera
Tom Lane wrote:
> So in a few more runs this morning using Alvaro's simplified test case,
> I have seen the following behaviors not previously reported:

> 1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index
> offnum: %u" error report as PageIndexTupleDeleteNoCompact.  I note the
> same message appears in plain PageIndexTupleDelete as well.
> I think we've been too hasty to assume all instances of this came out of
> PageIndexTupleDeleteNoCompact.

Ah, I wasn't paying close attention to the originator routine of the
message, but you're right, I see this one too.

> 2. Crashes in the data-insertion process, not only the process running
> summarize_range:

Yeah, I saw these.  I was expecting it, since the two routines
(brininsert and summarize_range) pretty much share the insertion
protocol.

> I really don't understand how any of this "let's release the buffer
> lock and then take it back later" logic is supposed to work reliably.

Yeah, evidently that was way too optimistic and I'll need to figure out
a better mechanism to handle this.

The intention was to avoid deadlocks while locking the target page for
the insertion: by having both pages start unlocked we can simply lock
them in block number order.  If we keep the page containing the tuple
locked, I don't see how to reliably avoid a deadlock while acquiring a
buffer to insert the new tuple.

> BTW, while I'm bitching, it seems fairly insane from a concurrency
> standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace
> while holding at least one and possibly two buffer locks.  Shouldn't
> that be done someplace else?

Hmm.  I spent a lot of effort (commit ccc4c074994d) to avoid leaving
pages uninitialized / unrecorded in FSM.  I left this on purpose on the
rationale that trying to fix it would make the callsites more convoluted
(the retry logic doesn't help).  But as I recall this was supposed to be
done only in the rare case where the buffer could not be returned to
caller ... but that's not what the current code does, so there is
something wrong there.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-31 Thread Michael Paquier
On Tue, Oct 31, 2017 at 2:00 PM, Simon Riggs  wrote:
> On 30 October 2017 at 18:58, Simon Riggs  wrote:
>> On 30 October 2017 at 15:22, Simon Riggs  wrote:
>>
 You forgot to update this formula in xlog.c:
 distance = (2.0 + CheckPointCompletionTarget) * 
 CheckPointDistanceEstimate;
 /* add 10% for good measure. */
 distance *= 1.10;
 You can guess easily where the mistake is.
>>>
>>> Doh - fixed that before posting, so I must have sent the wrong patch.
>>>
>>> It's the 10%, right? ;-)
>>
>> So, there are 2 locations that mention 2.0 in xlog.c
>>
>> I had already fixed one, which is why I remembered editing it.
>>
>> The other one you mention has a detailed comment above it explaining
>> why it should be 2.0 rather than 1.0, so I left it.
>>
>> Let me know if you still think it should be removed? I can see the
>> argument both ways.
>> Or maybe we need another patch to account for manual checkpoints.
>
> Revised patch to implement this

Here is the comment and the formula:
 * The reason this calculation is done from the prior
checkpoint, not the
 * one that just finished, is that this behaves better if some
checkpoint
 * cycles are abnormally short, like if you perform a manual checkpoint
 * right after a timed one. The manual checkpoint will make
almost a full
 * cycle's worth of WAL segments available for recycling, because the
 * segments from the prior's prior, fully-sized checkpoint cycle are no
 * longer needed. However, the next checkpoint will make only
few segments
 * available for recycling, the ones generated between the timed
 * checkpoint and the manual one right after that. If at the manual
 * checkpoint we only retained enough segments to get us to
the next timed
 * one, and removed the rest, then at the next checkpoint we would not
 * have enough segments around for recycling, to get us to the
checkpoint
 * after that. Basing the calculations on the distance from
the prior redo
 * pointer largely fixes that problem.
 */
distance = (2.0 + CheckPointCompletionTarget) *
CheckPointDistanceEstimate;

While the mention about a manual checkpoint happening after a timed
one will cause a full range of WAL segments to be recycled, it is not
actually true that segments of the prior's prior checkpoint are not
needed, because with your patch the segments of the prior checkpoint
are getting recycled. So it seems to me that based on that the formula
ought to use 1.0 instead of 2.0...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
So in a few more runs this morning using Alvaro's simplified test case,
I have seen the following behaviors not previously reported:

1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index
offnum: %u" error report as PageIndexTupleDeleteNoCompact.  I note the
same message appears in plain PageIndexTupleDelete as well.
I think we've been too hasty to assume all instances of this came out of
PageIndexTupleDeleteNoCompact.

2. Crashes in the data-insertion process, not only the process running
summarize_range:

#1  0x003b78a33c75 in abort () at abort.c:92
#2  0x0086e527 in errfinish (dummy=) at elog.c:557
#3  0x0086f334 in elog_finish (elevel=, 
fmt=) at elog.c:1378
#4  0x0075d48f in PageIndexTupleDeleteNoCompact (
rel=, buf=2772, page=0x7f0188438080 "\002", offnum=39)
at bufpage.c:995
#5  0x00476fd2 in brin_doupdate (idxrel=0x7f0186633e90, 
pagesPerRange=1, revmap=0xba, heapBlk=2542, oldbuf=2772, oldoff=39, 
origtup=0x2784010, origsz=8, newtup=0x2784098, newsz=400, 
samepage=0 '\000') at brin_pageops.c:270
#6  0x00475430 in brininsert (idxRel=0x7f0186633e90, 
values=0x7ffce7b827f0, nulls=0x7ffce7b828f0 "", heaptid=0x279fb3c, 
heapRel=, checkUnique=, 
indexInfo=0x279e370) at brin.c:292
#7  0x0061da18 in ExecInsertIndexTuples (slot=0x279e638, 
tupleid=0x279fb3c, estate=0x279dd10, noDupErr=0 '\000', specConflict=0x0, 
arbiterIndexes=0x0) at execIndexing.c:387

The postmaster log for this, with even more debug logging thrown in:

2017-10-31 11:58:03.466 EDT [23506] LOG:  
summarize_range(brin_test_a_idx,2542,2543) created placeholder at 68,39
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23506] LOG:  brin_doupdate(brin_test_a_idx) at 
68,39: getinsertbuffer returned page 70
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23506] LOG:  deleting tuple 39 (of 39) in rel 
brin_test_a_idx page 68
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23506] LOG:  brin_doupdate(brin_test_a_idx) at 
68,39: placed at 70,1
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23509] LOG:  brin_doupdate(brin_test_a_idx) at 
68,39: getinsertbuffer returned page 70
2017-10-31 11:58:03.466 EDT [23509] STATEMENT:  insert into brin_test values 
(repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));
2017-10-31 11:58:03.466 EDT [23509] LOG:  deleting tuple 39 (of 38) in rel 
brin_test_a_idx page 68
2017-10-31 11:58:03.466 EDT [23509] STATEMENT:  insert into brin_test values 
(repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));
2017-10-31 11:58:03.466 EDT [23509] PANIC:  invalid index offnum: 39
2017-10-31 11:58:03.466 EDT [23509] STATEMENT:  insert into brin_test values 
(repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));
2017-10-31 11:58:03.466 EDT [23506] LOG:  
summarize_range(brin_test_a_idx,2542,2543): update at 68,39 succeeded
2017-10-31 11:58:03.466 EDT [23506] STATEMENT:  select 
brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:04.517 EDT [23493] LOG:  server process (PID 23509) was 
terminated by signal 6: Aborted

So what evidently happened here is that brininsert decided it needed to
insert into the placeholder tuple that summarize_range had just created,
but by the time it got lock on the buffer, summarize_range had deleted
the placeholder and instead created a new tuple on another page.

HAH: I see how the connection to PageIndexTupleDeleteNoCompact's
change applies.  After re-locking the buffer, brin_doupdate tries
to check whether the tuple's been deleted, but it needs to do it
more like this, else it might be accessing off the end of the
lp array:

/*
 * Check that the old tuple wasn't updated concurrently: it might have
 * moved someplace else entirely ...
 */
-   if (!ItemIdIsNormal(oldlp))
+   if (oldoff > PageGetMaxOffsetNumber(oldpage) ||
+   !ItemIdIsNormal(oldlp))
{
LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);

This is a pre-existing bug, but before 24992c6d, it would only
have manifested in the (very?) unusual corner case that
PageIndexDeleteNoCompact was able to reset the page to empty.

However, when I fix that and then run Alvaro's test case, I get
ERROR:  corrupted BRIN index: inconsistent range map
so there's more creepy-crawlies around here somewhere.

I really don't understand how any of this "let's release the buffer
lock and then take it back later" logic is supposed to work reliably.

BTW, while I'm bitching, it seems fairly insane from a concurrency
standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace
while holding at 

Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists

2017-10-31 Thread Vitaly Burovoy
On 10/31/17, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>> Recently my colleagues found a bug.
>
>> -  "SELECT 'bigint'::name AS 
>> sequence_type, "
>> +  "SELECT 
>> 'bigint'::pg_catalog.name AS sequence_type,
>
> Good catch, but I think we could simplify this by just omitting the cast
> altogether:
>
> -   "SELECT 'bigint'::name AS 
> sequence_type, "
> +   "SELECT 'bigint' AS 
> sequence_type,
>
> pg_dump doesn't particularly care whether the column comes back marked
> as 'name' or 'text' or 'unknown'.
>
>   regards, tom lane

OK, just for convenience I'm attaching your version of the fix.
I left an other "NULL::name AS rolname" at
src/bin/pg_dump/pg_dump.c:2978 because can't check (remoteVersion <
9) it and it is under strict "selectSourceSchema(fout,
"pg_catalog");" schema set.

--
Best regards,
Vitaly Burovoy


0001-Fix-dumping-schema-if-a-table-named-name-exists.ver2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

2017-10-31 Thread Greg Stark
On 31 October 2017 at 07:05, Chris Travers  wrote:
> Hi;
>
> After Andres's excellent talk at PGConf we tried benchmarking
> effective_io_concurrency on some of our servers and found that those which
> have a number of NVME storage volumes could not fill the I/O queue even at
> the maximum setting (1000).

And was the system still i/o bound? If the cpu was 100% busy then
perhaps Postgres just can't keep up with the I/O system. It would
depend on workload though, if you start many very large sequential
scans you may be able to push the i/o system harder.

Keep in mind effective_io_concurrency only really affects bitmap index
scans (and to a small degree index scans). It works by issuing
posix_fadvise() calls for upcoming buffers one by one. That gets
multiple spindles active but it's not really going to scale to many
thousands of prefetches (and effective_io_concurrency of 1000 actually
means 7485 prefetches). At some point those i/o are going to start
completing before Postgres even has a chance to start processing the
data.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] postgres_fdw: Add support for INSERT OVERRIDING clause

2017-10-31 Thread Peter Eisentraut
It has been pointed out to me that the command deparsing in postgres_fdw
does not support the INSERT OVERRIDING clause that was added in PG10.
Here is a patch that seems to fix that.  I don't know much about this,
whether anything else needs to be added or whether there should be
tests.  Perhaps someone more familiar with postgres_fdw details can
check it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 097ecf61a9c2740b02a21be19a92aed92388346a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 31 Oct 2017 11:44:14 -0400
Subject: [PATCH] postgres_fdw: Add support for INSERT OVERRIDING clause

---
 contrib/postgres_fdw/deparse.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2af8364010..8eb227605f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1573,7 +1573,21 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
deparseColumnRef(buf, rtindex, attnum, root, false);
}
 
-   appendStringInfoString(buf, ") VALUES (");
+   appendStringInfoString(buf, ") ");
+
+   switch (root->parse->override)
+   {
+   case OVERRIDING_USER_VALUE:
+   appendStringInfoString(buf, "OVERRIDING USER 
VALUE ");
+   break;
+   case OVERRIDING_SYSTEM_VALUE:
+   appendStringInfoString(buf, "OVERRIDING SYSTEM 
VALUE ");
+   break;
+   default:
+   break;
+   }
+
+   appendStringInfoString(buf, "VALUES (");
 
pindex = 1;
first = true;
-- 
2.14.3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Consistently catch errors from Python _New() functions

2017-10-31 Thread Peter Eisentraut
While reviewing some unrelated code, I noticed that we are handling
error conditions from Python API functions such as PyList_New() and
PyDict_New() in pretty random ways or not at all.  Here is a patch to
fix that.

Arguably, this is a bug fix, but I'm not sure whether it's worth
meddling with this in the back branches.  Maybe only the places where
the errors are not caught at all should be fixed there.  Comments welcome.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From eca2709cd1654e2fc37b75fa3bdfe5e199c86cb9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 31 Oct 2017 10:49:36 -0400
Subject: [PATCH] Consistently catch errors from Python _New() functions

Python Py*_New() functions can fail and return NULL in out-of-memory
conditions.  The previous code handled that inconsistently or not at
all.  This change organizes that better.  If we are in a function that
is called from Python, we just check for failure and return NULL
ourselves, which will cause any exception information to be passed up.
If we are called from PostgreSQL, we consistently create an "out of
memory" error.
---
 contrib/hstore_plpython/hstore_plpython.c |  4 
 contrib/ltree_plpython/ltree_plpython.c   |  4 
 src/pl/plpython/plpy_cursorobject.c   | 19 ---
 src/pl/plpython/plpy_exec.c   | 10 +-
 src/pl/plpython/plpy_main.c   |  2 +-
 src/pl/plpython/plpy_plpymodule.c |  6 --
 src/pl/plpython/plpy_procedure.c  |  2 ++
 src/pl/plpython/plpy_resultobject.c   | 11 +++
 src/pl/plpython/plpy_spi.c| 25 +
 src/pl/plpython/plpy_typeio.c |  4 +++-
 10 files changed, 67 insertions(+), 20 deletions(-)

diff --git a/contrib/hstore_plpython/hstore_plpython.c 
b/contrib/hstore_plpython/hstore_plpython.c
index 22366bd40f..218e6612b1 100644
--- a/contrib/hstore_plpython/hstore_plpython.c
+++ b/contrib/hstore_plpython/hstore_plpython.c
@@ -93,6 +93,10 @@ hstore_to_plpython(PG_FUNCTION_ARGS)
PyObject   *dict;
 
dict = PyDict_New();
+   if (!dict)
+   ereport(ERROR,
+   (errcode(ERRCODE_OUT_OF_MEMORY),
+errmsg("out of memory")));
 
for (i = 0; i < count; i++)
{
diff --git a/contrib/ltree_plpython/ltree_plpython.c 
b/contrib/ltree_plpython/ltree_plpython.c
index ae9b90dd10..e88636a0a9 100644
--- a/contrib/ltree_plpython/ltree_plpython.c
+++ b/contrib/ltree_plpython/ltree_plpython.c
@@ -46,6 +46,10 @@ ltree_to_plpython(PG_FUNCTION_ARGS)
ltree_level *curlevel;
 
list = PyList_New(in->numlevel);
+   if (!list)
+   ereport(ERROR,
+   (errcode(ERRCODE_OUT_OF_MEMORY),
+errmsg("out of memory")));
 
curlevel = LTREE_FIRST(in);
for (i = 0; i < in->numlevel; i++)
diff --git a/src/pl/plpython/plpy_cursorobject.c 
b/src/pl/plpython/plpy_cursorobject.c
index 0108471bfe..4af7c6621f 100644
--- a/src/pl/plpython/plpy_cursorobject.c
+++ b/src/pl/plpython/plpy_cursorobject.c
@@ -464,15 +464,20 @@ PLy_cursor_fetch(PyObject *self, PyObject *args)
 
Py_DECREF(ret->rows);
ret->rows = PyList_New(SPI_processed);
-
-   for (i = 0; i < SPI_processed; i++)
+   if (!ret->rows)
{
-   PyObject   *row = 
PLyDict_FromTuple(>result,
-   
SPI_tuptable->vals[i],
-   
SPI_tuptable->tupdesc);
-
-   PyList_SetItem(ret->rows, i, row);
+   Py_DECREF(ret);
+   ret = NULL;
}
+   else
+   for (i = 0; i < SPI_processed; i++)
+   {
+   PyObject   *row = 
PLyDict_FromTuple(>result,
+   
SPI_tuptable->vals[i],
+   
SPI_tuptable->tupdesc);
+
+   PyList_SetItem(ret->rows, i, row);
+   }
}
 
SPI_freetuptable(SPI_tuptable);
diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index 26f61dd0f3..7cfa146c82 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -434,6 +434,9 @@ PLy_function_build_args(FunctionCallInfo fcinfo, 
PLyProcedure *proc)
  

Re: [HACKERS] [PATCH] Generic type subscripting

2017-10-31 Thread Arthur Zakirov
On Sun, Oct 29, 2017 at 10:56:19PM +0100, Dmitry Dolgov wrote:
> 
> So, here is the new version of patch that contains modifications we've
> discussed, namely:
> 
> * store oids of `parse`, `fetch` and `assign` functions
> 
> * introduce dependencies from a data type
> 
> * as a side effect of previous two I also eliminated some unnecessary
> arguments
>   in `parse` function.

Thank you for new version of the patch.

There are some notes.

Documentation
-

Documentation is compiled. But there are warnings about end-tags. Now it is 
necessary to have full named end-tags:

=# make -C doc/src/sgml check
/usr/sbin/onsgmls:json.sgml:574:20:W: empty end-tag
...

Documentation is out of date:
- catalogs.sgml needs to add information about additional pg_type fields
- create_type.sgml needs information about subscripting_parse, 
subscripting_assign and subscripting_fetch options
- xsubscripting.sgml is out of date

Code


I think it is necessary to check Oids of subscripting_parse, 
subscripting_assign, subscripting_fetch. Maybe within TypeCreate().

Otherwise next cases possible:

=# CREATE TYPE custom (
   internallength = 8,
   input = custom_in,
   output = custom_out,
   subscripting_parse = custom_subscripting_parse);
=# CREATE TYPE custom (
   internallength = 8,
   input = custom_in,
   output = custom_out,
   subscripting_fetch = custom_subscripting_fetch);

Are all subscripting_* fields mandatory? If so if user provided at least one of 
them then all fields should be provided.

Should all types have support assigning via subscript? If not then 
subscripting_assign parameter is optional.

> +Datum
> +jsonb_subscript_parse(PG_FUNCTION_ARGS)
> +{
> + boolisAssignment = PG_GETARG_BOOL(0);

and

> +Datum
> +custom_subscripting_parse(PG_FUNCTION_ARGS)
> +{
> + boolisAssignment = PG_GETARG_BOOL(0);

Here isAssignment is unused variable, so it could be removed.

> +
> + scratch->d.sbsref.eval_finfo = eval_finfo;
> + scratch->d.sbsref.nested_finfo = nested_finfo;
> +

As I mentioned earlier we need assigning eval_finfo and nested_finfo only for 
EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN and EEOP_SBSREF_FETCH steps.
Also they should be assigned before calling ExprEvalPushStep(), not after. 
Otherwise some bugs may appear in future.

> - ArrayRef   *aref = makeNode(ArrayRef);
> + NodeTag sbstag = nodeTag(src_expr);
> + Size nodeSize = sizeof(SubscriptingRef);
> + SubscriptingRef *sbsref = (SubscriptingRef *) newNode(nodeSize, 
> sbstag);

Is there necessity to use newNode() instead using makeNode(). The previous code 
was shorter.

There is no changes in execnodes.h except removed line. So I think execnodes.h 
could be removed from the patch.

> 
> I'm going to make few more improvements, but in the meantime I hope we can
> continue to review the patch.

I will wait.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists

2017-10-31 Thread Tom Lane
Vitaly Burovoy  writes:
> Recently my colleagues found a bug.

> -   "SELECT 'bigint'::name AS 
> sequence_type, "
> +   "SELECT 
> 'bigint'::pg_catalog.name AS sequence_type, 

Good catch, but I think we could simplify this by just omitting the cast
altogether:

- "SELECT 'bigint'::name AS 
sequence_type, "
+ "SELECT 'bigint' AS 
sequence_type, 

pg_dump doesn't particularly care whether the column comes back marked
as 'name' or 'text' or 'unknown'.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add some const decorations to prototypes

2017-10-31 Thread Tom Lane
Peter Eisentraut  writes:
> Here is a patch that adds const decorations to many char * arguments in
> functions.  It should have no impact otherwise; there are very few code
> changes caused by it.

+1 in general ...

> Some functions have a strtol()-like behavior
> where they take in a const char * and return a pointer into that as
> another argument.  In those cases, I added a cast or two.

... but I'm not sure that it's an improvement in cases where you have to
cast away the const somewhere else.  I realize that strtol has an ancient
pedigree, but I do not think it's very good design.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add some const decorations to prototypes

2017-10-31 Thread Peter Eisentraut
Here is a patch that adds const decorations to many char * arguments in
functions.  It should have no impact otherwise; there are very few code
changes caused by it.  Some functions have a strtol()-like behavior
where they take in a const char * and return a pointer into that as
another argument.  In those cases, I added a cast or two.

Generally, I find these const decorations useful as easy function
documentation.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From a8163e84c83887e4a3b81642c137995932701bb5 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 31 Oct 2017 10:34:31 -0400
Subject: [PATCH] Add some const decorations to prototypes

---
 contrib/cube/cubeparse.y   | 12 
 contrib/dict_xsyn/dict_xsyn.c  |  2 +-
 contrib/fuzzystrmatch/dmetaphone.c |  4 +--
 contrib/pgcrypto/pgcrypto.c|  4 +--
 contrib/seg/seg.c  |  4 +--
 contrib/seg/segdata.h  |  2 +-
 contrib/seg/segparse.y |  4 +--
 contrib/unaccent/unaccent.c|  2 +-
 contrib/uuid-ossp/uuid-ossp.c  |  2 +-
 src/backend/access/common/reloptions.c | 12 
 src/backend/access/gist/gistbuild.c|  2 +-
 src/backend/access/transam/xact.c  |  6 ++--
 src/backend/access/transam/xlogarchive.c   |  4 +--
 src/backend/catalog/heap.c | 10 +++
 src/backend/commands/comment.c |  4 +--
 src/backend/commands/event_trigger.c   |  4 +--
 src/backend/commands/extension.c   |  4 +--
 src/backend/commands/indexcmds.c   |  8 +++---
 src/backend/commands/opclasscmds.c |  2 +-
 src/backend/commands/tablecmds.c   | 16 +--
 src/backend/commands/typecmds.c|  6 ++--
 src/backend/commands/view.c|  2 +-
 src/backend/libpq/auth.c   | 24 
 src/backend/libpq/hba.c|  6 ++--
 src/backend/parser/parse_expr.c|  2 +-
 src/backend/parser/parse_func.c|  4 +--
 src/backend/parser/parse_relation.c|  8 +++---
 src/backend/parser/parse_target.c  |  2 +-
 src/backend/port/dynloader/darwin.c|  8 +++---
 src/backend/port/dynloader/darwin.h|  4 +--
 src/backend/port/dynloader/hpux.c  |  4 +--
 src/backend/port/dynloader/hpux.h  |  4 +--
 src/backend/port/dynloader/linux.c |  4 +--
 src/backend/postmaster/postmaster.c|  2 +-
 src/backend/replication/basebackup.c   |  8 +++---
 src/backend/rewrite/rewriteDefine.c|  4 +--
 src/backend/snowball/dict_snowball.c   |  2 +-
 src/backend/storage/lmgr/lwlock.c  |  8 +++---
 src/backend/tsearch/dict_thesaurus.c   |  2 +-
 src/backend/tsearch/spell.c|  4 +--
 src/backend/utils/adt/float.c  | 16 +--
 src/backend/utils/adt/genfile.c|  2 +-
 src/backend/utils/adt/ruleutils.c  |  4 +--
 src/backend/utils/adt/varlena.c|  2 +-
 src/backend/utils/adt/xml.c| 32 +++---
 src/bin/initdb/initdb.c| 12 
 src/bin/pg_dump/pg_backup_db.c |  2 +-
 src/bin/pg_dump/pg_backup_db.h |  2 +-
 src/bin/pg_rewind/fetch.c  |  2 +-
 src/bin/pg_rewind/fetch.h  |  2 +-
 src/bin/pg_upgrade/option.c|  6 ++--
 src/bin/pg_upgrade/pg_upgrade.c|  4 +--
 src/bin/pg_waldump/pg_waldump.c|  2 +-
 src/bin/pgbench/pgbench.c  |  4 +--
 src/include/access/gist_private.h  |  2 +-
 src/include/access/reloptions.h| 14 +-
 src/include/access/xact.h  |  6 ++--
 src/include/access/xlog_internal.h |  4 +--
 src/include/catalog/heap.h |  2 +-
 src/include/commands/comment.h |  4 +--
 src/include/commands/defrem.h  |  4 +--
 src/include/commands/typecmds.h|  2 +-
 src/include/commands/view.h|  2 +-
 src/include/executor/tablefunc.h   |  8 +++---
 src/include/parser/parse_relation.h|  6 ++--
 src/include/parser/parse_target.h  |  2 +-
 src/include/postmaster/bgworker.h  |  2 +-
 src/include/rewrite/rewriteDefine.h|  2 +-
 

Re: [HACKERS] Query regarding permission on table_column%type access

2017-10-31 Thread Tom Lane
Stephen Frost  writes:
> * Neha Sharma (neha.sha...@enterprisedb.com) wrote:
>> I have observed that even if the user does not have permission on a
>> table(created in by some other user),the function parameter still can have
>> a parameter of that table_column%type.

> This is because the creation of the table also creates a type of the
> same name and the type's permissions are independent of the table's.  I
> imagine that you could REVOKE USAGE ON TYPE from the type and deny
> access to that type if you wanted to.

Right.  (I checked, seems to work as expected.)

> I'm not sure that we should change the REVOKE on the table-level to also
> mean to REVOKE access to the type automatically (and what happens if you
> GRANT the access back for the table..?

It seems pretty silly for privileges on table rowtypes to behave
differently from those on other rowtypes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-31 Thread Simon Riggs
On 30 October 2017 at 18:58, Simon Riggs  wrote:
> On 30 October 2017 at 15:22, Simon Riggs  wrote:
>
>>> You forgot to update this formula in xlog.c:
>>> distance = (2.0 + CheckPointCompletionTarget) * 
>>> CheckPointDistanceEstimate;
>>> /* add 10% for good measure. */
>>> distance *= 1.10;
>>> You can guess easily where the mistake is.
>>
>> Doh - fixed that before posting, so I must have sent the wrong patch.
>>
>> It's the 10%, right? ;-)
>
> So, there are 2 locations that mention 2.0 in xlog.c
>
> I had already fixed one, which is why I remembered editing it.
>
> The other one you mention has a detailed comment above it explaining
> why it should be 2.0 rather than 1.0, so I left it.
>
> Let me know if you still think it should be removed? I can see the
> argument both ways.
>
> Or maybe we need another patch to account for manual checkpoints.

Revised patch to implement this

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


remove_secondary_checkpoint.v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane  wrote:
>> Yeah, we're still missing an understanding of why we didn't see it
>> before; the inadequate locking was surely there before.

> Because 24992c6d has added a check on the offset number by using
> PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks
> tighter, no?

No, I don't see how it's tighter.  The old code matched supplied
offnum(s) against the indexes of not-unused items, and then after
that loop it complained if they weren't all matched.  So it should
also have failed, albeit with a different error message, if it were
passed an offnum corresponding to a no-longer-live tuple.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Simon Riggs
On 31 October 2017 at 12:56, Stephen Frost  wrote:
> Simon,
>
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> On 30 October 2017 at 19:55, Stephen Frost  wrote:
>> > I don't think MERGE should be radically different from other database
>> > systems and just syntax sugar over a capability we have.
>>
>> I've proposed a SQL Standard compliant implementation that would do
>> much more than be new syntax over what we already have.
>>
>> So these two claims aren't accurate: "radical difference" and "syntax
>> sugar over a capability we have".
>
> Based on the discussion so far, those are the conclusions I've come to.
> Saying they're isn't accurate without providing anything further isn't
> likely to be helpful.

I'm trying to be clear, accurate and non-confrontational.

I appreciate those are your conclusions, which is why I need to be
clear that the proposal has been misunderstood on those stated points.

What else would you like me to add to be helpful? I will try...

I've spent weeks looking at other implementations and combing the
SQLStandard with a fine tooth comb to see what is possible and what is
not. My proposal shows a new way forwards and yet follows the standard
in horrible detail. It has taken me nearly 2 months to write the doc
page submitted here. It is not simply new syntax stretched over
existing capability. The full syntax of MERGE is quite powerful and
will take some months to make work, even with my proposal.

I'm not sure what else to say, but I am happy to answer specific
technical questions that have full context to allow a rational
discussion.

>> > Time changes
>> > many things, but I don't think anything's changed in this from the prior
>> > discussions about it.
>>
>> My proposal is new, that is what has changed.
>
> I'm happy to admit that I've missed something in the discussion, but
> from what I read the difference appeared to be primairly that you're
> proposing it, and doing so a couple years later.
>
>> At this stage, general opinions can be misleading.
>
> I'd certainly love to see a MERGE capability that meets the standard and
> works in much the same way from a user's perspective as the other RDBMS'
> which already implement it.

The standard explains how it should work, with the proviso that the
standard allows us to define concurrent behavior. Serge has explained
that he sees that my proposal covers the main use cases. I don't yet
see how to cover all, but I'm looking to do the most important use
cases first and other things later.

> From prior discussions with Peter on
> exactly that subject, I'm also convinced that having that would be a
> largely independent piece of work from the INSERT .. ON CONFLICT
> capability that we have (just as MERGE is distinct from similar UPSERT
> capabilities in other RDBMSs).
>
> The goal here is really just to avoid time wasted developing MERGE based
> on top of the INSERT .. ON CONFLICT system only to have it be rejected
> later because multiple other committers and major contributors have said
> that they don't agree with that approach.  If, given all of this
> discussion, you don't feel that's a high risk with your approach then by
> all means continue on with what you're thinking and we can review the
> patch once it's posted.

It is certainly a risk and I don't want to waste time either. I am
happy to consider any complete proposal for how to proceed, including
details and/or code, but I will be prioritising things to ensure that
we have a candidate feature for PG11 from at least one author, rather
than a research project for PG12+.

The key point here is concurrency. I have said that there is an
intermediate step that is useful and achievable, but does not cover
every case. Serge has also stated this - who it should be noted is the
only person in this discussion that has already written the MERGE
statement in SQL, for DB2, so I give more weight to his technical
opinion than I do to others, at this time.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-31 Thread Ants Aasma
On Mon, Oct 30, 2017 at 7:25 PM, Ivan Kartyshov
 wrote:
> It sounds reasonable. I can offer the following version.
>
> WAIT  LSN  lsn_number;
> WAIT  LSN  lsn_number  TIMEOUT delay;
> WAIT  LSN  lsn_number  INFINITE;
> WAIT  LSN  lsn_number  NOWAIT;
>
>
> WAIT  [token]  wait_value [option];
>
> token - [LSN, TIME | TIMESTAMP | CSN | XID]
> option - [TIMEOUT | INFINITE | NOWAIT]
>
> Ready to listen to your suggestions.

Making the interface more specific about the mechanism is not what I
had in mind, quite the opposite. I would like to see the interface
describe the desired effect of the wait.

Stepping back for a while, from what I understand the reason we want
to waiting is to prevent observation of database state going
backwards. To limit the amount of waiting needed we tell the database
what we have observed. For example "I just observed my transaction
commit", or "the last time I observed state was X", and then have the
database provide us with a snapshot that is causally dependent on
those states. This does not give us linearizability, for that we still
need at the very least serializable transactions on standby. But it
seems to provide a form of sequential consistency, which (if it can be
proved to hold) makes reasoning about concurrency a lot nicer.

For lack of a better proposal I would like something along the lines of:

WAIT FOR state_id[, state_id] [ OPTIONS (..)]

And to get the tokens maybe a function pg_last_commit_state().

Optionally, to provide read-to-read causality, pg_snapshot_state()
could return for example replay_lsn at the start of the current
transaction. This makes sure that things don't just appear and
disappear when load balancing across many standby servers.

WAIT FOR semantics is to ensure that next snapshot is causally
dependent (happens after) each of the specified observed states.

The state_id could simply be a LSN, or to allow for future expansion
something like 'lsn:/1234'. Example extension would be to allow
for waiting on xids. On master that would be just a ShareLock on the
transactionid. On standby it would wait for the commit or rollback
record for that transaction to be replayed.

Robert made a good point that people will still rely on the token
being an LSN, but perhaps they will be slightly less angry when we
explicitly tell them that this might change in the future.

Regards,
Ants Aasma

[1] 
https://www.postgresql.org/docs/devel/static/functions-admin.html#functions-snapshot-synchronization


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: restrict pg_rewind to whitelisted directories

2017-10-31 Thread Robert Haas
On Mon, Oct 30, 2017 at 6:44 PM, Chris Travers  wrote:
> The attached patch is cleaned up and filed for the commit fest this next
> month:

It's generally better to post the patch on the same message as the
discussion thread, or at least link back to the discussion thread from
the new one.  Otherwise people may have trouble understanding the
history.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query regarding permission on table_column%type access

2017-10-31 Thread Stephen Frost
Greetings,

* Neha Sharma (neha.sha...@enterprisedb.com) wrote:
> I have observed that even if the user does not have permission on a
> table(created in by some other user),the function parameter still can have
> a parameter of that table_column%type.

This is because the creation of the table also creates a type of the
same name and the type's permissions are independent of the table's.  I
imagine that you could REVOKE USAGE ON TYPE from the type and deny
access to that type if you wanted to.

I'm not sure that we should change the REVOKE on the table-level to also
mean to REVOKE access to the type automatically (and what happens if you
GRANT the access back for the table..?  Would we need to track that
dependency?) considering that's been the behavior for a very long time.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Adding column_constraint description in ALTER TABLE synopsis

2017-10-31 Thread Stephen Frost
Greetings,

* Lætitia Avrot (laetitia.av...@gmail.com) wrote:
> As Amit Langot pointed out, the column_constraint definition is missing
> whereas it is used in ALTER TABLE synopsis. It can be easily found in the
> CREATE TABLE synopsis, but it's not very user friendly.

Agreed.

> You will find enclosed my patch.

Thanks, this looks pretty reasonable, but did you happen to look for any
other keywords in the ALTER TABLE that should really be in ALTER TABLE
also?

I'm specifically looking at, at least, partition_bound_spec.  Maybe you
could propose an updated patch which addresses that also, and any other
cases you find?

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Remove inbound links to sql-createuser

2017-10-31 Thread Stephen Frost
David,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> Since CREATE USER is officially an alias for CREATE ROLE other parts of the
> documentation should point to CREATE ROLE, not CREATE USER.  Most do but I
> noticed when looking at CREATE DATABASE that it did not.  Further searching
> turned up the usage in client-auth.sgml.  That one is questionable since we
> are indeed talking about LOGIN roles there but we are already pointing to
> sql-alterrole instead of sql-alteruser and so changing the create variation
> to point to sql-createrole seems warranted.

+1.

Barring objections, I'll commit this in a bit.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Deadlock in ALTER SUBSCRIPTION REFRESH PUBLICATION

2017-10-31 Thread Masahiko Sawada
On Tue, Oct 24, 2017 at 7:13 PM, Konstantin Knizhnik
 wrote:
> Parallel execution of ALTER SUBSCRIPTION REFRESH PUBLICATION at several
> nodes may cause deadlock:
>
> knizhnik  1480  0.0  0.1 1417532 16496 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16589 sync 16720
> waiting
> knizhnik  1481  0.0  0.1 1417668 17668 ?   Ss   20:01   0:00 postgres:
> knizhnik postgres 127.0.0.1(36378) idle
> knizhnik  1484  0.0  0.1 1406952 16676 ?   Ss   20:01   0:00 postgres:
> knizhnik postgres 127.0.0.1(56194) ALTER SUBSCRIPTION waiting
> knizhnik  1486  0.0  0.1 1410040 21268 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(36386) idle
> knizhnik  1487  0.0  0.1 1417532 16736 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16589 sync 16774
> knizhnik  1489  0.0  0.1 1410040 21336 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(36388) idle
> knizhnik  1510  0.0  0.1 1406952 16820 ?   Ss   20:01   0:00 postgres:
> knizhnik postgres 127.0.0.1(56228) ALTER SUBSCRIPTION waiting
> knizhnik  1530  0.0  0.1 1406952 16736 ?   Ss   20:01   0:00 postgres:
> knizhnik postgres 127.0.0.1(56260) ALTER SUBSCRIPTION waiting
> knizhnik  1534  0.0  0.0 1417504 14360 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16590
> knizhnik  1537  0.0  0.1 1409164 16920 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(45054) idle
> knizhnik  1545  0.0  0.0 1417344 14576 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592
> knizhnik  1546  0.0  0.1 1409168 16588 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56274) idle
> knizhnik  1547  0.0  0.0 1417348 14332 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592 sync 16606
> knizhnik  1549  0.0  0.0 1409004 14512 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56276) idle in transaction waiting
> knizhnik  1553  0.0  0.0 1417348 14540 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592
> knizhnik  1554  0.0  0.1 1409168 16596 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56280) idle
> knizhnik  1555  0.0  0.0 1417348 14332 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592 sync 16660
> knizhnik  1556  0.0  0.0 1409004 14520 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56282) idle in transaction waiting
> knizhnik  1558  0.0  0.0 1417348 14460 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592 sync 16696
> knizhnik  1560  0.0  0.0 1409004 14516 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56286) idle in transaction waiting
> knizhnik  1562  0.0  0.0 1417348 14084 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592 sync 16732
> knizhnik  1567  0.0  0.0 1409004 14516 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56288) idle in transaction waiting
>
> knizhnik@knizhnik:~$ gdb -p 1556
> (gdb) bt
> #0  0x7f7b991569b3 in __epoll_wait_nocancel () at
> ../sysdeps/unix/syscall-template.S:84
> #1  0x0087185a in WaitEventSetWaitBlock (set=0x246d1e8,
> cur_timeout=-1, occurred_events=0x73a43580, nevents=1) at latch.c:1048
> #2  0x00871716 in WaitEventSetWait (set=0x246d1e8, timeout=-1,
> occurred_events=0x73a43580, nevents=1, wait_event_info=50331652) at
> latch.c:1000
> #3  0x00870e6c in WaitLatchOrSocket (latch=0x7f7b975daba4,
> wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:385
> #4  0x00870d3e in WaitLatch (latch=0x7f7b975daba4, wakeEvents=1,
> timeout=0, wait_event_info=50331652) at latch.c:339
> #5  0x0088995c in ProcSleep (locallock=0x24764d0,
> lockMethodTable=0xc17420 ) at proc.c:1255
> #6  0x008828cb in WaitOnLock (locallock=0x24764d0, owner=0x2517c78)
> at lock.c:1702
> #7  0x0088157e in LockAcquireExtended (locktag=0x73a439a0,
> lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
> '\001') at lock.c:998
> #8  0x00880ba8 in LockAcquire (locktag=0x73a439a0, lockmode=5,
> sessionLock=0 '\000', dontWait=0 '\000') at lock.c:688
> #9  0x0087f8dc in XactLockTableWait (xid=624, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:587
> #10 0x0082f524 in SnapBuildWaitSnapshot (running=0x246d1b0,
> cutoff=632) at snapbuild.c:1400
> #11 0x0082f2f6 in SnapBuildFindSnapshot (builder=0x254b550,
> lsn=24410008, running=0x246d1b0) at snapbuild.c:1311
> #12 0x0082ee67 in SnapBuildProcessRunningXacts (builder=0x254b550,
> lsn=24410008, running=0x246d1b0) at 

[HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists

2017-10-31 Thread Vitaly Burovoy
Hello, hackers!

Recently my colleagues found a bug.
They could not migrate from PG9.5 to PG10 due to error during
pg_upgrage (the same as in the "reproduce" part below).
An investigation showed there is a table "name" in the same schema
where the dumped sequence is located and the PG tries to unpack the
literal "bigint" as a composite type and fails.

The bug was introduced by two commits, one of them changes
search_path, the second one introduces the "'bigint'::name"
without specifying schema:
da4d1c0c15ab9afdfeee8bad9a1a9989b6bd59b5 pg_dump: Fix some schema
issues when dumping sequences
2ea5b06c7a7056dca0af1610aadebe608fbcca08 Add CREATE SEQUENCE AS  clause

Steps to reproduce:
$ psql -h pre-10 postgres
Password for user postgres:
psql (11devel, server 9.5.8)
Type "help" for help.

postgres=# create table name(id serial);
CREATE TABLE
postgres=# \q

$ pg_dump -h pre-10 postgres
pg_dump: [archiver (db)] query failed: ERROR:  malformed record
literal: "bigint"
LINE 1: SELECT 'bigint'::name AS sequence_type, start_value, increme...
   ^
DETAIL:  Missing left parenthesis.
pg_dump: [archiver (db)] query was: SELECT 'bigint'::name AS
sequence_type, start_value, increment_by, max_value, min_value,
cache_value, is_cycled FROM name_id_seq


I've implemented a little fix (attached), don't think there is
something to be written to docs and tests.

-- 
Best regards,
Vitaly Burovoy


0001-Fix-dumping-schema-if-a-table-named-name-exists.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Stephen Frost
Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 30 October 2017 at 19:55, Stephen Frost  wrote:
> > I don't think MERGE should be radically different from other database
> > systems and just syntax sugar over a capability we have.
> 
> I've proposed a SQL Standard compliant implementation that would do
> much more than be new syntax over what we already have.
> 
> So these two claims aren't accurate: "radical difference" and "syntax
> sugar over a capability we have".

Based on the discussion so far, those are the conclusions I've come to.
Saying they're isn't accurate without providing anything further isn't
likely to be helpful.

> > Time changes
> > many things, but I don't think anything's changed in this from the prior
> > discussions about it.
> 
> My proposal is new, that is what has changed.

I'm happy to admit that I've missed something in the discussion, but
from what I read the difference appeared to be primairly that you're
proposing it, and doing so a couple years later.

> At this stage, general opinions can be misleading.

I'd certainly love to see a MERGE capability that meets the standard and
works in much the same way from a user's perspective as the other RDBMS'
which already implement it.  From prior discussions with Peter on
exactly that subject, I'm also convinced that having that would be a
largely independent piece of work from the INSERT .. ON CONFLICT
capability that we have (just as MERGE is distinct from similar UPSERT
capabilities in other RDBMSs).

The goal here is really just to avoid time wasted developing MERGE based
on top of the INSERT .. ON CONFLICT system only to have it be rejected
later because multiple other committers and major contributors have said
that they don't agree with that approach.  If, given all of this
discussion, you don't feel that's a high risk with your approach then by
all means continue on with what you're thinking and we can review the
patch once it's posted.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Simon Riggs
On 30 October 2017 at 19:55, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs  wrote:
>> > Nothing I am proposing blocks later work.
>>
>> That's not really true.  Nobody's going to be happy if MERGE has one
>> behavior in one set of cases and an astonishingly different behavior
>> in another set of cases.  If you adopt a behavior for certain cases
>> that can't be extended to other cases, then you're blocking a
>> general-purpose MERGE.
>>
>> And, indeed, it seems that you're proposing an implementation that
>> adds no new functionality, just syntax compatibility.  Do we really
>> want or need two syntaxes  for the same thing in core?  I kinda think
>> Peter might have the right idea here.  Under his proposal, we'd be
>> getting something that is, in a way, new.
>
> +1.
>
> I don't think MERGE should be radically different from other database
> systems and just syntax sugar over a capability we have.

I've proposed a SQL Standard compliant implementation that would do
much more than be new syntax over what we already have.

So these two claims aren't accurate: "radical difference" and "syntax
sugar over a capability we have".

> Time changes
> many things, but I don't think anything's changed in this from the prior
> discussions about it.

My proposal is new, that is what has changed.

At this stage, general opinions can be misleading.

Hi ho, hi ho.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comment typo

2017-10-31 Thread Etsuro Fujita

On 2017/10/30 22:39, Magnus Hagander wrote:
On Mon, Oct 30, 2017 at 3:49 AM, Etsuro Fujita 
> wrote:


Here is a patch to fix a typo in a comment in partition.c:
s/specificiation/specification/.


Applied, thanks.


Thank you!

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Geoff Winkless
Can I add my 2c worth, as someone without a horse in the race, as it
were, in the hope that telling me how I've got this wrong might
clarify the argument a bit (or at least you can all start shouting at
me rather than each other :) )

The point of merge is to allow you to choose to either INSERT or
UPDATE (or indeed DELETE) records based on existing state, yes? That
state is whatever the state of the system at the start of the
transaction?

If I understand correctly, the only time when this would be
problematic is if you try to insert a record into a table which would
not allow that INSERT because another transaction has performed an
INSERT by the time the COMMIT happens, and where that new record would
have changed the state of the MERGE clause, yes?

Isn't the only reason this would fail if there is a unique constraint
on that table?

Yes, you could end up INSERTing values from the merge when another
transaction has INSERTed another, but (again, unless I've
misunderstood) there's nothing in the spec that says that shouldn't
happen; meanwhile for those tables that do require unique values you
can use the UPSERT mechanism, no?

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Query regarding permission on table_column%type access

2017-10-31 Thread Neha Sharma
Hi,

I have observed that even if the user does not have permission on a
table(created in by some other user),the function parameter still can have
a parameter of that table_column%type.

Scenario:
postgres=# create user u1 with login ;
CREATE ROLE
postgres=# create user u2 with login ;
CREATE ROLE
postgres=# \c - u1
You are now connected to database "postgres" as user "u1".
postgres=> create table t1(a int);
CREATE TABLE
postgres=> revoke ALL on t1 from u2;
REVOKE
postgres=> \c  - u2
You are now connected to database "postgres" as user "u2".
postgres=> create table t2(a int);
CREATE TABLE

postgres=> create or replace function foo(x t1.a%type) returns int as $$
BEGIN
return x + 1;
END;
$$ LANGUAGE plpgsql;
NOTICE:  type reference t1.a%TYPE converted to integer
CREATE FUNCTION
postgres=> select foo(1);
 foo
-
   2
(1 row)
postgres=> select * from t1;
ERROR:  permission denied for relation t1


Is this an expected behaviour? What if the user does not wants the object
type to be accessed across?

Thanks.
--
Regards,
Neha Sharma


Re: [HACKERS] parallelize queries containing initplans

2017-10-31 Thread tushar

On 10/30/2017 01:36 PM, tushar wrote:

On 10/30/2017 09:02 AM, Amit Kapila wrote:

Thanks a lot Tushar for testing this patch.  In the latest patch, I
have just rebased some comments, there is no code change, so I don't
expect any change in behavior, but feel free to test it once again.


Thanks Amit. Sure.


I have done re-verification ,Everything looks good.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] postgres.exe crashes with access violation on Windows while starting up

2017-10-31 Thread Michael Paquier
On Tue, Oct 31, 2017 at 6:59 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
>> So you are basically ready to lose any message that could be pushed
>> here if there is no memory context? That does not sound like a good
>> trade-off to me. A static buffer does not look like the best idea
>> either to not truncate message, so couldn't we envisage to just use
>> malloc? pgwin32_message_to_UTF16() is called in two places in elog.c,
>> and there is a full control on the error code paths.
>
> Thank you for reviewing a rare bug fix on Windows that most people wouldn't 
> be interested in.

Yeah, it may take a while until a committer gets interested I am
afraid. See my bug about pg_basebackup on Windows with path names..

> When CurrentMemoryContext is NULL, the message  is logged with 
> ReportEventA().  This is similar to write_console().

My point is that as Postgres is running as a service, isn't it wrong
to write a message to stderr as a fallback if the memory context is
not set? You would lose a message. It seems to me that for an
operation that can happen at a low-level like the postmaster startup,
we should really use a low-level operation as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stats about skipped vacuums

2017-10-31 Thread Kyotaro HORIGUCHI
This is just a repost as a (true) new thread.

At Mon, 30 Oct 2017 20:57:50 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20171030.205750.246076862.horiguchi.kyot...@lab.ntt.co.jp>
> At Fri, 20 Oct 2017 19:15:16 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] show "aggressive" or not in autovacuum logs

2017-10-31 Thread Kyotaro HORIGUCHI
At Thu, 26 Oct 2017 12:42:23 +0200, Robert Haas  wrote 
in 
> On Thu, Oct 26, 2017 at 10:18 AM, Kyotaro HORIGUCHI
>  wrote:
> > Thank you. I forgot that point. Changed them so that the messages
> > are detected as msgids.
> 
> Committed, changing "aggressive" to "aggressively" in one place for
> correct English.

Thank you for commiting!

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protect syscache from bloating with negative cache entries

2017-10-31 Thread Kyotaro HORIGUCHI
This is a rebased version of the patch.

At Fri, 17 Mar 2017 14:23:13 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170317.142313.232290068.horiguchi.kyot...@lab.ntt.co.jp>
> At Tue, 7 Mar 2017 19:23:14 -0800, David Steele  wrote 
> in <3b7b7f90-db46-8c37-c4f7-443330c3a...@pgmasters.net>
> > On 3/3/17 4:54 PM, David Steele wrote:
> > 
> > > On 2/1/17 1:25 AM, Kyotaro HORIGUCHI wrote:
> > >> Hello, thank you for moving this to the next CF.
> > >>
> > >> At Wed, 1 Feb 2017 13:09:51 +0900, Michael Paquier
> > >>  wrote in
> > >> 
> > >>> On Tue, Jan 24, 2017 at 4:58 PM, Kyotaro HORIGUCHI
> > >>>  wrote:
> >  Six new syscaches in 665d1fa was conflicted and 3-way merge
> >  worked correctly. The new syscaches don't seem to be targets of
> >  this patch.
> > >>> To be honest, I am not completely sure what to think about this patch.
> > >>> Moved to next CF as there is a new version, and no new reviews to make
> > >>> the discussion perhaps move on.
> > >> I'm thinking the following is the status of this topic.
> > >>
> > >> - The patch stll is not getting conflicted.
> > >>
> > >> - This is not a hollistic measure for memory leak but surely
> > >>saves some existing cases.
> > >>
> > >> - Shared catcache is another discussion (and won't really
> > >>proposed in a short time due to the issue on locking.)
> > >>
> > >> - As I mentioned, a patch that caps the number of negative
> > >>entries is avaiable (in first-created - first-delete manner)
> > >>but it is having a loose end of how to determine the
> > >>limitation.
> > > While preventing bloat in the syscache is a worthwhile goal, it
> > > appears
> > > there are a number of loose ends here and a new patch has not been
> > > provided.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 9f2c81dbc9bc344cafd6995dfc5969d55a8457d9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 28 Aug 2017 11:36:21 +0900
Subject: [PATCH 1/2] Cleanup negative cache of pg_statistic when dropping a
 relation.

Accessing columns that don't have statistics leaves negative entries
in catcache for pg_statstic, but there's no chance to remove
them. Especially when repeatedly creating then dropping temporary
tables bloats catcache so much that memory pressure becomes
significant. This patch removes negative entries in STATRELATTINH,
ATTNAME and ATTNUM when corresponding relation is dropped.
---
 src/backend/utils/cache/catcache.c |  58 ++-
 src/backend/utils/cache/syscache.c | 302 +++--
 src/include/utils/catcache.h   |   3 +
 src/include/utils/syscache.h   |   2 +
 4 files changed, 282 insertions(+), 83 deletions(-)

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 95a0742..bd303f3 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -423,10 +423,11 @@ CatCachePrintStats(int code, Datum arg)
 
 		if (cache->cc_ntup == 0 && cache->cc_searches == 0)
 			continue;			/* don't print unused caches */
-		elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %ld lsrch, %ld lhits",
+		elog(DEBUG2, "catcache %s/%u: %d tup, %d negtup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %ld lsrch, %ld lhits",
 			 cache->cc_relname,
 			 cache->cc_indexoid,
 			 cache->cc_ntup,
+			 cache->cc_nnegtup,
 			 cache->cc_searches,
 			 cache->cc_hits,
 			 cache->cc_neg_hits,
@@ -495,8 +496,11 @@ CatCacheRemoveCTup(CatCache *cache, CatCTup *ct)
 	 * point into tuple, allocated together with the CatCTup.
 	 */
 	if (ct->negative)
+	{
 		CatCacheFreeKeys(cache->cc_tupdesc, cache->cc_nkeys,
 		 cache->cc_keyno, ct->keys);
+		--cache->cc_nnegtup;
+	}
 
 	pfree(ct);
 
@@ -697,6 +701,51 @@ ResetCatalogCache(CatCache *cache)
 }
 
 /*
+ *		CleanupCatCacheNegEntries
+ *
+ *	Remove negative cache tuples matching a partial key.
+ *
+ */
+void
+CleanupCatCacheNegEntries(CatCache *cache, ScanKeyData *skey)
+{
+	int i;
+
+	/* If this cache has no negative entries, nothing to do */
+	if (cache->cc_nnegtup == 0)
+		return;
+
+	/* searching with a partial key means scanning the whole cache */
+	for (i = 0; i < cache->cc_nbuckets; i++)
+	{
+		dlist_head *bucket = >cc_bucket[i];
+		dlist_mutable_iter iter;
+
+		dlist_foreach_modify(iter, bucket)
+		{
+			const CCFastEqualFN *cc_fastequal = cache->cc_fastequal;
+			CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur);
+			int			oid_attnum = skey->sk_attno - 1;
+
+			if (!ct->negative)
+continue;
+
+			/* Compare the OIDs */
+			if (!(cc_fastequal[oid_attnum])(ct->keys[oid_attnum],
+			skey[0].sk_argument))
+continue;
+
+			/*
+			 * the negative cache entries can no longer be 

Re: [HACKERS] Restricting maximum keep segments by repslots

2017-10-31 Thread Kyotaro HORIGUCHI
Hello, this is a rebased version.

It gets a change of the meaning of monitoring value along with
rebasing.

In previous version, the "live" column mysteriously predicts the
necessary segments will be kept or lost by the next checkpoint
and the "distance" offered a still more mysterious value.

In this version the meaning of the two columns became clear and
informative.

pg_replication_slots
  - live:
true the slot have not lost necessary segments.

  - distance:
how many bytes LSN can advance before the margin defined by
max_slot_wal_keep_size (and wal_keep_segments) is exhasuted,
or how many bytes this slot have lost xlog from restart_lsn.

There is a case where live = t and distance = 0. The slot is
currently having all the necessary segments but will start to
lose them at most two checkpoint passes.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 57eaa2b878d30bfcebb093cca0e772fe7a9bff0e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 28 Feb 2017 11:39:48 +0900
Subject: [PATCH 1/2] Add WAL releaf vent for replication slots

Adds a capability to limit the number of segments kept by replication
slots by a GUC variable.
---
 src/backend/access/transam/xlog.c | 24 
 src/backend/utils/misc/guc.c  | 11 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 4 files changed, 37 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..f79cefb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -105,6 +105,7 @@ int			wal_level = WAL_LEVEL_MINIMAL;
 int			CommitDelay = 0;	/* precommit delay in microseconds */
 int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
+int			max_slot_wal_keep_size_mb = 0;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -9432,9 +9433,32 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 	if (max_replication_slots > 0 && keep != InvalidXLogRecPtr)
 	{
 		XLogSegNo	slotSegNo;
+		int			slotlimitsegs = ConvertToXSegs(max_slot_wal_keep_size_mb);
 
 		XLByteToSeg(keep, slotSegNo, wal_segment_size);
 
+		/*
+		 * ignore slots if too many wal segments are kept.
+		 * max_slot_wal_keep_size is just accumulated on wal_keep_segments.
+		 */
+		if (max_slot_wal_keep_size_mb > 0 && slotSegNo + slotlimitsegs < segno)
+		{
+			segno = segno - slotlimitsegs; /* must be positive */
+
+			/*
+			 * warn only if the checkpoint flushes the required segment.
+			 * we assume here that *logSegNo is calculated keep location.
+			 */
+			if (slotSegNo < *logSegNo)
+ereport(WARNING,
+	(errmsg ("restart LSN of replication slots is ignored by checkpoint"),
+	 errdetail("Some replication slots have lost required WAL segnents to continue by up to %ld segments.",
+	   (segno < *logSegNo ? segno : *logSegNo) - slotSegNo)));
+
+			/* emergency vent */
+			slotSegNo = segno;
+		}
+
 		if (slotSegNo <= 0)
 			segno = 1;
 		else if (slotSegNo < segno)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 65372d7..511023a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2368,6 +2368,17 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
+			gettext_noop("Sets the maximum size of extra WALs kept by replication slots."),
+		 NULL,
+		 GUC_UNIT_MB
+		},
+		_slot_wal_keep_size_mb,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"wal_sender_timeout", PGC_SIGHUP, REPLICATION_SENDING,
 			gettext_noop("Sets the maximum time to wait for WAL replication."),
 			NULL,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 368b280..e76c73a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -234,6 +234,7 @@
 #max_wal_senders = 10		# max number of walsender processes
 # (change requires restart)
 #wal_keep_segments = 0		# in logfile segments; 0 disables
+#max_slot_wal_keep_size = 0	# measured in bytes; 0 disables
 #wal_sender_timeout = 60s	# in milliseconds; 0 disables
 
 #max_replication_slots = 10	# max number of replication slots
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 0f2b8bd..f0c0255 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -98,6 +98,7 @@ extern int	wal_segment_size;
 extern int	min_wal_size_mb;
 extern int	max_wal_size_mb;
 extern int	wal_keep_segments;
+extern int	max_slot_wal_keep_size_mb;
 extern int	XLOGbuffers;
 extern int	XLogArchiveTimeout;
 extern int	wal_retrieve_retry_interval;
-- 
2.9.2

>From 37749d46ba97de38e4593f141bb8a82c67fc0af5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 

Re: [HACKERS] EXPLAIN (ANALYZE, BUFFERS) reports bogus temporary buffer reads

2017-10-31 Thread Robert Haas
On Tue, Oct 17, 2017 at 2:29 AM, Thomas Munro
 wrote:
> Vik Fearing asked off-list why hash joins appear to read slightly more
> temporary data than they write.  The reason is that we notch up a
> phantom block read when we hit the end of each file.  Harmless but it
> looks a bit weird and it's easily fixed.
>
> Unpatched, a 16 batch hash join reports that we read 30 more blocks
> than we wrote (2 per batch after the first, as expected):
>
>Buffers: shared hit=434 read=16234, temp read=5532 written=5502
>
> With the attached patch:
>
>Buffers: shared hit=547 read=16121, temp read=5502 written=5502

Committed.  Arguably we ought to back-patch this, but it's minor so I didn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Simon Riggs
On 30 October 2017 at 19:17, Peter Geoghegan  wrote:
> On Mon, Oct 30, 2017 at 11:07 AM, Simon Riggs  wrote:
>> Please explain in detail the MERGE SQL statements that you think will
>> be problematic and why.
>
> Your proposal is totally incomplete, so I can only surmise its
> behavior in certain cases, to make a guess at what the problems might
> be (see my remarks on EPQ, live lock, etc). This makes it impossible
> to do what you ask right now.

Impossible, huh. Henry Ford was right.

If there are challenges ahead, its reasonable to ask for test cases
for that now especially if you think you know what they already are.
Imagine we go forwards 2 months - if you dislike my patch when it
exists you will submit a test case showing the fault. Why not save us
all the trouble and describe that now? Test Driven Development.

> Besides, you haven't answered the question from my last e-mail
> ("What's wrong with that [set of MERGE semantics]?"), so why should I
> go to further trouble? You're just not constructively engaging with me
> at this point.

It's difficult to discuss anything with someone that refuses to
believe that there are acceptable ways around things. I believe there
are.

If you can calm down the rhetoric we can work together, but if you
continue to grandstand it makes it more difficult.


> We're going around in circles.

Not really. You've said some things and I'm waiting for further
details about the problems you've raised.

You've said its possible another way. Show that assertion is actually
true. We're all listening, me especially, for the technical details.

I've got a fair amount of work to do to get the rest of the patch in
shape, so take your time and make it a complete explanation.

My only goal is the MERGE feature in PG11. For me this is a
collaborative engineering challenge not a debate and definitely not an
argument. If you describe your plan of how to do this, I may be able
to follow it and include that design. If you don't, then it will be
difficult for me to include your thoughts. If you or others wish to
write something as well, I have already said that is OK too.

If anybody's goal is to block development or to wait for perfection to
exist at some unstated time in the future, than I disagree with those
thoughts.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-10-31 Thread Robert Haas
On Mon, Oct 16, 2017 at 3:02 PM, Masahiko Sawada  wrote:
> I guess that is the patch I proposed. However I think that there still
> is room for discussion because the patch cannot skip to cleanup vacuum
> when aggressive vacuum, which is one of the situation that I really
> wanted to skip.

Well, I think there are outstanding concerns that the patch in
question is not safe, and I don't see that anything has been done to
resolve them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: MultiXactId 3268957 has not been created yet -- apparent wraparound after missused pg_resetxlogs

2017-10-31 Thread Robert Haas
On Mon, Oct 16, 2017 at 5:41 PM, alain radix  wrote:
> I’m facing a problem with a PostgreSQL 9.6.2 reporting this error when
> selecting a table
>
> ERROR:  MultiXactId 3268957 has not been created yet -- apparent wraparound
>
> The root cause is not a Postgres bug but a buggy person that used
> pg_resetxlogs obviously without reading or understanding the documentation.

Hmm, I hope you patched that person...

> I’m trying to extract data from the db to create a new one ;-)
>
> But pg_dump logically end with the same error.
>
> I’ve seen the row with t_max  = 3268957 page_inspect, I might use it to
> extract row from the page, but this will not be quite easy.

Not sure if this would work, but maybe try BEGIN ... UPDATE the row,
perhaps via TID qual ... ROLLBACK?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-31 Thread Alexander Korotkov
On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada 
wrote:

> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas 
> wrote:
> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
> >  wrote:
> >> Hello. I made some bugfixes and rewrite the patch.
> >
> > I don't think it's a good idea to deliberately leave the state of the
> > standby different from the state of the  master on the theory that it
> > won't matter.  I feel like that's something that's likely to come back
> > to bite us.
>
> I agree with Robert. What happen if we intentionally don't apply the
> truncation WAL and switched over? If we insert a tuple on the new
> master server to a block that has been truncated on the old master,
> the WAL apply on the new standby will fail? I guess there are such
> corner cases causing failures of WAL replay after switch-over.
>

Yes, that looks dangerous.  One approach to cope that could be teaching
heap redo function to handle such these situations.  But I expect this
approach to be criticized for bad design.  And I understand fairness of
this criticism.

However, from user prospective of view, current behavior of
hot_standby_feedback
is just broken, because it both increases bloat and doesn't guarantee that
read-only query on standby wouldn't be cancelled because of vacuum.
Therefore, we should be looking for solution: if one approach isn't good
enough, then we should look for another approach.

I can propose following alternative approach: teach read-only queries on
hot standby to tolerate concurrent relation truncation.  Therefore, when
non-existent heap page is accessed on hot standby, we can know that it was
deleted by concurrent truncation and should be assumed to be empty.  Any
thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-31 Thread Robert Haas
On Mon, Oct 16, 2017 at 5:03 PM, Ashutosh Bapat
 wrote:
> set_append_rel_size() crashes when it encounters a partitioned table
> with a dropped column. Dropped columns do not have any translations
> saved in AppendInfo::translated_vars; the corresponding entry is NULL
> per make_inh_translation_list().
> 1802 att = TupleDescAttr(old_tupdesc, old_attno);
> 1803 if (att->attisdropped)
> 1804 {
> 1805 /* Just put NULL into this list entry */
> 1806 vars = lappend(vars, NULL);
> 1807 continue;
> 1808 }
>
> In set_append_rel_size() we try to attr_needed for child tables. While
> doing so we try to translate a user attribute number of parent to that
> of a child and crash since the translated Var is NULL. Here's patch to
> fix the crash. The patch also contains a testcase to test dropped
> columns in partitioned table.
>
> Sorry for not noticing this problem earlier.

OK, committed.  This is a good example of how having good code
coverage doesn't necessarily mean you've found all the bugs.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flexible configuration for full-text search

2017-10-31 Thread Emre Hasegeli
> I'm mostly happy with mentioned modifications, but I have few questions
> to clarify some points. I will send new patch in week or two.

I am glad you liked it.  Though, I think we should get approval from
more senior community members or committers about the syntax, before
we put more effort to the code.

> But configuration:
>
> CASE english_noun WHEN MATCH THEN english_hunspell ELSE simple END
>
> is not (as I understand ELSE can be used only with KEEP).
>
> I think we should decide to allow or disallow usage of different
> dictionaries for match checking (between CASE and WHEN) and a result
> (after THEN). If answer is 'allow', maybe we should allow the
> third example too for consistency in configurations.

I think you are right.  We better allow this too.  Then the CASE syntax becomes:

CASE config
WHEN [ NO ] MATCH THEN { KEEP | config }
[ ELSE config ]
END

> Based on formal definition it is possible to describe this example in
> following manner:
> CASE english_noun WHEN MATCH THEN english_hunspell END
>
> The question is same as in the previous example.

I couldn't understand the question.

> Currently, stopwords increment position, for example:
> SELECT to_tsvector('english','a test message');
> -
>  'messag':3 'test':2
>
> A stopword 'a' has a position 1 but it is not in the vector.

Is this problem only applies to stopwords and the whole thing we are
inventing?  Shouldn't we preserve the positions through the pipeline?

> If we want to save this behavior, we should somehow pass a stopword to
> tsvector composition function (parsetext in ts_parse.c) for counter
> increment or increment it in another way. Currently, an empty lexemes
> array is passed as a result of LexizeExec.
>
> One of possible way to do so is something like:
> CASE polish_stopword
> WHEN MATCH THEN KEEP -- stopword counting
> ELSE polish_isspell
> END

This would mean keeping the stopwords.  What we want is

CASE polish_stopword-- stopword counting
WHEN NO MATCH THEN polish_isspell
END

Do you think it is possible?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-10-31 Thread Alexander Korotkov
On Sun, Oct 29, 2017 at 12:47 AM, Pavel Stehule 
wrote:

> 2017-10-28 23:35 GMT+02:00 Alexander Korotkov :
>
>> On Sat, Oct 28, 2017 at 3:46 PM, Pavel Stehule 
>> wrote:
>>
>>> 2017-09-22 21:31 GMT+02:00 Pavel Stehule :
>>>


 2017-09-22 21:12 GMT+02:00 Peter Eisentraut <
 peter.eisentr...@2ndquadrant.com>:

> On 9/22/17 09:16, Pavel Stehule wrote:
> > Example: somebody set SORT_COLUMNS to schema_name value. This is
> > nonsense for \l command
> >
> > Now, I am thinking so more correct and practical design is based on
> > special mode, activated by variable
> >
> > PREFER_SIZE_SORT .. (off, asc, desc)
> >
> > This has sense for wide group of commands that can show size. And
> when
> > size is not visible, then this option is not active.
>
> Maybe this shouldn't be a variable at all.  It's not like you'll set
> this as a global preference.  You probably want it for one command
> only.
>  So a per-command option might make more sense.
>

 Sure, I cannot to know, what users will do. But, when I need to see a
 size of objects, then I prefer the sort by size desc every time. If I need
 to find some object, then I can to use a searching in pager. So in my case,
 this settings will be in psqlrc. In GoodData we used years own
 customization - the order by size was hardcoded and nobody reported me any
 issue.

 Alexander proposed some per command option, but current syntax of psql
 commands don't allows some simple parametrization. If it can be user
 friendly, then it should be short. From implementation perspective, it
 should be simply parsed. It should be intuitive too - too much symbols
 together is not good idea.

 Maybe some prefix design - but it is not design for common people
 (although these people don't use psql usually)

 '\sort size \dt ?

 \dt:sort_by_size
 \dt+:sort_by_size ?

 I don't see any good design in this direction


>>> I though about Alexander proposal, and I am thinking so it can be
>>> probably best if we respect psql design. I implemented two command suffixes
>>> (supported only when it has sense) "s" sorted by size and "d" as descent
>>>
>>> so list of tables can be sorted with commands:
>>>
>>> \dt+sd (in this case, the order is not strict), so command
>>> \dtsd+ is working too (same \disd+ or \di+sd)
>>>
>>> These two chars are acceptable. Same principle is used for \l command
>>>
>>> \lsd+ or \l+sd
>>>
>>> What do you think about it?
>>>
>>
>> I think \lsd+ command would be another postgres meme :)
>> BTW, are you going to provide an ability to sort by name, schema?
>>
>
> It has sense only for tables - probably only \dtn "n" like name
>

In general, this approach looks good for me.
Regarding current state of patch, I'd like to see new options documented.
Also, it would be better to replace "bool sort_size" with enum assuming
there could be other sorting orders in future.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Michael Paquier
On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> So: I put the blame on the fact that summarize_range() thinks that
>>> the tuple offset it has for the placeholder tuple is guaranteed to
>>> hold good, even across possibly-long intervals where it's holding
>>> no lock on the containing buffer.
>
>> Yeah, I think this is a pretty reasonable explanation for the problem.
>> I don't understand why it doesn't fail in 9.6.
>
> Yeah, we're still missing an understanding of why we didn't see it
> before; the inadequate locking was surely there before.  I'm guessing
> that somehow the previous behavior of PageIndexDeleteNoCompact managed
> to mask the problem (perhaps only by not throwing an error, which doesn't
> imply that the index state was good afterwards).  But I don't see quite
> how it did that.

Because 24992c6d has added a check on the offset number by using
PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks
tighter, no? I have not tested, and I lack knowledge about the brin
code, but it seems to me that if we had a similar check then things
could likely blow up.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-31 Thread Amit Langote
Thanks for the test case.

On 2017/10/30 17:09, Rajkumar Raghuwanshi wrote:
> I am getting wrong output when default is sub-partitioned further, below is
> a test case.
> 
> CREATE TABLE lpd(a int, b varchar, c float) PARTITION BY LIST (a);
> CREATE TABLE lpd_p1 PARTITION OF lpd FOR VALUES IN (1,2,3);
> CREATE TABLE lpd_p2 PARTITION OF lpd FOR VALUES IN (4,5);
> CREATE TABLE lpd_d PARTITION OF lpd DEFAULT PARTITION BY LIST(a);
> CREATE TABLE lpd_d1 PARTITION OF lpd_d FOR VALUES IN (7,8,9);
> CREATE TABLE lpd_d2 PARTITION OF lpd_d FOR VALUES IN (10,11,12);
> CREATE TABLE lpd_d3 PARTITION OF lpd_d FOR VALUES IN (6,null);
> INSERT INTO lpd SELECT i,i,i FROM generate_Series (1,12)i;
> INSERT INTO lpd VALUES (null,null,null);
> 
> --on HEAD
> postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
> a IS NOT NULL ORDER BY 1;
>  QUERY PLAN
> -
>  Sort
>Sort Key: ((lpd_p1.tableoid)::regclass)
>->  Result
>  ->  Append
>->  Seq Scan on lpd_p1
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_p2
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_d3
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_d1
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_d2
>  Filter: (a IS NOT NULL)
> (14 rows)
> 
> postgres=#
> postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
> BY 1;
>  tableoid | a  | b  | c
> --+++
>  lpd_p1   |  1 | 1  |  1
>  lpd_p1   |  2 | 2  |  2
>  lpd_p1   |  3 | 3  |  3
>  lpd_p2   |  4 | 4  |  4
>  lpd_p2   |  5 | 5  |  5
>  lpd_d1   |  7 | 7  |  7
>  lpd_d1   |  8 | 8  |  8
>  lpd_d1   |  9 | 9  |  9
>  lpd_d2   | 12 | 12 | 12
>  lpd_d2   | 10 | 10 | 10
>  lpd_d2   | 11 | 11 | 11
>  lpd_d3   |  6 | 6  |  6
> (12 rows)
> 
> 
> --on HEAD + v8 patches
> 
> postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
> a IS NOT NULL ORDER BY 1;
>  QUERY PLAN
> -
>  Sort
>Sort Key: ((lpd_p1.tableoid)::regclass)
>->  Result
>  ->  Append
>->  Seq Scan on lpd_p1
>  Filter: (a IS NOT NULL)
>->  Seq Scan on lpd_p2
>  Filter: (a IS NOT NULL)
> (8 rows)
> 
> postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
> BY 1;
>  tableoid | a | b | c
> --+---+---+---
>  lpd_p1   | 1 | 1 | 1
>  lpd_p1   | 2 | 2 | 2
>  lpd_p1   | 3 | 3 | 3
>  lpd_p2   | 4 | 4 | 4
>  lpd_p2   | 5 | 5 | 5
> (5 rows)

I found bugs in 0003 and 0005 that caused this.  Will post the patches
containing the fix in reply to the Dilip's email which contains some code
review comments [1].

Also, I noticed that the new pruning code was having a hard time do deal
with the fact that the default "range" partition doesn't explicitly say in
its partition constraint that it might contain null values.  More
precisely perhaps, the default range partition's constraint appears to
imply that it can only contain non-null values, which confuses the new
pruning code.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFiTN-thYsobXxPS6bwOA_9erpax_S=iztsn3rtuxkkmkg4...@mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

2017-10-31 Thread Chris Travers
Hi;

After Andres's excellent talk at PGConf we tried benchmarking
effective_io_concurrency on some of our servers and found that those which
have a number of NVME storage volumes could not fill the I/O queue even at
the maximum setting (1000).

Before we start benchmarking patched versions of PostgreSQL to bump this
setting up to ever higher values, I am wondering if anyone has done this
yet and if so if you would be willing to share results.

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] [bug fix] postgres.exe crashes with access violation on Windows while starting up

2017-10-31 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> So you are basically ready to lose any message that could be pushed
> here if there is no memory context? That does not sound like a good
> trade-off to me. A static buffer does not look like the best idea
> either to not truncate message, so couldn't we envisage to just use
> malloc? pgwin32_message_to_UTF16() is called in two places in elog.c,
> and there is a full control on the error code paths.

Thank you for reviewing a rare bug fix on Windows that most people wouldn't be 
interested in. When CurrentMemoryContext is NULL, the message  is logged with 
ReportEventA().  This is similar to write_console().

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers