Re: [HACKERS] more on comments of snapbuild.c

2017-03-17 Thread Erik Rijkers

On 2017-03-18 06:37, Erik Rijkers wrote:
Studying logrep yielded some more improvements to the comments in 
snapbuild.c


(to be applied to master)


Attached the actual file


thanks,

Erik Rijekrs

--- src/backend/replication/logical/snapbuild.c.orig	2017-03-18 05:02:28.627077888 +0100
+++ src/backend/replication/logical/snapbuild.c	2017-03-18 06:04:48.091686815 +0100
@@ -27,7 +27,7 @@
  * removed. This is achieved by using the replication slot mechanism.
  *
  * As the percentage of transactions modifying the catalog normally is fairly
- * small in comparisons to ones only manipulating user data, we keep track of
+ * small in comparison to ones only manipulating user data, we keep track of
  * the committed catalog modifying ones inside [xmin, xmax) instead of keeping
  * track of all running transactions like it's done in a normal snapshot. Note
  * that we're generally only looking at transactions that have acquired an
@@ -42,7 +42,7 @@
  * catalog in a transaction. During normal operation this is achieved by using
  * CommandIds/cmin/cmax. The problem with that however is that for space
  * efficiency reasons only one value of that is stored
- * (c.f. combocid.c). Since ComboCids are only available in memory we log
+ * (cf. combocid.c). Since ComboCids are only available in memory we log
  * additional information which allows us to get the original (cmin, cmax)
  * pair during visibility checks. Check the reorderbuffer.c's comment above
  * ResolveCminCmaxDuringDecoding() for details.
@@ -92,7 +92,7 @@
  * Only transactions that commit after CONSISTENT state has been reached will
  * be replayed, even though they might have started while still in
  * FULL_SNAPSHOT. That ensures that we'll reach a point where no previous
- * changes has been exported, but all the following ones will be. That point
+ * changes have been exported, but all the following ones will be. That point
  * is a convenient point to initialize replication from, which is why we
  * export a snapshot at that point, which *can* be used to read normal data.
  *
@@ -134,7 +134,7 @@
 
 /*
  * This struct contains the current state of the snapshot building
- * machinery. Besides a forward declaration in the header, it is not exposed
+ * machinery. Except for a forward declaration in the header, it is not exposed
  * to the public, so we can easily change its contents.
  */
 struct SnapBuild
@@ -442,7 +442,7 @@
 
 	/*
 	 * We misuse the original meaning of SnapshotData's xip and subxip fields
-	 * to make the more fitting for our needs.
+	 * to make them more fitting for our needs.
 	 *
 	 * In the 'xip' array we store transactions that have to be treated as
 	 * committed. Since we will only ever look at tuples from transactions
@@ -645,7 +645,7 @@
 
 /*
  * Handle the effects of a single heap change, appropriate to the current state
- * of the snapshot builder and returns whether changes made at (xid, lsn) can
+ * of the snapshot builder and return whether changes made at (xid, lsn) can
  * be decoded.
  */
 bool
@@ -1143,7 +1143,7 @@
 	 */
 	builder->xmin = running->oldestRunningXid;
 
-	/* Remove transactions we don't need to keep track off anymore */
+	/* Remove transactions we don't need to keep track of anymore */
 	SnapBuildPurgeCommittedTxn(builder);
 
 	elog(DEBUG3, "xmin: %u, xmax: %u, oldestrunning: %u",
@@ -1250,7 +1250,7 @@
 	}
 
 	/*
-	 * a) No transaction were running, we can jump to consistent.
+	 * a) No transactions were running, we can jump to consistent.
 	 *
 	 * NB: We might have already started to incrementally assemble a snapshot,
 	 * so we need to be careful to deal with that.
@@ -1521,8 +1521,8 @@
 			(uint32) (lsn >> 32), (uint32) lsn, MyProcPid);
 
 	/*
-	 * Unlink temporary file if it already exists, needs to have been before a
-	 * crash/error since we won't enter this function twice from within a
+	 * Unlink temporary file if it already exists, must have been from before
+	 * a crash/error since we won't enter this function twice from within a
 	 * single decoding slot/backend and the temporary file contains the pid of
 	 * the current process.
 	 */
@@ -1624,8 +1624,8 @@
 	fsync_fname("pg_logical/snapshots", true);
 
 	/*
-	 * Now there's no way we can loose the dumped state anymore, remember this
-	 * as a serialization point.
+	 * Now that there's no way we can lose the dumped state anymore, remember
+	 * this as a serialization point.
 	 */
 	builder->last_serialized_snapshot = lsn;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] more on comments of snapbuild.c

2017-03-17 Thread Erik Rijkers
Studying logrep yielded some more improvements to the comments in 
snapbuild.c


(to be applied to master)

thanks,

Erik Rijekrs


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-17 Thread Thomas Munro
On Tue, Mar 14, 2017 at 8:03 AM, Thomas Munro
 wrote:
> On Mon, Mar 13, 2017 at 8:40 PM, Rafia Sabih
>  wrote:
>> In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a
>> few regressions,
>> Q21: 52 secs on HEAD and  400 secs with this patch
>
> Thanks Rafia.  Robert just pointed out off-list that there is a bogus
> 0 row estimate in here:
>
> ->  Parallel Hash Semi Join  (cost=1006599.34..1719227.30 rows=0
> width=24) (actual time=38716.488..100933.250 rows=7315896 loops=5)
>
> Will investigate, thanks.

There are two problems here.

1.  There is a pre-existing cardinality estimate problem for
semi-joins with <> filters.  The big Q21 regression reported by Rafia
is caused by that phenomenon, probably exacerbated by another bug that
allowed 0 cardinality estimates to percolate inside the planner.
Estimates have been clamped at or above 1.0 since her report by commit
1ea60ad6.

I started a new thread to discuss that because it's unrelated to this
patch, except insofar as it confuses the planner about Q21 (with or
without parallelism).  Using one possible selectivity tweak suggested
by Tom Lane, I was able to measure significant speedups on otherwise
unpatched master:

https://www.postgresql.org/message-id/CAEepm%3D11BiYUkgXZNzMtYhXh4S3a9DwUP8O%2BF2_ZPeGzzJFPbw%40mail.gmail.com

2.  If you compare master tweaked as above against the latest version
of my patch series with the tweak, then the patched version always
runs faster with 4 or more workers, but with only 1 or 2 workers Q21
is a bit slower... but not always.  I realised that there was a
bi-modal distribution of execution times.  It looks like my 'early
exit' protocol, designed to make tuple-queue deadlock impossible, is
often causing us to lose a worker.  I am working on that now.

I have code changes for Peter G's and Andres's feedback queued up and
will send a v8 series shortly, hopefully with a fix for problem 2
above.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-17 Thread Robert Haas
On Fri, Mar 17, 2017 at 9:15 AM, Ashutosh Bapat
 wrote:
> This set of patches fixes both of those things.

0001 changes the purpose of a function and then 0007 renames it.  It
would be better to include the renaming in 0001 so that you're not
taking multiple whacks at the same function in the same patch series.
I believe it would also be best to include 0011's changes to
adjust_appendrel_attrs_multilevel in 0001.

0002 should either add find_param_path_info() to the relevant header
file as extern from the beginning, or it should declare and define it
as static and then 0007 can remove those markings.  It makes no sense
to declare it as extern but put the prototype in the .c file.

0004 still needs to be pared down.  If you want to get something
committed this release cycle, you have to get these details taken care
of, uh, more or less immediately.  Actually, preferably, several weeks
ago.  You're welcome to maintain your own test suite locally but what
you submit should be what you are proposing for commit -- or if not,
then you should separate the part proposed for commit and the part
included for dev testing into two different patches.

In 0005's README, the part about planning partition-wise joins in two
phases needs to be removed.  This patch also contains a small change
to partition_join.sql that belongs in 0004.

0008 removes direct tests against RELOPT_JOINREL almost everywhere,
but it overlooks the new ones added to postgres_fdw.c by
b30fb56b07a885f3476fe05920249f4832ca8da5.  It should be updated to
cover those as well, I suspect.  The commit message claims that it
will "Similarly replace RELOPT_OTHER_MEMBER_REL test with
IS_OTHER_REL() where we want to test for child relations of all kinds,
but in fact it makes exactly zero such substitutions.

While I was studying what you did with reparameterize_path_by_child(),
I started to wonder whether reparameterize_path() doesn't need to
start handling join paths.  I think it only handles scan paths right
now because that's the only thing that can appear under an appendrel
created by inheritance expansion, but you're changing that.  Maybe
it's not critical -- I think the worst consequences of missing some
handling there is that we won't consider a parameterized path in some
case where it would be advantageous to do so.  Still, you might want
to investigate a bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-17 Thread Robert Haas
On Fri, Mar 17, 2017 at 6:11 PM, Peter Eisentraut
 wrote:
> On 3/17/17 16:56, Tom Lane wrote:
>> Tools could get the segment size out of
>> XLogLongPageHeaderData.xlp_seg_size in the first page of the segment.
>
> OK, then pg_standby would have to wait until the file is at least
> XLOG_BLCKSZ, then look inside and get the expected final size.  A bit
> more complicated than now, but seems doable.

Yeah, that doesn't sound too bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Thomas Munro
On Sat, Mar 18, 2017 at 11:49 AM, Thomas Munro
 wrote:
> On Sat, Mar 18, 2017 at 6:14 AM, Tom Lane  wrote:
>> After a bit more thought, it seems like the bug here is that "the
>> fraction of the LHS that has a non-matching row" is not one minus
>> "the fraction of the LHS that has a matching row".  In fact, in
>> this example, *all* LHS rows have both matching and non-matching
>> RHS rows.  So the problem is that neqjoinsel is doing something
>> that's entirely insane for semijoin cases.
>>
>> It would not be too hard to convince me that neqjoinsel should
>> simply return 1.0 for any semijoin/antijoin case, perhaps with
>> some kind of discount for nullfrac.  Whether or not there's an
>> equal row, there's almost always going to be non-equal row(s).
>> Maybe we can think of a better implementation but that seems
>> like the zero-order approximation.
>
> Right.  If I temporarily hack neqjoinsel() thus:
>
> result = 1.0 - result;
> +
> +   if (jointype == JOIN_SEMI)
> +   result = 1.0;
> +
> PG_RETURN_FLOAT8(result);
>  }
>
> ... then I obtain sensible row estimates and the following speedups
> for TPCH Q21:
>
>   8 workers = 8.3s -> 7.8s
>   7 workers = 8.2s -> 7.9s
>   6 workers = 8.5s -> 8.2s
>   5 workers = 8.9s -> 8.5s
>   4 workers = 9.5s -> 9.1s
>   3 workers = 39.7s -> 9.9s
>   2 workers = 36.9s -> 11.7s
>   1 worker  = 38.2s -> 15.0s
>   0 workers = 47.9s -> 24.7s
>
> The plan is similar to the good plan from before even at lower worker
> counts, but slightly better because the aggregation has been pushed
> under the Gather node.  See attached.

... and so has the anti-join, probably more importantly.

Thanks for looking at this!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Thomas Munro
On Sat, Mar 18, 2017 at 6:14 AM, Tom Lane  wrote:
> After a bit more thought, it seems like the bug here is that "the
> fraction of the LHS that has a non-matching row" is not one minus
> "the fraction of the LHS that has a matching row".  In fact, in
> this example, *all* LHS rows have both matching and non-matching
> RHS rows.  So the problem is that neqjoinsel is doing something
> that's entirely insane for semijoin cases.
>
> It would not be too hard to convince me that neqjoinsel should
> simply return 1.0 for any semijoin/antijoin case, perhaps with
> some kind of discount for nullfrac.  Whether or not there's an
> equal row, there's almost always going to be non-equal row(s).
> Maybe we can think of a better implementation but that seems
> like the zero-order approximation.

Right.  If I temporarily hack neqjoinsel() thus:

result = 1.0 - result;
+
+   if (jointype == JOIN_SEMI)
+   result = 1.0;
+
PG_RETURN_FLOAT8(result);
 }

... then I obtain sensible row estimates and the following speedups
for TPCH Q21:

  8 workers = 8.3s -> 7.8s
  7 workers = 8.2s -> 7.9s
  6 workers = 8.5s -> 8.2s
  5 workers = 8.9s -> 8.5s
  4 workers = 9.5s -> 9.1s
  3 workers = 39.7s -> 9.9s
  2 workers = 36.9s -> 11.7s
  1 worker  = 38.2s -> 15.0s
  0 workers = 47.9s -> 24.7s

The plan is similar to the good plan from before even at lower worker
counts, but slightly better because the aggregation has been pushed
under the Gather node.  See attached.

-- 
Thomas Munro
http://www.enterprisedb.com

  QUERY PLAN
   
---
 Limit  (cost=2201513.85..2201514.10 rows=100 width=34) (actual 
time=9063.236..9063.250 rows=100 loops=1)
   ->  Sort  (cost=2201513.85..2201538.56 rows=9882 width=34) (actual 
time=9063.234..9063.242 rows=100 loops=1)
 Sort Key: (count(*)) DESC, supplier.s_name
 Sort Method: top-N heapsort  Memory: 38kB
 ->  Finalize GroupAggregate  (cost=2199767.92..2201136.17 rows=9882 
width=34) (actual time=9041.788..9061.662 rows=3945 loops=1)
   Group Key: supplier.s_name
   ->  Gather Merge  (cost=2199767.92..2200987.95 rows=9880 
width=34) (actual time=9041.743..9059.098 rows=17026 loops=1)
 Workers Planned: 4
 Workers Launched: 4
 ->  Partial GroupAggregate  (cost=2198767.86..2198811.09 
rows=2470 width=34) (actual time=9026.843..9029.020 rows=3405 loops=5)
   Group Key: supplier.s_name
   ->  Sort  (cost=2198767.86..2198774.04 rows=2470 
width=26) (actual time=9026.835..9027.507 rows=7781 loops=5)
 Sort Key: supplier.s_name
 Sort Method: quicksort  Memory: 1067kB
 ->  Nested Loop Anti Join  
(cost=558157.24..2198628.67 rows=2470 width=26) (actual time=4200.001..8975.908 
rows=7781 loops=5)
   ->  Hash Join  
(cost=558156.67..2143996.29 rows=2470 width=42) (actual time=4198.642..8624.528 
rows=139326 loops=5)
 Hash Cond: (l1.l_orderkey = 
orders.o_orderkey)
 ->  Nested Loop Semi Join  
(cost=2586.15..1585196.80 rows=199953 width=50) (actual time=14.685..4251.092 
rows=288319 loops=5)
   ->  Hash Join  
(cost=2585.58..1425767.03 rows=199953 width=42) (actual time=14.635..3160.867 
rows=298981 loops=5)
 Hash Cond: 
(l1.l_suppkey = supplier.s_suppkey)
 ->  Parallel Seq Scan 
on lineitem l1  (cost=0.00..1402436.29 rows=4998834 width=16) (actual 
time=0.056..2355.120 rows=7585870 loops=5)
   Filter: 
(l_receiptdate > l_commitdate)
   Rows Removed by 
Filter: 4411341
 ->  Hash  
(cost=2535.58..2535.58 rows=4000 width=30) (actual time=14.470..14.470 
rows=3945 loops=5)
   Buckets: 4096  
Batches: 1  Memory Usage: 279kB
   ->  Nested Loop  
(cost=79.29..2535.58 rows=4000 width=30) (actual time=1.807..13.024 rows=3945 
loops=5)
 ->  Seq 
Scan on nation  (cost=0.00..1.31 rows=1 width=4) (actual 

[HACKERS] Introduce expression initialization hook?

2017-03-17 Thread Andres Freund
Hi,

It looks we're creeping towards agreement in
http://archives.postgresql.org/message-id/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de

It'd be fairly easy to add a hook in its ExecInstantiateExpr(), which'd
allow extensions to provide newer (presumably faster) ways to evaluate
expressions.

I myself have no use for this, I'm determined to get JITed expression
evaluation into core.  But I could see others finding it useful.

Even if the concensus were that we want a hook there, I do not plan to
add it before this is committed, and a few other items are addressed.
But the discussion might take some time, so ...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-17 Thread Tomas Vondra

Hi,

On 03/17/2017 05:23 PM, Peter Eisentraut wrote:

I'm struggling to find a good way to share code between
bt_page_items(text, int4) and bt_page_items(bytea).

If we do it via the SQL route, as I had suggested, it makes the
extension non-relocatable, and it will also create a bit of a mess
during upgrades.

If doing it in C, it will be a bit tricky to pass the SRF context
around.  There is no "DirectFunctionCall within SRF context", AFAICT.



Not sure what it has to do with DirectFunctionCall? You want to call the 
bytea variant from the existing one? Wouldn't it be easier to simply 
define a static function with the shared parts, and pass around the 
fctx/fcinfo? Not quite pretty, but should work.


>

I'm half tempted to just rip out the (text, int4) variants.



Perhaps. I see pageinspect as a tool for ad-hoc investigations, and I 
can't really imagine it being hard-wired into something.


>

In any case, I think we should add bytea variants to all the btree
functions, not just the bt_page_items one.



I agree, but I think we need to find a way to share the code between the 
text/bytea variants. Unless we rip the text ones out, obviously.


Thanks for the work on the patch, BTW.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum support for Hash Index

2017-03-17 Thread Bruce Momjian
On Wed, Mar 15, 2017 at 07:26:45PM -0700, Andres Freund wrote:
> On 2017-03-15 16:31:11 -0400, Robert Haas wrote:
> > On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma  
> > wrote:
> > > Changed as per suggestions. Attached v9 patch. Thanks.
> > 
> > Wow, when do you sleep?
> 
> I think that applies to a bunch of people, including yourself ;)

Gee, no one asks when I sleep.  I wonder why.  ;-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-17 Thread Peter Geoghegan
On Fri, Mar 17, 2017 at 10:12 AM, Amit Khandekar  wrote:
> Yeah, I was in double minds as to whether to do the
> copy-to-array-and-qsort thing, or should just write the same number of
> lines of code to manually do an insertion sort. Actually I was
> searching if we already have a linked list sort, but it seems we don't
> have. Will do the qsort now since it would be faster.

relcache.c does an insertion sort with a list of OIDs. See insert_ordered_oid().


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-17 Thread Peter Eisentraut
On 3/17/17 16:56, Tom Lane wrote:
> Tools could get the segment size out of
> XLogLongPageHeaderData.xlp_seg_size in the first page of the segment.

OK, then pg_standby would have to wait until the file is at least
XLOG_BLCKSZ, then look inside and get the expected final size.  A bit
more complicated than now, but seems doable.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-17 Thread Alvaro Herrera
By the way, it would be very good if you could review some patches, too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings

2017-03-17 Thread Peter Eisentraut
On 3/13/17 11:35, Aleksander Alekseev wrote:
> Here is a new patch. I tried to make as little changes as possible. This
> is no doubt not the most beautiful patch on Earth but it removes all
> warnings. I anyone could suggest an approach that would be significantly
> better please don't hesitate to share your ideas.

I'm also seeing the -Wconstant-conversion warnings with clang-4.0.  The
warnings about strlcpy don't appear here.  That might be something
specific to the operating system.

To address the -Wconstant-conversion warnings, I suggest changing the
variables to unsigned char * as appropriate.

However, this would require a large number of changes to all call sites
of XLogRegisterData(), because they all have a cast like this:

XLogRegisterData((char *) , SizeOfHashMovePageContents);

Perhaps the first argument could be changed to void *.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Two phase commit in ECPG

2017-03-17 Thread Masahiko Sawada
On Fri, Mar 17, 2017 at 5:50 PM, Michael Meskes  wrote:
>> Thank you for pointing out.
>> Yeah, I agree that the twophase regression test should not be
>> performed by default as long as the default value of
>> max_prepared_transactions is 0. Similar to this, the isolation check
>> regression test does same thing. Attached patch removes sql/twophase
>> from schedule file and add new type of regression test.
>
> Would it be possible to include it in "make check"? Any check that
> needs manual interaction will not be executed nearly is often as the
> others and become much less useful imo.
>

Yes. I added two-phase commit test to "make check" test schedule while
adding new two type of test.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


ecpg_prepared_txns_test_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

2017-03-17 Thread Pritam Baral

On Friday 10 March 2017 07:59 PM, Alexander Korotkov wrote:
Hi, Pritam!  > > I've assigned to review this patch. > > On Thu, Feb 23, 2017 at 2:17 AM, Pritam Baral  wrote: > > The topic has been previously discussed[0] on the -performance mailing list, > about four years ago. > > In that thread, Tom suggested[0] the planner could be made to "expand > "intcol <@ > 'x,y'::int4range" into "intcol between x and y", using something similar > to the > index LIKE optimization (ie, the "special operator" stuff in indxpath.c)". > > > That's cool idea.  But I would say more.  Sometimes it's useful to transform "intcol between x and y" into "intcol <@ 'x,y'::int4range".  btree_gin supports "intcol between x and y" as overlap of "intcol >= x" and "intcol <= y".  That is very inefficient.  But it this clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle this very efficient. > > > > This patch tries to do exactly that. It's not tied to any specific datatype, > and has 

been tested with both builtin types and custom range types. Most > of the > checking for proper datatypes, operators, and 
btree index happens before > this > code, so I haven't run into any issues yet in my testing. But I'm not > 
familiar > enough with the internals to be able to confidently say it can handle > all cases > just yet. > > 
> I've tried this patch.  It applies cleanly, but doesn't compile. > > indxpath.c:4252:1: error: conflicting types for 
'range_elem_contained_quals' > range_elem_contained_quals(Node *leftop, Datum rightop) > ^ > indxpath.c:192:14: note: previous 
declaration is here > static List *range_elem_contained_quals(Node *leftop, Oid expr_op, Oid opfamily, >  ^ > Could 
you please recheck that you published right version of patch?

So sorry. I'm attaching the correct version of the original with this,
in case you want to test the limited implementation, because I still
have to go through Tom's list of suggestions.

BTW, the patch is for applying on top of REL9_6_2, and while I
suspect it may work on master too, I haven't tested it since the
original submission (Feb 23).


Also, I noticed that patch haven't regression tests.  Some mention of this optimization in 
docs is also nice to have.  > > -- > Alexander Korotkov > Postgres 
Professional: http://www.postgrespro.com > The Russian Postgres Company


--
#!/usr/bin/env regards
Chhatoi Pritam Baral

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2952bfb7c2..40a3c2c9f4 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -30,21 +30,23 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/predtest.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/var.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
 #include "utils/lsyscache.h"
 #include "utils/pg_locale.h"
+#include "utils/rangetypes.h"
 #include "utils/selfuncs.h"
+#include "utils/typcache.h"
 
 
 #define IsBooleanOpfamily(opfamily) \
 	((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
 
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
 
 /* Whether we are looking for plain indexscan, bitmap scan, or either */
 typedef enum
@@ -180,20 +182,21 @@ static Expr *expand_boolean_index_clause(Node *clause, int indexcol,
 			IndexOptInfo *index);
 static List *expand_indexqual_opclause(RestrictInfo *rinfo,
 		  Oid opfamily, Oid idxcollation);
 static RestrictInfo *expand_indexqual_rowcompare(RestrictInfo *rinfo,
 			IndexOptInfo *index,
 			int indexcol);
 static List *prefix_quals(Node *leftop, Oid opfamily, Oid collation,
 			 Const *prefix, Pattern_Prefix_Status pstatus);
 static List *network_prefix_quals(Node *leftop, Oid expr_op, Oid opfamily,
 	 Datum rightop);
+static List *range_elem_contained_quals(Node *leftop, Datum rightop);
 static Datum string_to_datum(const char *str, Oid datatype);
 static Const *string_to_const(const char *str, Oid datatype);
 
 
 /*
  * create_index_paths()
  *	  Generate all interesting index paths for the given relation.
  *	  Candidate paths are added to the rel's pathlist (using add_path).
  *
  * To be considered for an index scan, an index must match one or more
@@ -3286,20 +3289,23 @@ match_special_index_operator(Expr *clause, Oid opfamily, Oid idxcollation,
 			/* the right-hand const is type text for all of these */
 			pstatus = pattern_fixed_prefix(patt, Pattern_Type_Regex_IC, expr_coll,
 		   , NULL);
 			isIndexable = (pstatus != Pattern_Prefix_None);
 			break;
 
 		case OID_INET_SUB_OP:
 		case OID_INET_SUBEQ_OP:
 			isIndexable = true;
 			break;
+		case OID_RANGE_ELEM_CONTAINED_OP:
+			isIndexable = true;
+			break;
 	}
 
 	if (prefix)
 	

Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

2017-03-17 Thread Pritam Baral

On Sunday 12 March 2017 01:58 AM, Jim Nasby wrote:

On 3/10/17 8:29 AM, Alexander Korotkov wrote:  >> That's cool idea.  But I would say more.  Sometimes it's useful to >> transform "intcol between x and y" into 
"intcol <@ 'x,y'::int4range". >>  btree_gin supports "intcol between x and y" as overlap of "intcol >= x" >> and "intcol <= y".  
That is very inefficient.  But it this clause would >> be transformed into "intcol <@ 'x,y'::int4range", btree_gin could handle >> this very efficient. > > That's 
certainly be nice as well, but IMHO it's outside the scope of this patch to accomplish that.


Also, I think btree indexes are more common than btree_gin. The motivation for
this originally came from trying to use the primary key of a large table in a
range search, and the primary key index was the default btree.

Also, this is my first deep dive into Postgres's source code, so I took a few
easy ways out, just to get started. If it's not too complex to get btree_gin to
handle between queries as contained-in-range, I can give it a try.


 > BTW, while we're wishing for things... Something else that would be nice is 
if there was a way to do these kind of transforms without hacking the backend...


Indeed. And this was one of the things Tom said back when a similar discussion
had happened (on the -performance mailing list). But seeing as how it's been
almost four years since then, I decided to go ahead with the backend hacking
anyway.


 >> Also, I noticed that patch haven't regression tests. > > BTW, those tests 
need to pay special attention to inclusive vs exclusive bounds.


I will add regression tests, though I do have to get through all of Tom's
suggestions elsewhere in this thread first.

--
#!/usr/bin/env regards
Chhatoi Pritam Baral



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

2017-03-17 Thread Tom Lane
I wrote:
> * You're not bothering to insert any inputcollid into the generated
> comparison operator nodes.  I'm not sure why that fails to fall over
> for text comparisons (if indeed it does fail ...) but it's wrong.
> Use the range type's collation there.

Oh ... looking at this again, I realize that there's an additional
validity check missing: if the range type's collation doesn't match
the index column's collation, we can't do this optimization at all.
That check probably belongs in match_special_index_operator.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-17 Thread Tom Lane
Peter Eisentraut  writes:
> On 3/17/17 16:20, Peter Eisentraut wrote:
>> I think we would have to extend restore_command with an additional
>> placeholder that communicates the segment size, and add a new pg_standby
>> option to accept that size somehow.  And specifying the size would have
>> to be mandatory, for complete robustness.  Urgh.

> Another way would be to name the WAL files in a more self-describing
> way.  For example, instead of

Actually, if you're content with having tools obtain this info by
examining the WAL files, we shouldn't need to muck with the WAL naming
convention (which seems like it would be a horrid mess, anyway --- too
much outside code knows that).  Tools could get the segment size out of
XLogLongPageHeaderData.xlp_seg_size in the first page of the segment.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-17 Thread Peter Eisentraut
On 3/17/17 16:20, Peter Eisentraut wrote:
> On 3/16/17 21:10, Robert Haas wrote:
>> The changes to pg_standby seem to completely break the logic to wait
>> until the file has attained the correct size.  I don't know how to
>> salvage that logic off-hand, but just breaking it isn't acceptable.
> 
> I think we would have to extend restore_command with an additional
> placeholder that communicates the segment size, and add a new pg_standby
> option to accept that size somehow.  And specifying the size would have
> to be mandatory, for complete robustness.  Urgh.

Another way would be to name the WAL files in a more self-describing
way.  For example, instead of

00010001
00010002
00010003

name them (for 16 MB)

000101
000102
000103

Then, pg_standby and similar tools can compute the expected file size
from the file name length: 16 ^ (24 - fnamelen)

However, that way you can't actually support 64 MB segments.  The next
jump up would have to be 256 MB (unless you want to go to a base other
than 16).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files

2017-03-17 Thread Pavel Stehule
2017-03-16 22:01 GMT+01:00 Stephen Frost :

> Pavel,
>
> * Pavel Stehule (pavel.steh...@gmail.com) wrote:
> > 2017-03-15 17:21 GMT+01:00 Stephen Frost :
> > > I started looking through this to see if it might be ready to commit
> and
> > > I don't believe it is.  Below are my comments about the first patch, I
> > > didn't get to the point of looking at the others yet since this one had
> > > issues.
> > >
> > > * Pavel Stehule (pavel.steh...@gmail.com) wrote:
> > > > 2017-01-09 17:24 GMT+01:00 Jason O'Donnell :
> > > > > gstore/gbstore:
> > >
> > > I don't see the point to 'gstore'- how is that usefully different from
> > > just using '\g'?  Also, the comments around these are inconsistent,
> some
> > > say they can only be used with a filename, others say it could be a
> > > filename or a pipe+command.
> >
> > \gstore ensure dump row data. It can be replaced by \g with some other
> > setting, but if query is not unique, then the result can be messy. What
> is
> > not possible with \gbstore.
>
> I don't understand what you mean by "the result can be messy."  We have
> lots of options for controlling the output of the query and those can be
> used with \g just fine.  This seems like what you're doing is inventing
> something entirely new which is exactly the same as setting the right
> options which already exist and that seems odd to me.
>
> Is it any different from setting \a and \t and then calling \g?  If not,
> then I don't see why it would be useful to add.
>

I am searching some comfortable way - I agree, it can be redundant to
already available functionality.


>
> > More interesting is \gbstore that uses binary API - it can be used for
> > bytea fields or for XML fields with implicit correct encoding change.
> > \gbstore is not possible to replace by \g.
>
> Yes, having a way to get binary data out using psql and into a file is
> interesting and I agree that we should have that capability.
>
> Further, what I think we will definitely need is a way to get binary
> data out using psql at the command-line too.  We have the -A and -t
> switches which correspond to \a and \t, we should have something for
> this too.  Perhaps what that really calls for is a '\b' and a '-B'
> option to go with it which will flip psql into binary mode, similar to
> the other Formatting options.  I realize it might seem a bit
> counter-intuitive, but I can actually see use-cases for having binary
> data spit out to $PAGER (when you have a $PAGER that handles it
> cleanly, as less does, for example).
>

It is interesting idea. I am not sure if it is more formatting option or
general psql option. But can be interesting for another purposes too.

One idea for import files to postgres via psql

we can introduce \gloadfrom that can replace parameters by files - and this
statement can work in text or in binary mode controlled by proposed option.

some like

insert into foo values('Pavel','Stehule', $1) \gloadfrom ~/avatar.jpg
insert into doc(id, doc) values(default, $1) \gloadfrom ~/mydoc.xml

Regards

Pavel


>
> > > There's a whitespace-only hunk that shouldn't be included.
> > >
> > > I don't agree with the single-column/single-row restriction on these.
> I
> > > can certainly see a case where someone might want to, say, dump out a
> > > bunch of binary integers into a file for later processing.
> > >
> > > The tab-completion for 'gstore' wasn't correct (you didn't include the
> > > double-backslash).  The patch also has conflicts against current master
> > > now.
> > >
> > > I guess my thinking about moving this forward would be to simplify it
> to
> > > just '\gb' which will pull the data from the server side in binary
> > > format and dump it out to the filename or command given.  If there's a
> > > new patch with those changes, I'll try to find time to look at it.
> >
> > ok I'll prepare patch
>
> Great, thanks!
>
> Stephen
>


Re: [HACKERS] Candidate for local inline function?

2017-03-17 Thread Tom Lane
Kevin Grittner  writes:
> Why do we warn of a hazard here instead of eliminating said hazard
> with a static inline function declaration in executor.h?

> /*
>  * ExecEvalExpr was formerly a function containing a switch statement;
>  * now it's just a macro invoking the function pointed to by an ExprState
>  * node.  Beware of double evaluation of the ExprState argument!
>  */
> #define ExecEvalExpr(expr, econtext, isNull) \
> ((*(expr)->evalfunc) (expr, econtext, isNull))

> Should I change that to a static inline function doing exactly what
> the macro does?

No, because that code has only days to live anyway.  You'd just
create a merge hazard for Andres' execQual rewrite.

In practice, I seriously doubt that there are or ever will be any
callers passing volatile expressions to this macro, so that there's
not really much advantage to be gained by assuming that the compiler
is smart about inline functions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Candidate for local inline function?

2017-03-17 Thread Kevin Grittner
On Fri, Mar 17, 2017 at 3:23 PM, Andres Freund  wrote:
> On 2017-03-17 15:17:33 -0500, Kevin Grittner wrote:
>> Why do we warn of a hazard here instead of eliminating said hazard
>> with a static inline function declaration in executor.h?
>
> Presumably because it was written long before we started relying on
> inline functions :/

Right.  git blame says it was changed in 2004.

>> /*
>>  * ExecEvalExpr was formerly a function containing a switch statement;
>>  * now it's just a macro invoking the function pointed to by an ExprState
>>  * node.  Beware of double evaluation of the ExprState argument!
>>  */
>> #define ExecEvalExpr(expr, econtext, isNull) \
>> ((*(expr)->evalfunc) (expr, econtext, isNull))
>>
>> Should I change that to a static inline function doing exactly what
>> the macro does?  In the absence of multiple evaluations of a
>> parameter with side effects, modern versions of gcc have generated
>> the same code for a macro versus a static inline function, at least
>> in the cases I checked.
>
> I'm absolutely not against changing this to an inline function, but I'd
> prefer if that code weren't touched quite right now, there's a large
> pending patch of mine in the area.  If you don't mind, I'll just include
> the change there, rather than have a conflict?

Fine with me.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Candidate for local inline function?

2017-03-17 Thread Andres Freund
Hi Kevin,


On 2017-03-17 15:17:33 -0500, Kevin Grittner wrote:
> Why do we warn of a hazard here instead of eliminating said hazard
> with a static inline function declaration in executor.h?

Presumably because it was written long before we started relying on
inline functions :/


> /*
>  * ExecEvalExpr was formerly a function containing a switch statement;
>  * now it's just a macro invoking the function pointed to by an ExprState
>  * node.  Beware of double evaluation of the ExprState argument!
>  */
> #define ExecEvalExpr(expr, econtext, isNull) \
> ((*(expr)->evalfunc) (expr, econtext, isNull))
> 
> Should I change that to a static inline function doing exactly what
> the macro does?  In the absence of multiple evaluations of a
> parameter with side effects, modern versions of gcc have generated
> the same code for a macro versus a static inline function, at least
> in the cases I checked.

I'm absolutely not against changing this to an inline function, but I'd
prefer if that code weren't touched quite right now, there's a large
pending patch of mine in the area.  If you don't mind, I'll just include
the change there, rather than have a conflict?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-17 Thread Peter Eisentraut
On 3/16/17 21:10, Robert Haas wrote:
> The changes to pg_standby seem to completely break the logic to wait
> until the file has attained the correct size.  I don't know how to
> salvage that logic off-hand, but just breaking it isn't acceptable.

I think we would have to extend restore_command with an additional
placeholder that communicates the segment size, and add a new pg_standby
option to accept that size somehow.  And specifying the size would have
to be mandatory, for complete robustness.  Urgh.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Candidate for local inline function?

2017-03-17 Thread Kevin Grittner
Why do we warn of a hazard here instead of eliminating said hazard
with a static inline function declaration in executor.h?

/*
 * ExecEvalExpr was formerly a function containing a switch statement;
 * now it's just a macro invoking the function pointed to by an ExprState
 * node.  Beware of double evaluation of the ExprState argument!
 */
#define ExecEvalExpr(expr, econtext, isNull) \
((*(expr)->evalfunc) (expr, econtext, isNull))

Should I change that to a static inline function doing exactly what
the macro does?  In the absence of multiple evaluations of a
parameter with side effects, modern versions of gcc have generated
the same code for a macro versus a static inline function, at least
in the cases I checked.

--
Kevin Grittner


Re: [HACKERS] pageinspect and hash indexes

2017-03-17 Thread Tom Lane
Ashutosh Sharma  writes:
> Basically, when we started working on WAL for hash index, we found
> that WAL routine 'XLogReadBufferExtended' does not expect a page to be
> completely zero page else it returns Invalid Buffer. To fix this, we
> started initializing freed overflow page or new bucket pages using
> _hash_pageinit() which basically initialises page header portion but
> not it's special area where page type information is present. That's
> why you are seeing an ERROR saying 'page is not a hash page'. Actually
> pageinspect module needs to handle this type of page. Currently it is
> just handling zero pages but not an empty pages. I will submit a patch
> for this.

That seems like entirely the wrong approach.  You should make the special
space valid, instead, so that tools like pg_filedump can make sense of
the page.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect and hash indexes

2017-03-17 Thread Peter Eisentraut
On 3/17/17 13:24, Jeff Janes wrote:
> And isn't it unhelpful to have the pageinspect module throw errors,
> rather than returning a dummy value to indicate there was an error?

Even if one agreed with that premise, it would be hard to satisfy
reliably, because the user-facing pageinspect functions might reach into
lower-level code to do some of the decoding, which are free to error out.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 17, 2017 at 1:14 PM, Tom Lane  wrote:
>> It would not be too hard to convince me that neqjoinsel should
>> simply return 1.0 for any semijoin/antijoin case, perhaps with
>> some kind of discount for nullfrac.  Whether or not there's an
>> equal row, there's almost always going to be non-equal row(s).
>> Maybe we can think of a better implementation but that seems
>> like the zero-order approximation.

> Yeah, it's not obvious how to do better than that considering only one
> clause at a time.  Of course, what we really want to know is
> P(x<>y|z=t), but don't ask me how to compute that.

Yeah.  Another hole in this solution is that it means that the
estimate for x <> y will be quite different from the estimate
for NOT(x = y).  You wouldn't notice it in the field unless
somebody forgot to put a negator link on their equality operator,
but it seems like ideally we'd think of a solution that made sense
for generic NOT in this context.

No, I have no idea how to do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-17 Thread Tom Lane
I wrote:
> Petr Jelinek  writes:
>> Thanks, now that I look at it again I think we might need to do cycle
>> with the occurred_events and returned_events and not return on first
>> find since theoretically there can be multiple sockets if this gets
>> called directly and not via WaitLatchOrSocket(). I don't have mental
>> power to finish it up right now though, sorry for that.

> I think WaitEventSet is only required to identify at least one reason
> for ending the wait; it is not required to identify all of them.

Also, I see the header comment for the Windows version of
WaitEventSetWaitBlock says it really only ever reports one event anyway.
So there seems no need to work harder than this.  Pushed with cosmetic
adjustments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-17 Thread Tom Lane
Andrew Dunstan  writes:
> On 03/17/2017 12:28 PM, Tom Lane wrote:
>> I'll add some comments and push this.  Does everyone agree that
>> this had better get back-patched, too?

> Confirmed that this fixes the problem on jacana.
> +1 for backpatching.

I've pushed this into 9.6, but the WaitEventSet code doesn't exist before
that.  It might be that we should apply a similar fix to the predecessor
code, but I'm not really excited about it given that we have no indication
that the bug can be hit in the older branches.  Anyway, lacking a Windows
machine to test on, I have no interest in trying to make such a patch
blindly.

> Does that mean we could remove the special logic in pgstat.c?

If you mean the bit near line 3930 about

 * Windows, at least in its Windows Server 2003 R2 incarnation,
 * sometimes loses FD_READ events.  Waking up and retrying the recv()
 * fixes that, so don't sleep indefinitely.  This is a crock of the
 * first water, but until somebody wants to debug exactly what's
 * happening there, this is the best we can do.

that doesn't seem related --- that's about missing FD_READ events not
FD_WRITE.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-17 Thread Andrew Dunstan


On 03/17/2017 12:28 PM, Tom Lane wrote:
> Petr Jelinek  writes:
>> Now the documentation for WSAEventSelect says "The FD_WRITE network
>> event is handled slightly differently. An FD_WRITE network event is
>> recorded when a socket is first connected with a call to the connect,
>> ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
>> when a socket is accepted with accept, AcceptEx, or WSAAccept function
>> and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
>> available. Therefore, an application can assume that sends are possible
>> starting from the first FD_WRITE network event setting and lasting until
>> a send returns WSAEWOULDBLOCK. After such a failure the application will
>> find out that sends are again possible when an FD_WRITE network event is
>> recorded and the associated event object is set."
>> But while PQconnectPoll does connect() before setting connection status
>> to CONNECTION_STARTED and returns  PGRES_POLLING_WRITING so the FD_WRITE
>> eventually happens, it does not do any writes in the code block that
>> switches to  CONNECTION_MADE and PGRES_POLLING_WRITING. That means
>> FD_WRITE event is never recorded as per quoted documentation.
> Ah-hah!  Great detective work.
>
>> Then what remains is why it works in libpq. If you look at
>> pgwin32_select which is eventually called by libpq code, it actually
>> handles the situation by trying empty WSASend on any socket that is
>> supposed to wait for FD_WRITE and only calling the
>> WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
>> the WSASend succeeds it immediately returns ok.
>> So I did something similar in attached for WaitEventSetWaitBlock() and
>> it indeed solves the issue my windows test machine. Now the
>> coding/placement probably isn't the best (and there are no comments),
>> but maybe somebody will find proper place for this now that we know the
>> cause.
> Yeah, I'm afraid we had better do something more or less like this.
> It's interesting to speculate about whether WaitEventSet could keep
> additional state that would avoid the need for a dummy send() every
> time, but that seems like material for new development not a bug fix.
>
> In any case, a quick review of the code suggests that there are no
> performance-critical places where we wait for WL_SOCKET_WRITEABLE
> unless we've already detected that we have to wait for the network;
> so the dummy send() isn't going to do anything except consume a
> few more CPU cycles before we get to the point of blocking.  It may
> not be worth worrying about.
>
> I'll add some comments and push this.  Does everyone agree that
> this had better get back-patched, too?
>
>   



Confirmed that this fixes the problem on jacana.

+1 for backpatching.

Does that mean we could remove the special logic in pgstat.c?

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Remove objname/objargs split for referring to objects

2017-03-17 Thread Peter Eisentraut
On 3/17/17 13:39, Alvaro Herrera wrote:
> Oh yeah, buildfarm member sitella shows that.  Does the attached patch
> fix it? (pretty much the same thing Michael attached, but we also need
> to handle typeoids)

Yes, that makes the warning go away.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect and hash indexes

2017-03-17 Thread Ashutosh Sharma
On Fri, Mar 17, 2017 at 10:54 PM, Jeff Janes  wrote:
> While trying to figure out some bloating in the newly logged hash indexes,
> I'm looking into the type of each page in the index.  But I get an error:
>
> psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x)) from
> generate_series(1650,1650) f(x)"
>
> ERROR:  page is not a hash page
> DETAIL:  Expected ff80, got .
>
> The contents of the page are:
>
> \xa400d8f203bf65c91800f01ff01f0420...
>
> (where the elided characters at the end are all zero)
>
> What kind of page is that actually?

it is basically either a newly allocated bucket page or a freed overflow page.

You are seeing this error because of following change in the WAL patch
for hash index,

Basically, when we started working on WAL for hash index, we found
that WAL routine 'XLogReadBufferExtended' does not expect a page to be
completely zero page else it returns Invalid Buffer. To fix this, we
started initializing freed overflow page or new bucket pages using
_hash_pageinit() which basically initialises page header portion but
not it's special area where page type information is present. That's
why you are seeing an ERROR saying 'page is not a hash page'. Actually
pageinspect module needs to handle this type of page. Currently it is
just handling zero pages but not an empty pages. I will submit a patch
for this.

And isn't it unhelpful to have the
> pageinspect module throw errors, rather than returning a dummy value to
> indicate there was an error?

Well, this is something not specific to hash index. So, I have no answer :)

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com





--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] free space map and visibility map

2017-03-17 Thread Jeff Janes
With some intensive crash-recovery testing, I've run into a situation where
I get some bad table bloat.  There will be large swaths of the table which
are empty (all results from heap_page_items other than lp are either zero
or NULL), but have zero available space in the fsm, and are marked as
all-visible and all-frozen in the vm.

I guess it is a result of a crash causing updates to the fsm to be lost.
Then due to the (crash-recovered) visibility map showing them as all
visible and all frozen, vacuum never touches the pages again, so the fsm
never gets corrected.

'VACUUM (DISABLE_PAGE_SKIPPING) foo;'   does fix it, but that seems to be
the only thing that will.

Is there a way to improve this, short of making updates to the fsm be a
wal-logged operation?

It is probably not a very pressing issue, as crashes are normally pretty
rare, I would hope.  But it seems worth improving if there is a good way to
do so.

Cheers,

Jeff


Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Robert Haas
On Fri, Mar 17, 2017 at 1:14 PM, Tom Lane  wrote:
> After a bit more thought, it seems like the bug here is that "the
> fraction of the LHS that has a non-matching row" is not one minus
> "the fraction of the LHS that has a matching row".  In fact, in
> this example, *all* LHS rows have both matching and non-matching
> RHS rows.  So the problem is that neqjoinsel is doing something
> that's entirely insane for semijoin cases.

Thanks for the analysis.  I had a niggling feeling that there might be
something of this sort going on, but I was not sure.

> It would not be too hard to convince me that neqjoinsel should
> simply return 1.0 for any semijoin/antijoin case, perhaps with
> some kind of discount for nullfrac.  Whether or not there's an
> equal row, there's almost always going to be non-equal row(s).
> Maybe we can think of a better implementation but that seems
> like the zero-order approximation.

Yeah, it's not obvious how to do better than that considering only one
clause at a time.  Of course, what we really want to know is
P(x<>y|z=t), but don't ask me how to compute that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Corey Huinker
>
>
> > \set x 'arg1 arg2'
>
> > \if false
> > \cmd_that_takes_exactly_two_args :x
> > \endif
>
> Yeah, throwing errors for bad arguments would also need to be suppressed.
>
> regards, tom lane
>

Ok, barring other feedback, I'm going to take my marching orders as "make
each slash command active-aware". To keep that sane, I'm probably going to
break out each slash command family into it's own static function.


Re: [HACKERS] merging duplicate definitions of adjust_relid_set

2017-03-17 Thread Robert Haas
On Fri, Mar 17, 2017 at 1:50 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> While reviewing Ashutosh Bapat's partitionwise join code, I noticed
>> he'd run up against the problem that adjust_relid_set() is defined as
>> static in two different source files, and he wanted to call it from a
>> third file.  I didn't much like his solution to that problem, which
>> was to rename one of them and make that definition non-static; I think
>> it would be better to keep the existing name and stop defining it in
>> multiple places.  However, I discovered that there wasn't really an
>> obviously-good place to put the function; neither prepunion.c nor
>> rewriteManip.c, the two files that contain static versions as of now,
>> seem like an appropriate place from which to expose it, and I didn't
>> find anything else that I was wildly in love with, either.  The
>> attached patch puts it in var.c, because it didn't look horrible and I
>> thought it wasn't worth creating a new file just for this.
>
>> Objections, better ideas?
>
> I think it might be better to define it as a fundamental Bitmapset
> operation in nodes/bitmapset.c, along the lines of
>
> Bitmapset *bms_replace_member(const Bitmapset *bms, int member, int repl);
>
> This API would probably require giving up the property of not copying the
> set unless it changes, but I doubt that that's performance critical.

I thought of that, but I wasn't sure it was good in terms of clarity
to replace a function that works in terms of Relids with one that
works in terms of Bitmapset *.  I know they're the same, so it's just
a style thing.

But actually, I think my email was premature.  Actually, his patch
series first changes adjust_relid_set to take different arguments so
that it can do multiple translations at once, and then a later patch
in the series renames it.  So there's actually no need to merge the
definitions, because one of them ends up going away anyway.  So, uh,
never mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix and simplify check for whether we're running as Windows serv

2017-03-17 Thread Tom Lane
Heikki Linnakangas  writes:
> Fix and simplify check for whether we're running as Windows service.

This seems to have broken narwhal:

Creating library file: libpostgres.a
../../src/port/libpgport_srv.a(win32security_srv.o)(.text+0x109): In function 
`pgwin32_is_admin':
C:/msys/1.0/local/pgbuildfarm/buildroot/HEAD/pgsql.build/src/port/win32security.c:77:
 undefined reference to `CheckTokenMembership'
../../src/port/libpgport_srv.a(win32security_srv.o)(.text+0x127):C:/msys/1.0/local/pgbuildfarm/buildroot/HEAD/pgsql.build/src/port/win32security.c:77:
 undefined reference to `CheckTokenMembership'
../../src/port/libpgport_srv.a(win32security_srv.o)(.text+0x264): In function 
`pgwin32_is_service':
C:/msys/1.0/local/pgbuildfarm/buildroot/HEAD/pgsql.build/src/port/win32security.c:138:
 undefined reference to `CheckTokenMembership'
../../src/port/libpgport_srv.a(win32security_srv.o)(.text+0x302):C:/msys/1.0/local/pgbuildfarm/buildroot/HEAD/pgsql.build/src/port/win32security.c:163:
 undefined reference to `CheckTokenMembership'
collect2: ld returned 1 exit status
make[2]: *** [postgres] Error 1

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.

2017-03-17 Thread Elvis Pranskevichus
Hi,

Currently, clients wishing to know when the server exits hot standby
have to resort to polling, which is often suboptimal.

This adds the new "in_hot_standby" GUC variable that is reported via a
ParameterStatus message.  This allows the clients to:

   (a) know right away that they are connected to a server in hot
   standby; and

   (b) know immediately when a server exits hot standby.

This change will be most beneficial to various connection pooling
systems (pgpool etc.)


  Elvis

>From bdf9a409005f0ea209c640935d9827369725e241 Mon Sep 17 00:00:00 2001
From: Elvis Pranskevichus 
Date: Fri, 17 Mar 2017 13:25:08 -0400
Subject: [PATCH v1] Add and report the new "in_hot_standby" GUC
 pseudo-variable.

Currently, clients wishing to know when the server exits hot standby
have to resort to polling, which is suboptimal.

This adds the new "in_hot_standby" GUC variable that is reported via a
ParameterStatus message.  This allows the clients to:

   (a) know right away that they are connected to a server in hot
   standby; and

   (b) know immediately when a server exits hot standby.

This change will be most beneficial to various connection pooling
systems.
---
 doc/src/sgml/high-availability.sgml  |  4 +--
 doc/src/sgml/libpq.sgml  |  1 +
 doc/src/sgml/protocol.sgml   |  1 +
 doc/src/sgml/ref/show.sgml   |  9 +++
 src/backend/access/transam/xlog.c|  4 ++-
 src/backend/commands/async.c | 48 
 src/backend/storage/ipc/procarray.c  | 30 ++
 src/backend/storage/ipc/procsignal.c |  3 +++
 src/backend/storage/ipc/standby.c|  9 +++
 src/backend/tcop/postgres.c  |  6 -
 src/backend/utils/init/postinit.c|  6 +
 src/backend/utils/misc/check_guc | 10 
 src/backend/utils/misc/guc.c | 15 +++
 src/include/commands/async.h |  7 ++
 src/include/storage/procarray.h  |  2 ++
 src/include/storage/procsignal.h |  2 ++
 src/include/storage/standby.h|  1 +
 17 files changed, 149 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index cc84b911b0..44795c5bcc 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1831,8 +1831,8 @@ if (!triggered)

 

-Users will be able to tell whether their session is read-only by
-issuing SHOW transaction_read_only.  In addition, a set of
+Users will be able to tell whether their session is in hot standby mode by
+issuing SHOW in_hot_standby.  In addition, a set of
 functions () allow users to
 access information about the standby server. These allow you to write
 programs that are aware of the current state of the database. These
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 4bc5bf3192..367ec4460d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1706,6 +1706,7 @@ const char *PQparameterStatus(const PGconn *conn, const char *paramName);
server_encoding,
client_encoding,
application_name,
+   in_hot_standby,
is_superuser,
session_authorization,
DateStyle,
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 7c82b48845..0fafaf6788 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1123,6 +1123,7 @@
 server_encoding,
 client_encoding,
 application_name,
+in_hot_standby,
 is_superuser,
 session_authorization,
 DateStyle,
diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml
index 46bb239baf..cf21bd961a 100644
--- a/doc/src/sgml/ref/show.sgml
+++ b/doc/src/sgml/ref/show.sgml
@@ -78,6 +78,15 @@ SHOW ALL

 

+IN_HOT_STANDBY
+
+ 
+  True if the server is in Hot Standby mode.
+ 
+
+   
+
+   
 LC_COLLATE
 
  
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cdb3a8ac1d..acca53b12f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7654,8 +7654,10 @@ StartupXLOG(void)
 	 * Shutdown the recovery environment. This must occur after
 	 * RecoverPreparedTransactions(), see notes for lock_twophase_recover()
 	 */
-	if (standbyState != STANDBY_DISABLED)
+	if (standbyState != STANDBY_DISABLED) {
 		ShutdownRecoveryTransactionEnvironment();
+		SendHotStandbyExitSignal();
+	}
 
 	/* Shut down xlogreader */
 	if (readFile >= 0)
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index e32d7a1d4e..8bc365489a 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -355,6 +355,8 @@ static List *upperPendingNotifies = NIL;		/* list of upper-xact lists */
  */
 volatile sig_atomic_t notifyInterruptPending = false;
 
+volatile sig_atomic_t 

Re: [HACKERS] merging duplicate definitions of adjust_relid_set

2017-03-17 Thread Tom Lane
Robert Haas  writes:
> While reviewing Ashutosh Bapat's partitionwise join code, I noticed
> he'd run up against the problem that adjust_relid_set() is defined as
> static in two different source files, and he wanted to call it from a
> third file.  I didn't much like his solution to that problem, which
> was to rename one of them and make that definition non-static; I think
> it would be better to keep the existing name and stop defining it in
> multiple places.  However, I discovered that there wasn't really an
> obviously-good place to put the function; neither prepunion.c nor
> rewriteManip.c, the two files that contain static versions as of now,
> seem like an appropriate place from which to expose it, and I didn't
> find anything else that I was wildly in love with, either.  The
> attached patch puts it in var.c, because it didn't look horrible and I
> thought it wasn't worth creating a new file just for this.

> Objections, better ideas?

I think it might be better to define it as a fundamental Bitmapset
operation in nodes/bitmapset.c, along the lines of

Bitmapset *bms_replace_member(const Bitmapset *bms, int member, int repl);

This API would probably require giving up the property of not copying the
set unless it changes, but I doubt that that's performance critical.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Remove objname/objargs split for referring to objects

2017-03-17 Thread Alvaro Herrera
Peter Eisentraut wrote:

> cc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
> -Wformat-security -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard -g -O2 -I../../../src/include
> -D_FORTIFY_SOURCE=2 -DLINUX_OOM_ADJ -D_GNU_SOURCE -I/usr/include/libxml2
> -c -o objectaddress.o objectaddress.c
> objectaddress.c: In function ‘get_object_address’:
> objectaddress.c:1650:24: warning: ‘typeoids[1]’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> objectaddress.c:1582:6: note: ‘typeoids[1]’ was declared here
> objectaddress.c:1627:43: warning: ‘typenames[1]’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> objectaddress.c:1581:12: note: ‘typenames[1]’ was declared here
> objectaddress.c:1650:24: warning: ‘typeoids[0]’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> objectaddress.c:1582:6: note: ‘typeoids[0]’ was declared here
> objectaddress.c:1627:43: warning: ‘typenames[0]’ may be used
> uninitialized in this function [-Wmaybe-uninitialized]
> objectaddress.c:1581:12: note: ‘typenames[0]’ was declared here
> 
> cc (Debian 4.7.2-5) 4.7.2

Oh yeah, buildfarm member sitella shows that.  Does the attached patch
fix it? (pretty much the same thing Michael attached, but we also need
to handle typeoids)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/objectaddress.c 
b/src/backend/catalog/objectaddress.c
index 8ccc171..7f73fc0 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -1578,8 +1578,8 @@ get_object_address_opf_member(ObjectType objtype,
ObjectAddress address;
ListCell   *cell;
List   *copy;
-   TypeName   *typenames[2];
-   Oid typeoids[2];
+   TypeName   *typenames[2] = { NULL, NULL };
+   Oid typeoids[2] = { InvalidOid, InvalidOid };
int membernum;
int i;
 
@@ -1607,6 +1607,9 @@ get_object_address_opf_member(ObjectType objtype,
break;
}
 
+   Assert(typenames[0] != NULL && typenames[1] != NULL);
+   Assert(typeoids[0] != InvalidOid && typeoids[1] != InvalidOid);
+
switch (objtype)
{
case OBJECT_AMOP:

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] merging duplicate definitions of adjust_relid_set

2017-03-17 Thread Robert Haas
While reviewing Ashutosh Bapat's partitionwise join code, I noticed
he'd run up against the problem that adjust_relid_set() is defined as
static in two different source files, and he wanted to call it from a
third file.  I didn't much like his solution to that problem, which
was to rename one of them and make that definition non-static; I think
it would be better to keep the existing name and stop defining it in
multiple places.  However, I discovered that there wasn't really an
obviously-good place to put the function; neither prepunion.c nor
rewriteManip.c, the two files that contain static versions as of now,
seem like an appropriate place from which to expose it, and I didn't
find anything else that I was wildly in love with, either.  The
attached patch puts it in var.c, because it didn't look horrible and I
thought it wasn't worth creating a new file just for this.

Objections, better ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


merge-adjust-relid-sets.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Tom Lane
Corey Huinker  writes:
>> In the end, I suspect that teaching all the backslash commands to do
>> nothing after absorbing their arguments is likely to be the least messy
>> way to tackle this, even if it makes for a rather bulky patch.

> Perhaps, but just glancing at \connect makes me think that for some
> commands (present or future) the number of args might depend on the value
> of the first arg, and variable expansion-or-not, backtick execution-or-not
> could alter the number of apparent args on the line, like this:

> \set x 'arg1 arg2'

> \if false
> \cmd_that_takes_exactly_two_args :x
> \endif

Yeah, throwing errors for bad arguments would also need to be suppressed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pageinspect and hash indexes

2017-03-17 Thread Jeff Janes
While trying to figure out some bloating in the newly logged hash indexes,
I'm looking into the type of each page in the index.  But I get an error:

psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x))
from generate_series(1650,1650) f(x)"

ERROR:  page is not a hash page
DETAIL:  Expected ff80, got .

The contents of the page are:

\xa400d8f203bf65c91800f01ff01f0420...

(where the elided characters at the end are all zero)

What kind of page is that actually?  And isn't it unhelpful to have the
pageinspect module throw errors, rather than returning a dummy value to
indicate there was an error?

Thanks,

Jeff


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-17 Thread Tom Lane
Petr Jelinek  writes:
> On 17/03/17 17:28, Tom Lane wrote:
>> Yeah, I'm afraid we had better do something more or less like this.
>> It's interesting to speculate about whether WaitEventSet could keep
>> additional state that would avoid the need for a dummy send() every
>> time, but that seems like material for new development not a bug fix.

> Thanks, now that I look at it again I think we might need to do cycle
> with the occurred_events and returned_events and not return on first
> find since theoretically there can be multiple sockets if this gets
> called directly and not via WaitLatchOrSocket(). I don't have mental
> power to finish it up right now though, sorry for that.

I think WaitEventSet is only required to identify at least one reason
for ending the wait; it is not required to identify all of them.
(Even if it tried to do so, the information could be out of date by
the time it returns.)  So I'm not particularly fussed about that,
even if we had code that waited on write-ready for multiple sockets
which I don't believe we do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-17 Thread Alvaro Herrera
I gave this patch a quick skim.  At first I was confused by the term
"catalog"; I thought it meant we stored options in a system table.  But
that's not what is meant at all; instead, what we do is build these
"catalogs" in memory.  Maybe a different term could have been used, but
I'm not sure it's stricly necessary.  I wouldn't really bother.

I'm confused by the "no ALTER, no lock" rule.  Does it mean that if
"ALTER..SET" is forbidden?  Because I noticed that brin's
pages_per_range is marked as such, but we do allow that option to change
with ALTER..SET, so there's at least one bug there, and I would be
surprised if there aren't any others.

Please make sure to mark functions as static (e.g. bringetreloptcatalog).

Why not pass ->struct_size and ->postprocess_fun (which I'd rename it as
->postprocess_fn, more in line with our naming style) as a parameter to
allocateOptionsCatalog?  Also, to avoid repalloc() in most cases (and to
avoid pallocing more options that you're going to need in a bunch of
cases, perhaps that function should the number of times you expect to
call AddItems for that catalog (since you do it immediately afterwards
in all cases I can see), and allocate that number.  If later another
item arrives, then repalloc using the same code you already have in
AddItems().

Something is wrong with leading whitespace in many places; either you
added too many tabs, or the wrong number spaces; not sure which but
visually it's clearly wrong.  ... Actually there are whitespace-style
violations in several places; please fix using pgindent (after adding
any new typedefs your defining to typedefs.list).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-17 Thread Tom Lane
Andres Freund  writes:
> That said, it seems this is something that has to wait for a later
> release, I'm putting back in similar logic as there was before (not a
> branch, but change the opcode to a non-checking variant).

Yeah, I was wondering if changing the opcode would be preferable to
a first-time flag.

>> So my recommendation is to drop 0001 and include the same one-time
>> checks that execQual.c currently has as out-of-line one-time checks
>> in the new code.  We can revisit that later, but time grows short for
>> v10.  I would much rather have a solid version of 0004 and not 0001,
>> than not have anything for v10 because we spent too much time dealing
>> with adding new dependencies.

> Doing that (+README).

OK.  I believe that we can get this committed after the documentation
problems are sorted.  I noticed a lot of small things that bugged me,
mostly sloppy comments, but I think that the most efficient way to
handle those is for me to make an editorial pass on your next version.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-17 Thread Amit Khandekar
On 16 March 2017 at 18:18, Ashutosh Bapat
 wrote:
> + * Check if we are already finished plans from parallel append. This
> + * can happen if all the subplans are finished when this worker
> + * has not even started returning tuples.
> + */
> +if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
> +return ExecClearTuple(node->ps.ps_ResultTupleSlot);
> From the comment, it looks like this condition will be encountered before the
> backend returns any tuple. But this code is part of the loop which returns the
> tuples. Shouldn't this be outside the loop? Why do we want to check a 
> condition
> for every row returned when the condition can happen only once and that too
> before returning any tuple?

The way ExecProcNode() gets called, there is no different special code
that gets called instead of ExecProcNode() when a tuple is fetched for
the first time. I mean, we cannot prevent ExecProcNode() from getting
called when as_whichplan is invalid right from the beginning.

One thing we can do is : have a special slot in AppenState->as_plan[]
which has some dummy execution node that just returns NULL tuple, and
initially make as_whichplan point to this slot. But I think it is not
worth doing this.

We can instead reduce the if condition to:
if (node->as_whichplan == PA_INVALID_PLAN)
{
Assert(node->as_padesc != NULL);
   return ExecClearTuple(node->ps.ps_ResultTupleSlot);
}

BTW, the loop which you mentioned that returns tuples the loop is
not for returning tuples, the loop is for iterating to the next
subplan. Even if we take the condition out and keep it in the
beginning of ExecAppend, the issue will remain.

>
> Why do we need following code in both ExecAppendInitializeWorker() and
> ExecAppendInitializeDSM()? Both of those things happen before starting the
> actual execution, so one of those should suffice?
> +/* Choose the optimal subplan to be executed. */
> +(void) parallel_append_next(node);

ExecAppendInitializeWorker() is for the worker to attach (and then
initialize its own local data) to the dsm area created and shared by
ExecAppendInitializeDSM() in backend. But both worker and backend
needs to initialize its own as_whichplan to the next subplan.

>
> There is no pa_num_worker now, so probably this should get updated. Per 
> comment
> we should also get rid of SpinLockAcquire() and SpinLockRelease()?
> + *purpose. The spinlock is used so that it does not change the
> + *pa_num_workers field while workers are choosing the next node.
Will do this.

>
> BTW, sa_finished seems to be a misnomor. The plan is not finished yet, but it
> wants no more workers. So, should it be renamed as sa_no_new_workers or
> something like that?

Actually in this context, "finished" means "we are done with this subplan".

>
> In parallel_append_next() we shouldn't need to call goto_next_plan() twice. If
> the plan indicated by pa_next_plan is finished, all the plans must have
> finished. This should be true if we set pa_next_plan to 0 at the time of
> initialization. Any worker picking up pa_next_plan will set it to the next
> valid plan. So the next worker asking for plan should pick pa_next_plan and
> set it to the next one and so on.

The current patch does not call it twice, but I might have overlooked
something. Let me know if I have.

>
> I am wonding whether goto_next_plan() can be simplified as some module
> arithmatic e.g. (whichplan - first_plan)++ % (last_plan - first_plan)
> + first_plan.

Hmm. IMHO it seems too much calculation for just shifting to next array element.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-17 Thread Amit Khandekar
On 17 March 2017 at 01:37, Robert Haas  wrote:
> - You've added a GUC (which is good) but not documented it (which is
> bad) or added it to postgresql.conf.sample (also bad).
>
> - You've used a loop inside a spinlock-protected critical section,
> which is against project policy.  Use an LWLock; define and document a
> new builtin tranche ID.
>
> - The comment for pa_finished claims that it is the number of workers
> executing the subplan, but it's a bool, not a count; I think this
> comment is just out of date.

Yes, agreed. Will fix the above.

>
> - paths_insert_sorted_by_cost() is a hand-coded insertion sort.  Can't
> we find a way to use qsort() for this instead of hand-coding a slower
> algorithm?  I think we could just create an array of the right length,
> stick each path into it from add_paths_to_append_rel, and then qsort()
> the array based on .  Then the result can be
> turned into a list.

Yeah, I was in double minds as to whether to do the
copy-to-array-and-qsort thing, or should just write the same number of
lines of code to manually do an insertion sort. Actually I was
searching if we already have a linked list sort, but it seems we don't
have. Will do the qsort now since it would be faster.

>
> - Maybe the new helper functions in nodeAppend.c could get names
> starting with exec_append_, to match the style of
> exec_append_initialize_next().
>
> - There's a superfluous whitespace change in add_paths_to_append_rel.

Will fix this.

>
> - The substantive changes in add_paths_to_append_rel don't look right
> either.  It's not clear why accumulate_partialappend_subpath is
> getting called even in the non-enable_parallelappend case.  I don't
> think the logic for the case where we're not generating a parallel
> append path needs to change at all.

When accumulate_partialappend_subpath() is called for a childrel with
a partial path, it works just like accumulate_append_subpath() when
enable_parallelappend is false. That's why, for partial child path,
the same function is called irrespective of parallel-append or
non-parallel-append case. May be mentioning this in comments should
suffice here ?

>
> - When parallel append is enabled, I think add_paths_to_append_rel
> should still consider all the same paths that it does today, plus one
> extra.  The new path is a parallel append path where each subpath is
> the cheapest subpath for that childrel, whether partial or
> non-partial.  If !enable_parallelappend, or if all of the cheapest
> subpaths are partial, then skip this.  (If all the cheapest subpaths
> are non-partial, it's still potentially useful.)

In case of all-partial childrels, the paths are *exactly* same as
those that would have been created for enable_parallelappend=off. The
extra path is there for enable_parallelappend=on only when one or more
of the child rels do not have partial paths. Does this make sense ?

> In other words,
> don't skip consideration of parallel append just because you have a
> partial path available for every child rel; it could be

I didn't get this. Are you saying that in the patch it is getting
skipped if enable_parallelappend = off ? I don't think so. For
all-partial child rels, partial append is always created. Only thing
is, in case of enable_parallelappend=off, the Append path is not
parallel_aware, so it executes just like it executes today under
Gather without being parallel-aware.

>
> - I think the way cost_append() works is not right.  What you've got
> assumes that you can just multiply the cost of a partial plan by the
> parallel divisor to recover the total cost, which is not true because
> we don't divide all elements of the plan cost by the parallel divisor
> -- only the ones that seem like they should be divided.

Yes, that was an approximation done. For those subpaths for which
there is no parallel_divsor, we cannot calculate the total cost
considering the number of workers for the subpath. I feel we should
consider the per-subpath parallel_workers somehow. The
Path->total_cost for a partial path is *always* per-worker cost, right
? Just want to confirm this assumption of mine.

> Also, it
> could be smarter about what happens with the costs of non-partial
> paths. I suggest the following algorithm instead.
>
> 1. Add up all the costs of the partial paths.  Those contribute
> directly to the final cost of the Append.  This ignores the fact that
> the Append may escalate the parallel degree, but I think we should
> just ignore that problem for now, because we have no real way of
> knowing what the impact of that is going to be.

I wanted to take into account per-subpath parallel_workers for total
cost of Append. Suppose the partial subpaths have per worker total
costs (3, 3, 3) and their parallel_workers are (2, 8, 4), with 2
Append workers available. So according to what you say, the total cost
is 9. With per-subplan parallel_workers taken into account, total cost
= (3*2 + 3*8 * 3*4)/2 = 21.


Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Tom Lane
I wrote:
> The problem here appears to be that we don't have any MCV list for
> the "twothousand" column (because it has a perfectly flat distribution),
> and the heuristic that eqjoinsel_semi is using for the no-MCVs case
> is falling down badly.

Oh ... wait.  eqjoinsel_semi's charter is to "estimate the fraction of the
LHS relation that has a match".  Well, at least in the given regression
test case, it's satisfying that exactly: they all do.  For instance,
this estimate is dead on:

regression=# explain analyze select * from tenk1 a where exists(select * from 
tenk1 b where a.twothousand = b.twothousand);
 QUERY PLAN 
 

-
 Hash Join  (cost=528.00..1123.50 rows=1 width=244) (actual time=9.902..15.1
02 rows=1 loops=1)
   Hash Cond: (a.twothousand = b.twothousand)

So eqjoinsel_semi is doing exactly what it thinks it's supposed to.

After a bit more thought, it seems like the bug here is that "the
fraction of the LHS that has a non-matching row" is not one minus
"the fraction of the LHS that has a matching row".  In fact, in
this example, *all* LHS rows have both matching and non-matching
RHS rows.  So the problem is that neqjoinsel is doing something
that's entirely insane for semijoin cases.

It would not be too hard to convince me that neqjoinsel should
simply return 1.0 for any semijoin/antijoin case, perhaps with
some kind of discount for nullfrac.  Whether or not there's an
equal row, there's almost always going to be non-equal row(s).
Maybe we can think of a better implementation but that seems
like the zero-order approximation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] reorder tablespaces in basebackup tar stream for backup_label

2017-03-17 Thread Michael Banck
Hi,

Am Freitag, den 17.03.2017, 20:46 +0900 schrieb Michael Paquier:
> On Fri, Mar 17, 2017 at 7:18 PM, Michael Banck
>  wrote:
> > Am Freitag, den 17.03.2017, 10:50 +0900 schrieb Michael Paquier:
> >> The comment block format is incorrect. I would think as well that this
> >> comment should say it is important to have the main tablespace listed
> >> last it includes the WAL segments, and those need to contain all the
> >> latest WAL segments for a consistent backup.
> >
> > How about the attached? The comment now reads as follows:
> >
> > |Add a node for the base directory. If WAL is included, the base
> > |directory has to be last as the WAL files get appended to it. If WAL
> > |is not included, send the base directory first, so that the
> > |backup_label file is the first file to be sent.
> 
> Close enough, still not that:
> https://www.postgresql.org/docs/current/static/source-format.html

Ouch, three times a charm then I guess. I hope the attached is finally
correct, thanks for bearing with me.

> >> FWIW, I have no issue with changing the ordering of backups the way
> >> you are proposing here as long as the comment of this code path is
> >> clear.
> >
> > OK, great, let's see what the committers think then.
> 
> Still that's a minor point, so I am making that ready for committer.

Thanks!


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

From c982aa133ab4d982c1a6a267c67d7ca89811d024 Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Fri, 17 Mar 2017 11:10:53 +0100
Subject: [PATCH] Reorder tablespaces for non-WAL streaming basebackups.

Currently, the main tablespace is the last to be sent, in order for the WAL
files to be appended to it. This makes the backup_label file (which gets
prepended to the main tablespace) inconveniently end up in the middle of the
basebackup stream if other tablespaces are present.

Change this so that the main tablespace is the first to be sent in case
no WAL files are requested, ensuring that backup_label is the first file
in the stream in this case.
---
 src/backend/replication/basebackup.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e3a7ad5..520628f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -233,10 +233,18 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 		else
 			statrelpath = pgstat_stat_directory;
 
-		/* Add a node for the base directory at the end */
+		/*
+		 * Add a node for the base directory. If WAL is included, the base
+		 * directory has to be last as the WAL files get appended to it. If WAL
+		 * is not included, send the base directory first, so that the
+		 * backup_label file is the first file to be sent.
+		 */
 		ti = palloc0(sizeof(tablespaceinfo));
 		ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
-		tablespaces = lappend(tablespaces, ti);
+		if (opt->includewal)
+			tablespaces = lappend(tablespaces, ti);
+		else
+			tablespaces = lcons(ti, tablespaces);
 
 		/* Send tablespace header */
 		SendBackupHeader(tablespaces);
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Tom Lane
Robert Haas  writes:
> The relevant code is in neqsel().  It estimates the fraction of rows
> that will be equal, and then does 1 - that number.  Evidently, the
> query planner thinks that l1.l_suppkey = l2.l_suppkey would almost
> always be true, and therefore l1.l_suppkey <> l2.l_suppkey will almost
> always be false.  I think the presumed selectivity of l1.l_suppkey =
> l2.l_suppkey is being computed by var_eq_non_const(), but I'm a little
> puzzled by that function is managing to produce a selectivity estimate
> of, essentially, 1.

No, I believe it's going through neqjoinsel and thence to eqjoinsel_semi.
This query will have been flattened into a semijoin.

I can reproduce a similarly bad estimate in the regression database:

regression=# explain select * from tenk1 a where exists(select * from tenk1 b 
where a.thousand = b.thousand and a.twothousand <> b.twothousand);
   QUERY PLAN
-
 Hash Semi Join  (cost=583.00..1067.25 rows=1 width=244)
   Hash Cond: (a.thousand = b.thousand)
   Join Filter: (a.twothousand <> b.twothousand)
   ->  Seq Scan on tenk1 a  (cost=0.00..458.00 rows=1 width=244)
   ->  Hash  (cost=458.00..458.00 rows=1 width=8)
 ->  Seq Scan on tenk1 b  (cost=0.00..458.00 rows=1 width=8)
(6 rows)

The problem here appears to be that we don't have any MCV list for
the "twothousand" column (because it has a perfectly flat distribution),
and the heuristic that eqjoinsel_semi is using for the no-MCVs case
is falling down badly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Corey Huinker
>
>
> In the end, I suspect that teaching all the backslash commands to do
> nothing after absorbing their arguments is likely to be the least messy
> way to tackle this, even if it makes for a rather bulky patch.
>
>
Perhaps, but just glancing at \connect makes me think that for some
commands (present or future) the number of args might depend on the value
of the first arg, and variable expansion-or-not, backtick execution-or-not
could alter the number of apparent args on the line, like this:

\set x 'arg1 arg2'

\if false
\cmd_that_takes_exactly_two_args :x
\endif


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Fabien COELHO


Hello Tom,


I also fear that there are corner cases where the behavior would still
be inconsistent.  Consider

\if ...
\set foo `echo \endif should not appear here`



In this instance, ISTM that there is no problem. On "\if true", set is
executed, all is well. On "\if false", the whole line would be skipped
because the if-related commands are only expected on their own line, all
is well again. No problem.


AFAICS, you misunderstood the example completely, or else you're proposing
syntax restrictions that are even more bizarre and unintelligible than
I thought before.


Hmmm. The example you put forward does work as expected with the rule I 
suggested. It does not prove that the rules are good or sane, I'm just 
stating that the example would work consistently.


We cannot have a situation where the syntax rules for backslash commands 
inside an \if are fundamentally different from what they are elsewhere;


Indeed, I do not see an issue with requiring some new backslash commands 
to be on their own line: Any average programmer would put them like that 
anyway for readability. What is the point of trying to write code to 
handle strange unmaintainable oneliners?



that's just going to lead to confusion and bug reports.


Whatever is done, there will be some confusion and bug reports:-)

If someone writes a strange one-liner and see that it generates errors, 
then the error messages should be clear enough. Maybe they will complain 
and fill in bugs because they like backslash-command oneliners. That is 
life.


Now you are the committer and Corey is the developer. I'm just a reviewer 
trying to help. I can still review a larger patch which tries to be subtly 
compatible with a lack of previous clear design by adding code complexity, 
even if I think that this particular effort is a bad idea (i.e. mis-spent 
resource on a useless sub-feature which makes future maintenance harder). 
With some luck, Corey may find a way of doing it which is not too bad.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-17 Thread Petr Jelinek
On 17/03/17 17:28, Tom Lane wrote:
> Petr Jelinek  writes:
>> Now the documentation for WSAEventSelect says "The FD_WRITE network
>> event is handled slightly differently. An FD_WRITE network event is
>> recorded when a socket is first connected with a call to the connect,
>> ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
>> when a socket is accepted with accept, AcceptEx, or WSAAccept function
>> and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
>> available. Therefore, an application can assume that sends are possible
>> starting from the first FD_WRITE network event setting and lasting until
>> a send returns WSAEWOULDBLOCK. After such a failure the application will
>> find out that sends are again possible when an FD_WRITE network event is
>> recorded and the associated event object is set."
> 
>> But while PQconnectPoll does connect() before setting connection status
>> to CONNECTION_STARTED and returns  PGRES_POLLING_WRITING so the FD_WRITE
>> eventually happens, it does not do any writes in the code block that
>> switches to  CONNECTION_MADE and PGRES_POLLING_WRITING. That means
>> FD_WRITE event is never recorded as per quoted documentation.
> 
> Ah-hah!  Great detective work.
> 
>> Then what remains is why it works in libpq. If you look at
>> pgwin32_select which is eventually called by libpq code, it actually
>> handles the situation by trying empty WSASend on any socket that is
>> supposed to wait for FD_WRITE and only calling the
>> WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
>> the WSASend succeeds it immediately returns ok.
> 
>> So I did something similar in attached for WaitEventSetWaitBlock() and
>> it indeed solves the issue my windows test machine. Now the
>> coding/placement probably isn't the best (and there are no comments),
>> but maybe somebody will find proper place for this now that we know the
>> cause.
> 
> Yeah, I'm afraid we had better do something more or less like this.
> It's interesting to speculate about whether WaitEventSet could keep
> additional state that would avoid the need for a dummy send() every
> time, but that seems like material for new development not a bug fix.
> 
> In any case, a quick review of the code suggests that there are no
> performance-critical places where we wait for WL_SOCKET_WRITEABLE
> unless we've already detected that we have to wait for the network;
> so the dummy send() isn't going to do anything except consume a
> few more CPU cycles before we get to the point of blocking.  It may
> not be worth worrying about.
> 

Thanks, now that I look at it again I think we might need to do cycle
with the occurred_events and returned_events and not return on first
find since theoretically there can be multiple sockets if this gets
called directly and not via WaitLatchOrSocket(). I don't have mental
power to finish it up right now though, sorry for that.

> I'll add some comments and push this.  Does everyone agree that
> this had better get back-patched, too?
> 

+1

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Tom Lane
Corey Huinker  writes:
> I think Fabien was arguing that inside a false block there would be no
> syntax rules beyond "is the first non-space character on this line a '\'
> and if so is it followed with a if/elif/else/endif?". If the answer is no,
> skip the line. To me that seems somewhat similar to Tom's suggestion that a
> false branch just keeps consuming text until it encounters a \conditional
> or EOF.

Hmm.  If we can keep the syntax requirements down to "\if and friends
must be the first backslash command on the line", and not change the
apparent behavior for any other command type, it probably would be okay
from the user's standpoint.  I'm not really convinced that this approach
will accomplish that, though, and especially not that it will do so
without injecting some ugliness into the core lexer.

In the end, I suspect that teaching all the backslash commands to do
nothing after absorbing their arguments is likely to be the least messy
way to tackle this, even if it makes for a rather bulky patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-17 Thread Andres Freund
Hi,

On 2017-03-17 11:36:30 -0400, Tom Lane wrote:
> Having said all that, I think that 0001 is contributing very little to the
> goals of this patch set.  Andres stated that he wanted it so as to drop
> some of the one-time checks that execQual.c currently does for Vars, but
> I'm not really convinced that we could do that safely even with these
> additional dependencies in place.  Moreover, I really doubt that there's
> a horrible performance cost from including something like
> 
>   if (unlikely(op->first_execution))
>   out_of_line_checking_subroutine(...);

> in the execution code for Vars.

But it actually does (well, for some relatively small value of horrible)
The issue is that op->first_execution is actually badly predictable,
because it will be set back/forth between executions of different
expressions (in the same plantree).  Obviously you'll not notice if you
have a Var and then some expensive stuff, but it's noticeable for
cheap-ish expressions (say e.g. a single Var).  So the branch prediction
often doesn't handle this gracefully - it also just expands the set of
to-be-tracked jumps.

If we had a decent way to actually check this during ExecInitExpr() (et
al), the whole discussion would be different - I'd be all expanding the
set of such checks even.  But I couldn't find a decent way to get there
- when expressions are initialized we don't even get an ExprContext (not
to speak of valid slots), nor is parent->plan very helpful.


That said, it seems this is something that has to wait for a later
release, I'm putting back in similar logic as there was before (not a
branch, but change the opcode to a non-checking variant).


> And that certainly isn't adding any
> complexity for JIT compilation that we don't face anyway for other
> execution step types.

Obviously a if (op->first_execution) isn't an issue, it's actually only
doing the first time through that's not easily possible.



> So my recommendation is to drop 0001 and include the same one-time
> checks that execQual.c currently has as out-of-line one-time checks
> in the new code.  We can revisit that later, but time grows short for
> v10.  I would much rather have a solid version of 0004 and not 0001,
> than not have anything for v10 because we spent too much time dealing
> with adding new dependencies.

Doing that (+README).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-17 Thread Tom Lane
Petr Jelinek  writes:
> Now the documentation for WSAEventSelect says "The FD_WRITE network
> event is handled slightly differently. An FD_WRITE network event is
> recorded when a socket is first connected with a call to the connect,
> ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
> when a socket is accepted with accept, AcceptEx, or WSAAccept function
> and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
> available. Therefore, an application can assume that sends are possible
> starting from the first FD_WRITE network event setting and lasting until
> a send returns WSAEWOULDBLOCK. After such a failure the application will
> find out that sends are again possible when an FD_WRITE network event is
> recorded and the associated event object is set."

> But while PQconnectPoll does connect() before setting connection status
> to CONNECTION_STARTED and returns  PGRES_POLLING_WRITING so the FD_WRITE
> eventually happens, it does not do any writes in the code block that
> switches to  CONNECTION_MADE and PGRES_POLLING_WRITING. That means
> FD_WRITE event is never recorded as per quoted documentation.

Ah-hah!  Great detective work.

> Then what remains is why it works in libpq. If you look at
> pgwin32_select which is eventually called by libpq code, it actually
> handles the situation by trying empty WSASend on any socket that is
> supposed to wait for FD_WRITE and only calling the
> WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
> the WSASend succeeds it immediately returns ok.

> So I did something similar in attached for WaitEventSetWaitBlock() and
> it indeed solves the issue my windows test machine. Now the
> coding/placement probably isn't the best (and there are no comments),
> but maybe somebody will find proper place for this now that we know the
> cause.

Yeah, I'm afraid we had better do something more or less like this.
It's interesting to speculate about whether WaitEventSet could keep
additional state that would avoid the need for a dummy send() every
time, but that seems like material for new development not a bug fix.

In any case, a quick review of the code suggests that there are no
performance-critical places where we wait for WL_SOCKET_WRITEABLE
unless we've already detected that we have to wait for the network;
so the dummy send() isn't going to do anything except consume a
few more CPU cycles before we get to the point of blocking.  It may
not be worth worrying about.

I'll add some comments and push this.  Does everyone agree that
this had better get back-patched, too?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Remove objname/objargs split for referring to objects

2017-03-17 Thread Peter Eisentraut
On 3/17/17 11:06, Alvaro Herrera wrote:
> I don't get anything with my compiler (gcc (Debian 4.9.2-10) 4.9.2) and
> I don't see anything in the few logs I looked at (clang, 9.6) from
> buildfarm either.  What are you seeing specifically?

cc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -g -O2 -I../../../src/include
-D_FORTIFY_SOURCE=2 -DLINUX_OOM_ADJ -D_GNU_SOURCE -I/usr/include/libxml2
-c -o objectaddress.o objectaddress.c
objectaddress.c: In function ‘get_object_address’:
objectaddress.c:1650:24: warning: ‘typeoids[1]’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
objectaddress.c:1582:6: note: ‘typeoids[1]’ was declared here
objectaddress.c:1627:43: warning: ‘typenames[1]’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
objectaddress.c:1581:12: note: ‘typenames[1]’ was declared here
objectaddress.c:1650:24: warning: ‘typeoids[0]’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
objectaddress.c:1582:6: note: ‘typeoids[0]’ was declared here
objectaddress.c:1627:43: warning: ‘typenames[0]’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
objectaddress.c:1581:12: note: ‘typenames[0]’ was declared here

cc (Debian 4.7.2-5) 4.7.2

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-17 Thread Peter Eisentraut
I'm struggling to find a good way to share code between
bt_page_items(text, int4) and bt_page_items(bytea).

If we do it via the SQL route, as I had suggested, it makes the
extension non-relocatable, and it will also create a bit of a mess
during upgrades.

If doing it in C, it will be a bit tricky to pass the SRF context
around.  There is no "DirectFunctionCall within SRF context", AFAICT.

I'm half tempted to just rip out the (text, int4) variants.

In any case, I think we should add bytea variants to all the btree
functions, not just the bt_page_items one.

Attached is the remaining patch after the previous commits.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 203fa8aeb22d56e2d24ba4a5cf0e48ab5253a212 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 17 Mar 2017 12:17:29 -0400
Subject: [PATCH v3] pageinspect: Add bt_page_items function with bytea
 argument

XXX WIP

Author: Tomas Vondra 
Reviewed-by: Ashutosh Sharma 
---
 contrib/pageinspect/btreefuncs.c  | 126 ++
 contrib/pageinspect/expected/btree.out|  13 +++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  14 +++
 contrib/pageinspect/sql/btree.sql |   4 +
 doc/src/sgml/pageinspect.sgml |  31 +++
 src/include/access/nbtree.h   |   1 +
 6 files changed, 189 insertions(+)

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 6f35e288fd..36185db19a 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -41,6 +41,7 @@
 
 PG_FUNCTION_INFO_V1(bt_metap);
 PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_items_bytea);
 PG_FUNCTION_INFO_V1(bt_page_stats);
 
 #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -396,6 +397,131 @@ bt_page_items(PG_FUNCTION_ARGS)
 	}
 }
 
+/*---
+ * bt_page_items_bytea()
+ *
+ * Get IndexTupleData set in a btree page
+ *
+ * Usage: SELECT * FROM bt_page_items(get_raw_page('t1_pkey', 1));
+ *---
+ */
+
+Datum
+bt_page_items_bytea(PG_FUNCTION_ARGS)
+{
+	bytea	   *raw_page = PG_GETARG_BYTEA_P(0);
+	Datum		result;
+	char	   *values[6];
+	HeapTuple	tuple;
+	FuncCallContext *fctx;
+	struct user_args *uargs;
+	int			raw_page_size;
+
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use pageinspect functions";
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		BTPageOpaque opaque;
+		MemoryContext mctx;
+		TupleDesc	tupleDesc;
+
+		raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+		if (raw_page_size < SizeOfPageHeaderData)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg("input page too small (%d bytes)", raw_page_size)));
+
+		fctx = SRF_FIRSTCALL_INIT();
+		mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+		uargs = palloc(sizeof(struct user_args));
+
+		uargs->page = VARDATA(raw_page);
+
+		uargs->offset = FirstOffsetNumber;
+
+		opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+		if (P_ISMETA(opaque))
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg("block is a meta page")));
+
+		if (P_ISDELETED(opaque))
+			elog(NOTICE, "page is deleted");
+
+		fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+
+		/* Build a tuple descriptor for our result type */
+		if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
+			elog(ERROR, "return type must be a row type");
+
+		fctx->attinmeta = TupleDescGetAttInMetadata(tupleDesc);
+
+		fctx->user_fctx = uargs;
+
+		MemoryContextSwitchTo(mctx);
+	}
+
+	fctx = SRF_PERCALL_SETUP();
+	uargs = fctx->user_fctx;
+
+	if (fctx->call_cntr < fctx->max_calls)
+	{
+		ItemId		id;
+		IndexTuple	itup;
+		int			j;
+		int			off;
+		int			dlen;
+		char	   *dump;
+		char	   *ptr;
+
+		id = PageGetItemId(uargs->page, uargs->offset);
+
+		if (!ItemIdIsValid(id))
+			elog(ERROR, "invalid ItemId");
+
+		itup = (IndexTuple) PageGetItem(uargs->page, id);
+
+		j = 0;
+		values[j++] = psprintf("%d", uargs->offset);
+		values[j++] = psprintf("(%u,%u)",
+			   BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)),
+			   itup->t_tid.ip_posid);
+		values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
+		values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+		values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+		ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+		dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+		dump = palloc0(dlen * 3 + 1);
+		values[j] = dump;
+		for (off = 0; off < dlen; off++)
+		{
+			if (off > 0)
+*dump++ = ' ';
+			sprintf(dump, "%02x", *(ptr + off) & 0xff);
+			dump += 2;
+		}
+
+		tuple = 

Re: [HACKERS] QGIS Seem To Bypass PostgreSQL/PostGIS User Privileges/Permissions

2017-03-17 Thread Osahon Oduware
Hi John,

Thanks for your response. From my experience as a Software Developer, I
don't think QGIS can logon to my database/schema/table with a dedicated
user as they need authentication to do so. What you described about using a
dedicated user in applications is only possible because that user was
created in the database server and granted necessary privileges/permissions.

On Fri, Mar 17, 2017 at 5:09 PM, John Scalia  wrote:

> While I do not know QGIS, I'm wondering if it's similar to some of our
> applications where they always use the same system login for the database
> while each user provides a unique login to the application. Have you ever
> set log_connections in your postgresql.conf file? That would show you which
> user is connecting during your attempts, and they might very well be
> something you're not expecting. As far as I know, there is no way for any
> application to bypass PostgreSQL's internal security model.
>
> On Fri, Mar 17, 2017 at 11:43 AM, Osahon Oduware 
> wrote:
>
>> Hi Giuseppe,
>>
>> Thanks for the response. I have provided the GRANTS and other PostgreSQL
>> setup scripts below as it includes what you have suggested:
>>
>> ROLE
>> -
>> CREATE ROLE  WITH NOLOGIN NOSUPERUSER INHERIT NOCREATEDB
>> NOCREATEROLE NOREPLICATION;
>>
>> USER
>> --
>> CREATE USER  WITH PASSWORD ''
>>
>> REVOKES
>> 
>> REVOKE ALL PRIVILEGES ON DATABASE  FROM PUBLIC;
>> REVOKE ALL PRIVILEGES ON DATABASE postgres FROM PUBLIC;
>>
>> GRANTS
>> -
>> GRANT CONNECT ON DATABASE  TO ;
>> GRANT USAGE ON SCHEMA  TO ;
>> GRANT SELECT ON ALL TABLES IN SCHEMA  TO ;
>> GRANT  TO ;
>>
>> I cannot but wonder why these privileges are working when tested in
>> pgAdmin/pgsql, but not in QGIS with the same user/schema/table/database.
>>
>> On Fri, Mar 17, 2017 at 4:16 PM, Giuseppe Broccolo <
>> giuseppe.brocc...@2ndquadrant.it> wrote:
>>
>>> Hi Osahon,
>>>
>>> 2017-03-17 15:54 GMT+01:00 Osahon Oduware :
>>>
 Hi All,

 I created a "Read-only" User in PostgreSQL via a Role with "SELECT"
 ONLY privilege on all tables in a schema as shown below:

 GRANT SELECT ON ALL TABLES IN SCHEMA [schema_name] TO [role_name]
 GRANT [role_name] TO [user_name]

>>>
>>> I'd have done this as followed:
>>>
>>> REVOKE ALL ON SCHEMA [schema_name] FROM PUBLIC;
>>> GRANT USAGE ON SCHEMA [schema_name] TO [role_name];
>>> GRANT SELECT ON ALL TABLES IN SCHEMA [schema_name] TO [role_name];
>>> GRANT [role_name] TO [user_name];
>>>
>>>

 Next, I test this by trying to UPDATE a column in a table (same schema
 as above) with pgAdmin/psql and this works fine by giving a response that
 the user has no permission - 'ERROR: permission denied for relation
 .'

 Next, I connect with the same user in QGIS and add a layer from the
 same table (same schema as above). I open the attribute table for the
 layer, turn on editing mode (by clicking on the pencil-like icon), and edit
 the same field/column above. To my surprise, the edit was saved
 successfully without any permission error prompt.

 Next, I check the value of the field/column (same table/schema as
 above) in pgAdmin/psql and it is having the new (edited) value from QGIS.
 This is rather strange as it seems QGIS is bypassing the permissions set
 for the same user in the PostgreSQL/PostGIS database.

 I will be glad if someone can help me unravel this mystery.

>>>
>>> Check which user is used the first time you connect to the database
>>> through QGIS, and if you switch the user to [user_name] in a second moment.
>>> I'm wondering if you are keeping some privileges from a previous session.
>>>
>>> All the best,
>>> Giuseppe.
>>>
>>> --
>>> Giuseppe Broccolo - 2ndQuadrant Italy
>>> PostgreSQL & PostGIS Training, Services and Support
>>> giuseppe.brocc...@2ndquadrant.it | www.2ndQuadrant.it
>>>
>>
>>
>


Re: [HACKERS] <> join selectivity estimate question

2017-03-17 Thread Robert Haas
On Fri, Mar 17, 2017 at 1:54 AM, Thomas Munro
 wrote:
>  SELECT *
>FROM lineitem l1
>   WHERE EXISTS (SELECT *
>   FROM lineitem l2
>  WHERE l1.l_orderkey = l2.l_orderkey);
>
>  -> estimates 59986012 rows, actual rows 59,986,052 (scale 10 TPCH)
>
>  SELECT *
>FROM lineitem l1
>   WHERE EXISTS (SELECT *
>   FROM lineitem l2
>  WHERE l1.l_orderkey = l2.l_orderkey
>AND l1.l_suppkey <> l2.l_suppkey);
>
>  -> estimates 1 row, actual rows 57,842,090 (scale 10 TPCH)

The relevant code is in neqsel().  It estimates the fraction of rows
that will be equal, and then does 1 - that number.  Evidently, the
query planner thinks that l1.l_suppkey = l2.l_suppkey would almost
always be true, and therefore l1.l_suppkey <> l2.l_suppkey will almost
always be false.  I think the presumed selectivity of l1.l_suppkey =
l2.l_suppkey is being computed by var_eq_non_const(), but I'm a little
puzzled by that function is managing to produce a selectivity estimate
of, essentially, 1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] QGIS Seem To Bypass PostgreSQL/PostGIS User Privileges/Permissions

2017-03-17 Thread Giuseppe Broccolo
Hi all,

2017-03-17 17:09 GMT+01:00 John Scalia :

> While I do not know QGIS, I'm wondering if it's similar to some of our
> applications where they always use the same system login for the database
> while each user provides a unique login to the application. Have you ever
> set log_connections in your postgresql.conf file? That would show you which
> user is connecting during your attempts, and they might very well be
> something you're not expecting. As far as I know, there is no way for any
> application to bypass PostgreSQL's internal security model.
>

Exactly, so I repost here my initial question:

On Fri, Mar 17, 2017 at 4:16 PM, Giuseppe Broccolo <
> giuseppe.brocc...@2ndquadrant.it> wrote:
>
>>
>>
>> Check which user is used the first time you connect to the database
>> through QGIS, and if you switch the user to [user_name] in a second moment.
>> I'm wondering if you are keeping some privileges from a previous session.
>>
>
Regards,
Giuseppe.

-- 
Giuseppe Broccolo - 2ndQuadrant Italy
PostgreSQL & PostGIS Training, Services and Support
giuseppe.brocc...@2ndquadrant.it | www.2ndQuadrant.it


Re: [HACKERS] QGIS Seem To Bypass PostgreSQL/PostGIS User Privileges/Permissions

2017-03-17 Thread John Scalia
While I do not know QGIS, I'm wondering if it's similar to some of our
applications where they always use the same system login for the database
while each user provides a unique login to the application. Have you ever
set log_connections in your postgresql.conf file? That would show you which
user is connecting during your attempts, and they might very well be
something you're not expecting. As far as I know, there is no way for any
application to bypass PostgreSQL's internal security model.

On Fri, Mar 17, 2017 at 11:43 AM, Osahon Oduware 
wrote:

> Hi Giuseppe,
>
> Thanks for the response. I have provided the GRANTS and other PostgreSQL
> setup scripts below as it includes what you have suggested:
>
> ROLE
> -
> CREATE ROLE  WITH NOLOGIN NOSUPERUSER INHERIT NOCREATEDB
> NOCREATEROLE NOREPLICATION;
>
> USER
> --
> CREATE USER  WITH PASSWORD ''
>
> REVOKES
> 
> REVOKE ALL PRIVILEGES ON DATABASE  FROM PUBLIC;
> REVOKE ALL PRIVILEGES ON DATABASE postgres FROM PUBLIC;
>
> GRANTS
> -
> GRANT CONNECT ON DATABASE  TO ;
> GRANT USAGE ON SCHEMA  TO ;
> GRANT SELECT ON ALL TABLES IN SCHEMA  TO ;
> GRANT  TO ;
>
> I cannot but wonder why these privileges are working when tested in
> pgAdmin/pgsql, but not in QGIS with the same user/schema/table/database.
>
> On Fri, Mar 17, 2017 at 4:16 PM, Giuseppe Broccolo  2ndquadrant.it> wrote:
>
>> Hi Osahon,
>>
>> 2017-03-17 15:54 GMT+01:00 Osahon Oduware :
>>
>>> Hi All,
>>>
>>> I created a "Read-only" User in PostgreSQL via a Role with "SELECT" ONLY
>>> privilege on all tables in a schema as shown below:
>>>
>>> GRANT SELECT ON ALL TABLES IN SCHEMA [schema_name] TO [role_name]
>>> GRANT [role_name] TO [user_name]
>>>
>>
>> I'd have done this as followed:
>>
>> REVOKE ALL ON SCHEMA [schema_name] FROM PUBLIC;
>> GRANT USAGE ON SCHEMA [schema_name] TO [role_name];
>> GRANT SELECT ON ALL TABLES IN SCHEMA [schema_name] TO [role_name];
>> GRANT [role_name] TO [user_name];
>>
>>
>>>
>>> Next, I test this by trying to UPDATE a column in a table (same schema
>>> as above) with pgAdmin/psql and this works fine by giving a response that
>>> the user has no permission - 'ERROR: permission denied for relation
>>> .'
>>>
>>> Next, I connect with the same user in QGIS and add a layer from the same
>>> table (same schema as above). I open the attribute table for the layer,
>>> turn on editing mode (by clicking on the pencil-like icon), and edit the
>>> same field/column above. To my surprise, the edit was saved successfully
>>> without any permission error prompt.
>>>
>>> Next, I check the value of the field/column (same table/schema as above)
>>> in pgAdmin/psql and it is having the new (edited) value from QGIS. This is
>>> rather strange as it seems QGIS is bypassing the permissions set for the
>>> same user in the PostgreSQL/PostGIS database.
>>>
>>> I will be glad if someone can help me unravel this mystery.
>>>
>>
>> Check which user is used the first time you connect to the database
>> through QGIS, and if you switch the user to [user_name] in a second moment.
>> I'm wondering if you are keeping some privileges from a previous session.
>>
>> All the best,
>> Giuseppe.
>>
>> --
>> Giuseppe Broccolo - 2ndQuadrant Italy
>> PostgreSQL & PostGIS Training, Services and Support
>> giuseppe.brocc...@2ndquadrant.it | www.2ndQuadrant.it
>>
>
>


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Corey Huinker
On Fri, Mar 17, 2017 at 11:42 AM, Tom Lane  wrote:

> Fabien COELHO  writes:
> >> I also fear that there are corner cases where the behavior would still
> >> be inconsistent.  Consider
> >>
> >> \if ...
> >> \set foo `echo \endif should not appear here`
>
> > In this instance, ISTM that there is no problem. On "\if true", set is
> > executed, all is well. On "\if false", the whole line would be skipped
> > because the if-related commands are only expected on their own line, all
> > is well again. No problem.
>
> AFAICS, you misunderstood the example completely, or else you're proposing
> syntax restrictions that are even more bizarre and unintelligible than
> I thought before.  We cannot have a situation where the syntax rules for
> backslash commands inside an \if are fundamentally different from what
> they are elsewhere; that's just going to lead to confusion and bug
> reports.
>
> regards, tom lane
>

I think Fabien was arguing that inside a false block there would be no
syntax rules beyond "is the first non-space character on this line a '\'
and if so is it followed with a if/elif/else/endif?". If the answer is no,
skip the line. To me that seems somewhat similar to Tom's suggestion that a
false branch just keeps consuming text until it encounters a \conditional
or EOF.


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-17 Thread Robert Haas
On Thu, Mar 16, 2017 at 10:34 PM, Craig Ringer  wrote:
> On 17 March 2017 at 08:10, Stas Kelvich  wrote:
>> While working on this i’ve spotted quite a nasty corner case with aborted 
>> prepared
>> transaction. I have some not that great ideas how to fix it, but maybe i 
>> blurred my
>> view and missed something. So want to ask here at first.
>>
>> Suppose we created a table, then in 2pc tx we are altering it and after that 
>> aborting tx.
>> So pg_class will have something like this:
>>
>> xmin | xmax | relname
>> 100  | 200| mytable
>> 200  | 0| mytable
>>
>> After previous abort, tuple (100,200,mytable) becomes visible and if we will 
>> alter table
>> again then xmax of first tuple will be set current xid, resulting in 
>> following table:
>>
>> xmin | xmax | relname
>> 100  | 300| mytable
>> 200  | 0| mytable
>> 300  | 0| mytable
>>
>> In that moment we’ve lost information that first tuple was deleted by our 
>> prepared tx.
>
> Right. And while the prepared xact has aborted, we don't control when
> it aborts and when those overwrites can start happening. We can and
> should check if a 2pc xact is aborted before we start decoding it so
> we can skip decoding it if it's already aborted, but it could be
> aborted *while* we're decoding it, then have data needed for its
> snapshot clobbered.
>
> This hasn't mattered in the past because prepared xacts (and
> especially aborted 2pc xacts) have never needed snapshots, we've never
> needed to do something from the perspective of a prepared xact.
>
> I think we'll probably need to lock the 2PC xact so it cannot be
> aborted or committed while we're decoding it, until we finish decoding
> it. So we lock it, then check if it's already aborted/already
> committed/in progress. If it's aborted, treat it like any normal
> aborted xact. If it's committed, treat it like any normal committed
> xact. If it's in progress, keep the lock and decode it.

But that lock could need to be held for an unbounded period of time -
as long as decoding takes to complete - which seems pretty
undesirable.  Worse still, the same problem will arise if you
eventually want to start decoding ordinary, non-2PC transactions that
haven't committed yet, which I think is something we definitely want
to do eventually; the current handling of bulk loads or bulk updates
leads to significant latency.  You're not going to be able to tell an
active transaction that it isn't allowed to abort until you get done
with it, and I don't really think you should be allowed to lock out
2PC aborts for long periods of time either.  That's going to stink for
users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding the optional clause 'AS' in CREATE TRIGGER

2017-03-17 Thread Surafel Temesgen
>
>
> I am sending the review of this patch

I found the following

v  Use  tage in documentation

v  Don’t modified existing test case add new one instead

v  Comment in pg_constraint.c is extended make it short

v  Error message can be more guider if it tells about general rule

v  Wrong result in concurrency case

   Step to produce the result

1.   build with with --enable-cassert and with
-DCATCACHE_FORCE_RELEASE=1

2.   Run these commands as setup:

  CREATE TABLE my_table1 (id integer, name text);

  CREATE TABLE my_table2 (id integer);

3.   CREATE FUNCTION my_deleteproc1() RETURNS trigger AS $$

  begin

  DELETE FROM my_table2 WHERE id=OLD.id;

  RETURN NULL;

  end;$$ LANGUAGE plpgsql;

4.   INSERT INTO my_table1 VALUES(323, 'Alex');

  INSERT INTO my_table1 VALUES(23, 'Teddy');

  INSERT INTO my_table1 VALUES(38, 'Bob');

  INSERT INTO my_table2 VALUES(323);

  INSERT INTO my_table2 VALUES(23);

  INSERT INTO my_table2 VALUES(38);

5. CREATE OR REPLACE TRIGGER my_regular_trigger AFTER DELETE ON
my_table1

 FOR EACH ROW

 EXECUTE PROCEDURE my_deleteproc1();

6. Attach a debugger to your session set a breakpoint at
ExecARDeleteTriggers

7. Run this in your session

 DELETE FROM my_table1 WHERE id=323;

8. start another session and run:

 CREATE OR REPLACE TRIGGER my_regular_trigger
AFTER INSERT ON my_table1

 FOR EACH ROW

 EXECUTE PROCEDURE my_deleteproc1();

9. exite the debugger to release the first session

 and the result

 postgres=# SELECT * FROM my_table1;

  id | name

  +---

  23 | Teddy

 38 | Bob

 (2 rows)

 postgres=# SELECT * FROM my_table2;

 id

 -

 323

 23

 38

(3 rows)


Id number 323 should not be there in my_table2;



Regards

Surafel


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-17 Thread Petr Jelinek
On 17/03/17 16:07, Tom Lane wrote:
> Andrew Dunstan  writes:
>> I have confirmed on jacana Petr's observation that adding a timeout to
>> the WaitLatchOrSocket cures the problem.
> 
> Does anyone have a theory as to why that cures the problem?
> 

I now have theory and even PoC patch for it.

The long wait of WaitLatchOrSocket happens after connection state
switches from CONNECTION_STARTED to CONNECTION_MADE in both cases the
return value of PQconnectPoll is PGRES_POLLING_WRITING so we wait for
FD_WRITE.

Now the documentation for WSAEventSelect says "The FD_WRITE network
event is handled slightly differently. An FD_WRITE network event is
recorded when a socket is first connected with a call to the connect,
ConnectEx, WSAConnect, WSAConnectByList, or WSAConnectByName function or
when a socket is accepted with accept, AcceptEx, or WSAAccept function
and then after a send fails with WSAEWOULDBLOCK and buffer space becomes
available. Therefore, an application can assume that sends are possible
starting from the first FD_WRITE network event setting and lasting until
a send returns WSAEWOULDBLOCK. After such a failure the application will
find out that sends are again possible when an FD_WRITE network event is
recorded and the associated event object is set."

But while PQconnectPoll does connect() before setting connection status
to CONNECTION_STARTED and returns  PGRES_POLLING_WRITING so the FD_WRITE
eventually happens, it does not do any writes in the code block that
switches to  CONNECTION_MADE and PGRES_POLLING_WRITING. That means
FD_WRITE event is never recorded as per quoted documentation.

Then what remains is why it works in libpq. If you look at
pgwin32_select which is eventually called by libpq code, it actually
handles the situation by trying empty WSASend on any socket that is
supposed to wait for FD_WRITE and only calling the
WaitForMultipleObjectsEx when WSASend failed with WSAEWOULDBLOCK, when
the WSASend succeeds it immediately returns ok.

So I did something similar in attached for WaitEventSetWaitBlock() and
it indeed solves the issue my windows test machine. Now the
coding/placement probably isn't the best (and there are no comments),
but maybe somebody will find proper place for this now that we know the
cause.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index ea7f930..300b866 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1401,6 +1401,37 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			WaitEventAdjustWin32(set, cur_event);
 			cur_event->reset = false;
 		}
+
+		if (cur_event->events & WL_SOCKET_WRITEABLE)
+		{
+			char		c;
+			WSABUF		buf;
+			DWORD		sent;
+			int			r;
+
+			buf.buf = 
+			buf.len = 0;
+
+			r = WSASend(cur_event->fd, , 1, , 0, NULL, NULL);
+			if (r == 0)
+			{
+occurred_events->pos = cur_event->pos;
+occurred_events->user_data = cur_event->user_data;
+occurred_events->events = WL_SOCKET_WRITEABLE;
+occurred_events->fd = cur_event->fd;
+return 1;
+			}
+			else
+			{
+if (WSAGetLastError() != WSAEWOULDBLOCK)
+	/*
+	 * Not completed, and not just "would block", so an error
+	 * occurred
+	 */
+	elog(ERROR, "failed writability check on socket: error code %u",
+		 WSAGetLastError());
+			}
+		}
 	}
 
 	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] QGIS Seem To Bypass PostgreSQL/PostGIS User Privileges/Permissions

2017-03-17 Thread Osahon Oduware
Hi Giuseppe,

Thanks for the response. I have provided the GRANTS and other PostgreSQL
setup scripts below as it includes what you have suggested:

ROLE
-
CREATE ROLE  WITH NOLOGIN NOSUPERUSER INHERIT NOCREATEDB
NOCREATEROLE NOREPLICATION;

USER
--
CREATE USER  WITH PASSWORD ''

REVOKES

REVOKE ALL PRIVILEGES ON DATABASE  FROM PUBLIC;
REVOKE ALL PRIVILEGES ON DATABASE postgres FROM PUBLIC;

GRANTS
-
GRANT CONNECT ON DATABASE  TO ;
GRANT USAGE ON SCHEMA  TO ;
GRANT SELECT ON ALL TABLES IN SCHEMA  TO ;
GRANT  TO ;

I cannot but wonder why these privileges are working when tested in
pgAdmin/pgsql, but not in QGIS with the same user/schema/table/database.

On Fri, Mar 17, 2017 at 4:16 PM, Giuseppe Broccolo <
giuseppe.brocc...@2ndquadrant.it> wrote:

> Hi Osahon,
>
> 2017-03-17 15:54 GMT+01:00 Osahon Oduware :
>
>> Hi All,
>>
>> I created a "Read-only" User in PostgreSQL via a Role with "SELECT" ONLY
>> privilege on all tables in a schema as shown below:
>>
>> GRANT SELECT ON ALL TABLES IN SCHEMA [schema_name] TO [role_name]
>> GRANT [role_name] TO [user_name]
>>
>
> I'd have done this as followed:
>
> REVOKE ALL ON SCHEMA [schema_name] FROM PUBLIC;
> GRANT USAGE ON SCHEMA [schema_name] TO [role_name];
> GRANT SELECT ON ALL TABLES IN SCHEMA [schema_name] TO [role_name];
> GRANT [role_name] TO [user_name];
>
>
>>
>> Next, I test this by trying to UPDATE a column in a table (same schema as
>> above) with pgAdmin/psql and this works fine by giving a response that the
>> user has no permission - 'ERROR: permission denied for relation
>> .'
>>
>> Next, I connect with the same user in QGIS and add a layer from the same
>> table (same schema as above). I open the attribute table for the layer,
>> turn on editing mode (by clicking on the pencil-like icon), and edit the
>> same field/column above. To my surprise, the edit was saved successfully
>> without any permission error prompt.
>>
>> Next, I check the value of the field/column (same table/schema as above)
>> in pgAdmin/psql and it is having the new (edited) value from QGIS. This is
>> rather strange as it seems QGIS is bypassing the permissions set for the
>> same user in the PostgreSQL/PostGIS database.
>>
>> I will be glad if someone can help me unravel this mystery.
>>
>
> Check which user is used the first time you connect to the database
> through QGIS, and if you switch the user to [user_name] in a second moment.
> I'm wondering if you are keeping some privileges from a previous session.
>
> All the best,
> Giuseppe.
>
> --
> Giuseppe Broccolo - 2ndQuadrant Italy
> PostgreSQL & PostGIS Training, Services and Support
> giuseppe.brocc...@2ndquadrant.it | www.2ndQuadrant.it
>


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Tom Lane
Fabien COELHO  writes:
>> I also fear that there are corner cases where the behavior would still
>> be inconsistent.  Consider
>> 
>> \if ...
>> \set foo `echo \endif should not appear here`

> In this instance, ISTM that there is no problem. On "\if true", set is 
> executed, all is well. On "\if false", the whole line would be skipped 
> because the if-related commands are only expected on their own line, all 
> is well again. No problem.

AFAICS, you misunderstood the example completely, or else you're proposing
syntax restrictions that are even more bizarre and unintelligible than
I thought before.  We cannot have a situation where the syntax rules for
backslash commands inside an \if are fundamentally different from what
they are elsewhere; that's just going to lead to confusion and bug
reports.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Corey Huinker
>
>
> command.c:38:25: fatal error: conditional.h: No such file or directory
>  #include "conditional.h"
>

Odd, it's listed as a new file in git status. Anyway, my point of posting
the WIP patch was to give people a reference point and spark discussion
about the next step, and it succeeded at that.


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-17 Thread Tom Lane
Andres Freund  writes:
> [ latest patches ]

I looked through 0001 (the composite-type-dependencies one).  Although
I agree that it'd be good to tighten things up in that area, I do not
think we want this specific patch: it tightens things too much.  Consider
this variant of the existing test case in create_view.sql:

create table tt (f1 int, f2 int, f3 int);
create function ttf() returns setof tt as 'select * from tt' language sql;
create view vv as select f3 from ttf();
alter table tt drop column f3;

Current code allows the above (and causes the view to return nulls
afterwards).  The 0001 patch would forbid the DROP, which is fine,
but it would also forbid dropping either of the other two table
columns, which I think is insupportable.

Also, given the above table and function, consider

create view vvv as select ttf();

This view returns a 3-column composite type.  Now do

alter table tt add column f4 int;

Now the view returns a 4-column composite type.  But at this point the
patch will let you drop the f4 column, but not any of the earlier three.
That's just weird.

So I'm unhappy with the specific decisions made in 0001.  I think what we
really want there, probably, is for find_expr_references_walker to do more
than nothing with Vars referencing non-RELATION RTEs.

Having said all that, I think that 0001 is contributing very little to the
goals of this patch set.  Andres stated that he wanted it so as to drop
some of the one-time checks that execQual.c currently does for Vars, but
I'm not really convinced that we could do that safely even with these
additional dependencies in place.  Moreover, I really doubt that there's
a horrible performance cost from including something like

if (unlikely(op->first_execution))
out_of_line_checking_subroutine(...);

in the execution code for Vars.  And that certainly isn't adding any
complexity for JIT compilation that we don't face anyway for other
execution step types.

So my recommendation is to drop 0001 and include the same one-time
checks that execQual.c currently has as out-of-line one-time checks
in the new code.  We can revisit that later, but time grows short for
v10.  I would much rather have a solid version of 0004 and not 0001,
than not have anything for v10 because we spent too much time dealing
with adding new dependencies.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Fabien COELHO


Hello Tom,


ISTM that I've tried to suggest to work around that complexity by:
  - document that \if-related commands should only occur at line start
(and extend to eol).
  - detect and complain when this is not the case.


I think this is a lousy definition, and would never be considered if we
were working in a green field.


Yes, sure. As you pointed out, the field is not green: there is no clean 
lexical convention, too bad. I'm trying to deal with that without too much 
fuss in the code.



Moreover, preventing such cases would be pretty darn ugly/messy as well.

I also fear that there are corner cases where the behavior would still
be inconsistent.  Consider

\if ...
\set foo `echo \endif should not appear here`


In this instance, ISTM that there is no problem. On "\if true", set is 
executed, all is well. On "\if false", the whole line would be skipped 
because the if-related commands are only expected on their own line, all 
is well again. No problem.


Another more interesting one would be:

  \if ...
\unset foo \endif

On true, unset get its argument, then endif is detected as a backslash 
command, but it would see that it is not on its own line, so it would 
error out *and* be ignored. On false, the whole line would be ignored, it 
would just not complain, but it would be the same, i.e. it is *not* an 
\endif again. The drawback is only that the wrong \endif is not detected 
when under a false branch. That is why I added a third bullet "call border 
cases a feature".


ISTM that the proposed simple rules allow to deal with the situation 
without having to dive into each command lexing rules, and changing the 
existing code significantly. The drawback is that misplaced \endif are not 
detected in false branch, but they are ignored anyway, which is fine.



I'm imagining that instead of

[...] char   *envvar = psql_scan_slash_option(scan_state,

we'd write

[...] char   *envvar = args[0];

where the args array had been filled at the top of the function.
The top-of-function code would have to know all the cases where
commands didn't use basic OT_NORMAL processing, but there aren't
that many of those, I think.


Yep, I understood the idea. There are a few of those, about 49 OT_* in 
"command.c", including 34 OT_NORMAL, 1 OT_NO_EVAL, 3 OT_FILEPIPE, 9 
OT_WHOLELINE, some OT_SQLHACKID & OT_SQLID. I'm not sure of the 
combinations.


It still means splitting command lexing knowledge in several places. I'm 
not convinced by the impact on the resulting code with regard to 
readability and maintainability, so if there could be a way to get 
something without taking that path that would be nice, hence my 
suggestions.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Misleading bgworker log message

2017-03-17 Thread Mike Blackwell
​​
A handful of rather surprising errors showed up in our log extract this
morning, along the lines of:

​
2017-03-17 05:01:55 CDT [5400]: [1-1] @ FATAL:  57P01: terminating
connection due to administrator command

​After a moment of more than a little astonishment, a look at the full log
revealed
​
that, while the message implies some type of intervention from a
​ human​
, what's actually happening is the termination of parallel query worker
threads due to a statement timeout.

​I gather this is happening because the same mechanism is used to cancel
the process in both cases.  If that's indeed the case, I wonder if
 it's possible to
​re
phrase the termination message
​so it does not ​
suggest
​an admin was involved.

​

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com



* *


Re: [HACKERS] QGIS Seem To Bypass PostgreSQL/PostGIS User Privileges/Permissions

2017-03-17 Thread Giuseppe Broccolo
Hi Osahon,

2017-03-17 15:54 GMT+01:00 Osahon Oduware :

> Hi All,
>
> I created a "Read-only" User in PostgreSQL via a Role with "SELECT" ONLY
> privilege on all tables in a schema as shown below:
>
> GRANT SELECT ON ALL TABLES IN SCHEMA [schema_name] TO [role_name]
> GRANT [role_name] TO [user_name]
>

I'd have done this as followed:

REVOKE ALL ON SCHEMA [schema_name] FROM PUBLIC;
GRANT USAGE ON SCHEMA [schema_name] TO [role_name];
GRANT SELECT ON ALL TABLES IN SCHEMA [schema_name] TO [role_name];
GRANT [role_name] TO [user_name];


>
> Next, I test this by trying to UPDATE a column in a table (same schema as
> above) with pgAdmin/psql and this works fine by giving a response that the
> user has no permission - 'ERROR: permission denied for relation
> .'
>
> Next, I connect with the same user in QGIS and add a layer from the same
> table (same schema as above). I open the attribute table for the layer,
> turn on editing mode (by clicking on the pencil-like icon), and edit the
> same field/column above. To my surprise, the edit was saved successfully
> without any permission error prompt.
>
> Next, I check the value of the field/column (same table/schema as above)
> in pgAdmin/psql and it is having the new (edited) value from QGIS. This is
> rather strange as it seems QGIS is bypassing the permissions set for the
> same user in the PostgreSQL/PostGIS database.
>
> I will be glad if someone can help me unravel this mystery.
>

Check which user is used the first time you connect to the database through
QGIS, and if you switch the user to [user_name] in a second moment. I'm
wondering if you are keeping some privileges from a previous session.

All the best,
Giuseppe.

-- 
Giuseppe Broccolo - 2ndQuadrant Italy
PostgreSQL & PostGIS Training, Services and Support
giuseppe.brocc...@2ndquadrant.it | www.2ndQuadrant.it


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-17 Thread Tom Lane
Andrew Dunstan  writes:
> I have confirmed on jacana Petr's observation that adding a timeout to
> the WaitLatchOrSocket cures the problem.

Does anyone have a theory as to why that cures the problem?

What length of timeout is being suggested here?  Would a long timeout
perhaps create a performance issue?  That is, I'm wondering if the
wait actually sleeps to the end of the timeout in the problem case(s).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Remove objname/objargs split for referring to objects

2017-03-17 Thread Alvaro Herrera
Peter Eisentraut wrote:
> On 3/16/17 11:56, Alvaro Herrera wrote:
> > Michael Paquier wrote:
> > 
> >> What are you using as CFLAGS? As both typenames should be normally
> >> set, what about initializing those fields with NULL and add an
> >> assertion like the attached?
> > 
> > Actually, my compiler was right -- this was an ancient bug I introduced
> > in 9.5 (commit a61fd533), and this new warning was my compiler being a
> > bit smarter now for some reason.  The problem is we were trying to
> > extract String value from a TypeName node, which obviously doesn't work
> > very well.
> > 
> > I pushed a real fix, not just a compiler-silencer, along with a few
> > lines in object_address.sql to make sure it works properly.  Maybe we
> > need a few more tests cases for other parts of pg_get_object_address.
> > 
> > Pushed fix, backpatched to 9.5.
> 
> I am now seeing the warnings that Michael was reporting *after* your
> commit in 9.5 and 9.6.

Do you mean the ones I reported?

I don't get anything with my compiler (gcc (Debian 4.9.2-10) 4.9.2) and
I don't see anything in the few logs I looked at (clang, 9.6) from
buildfarm either.  What are you seeing specifically?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum support for Hash Index

2017-03-17 Thread Ashutosh Sharma
On Fri, Mar 17, 2017 at 6:13 PM, Amit Kapila  wrote:
> On Fri, Mar 17, 2017 at 12:27 PM, Ashutosh Sharma  
> wrote:
>> On Fri, Mar 17, 2017 at 8:20 AM, Amit Kapila  wrote:
>>
>> As I said in my previous e-mail, I think you need
>>> to record clearing of this flag in WAL record XLOG_HASH_DELETE as you
>>> are not doing this unconditionally and then during replay clear it
>>> only when the WAL record indicates the same.
>>
>> Thank you so much for putting that point. I too think that we should
>> record the flag status in the WAL record and clear it only when
>> required during replay.
>>
>
> I think hashdesc.c needs an update (refer case XLOG_HASH_DELETE:).

Done. Thanks!

>
> - * flag. Clearing this flag is just a hint; replay won't redo this.
> + * flag. Clearing this flag is just a hint; replay will check the
> + * status of clear_dead_marking flag before redo it.
>   */
>   if (tuples_removed && *tuples_removed > 0 &&
>   opaque->hasho_flag & LH_PAGE_HAS_DEAD_TUPLES)
> + {
>   opaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
> + clear_dead_marking = true;
> + }
>
>
> I feel the above comment is not required as you are logging this
> action explicitly.

That's right. I have removed it in the attached v4 patch.

>
> + bool clear_dead_marking; /* TRUE if VACUUM clears
>
> No need to write VACUUM explicitly, you can simply say "TRUE if this
> operation clears ...".

Corrected. Please find the attached v4 patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


0001-Reset-LH_PAGE_HAS_DEAD_TUPLES-flag-during-replay-v4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-17 Thread Peter Eisentraut
I have committed the page_checksum function, will work on the bt stuff next.

I left in the superuser check, because I was not confident how well
pg_checksum_page() would handle messed up data.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] QGIS Seem To Bypass PostgreSQL/PostGIS User Privileges/Permissions

2017-03-17 Thread Osahon Oduware
Hi All,

I created a "Read-only" User in PostgreSQL via a Role with "SELECT" ONLY
privilege on all tables in a schema as shown below:

GRANT SELECT ON ALL TABLES IN SCHEMA [schema_name] TO [role_name]
GRANT [role_name] TO [user_name]

Next, I test this by trying to UPDATE a column in a table (same schema as
above) with pgAdmin/psql and this works fine by giving a response that the
user has no permission - 'ERROR: permission denied for relation
.'

Next, I connect with the same user in QGIS and add a layer from the same
table (same schema as above). I open the attribute table for the layer,
turn on editing mode (by clicking on the pencil-like icon), and edit the
same field/column above. To my surprise, the edit was saved successfully
without any permission error prompt.

Next, I check the value of the field/column (same table/schema as above) in
pgAdmin/psql and it is having the new (edited) value from QGIS. This is
rather strange as it seems QGIS is bypassing the permissions set for the
same user in the PostgreSQL/PostGIS database.

I will be glad if someone can help me unravel this mystery.


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-17 Thread Andrew Dunstan


On 03/08/2017 08:29 PM, Andrew Dunstan wrote:
>
> On 03/08/2017 08:33 AM, Peter Eisentraut wrote:
>> On 3/6/17 09:38, Peter Eisentraut wrote:
>>> On 3/4/17 01:45, Petr Jelinek wrote:
 If that's the case, the attached should fix it, but I have no way of
 testing it on windows, I can only say that it still works on my machine
 so at least it hopefully does not make things worse.
>>> Committed that.  Let's see how it goes.
>> So that didn't work.  Now we probably need someone to dig into that host
>> directly.
>>
>> Andrew, could you help?
>>
>
> I'll take a look when I get a chance. Might be a few days.
>



I have confirmed on jacana Petr's observation that adding a timeout to
the WaitLatchOrSocket cures the problem.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Two phase commit in ECPG

2017-03-17 Thread Michael Meskes
> Thank you for pointing out.
> Yeah, I agree that the twophase regression test should not be
> performed by default as long as the default value of
> max_prepared_transactions is 0. Similar to this, the isolation check
> regression test does same thing. Attached patch removes sql/twophase
> from schedule file and add new type of regression test.

Would it be possible to include it in "make check"? Any check that
needs manual interaction will not be executed nearly is often as the
others and become much less useful imo.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Tom Lane
Fabien COELHO  writes:
> ISTM that I've tried to suggest to work around that complexity by:
>   - document that \if-related commands should only occur at line start
> (and extend to eol).
>   - detect and complain when this is not the case.

I think this is a lousy definition, and would never be considered if we
were working in a green field.  Moreover, preventing such cases would be
pretty darn ugly/messy as well.

I also fear that there are corner cases where the behavior would still
be inconsistent.  Consider

\if ...
\set foo `echo \endif should not appear here`

If the \if succeeds, the result of the second line would be to set foo
to "endif should not appear here" (and we'd remain in the \if block).
But if the \if fails and we need to skip the \set command, any approach
that involves changing the argument parsing rules will fail to recognize
the backtick construct, and then will see the \endif as a command.
Similar examples can be constructed using \copy.

It's possible that we could keep the implementation that uses an early exit
from exec_command() if we were to move argument collection for all
backslash commands up to the start of the function.  It would still be
a bit invasive, but perhaps not too awful: I'm imagining that instead of

else if (strcmp(cmd, "setenv") == 0)
{
char   *envvar = psql_scan_slash_option(scan_state,
OT_NORMAL, NULL, false);
char   *envval = psql_scan_slash_option(scan_state,
OT_NORMAL, NULL, false);

we'd write

else if (strcmp(cmd, "setenv") == 0)
{
char   *envvar = args[0];
char   *envval = args[1];

where the args array had been filled at the top of the function.
The top-of-function code would have to know all the cases where
commands didn't use basic OT_NORMAL processing, but there aren't
that many of those, I think.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-03-17 Thread David Steele
On 3/17/17 4:18 AM, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tsunakawa,
>> Takayuki
>> I made this ready for committer.  The patch applied except for catversion.h,
>> the patch content looks good, and the target test passed as follows:
> 
> Sorry, I reverted this to waiting for author, because make check failed.  
> src/test/regress/expected/opr_sanity.out seems to need revision because 
> pg_stop_backup() was added.

Well, that's embarrassing.  When I recreated the function to add
defaults I messed up the AS clause and did not pay attention to the
results of the regression tests, apparently.

Attached is a new version rebased on 88e66d1.  Catalog version bump has
also been omitted.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9408a25..4dc30ca 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18355,7 +18355,7 @@ SELECT set_config('log_statement_stats', 'off', false);
   
   

-pg_stop_backup(exclusive 
boolean)
+pg_stop_backup(exclusive 
boolean , wait_for_archive boolean 
)
 
setof record
Finish performing exclusive or non-exclusive on-line backup 
(restricted to superusers by default, but other users can be granted EXECUTE to 
run the function)
@@ -18439,7 +18439,13 @@ postgres=# select pg_start_backup('label_goes_here');
 pg_start_backup. In a non-exclusive backup, the contents of
 the backup_label and tablespace_map are returned
 in the result of the function, and should be written to files in the
-backup (and not in the data directory).
+backup (and not in the data directory).  There is an optional second
+parameter of type boolean.  If false, the pg_stop_backup
+will return immediately after the backup is completed without waiting for
+WAL to be archived.  This behavior is only useful for backup
+software which independently monitors WAL archiving. Otherwise, WAL
+required to make the backup consistent might be missing and make the backup
+useless.

 

diff --git a/src/backend/access/transam/xlogfuncs.c 
b/src/backend/access/transam/xlogfuncs.c
index 96aa15e..1ef9f2b 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -190,6 +190,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
boolnulls[3];
 
boolexclusive = PG_GETARG_BOOL(0);
+   boolwait_for_archive = PG_GETARG_BOOL(1);
XLogRecPtr  stoppoint;
 
/* check to see if caller supports us returning a tuplestore */
@@ -232,7 +233,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 * Stop the exclusive backup, and since we're in an exclusive 
backup
 * return NULL for both backup_label and tablespace_map.
 */
-   stoppoint = do_pg_stop_backup(NULL, true, NULL);
+   stoppoint = do_pg_stop_backup(NULL, wait_for_archive, NULL);
exclusive_backup_running = false;
 
nulls[1] = true;
@@ -250,7 +251,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 * Stop the non-exclusive backup. Return a copy of the backup 
label
 * and tablespace map so they can be written to disk by the 
caller.
 */
-   stoppoint = do_pg_stop_backup(label_file->data, true, NULL);
+   stoppoint = do_pg_stop_backup(label_file->data, 
wait_for_archive, NULL);
nonexclusive_backup_running = false;
cancel_before_shmem_exit(nonexclusive_base_backup_cleanup, 
(Datum) 0);
 
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index b6552da..c2b0bed 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -988,6 +988,12 @@ CREATE OR REPLACE FUNCTION
   RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_stop_backup (
+exclusive boolean, wait_for_archive boolean DEFAULT true,
+OUT lsn pg_lsn, OUT labelfile text, OUT spcmapfile text)
+  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text 
boolean DEFAULT false)
@@ -1088,7 +1094,7 @@ AS 'jsonb_insert';
 -- available to superuser / cluster owner, if they choose.
 REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public;
 REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public;
-REVOKE EXECUTE ON FUNCTION pg_stop_backup(boolean) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stop_backup(boolean, boolean) FROM public;
 REVOKE EXECUTE ON FUNCTION pg_create_restore_point(text) FROM public;
 REVOKE EXECUTE 

Re: [HACKERS] wait events for disk I/O

2017-03-17 Thread Rushabh Lathia
Thanks Robert for the review.

On Thu, Mar 16, 2017 at 8:05 PM, Robert Haas  wrote:

> On Thu, Mar 16, 2017 at 8:28 AM, Rahila Syed 
> wrote:
> > Thank you for the updated patch.
> >
> > I have applied and tested it on latest sources and the patch looks good
> to
> > me.
>
> The documentation puts the new wait events in a pretty random order.
> I think they should be alphabetized, like we do with the IPC events.
>

Done.


> I also suggest we change the naming scheme so that the kind of thing
> being operated on is first and this is followed by the operation name.
> This will let us keep related entries next to each other after
> alphabetizing.  So with that principle in mind:
>
>
Yes above naming scheme is more clear then the one i choose.

- instead of ReadDataBlock etc. I propose DataFileRead, DataFileWrite,
> DataFileSync, DataFileExtend, DataFileFlush, DataFilePrefetch,
> DataFileTruncate.  using file instead of block avoids singular/plural
> confusion.
> - instead of RelationSync and RelationImmedSync I proposed
> DataFileSync and DataFileImmediateSync; these are md.c operations like
> the previous set, so why name it differently?
>

Yes, you are right, DataFileSync and DataFileImmediateSync make more
sense.


> - instead of WriteRewriteDataBlock and SyncRewriteDataBlock and
> TruncateLogicalMappingRewrite, which aren't consistent with each other
> even though they are related, I propose LogicalRewriteWrite,
> LogicalRewriteSync, and LogicalRewriteTruncate, which are also closer
> to the names of the functions that contain those wait points
> - for ReadBuffile and WriteBuffile seem OK, I propose BufFileRead and
> BufFileWrite, again reversing the order and also tweaking the
> capitalization
> - in keeping with our new policy of referring to xlog as wal in user
> visible interfaces, I propose WALRead, WALCopyRead, WALWrite,
> WALInitWrite, WALCopyWrite, WALBootstrapWrite, WALInitSync,
> WALBootstrapSync, WALSyncMethodAssign
> - for control file ops, ControlFileRead, ControlFileWrite,
> ControlFileWriteUpdate, ControlFileSync, ControlFileSyncUpdate
>

- ReadApplyLogicalMapping and friends seem to have to do with the
> reorderbuffer code, so maybe ReorderBufferRead etc.
> - there seems to be some discrepancy between the documentation and
> pgstat_get_wait_io for the snapbuild stuff.  maybe SnapBuildWrite,
> SnapBuildSync, SnapBuildRead.
> - SLRURead, SLRUWrite, etc.
> - TimelineHistoryRead, etc.
> - the syslogger changes should be dropped, since the syslogger is not
> and should not be connected to shared memory
>

Ok removed.


> - the replslot terminology seems like a case of odd capitalization and
> overeager abbreviation.  why not ReplicationSlotRead,
> ReplicationSlotWrite, etc?  similarly RelationMapRead,
> RelationMapWrite, etc?
> - CopyFileRead, CopyFileWrite
> - LockFileCreateRead, etc.
> - AddToDataDirLockFileRead is a little long and incomprehensible;
> maybe LockFileUpdateRead etc.
>

How about LockFileAddToDataDirRead? even though its little long but it
gives clear understanding about what's going on.


> - DSMWriteZeroBytes, maybe?
>
>
DSMFillZeroWrite? Basically want to keep the file IP operation at the end of
the event name.


> Of course the constants should be renamed to match.
>

I tried to cover all the suggestion in the attached latest patch.


Thanks,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9eaf43a..acf310e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -716,6 +716,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   point.
  
 
+
+ 
+  IO: The server process is waiting for a IO to complete.
+  wait_event will identify the specific wait point.
+ 
+

   
  
@@ -1272,6 +1278,271 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  RecoveryApplyDelay
  Waiting to apply WAL at recovery because it is delayed.
 
+
+ IO
+ BufFileRead
+ Waiting during buffer read operation.
+
+
+ BufFileWrite
+ Waiting during buffer write operation.
+
+
+ CheckpointLogicalRewriteSync
+ Waiting to sync logical mapping during a checkpoint for logical rewrite mappings.
+
+
+ ControlFileRead
+ Waiting to read the control file.
+
+
+ ControlFileSync
+ Waiting to sync the control file.
+
+
+ ControlFileSyncUpdate
+ Waiting to sync the control file during update control file.
+
+
+ ControlFileWrite
+ Waiting to write the control file.
+
+
+ ControlFileWriteUpdate
+ Waiting to write the control file during update control file.
+

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2017-03-17 Thread Robert Haas
On Thu, Mar 16, 2017 at 10:21 PM, Michael Paquier
 wrote:
> On Fri, Mar 17, 2017 at 11:17 AM, Robert Haas  wrote:
>> I understand that the point of renaming pg_clog to pg_xact is that
>> pg_clog contains the dreaded letters l-o-g, which we hypothesize
>> causes DBAs to remove it.  (Alternate hypothesis: "So, that's what's
>> clogging my database!")
>>
>> Renaming pg_subtrans to pg_subxact has no such redeeming properties.
>>
>> More, with each of these renamings, we're further separating what
>> things are called in the code (xlog, clog, subtrans) with what they're
>> called in the filesystem (wal, xact, subxact).
>>
>> So if we must rename pg_clog, OK, but can't we leave pg_subtrans
>> alone?  It's not hurting anybody.
>
> The only argument behind the renaming of pg_subtrans is really
> consistency with pg_xact, because both deal with transactions. I don't
> personally mind if this portion of the renaming is left off, as you
> say anything labelled with "log" is at the origin of this thread.

Fine!  I've committed the pg_clog renaming, but I'd really like to
draw the line here.  I'm not going to commit the pg_subtrans ->
pg_subxact naming and am -1 on anyone else doing so.  I think that
having the names of things in the code match what shows up in the file
system is an awfully desirable property which we should preserve
insofar as we can do so without compromising other goals.  Not
invalidating what users and DBAs think they know about how PostgreSQL
works is a good thing, too; there are a lot more people who know
something about the directory layout than will read this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2017-03-17 Thread Michael Paquier
On Fri, Mar 17, 2017 at 10:58 PM, Robert Haas  wrote:
> Fine!  I've committed the pg_clog renaming, but I'd really like to
> draw the line here.  I'm not going to commit the pg_subtrans ->
> pg_subxact naming and am -1 on anyone else doing so.  I think that
> having the names of things in the code match what shows up in the file
> system is an awfully desirable property which we should preserve
> insofar as we can do so without compromising other goals.  Not
> invalidating what users and DBAs think they know about how PostgreSQL
> works is a good thing, too; there are a lot more people who know
> something about the directory layout than will read this thread.

I'm cool with that, thanks for the commit.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Guidelines for GSoC student proposals

2017-03-17 Thread Kevin Grittner
I've found various sources that give hints about what a student
proposal should look like, but nothing I could just give as a link,
so I pulled together what I could find, tempered by my own ideas and
opinions.  I suggest that we send the below, or something like it to
each student who expresses interest in making a proposal, or who
submits a proposal that doesn't meet the below guidelines.  Thoughts
or suggestions for changes before we do?  Remember, time is short,
so this cannot be a 200 message bike-shedding debate -- we just need
to provide some sort of guidance to students in a timely way, with
the timeline being:

  February 27 - March 20
Potential student participants discuss application ideas with
mentoring organizations
  March 20 16:00 UTC
Student application period opens
  April 3 16:00 UTC
Student application deadline

Each GSoC student proposal should be a PDF file of 6 to 8 pages.  In
the end, Google will publish these documents on a web page, so the
student should make each proposal something which they will be happy
to have future potential employers review.

Some ideas for desirable content:

  - A resume or CV of the student, including any prior GSoC work
  - Their reasons for wanting to participate
  - What else they have planned for the summer, and what their time
commitment to the GSoC work will be
  - A clear statement that there will be no intellectual property
problems with the work they will be doing -- that the PostgreSQL
community will be able to use their work without encumbrances
(e.g., there should be no agreements related to prior or
ongoing work which might assign the rights to the work they do
to someone else)
  - A description of what they will do, and how
  - Milestones with dates
  - What they consider to be the test that they have successfully
completed the project

Note that a student proposal is supposed to be far more detailed
than the ideas for projects provided by the organization -- those
are intended to be ideas for what the student might write up as a
proposal, not ready-to-go proposal documents.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Configurable file mode mask

2017-03-17 Thread David Steele
On 3/15/17 3:00 AM, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of David Steele
>>> But it might be worth thinking about whether we want to encourage
>>> people to do manual chmod's at all; that's fairly easy to get wrong,
>>> particularly given the difference in X bits that should be applied to
>>> files and directories.  Another approach that could be worth
>>> considering is a PGC_POSTMASTER GUC with just two states (group access
>>> or not) and make it the postmaster's responsibility to do the
>>> equivalent of chmod -R to make the file tree match the GUC.  I think
>>> we do a tree scan anyway for other purposes, so correcting any wrong
>>> file permissions might not be much added work in the normal case.
>>
>> The majority of scanning is done in recovery (to find and remove unlogged
>> tables) and I'm not sure we would want to add that overhead to normal 
>> startup.
> 
> I'm on David's side, too.  I don't postmaster to always scan all files at 
> startup.
> 
> On the other hand, just doing "chmod -R $PGDATA" is not enough, because chmod 
> doesn't follow the symbolic links.  Symbolic links are used for pg_tblspc/* 
> and pg_wal at least.  FYI, MySQL's manual describes the pithole like this:

Good point - I think we'll need to add that to the docs as well.

> I think we also need to describe the procedure carefully.  That said, it 
> would be best to make users aware of a configuration alternative (group 
> access) with enough documentation when they first build the database or 
> upgrade the database cluster.  Just describing the alternative only in initdb 
> reference page would result in being unaware of the better configuration, 
> like --data-checksum.

I'm working on a new approach incorporating everybody's suggestions and
enhanced documentation.  It should be ready on Monday.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Configurable file mode mask

2017-03-17 Thread David Steele
On 3/15/17 1:56 AM, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of David Steele
>> Sure, but having the private key may allow them to get new data from the
>> server as well as the data from the backup.
> 
> You are right.  My rough intent was that the data is stolen anyway.  So, I 
> thought it might not be so bad to expect to be able to back up the SSL key 
> file in $PGDATA together with the database.  If it's bad, then the default 
> value of ssl_key_file (=$PGDATA/ssl.key) should be disallowed.

I think it really depends on the situation and the user needs to make an
evaluation of the security risks for themselves.

>> If the backup user is in the same group as the postgres user and in the
>> ssl_cert group then backups of the certs would be possible using group reads.
>> Restores will be a little tricky, though, because of the need to set
>> ownership to root.  The restore would need to be run as root or the
>> permissions fixed up after the restore completes.
> 
> Yes, but I thought, from the following message,  that you do not recommend 
> that the backup user be able to read the SSL key file.  So, I proposed to 
> describe the example configuration to achieve that -- postgres user in dba 
> and ssl_cert group, and a separate backup user in only dba group.

That would work as long as the key was not in $PGDATA.  Most database
backup software does not have a --ignore option because of the obvious
dangers.

 It seems to me that it would be best to advise in the docs that these
 files should be relocated if they won't be readable by the backup user.
 In any event, I'm not convinced that backing up server private keys
 is a good idea.
> 
>> To be clear, the default for this patch is to leave permissions exactly
>> as they are now.  It also provides alternatives that may or not be useful
>> in all cases.
> 
> So you think there are configurations that may be useful or not, don't you?  
> Adding a new parameter could possibly complicate what users have to consider. 
>  Maximal flexibility could lead to insecure misuse.  So I think it would be 
> better to describe secure and practical configuration examples in the SSL 
> section and/or the backup chapter.  The configuration factor includes whether 
> the backup user is different from the postgres user, where the SSL key file 
> is placed, the owner of the SSL key file, whether the backup user can read 
> the SSL key file.

I think we are more or less on the same page, and you'd like to see
documentation that shows examples of secure configurations when group
access is allow.  I agree and I'm working on documentation for all the
sections you recommended.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-17 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 17, 2017 at 8:32 AM, Heikki Linnakangas  wrote:
>> It would make sense to have \password obey password_encryption GUC. Then
>> \password and ALTER USER would do the same thing, which would be less
>> surprising. Although it's also a bit weird for a GUC to affect client-side
>> behavior, so perhaps better to just document that \password will create a
>> SCRAM verifier, unless you explicitly tell it to create an MD5 hash, and add
>> a 'method' parameter to it.

> Either of those would be fine with me, but I think we should do one of them.

I vote for the second one; seems much less surprising and action-at-a-
distance-y.  And I think the entire point of \password is to *not* do
exactly what a bare ALTER USER would do, but to superimpose a layer of
best practice on it.  We certainly want to define use of SCRAM as being
best practice.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] ALTER DEFAULT PRIVILEGES with GRANT/REVOKE ON SCHEMAS

2017-03-17 Thread David Steele
On 3/13/17 11:15 AM, David Steele wrote:
> Hi Matheus,
> 
> On 3/2/17 8:27 AM, David Steele wrote:
>> On 1/18/17 7:18 PM, Petr Jelinek wrote:
>>>
>>> The patch looks good, the only thing I am missing is tab completion
>>> support for psql.
>>
>> It looks like this patch is still waiting on an update for tab
>> completion in psql.
>>
>> Do you know when will have that patch ready?
> 
> It's been a while since there was a new patch or any activity on this
> thread.
> 
> If you need more time to produce a patch, please post an explanation for
> the delay and a schedule for the new patch.  If no patch or explanation
> is is posted by 2017-03-16 AoE I will mark this submission
> "Returned with Feedback".

I have marked this submission "Returned with Feedback".  Please feel
free to resubmit when you have a new version.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-03-17 Thread David Steele
On 3/17/17 3:58 AM, Kyotaro HORIGUCHI wrote:
> At Mon, 13 Mar 2017 10:42:05 -0400, David Steele  wrote 
> in <1e8297fd-f7f2-feab-848d-5121e45c8...@pgmasters.net>
>> It has been a while since this thread has received any comments or a new
>> patch.  The general consensus seems to be that this feature is too large
>> a rewrite of tab completion considering a major rewrite was done for 9.6.
>>
>> Are you considering writing a localized patch for this feature as Tom
>> suggested?  If so, please post that by 2017-03-16 AoE.
>>
>> If no new patch is submitted by that date I will mark this submission
>> "Returned with Feedback".
> 
> It's a pity. I had to take a week's leave..
> 
> I understood that the 'localized fashion' means "without removing
> eles's", that is the core of this patch. So, if it is not
> acceptable this should be abandoned. I'll try put the inidividual
> enahancements other than refactoring in other shape, in the next
> commitfest.

I have marked this submission "Returned with Feedback".  Please feel
free to resubmit when you have a new version.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove defunct and unnecessary link

2017-03-17 Thread Robert Haas
On Thu, Mar 16, 2017 at 6:30 PM, David Christensen  wrote:
> The HA docs reference a “glossary” link which is no longer accessible, nor is 
> it likely to be useful in general to link off-site IMHO.  This simple patch 
> removes this link.

Committed and back-patched.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-17 Thread Tom Lane
"Daniel Verite"  writes:
>   Tom Lane wrote:
>> OT_WHOLE_LINE is not what you want because that results in verbatim
>> copying, without variable expansion or anything

> But if we want to implement "\if defined :foo" in the future
> isn't it just what we need?

I don't think that should mean what you think.  I believe an appropriate
spelling of what you mean is "\if defined foo".  What you wrote should
result in foo being expanded and then a defined-ness test being performed
on whatever variable name results.

> Also we could leave open the option to accept an SQL expression
> here. I expect people will need SQL as the evaluator in a lot of cases.

Right, and they'll also want to insert variable references into that
SQL.  In the short term though, `expr ...` is going to be the solution,
and that means we'd better not throw away the behavior of expanding
back-ticks.

> There's a precedent with \copy accepting a query inside parentheses,
> using OT_WHOLE_LINE.

IMV, \copy is just about completely broken in this regard, precisely
because it fails to expand variable references.  I don't want to
emulate that brain-damage for \if.  (I believe, btw, that part
of the reason for \copy behaving this way is that we wanted to
preserve an ancient behavior whereby Windows users were not forced
to double backslashes in \windows\style\path\names.  Fortunately,
that bit of silliness need not be considered for \if.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-17 Thread Robert Haas
On Fri, Mar 17, 2017 at 8:32 AM, Heikki Linnakangas  wrote:
> It would make sense to have \password obey password_encryption GUC. Then
> \password and ALTER USER would do the same thing, which would be less
> surprising. Although it's also a bit weird for a GUC to affect client-side
> behavior, so perhaps better to just document that \password will create a
> SCRAM verifier, unless you explicitly tell it to create an MD5 hash, and add
> a 'method' parameter to it.

Either of those would be fine with me, but I think we should do one of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum support for Hash Index

2017-03-17 Thread Amit Kapila
On Fri, Mar 17, 2017 at 12:27 PM, Ashutosh Sharma  wrote:
> On Fri, Mar 17, 2017 at 8:20 AM, Amit Kapila  wrote:
>
> As I said in my previous e-mail, I think you need
>> to record clearing of this flag in WAL record XLOG_HASH_DELETE as you
>> are not doing this unconditionally and then during replay clear it
>> only when the WAL record indicates the same.
>
> Thank you so much for putting that point. I too think that we should
> record the flag status in the WAL record and clear it only when
> required during replay.
>

I think hashdesc.c needs an update (refer case XLOG_HASH_DELETE:).

- * flag. Clearing this flag is just a hint; replay won't redo this.
+ * flag. Clearing this flag is just a hint; replay will check the
+ * status of clear_dead_marking flag before redo it.
  */
  if (tuples_removed && *tuples_removed > 0 &&
  opaque->hasho_flag & LH_PAGE_HAS_DEAD_TUPLES)
+ {
  opaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
+ clear_dead_marking = true;
+ }


I feel the above comment is not required as you are logging this
action explicitly.

+ bool clear_dead_marking; /* TRUE if VACUUM clears

No need to write VACUUM explicitly, you can simply say "TRUE if this
operation clears ...".


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-17 Thread David Steele
On 3/17/17 2:08 AM, Fabien COELHO wrote:
> 
> Hello David,
> 
>>> Repost from bugs.
>>
>> This patch does not apply at cccbdde:
> 
> Indeed. It should not. The fix is for the 9.6 branch. The issue has been
> fixed by some heavy but very welcome restructuring in master.

Whoops, sorry about that!

>> Marked as "Waiting for Author".
> 
> I put it back to "Needs review".

Beena, do you know when you'll have time to review?

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-17 Thread Heikki Linnakangas

On 03/17/2017 02:01 PM, Robert Haas wrote:

On Thu, Mar 16, 2017 at 11:38 PM, Michael Paquier
 wrote:

At least this has the merit of making \password simpler from psql
without a kind of --method option: if the backend is 9.6 or older,
just generate a MD5-hash, and SCRAM-hash for newer versions.
PQencryptPassword still needs to be extended so as it accepts a hash
method though.


What if the user doesn't want to switch to SCRAM because they also use
some connector that hasn't been updated to support it?

I bet there will be a lot of people in that situation.


You could use the less secure server-side ALTER USER to set the password 
in that case. Or use an older client. But I agree, it's a bit weird if 
we don't allow the user to generate an MD5 hash, if he insists. I think 
we still need a 'method' option to \password.


It would make sense to have \password obey password_encryption GUC. Then 
\password and ALTER USER would do the same thing, which would be less 
surprising. Although it's also a bit weird for a GUC to affect 
client-side behavior, so perhaps better to just document that \password 
will create a SCRAM verifier, unless you explicitly tell it to create an 
MD5 hash, and add a 'method' parameter to it.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >