Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Ashutosh Bapat
On Fri, Apr 7, 2017 at 9:02 AM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> While I agree that we can remove indesc->tdtypeid ==
>> outdesc->tdtypeid, I am not sure whether it should be replaced by
>> !indesc->tdhasoid && !outdesc->tdhasoid.
>
> No, that was overly conservative; the correct test is that the tdhasoid
> settings must be equal.  Reading Robert's commit message for 3838074f8
> and mine for 3f902354b might clarify this.

Thanks for the commit and testcases.

>
>> If that's required, it seems
>> to be a bug that needs to be fixed in earlier braches as well.
>
> It's not a bug in older branches, because the tdtypeid comparison
> was enough to guarantee the same tdhasoid values.

Ok.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making clausesel.c Smarter

2017-04-06 Thread David Rowley
On 4 April 2017 at 13:35, Claudio Freire  wrote:
> On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire  wrote:
> If you ask me, I'd just leave:
>
> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
> DEFAULT_INEQ_SEL)
> + {
> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
> implies we have no stats */
> + s2 = DEFAULT_RANGE_INEQ_SEL;
> + }
> + else
> + {
> ...
> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
> +   notnullsel = 1.0 - nullsel;
> +
> +   /* Adjust for double-exclusion of NULLs */
> +   s2 += nullsel;
> +   if (s2 <= 0.0) {
> +  if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
> +  {
> +   /* Most likely contradictory clauses found */
> +   s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
> +  }
> +  else
> +  {
> +   /* Could be a rounding error */
> +   s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
> +  }
> +   }
> + }
>
> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
> cautious) estimation of the amount of rounding error that could be
> there with 32-bit floats.
>
> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
> an estimation based on stats, maybe approximate, but one that makes
> sense (ie: preserves the monotonicity of the CPF), and as such
> negative values are either sign of a contradiction or rounding error.
> The previous code left the possibility of negatives coming out of
> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
> but that doesn't seem possible coming out of scalarineqsel.

I notice this does change the estimates for contradictory clauses such as:

create table a (value int);
insert into a select x/100 from generate_Series(1,1) x;
analyze a;
explain analyze select * from a where value between 101 and -1;

We now get:

 QUERY PLAN
-
 Seq Scan on a  (cost=0.00..195.00 rows=1 width=4) (actual
time=1.904..1.904 rows=0 loops=1)
   Filter: ((value >= 101) AND (value <= '-1'::integer))
   Rows Removed by Filter: 1
 Planning time: 0.671 ms
 Execution time: 1.950 ms
(5 rows)

where before we'd get:

  QUERY PLAN
--
 Seq Scan on a  (cost=0.00..195.00 rows=50 width=4) (actual
time=0.903..0.903 rows=0 loops=1)
   Filter: ((value >= 101) AND (value <= '-1'::integer))
   Rows Removed by Filter: 1
 Planning time: 0.162 ms
 Execution time: 0.925 ms
(5 rows)

Which comes from the 1 * 0.005 selectivity estimate from tuples *
DEFAULT_RANGE_INEQ_SEL

I've attached a patch to this effect.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


smarter_clausesel_2017-04-07a.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-04-06 Thread Vitaly Burovoy
On 4/6/17, Peter Eisentraut  wrote:
> On 4/4/17 22:53, Vitaly Burovoy wrote:
>> The next nitpickings to the last patch. I try to get places with
>> lacking of variables' initialization.
>> All other things seem good for me now. I'll continue to review the
>> patch while you're fixing the current notes.
>
> Committed with your changes (see below) as well as executor fix.

Thank you very much!


> As I tried to mention earlier, it is very difficult to implement the IF
> NOT EXISTS behavior here, because we need to run the commands the create
> the sequence before we know whether we will need it.

In fact with the function "get_attidentity" it is not so hard. Please,
find the patch attached.
I've implement SET GENERATED ... IF NOT EXISTS. It must be placed
before other SET options but fortunately it conforms with the
standard.
Since that form always changes the sequence behind the column, I
decided to explicitly write "[NO] CACHE" in pg_dump.

As a plus now it is possible to rename the sequence behind the column
by specifying SEQUENCE NAME in SET GENERATED.

I hope it is still possible to get rid of the "ADD GENERATED" syntax.

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Best regards,
Vitaly Burovoy


v7-0001-Implement-SET-IDENTITY-.-IF-NOT-EXISTS.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] POC: Sharing record typmods between backends

2017-04-06 Thread Thomas Munro
Hi hackers,

Tuples can have type RECORDOID and a typmod number that identifies a
"blessed" TupleDesc in a backend-private cache.  To support the
sharing of such tuples through shared memory and temporary files, I
think we need a typmod registry in shared memory.  Here's a
proof-of-concept patch for discussion.  I'd be grateful for any
feedback and/or flames.

This is a problem I ran into in my parallel hash join project.  Robert
pointed it out to me and told me to go read tqueue.c for details, and
my first reaction was: I'll code around this by teaching the planner
to avoid sharing tuples from paths that produce transient record types
based on tlist analysis[1].  Aside from being a cop-out, that approach
doesn't work because the planner doesn't actually know what types the
executor might come up with since some amount of substitution for
structurally-similar records seems to be allowed[2] (though I'm not
sure I can explain that).  So... we're gonna need a bigger boat.

The patch uses typcache.c's backend-private cache still, but if the
backend is currently "attached" to a shared registry then it functions
as a write though cache.   There is no cache-invalidation problem
because registered typmods are never unregistered.  parallel.c exports
the leader's existing record typmods into a shared registry, and
attaches to it in workers.  A DSM detach hook returns backends to
private cache mode when parallelism ends.

Some thoughts:

* Maybe it would be better to have just one DSA area, rather than the
one controlled by execParallel.c (for executor nodes to use) and this
new one controlled by parallel.c (for the ParallelContext).  Those
scopes are approximately the same at least in the parallel query case,
but...

* It would be nice for the SharedRecordTypeRegistry to be able to
survive longer than a single parallel query, perhaps in a per-session
DSM segment.  Perhaps eventually we will want to consider a
query-scoped area, a transaction-scoped area and a session-scoped
area?  I didn't investigate that for this POC.

* It seemed to be a reasonable goal to avoid allocating an extra DSM
segment for every parallel query, so the new DSA area is created
in-place.  192KB turns out to be enough to hold an empty
SharedRecordTypmodRegistry due to dsa.c's superblock allocation scheme
(that's two 64KB size class superblocks + some DSA control
information).  It'll create a new DSM segment as soon as you start
using blessed records, and will do so for every parallel query you
start from then on with the same backend.  Erm, maybe adding 192KB to
every parallel query DSM segment won't be popular...

* Perhaps simplehash + an LWLock would be better than dht, but I
haven't looked into that.  Can it be convinced to work in DSA memory
and to grow on demand?

Here's one way to hit the new code path, so that record types blessed
in a worker are accessed from the leader:

  CREATE TABLE foo AS SELECT generate_series(1, 10) AS x;
  CREATE OR REPLACE FUNCTION make_record(n int)
RETURNS RECORD LANGUAGE plpgsql PARALLEL SAFE AS
  $$
  BEGIN
RETURN CASE n
 WHEN 1 THEN ROW(1)
 WHEN 2 THEN ROW(1, 2)
 WHEN 3 THEN ROW(1, 2, 3)
 WHEN 4 THEN ROW(1, 2, 3, 4)
 ELSE ROW(1, 2, 3, 4, 5)
   END;
  END;
  $$;
  SET force_parallel_mode = 1;
  SELECT make_record(x) FROM foo;

PATCH

1.  Apply dht-v3.patch[3].
2.  Apply shared-record-typmod-registry-v1.patch.
3.  Apply rip-out-tqueue-remapping-v1.patch.

[1] 
https://www.postgresql.org/message-id/CAEepm%3D2%2Bzf7L_-eZ5hPW5%3DUS%2Butdo%3D9tMVD4wt7ZSM-uOoSxWg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CA+TgmoZMH6mJyXX=ylsovj8julfqggxwzcr_rbkc1nj+177...@mail.gmail.com
[3] 
https://www.postgresql.org/message-id/flat/CAEepm%3D3d8o8XdVwYT6O%3DbHKsKAM2pu2D6sV1S_%3D4d%2BjStVCE7w%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


rip-out-tqueue-remapping-v1.patch
Description: Binary data


shared-record-typmod-registry-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM authentication, take three

2017-04-06 Thread Noah Misch
On Thu, Apr 06, 2017 at 09:46:29PM +0300, Heikki Linnakangas wrote:
> On 04/06/2017 08:36 AM, Noah Misch wrote:
> >On Tue, Mar 07, 2017 at 02:36:13PM +0200, Heikki Linnakangas wrote:
> >>I didn't include the last-minute changes to the way you specify this in
> >>pg_hba.conf. So it's still just "scram". I agree in general that we should
> >>think about how to extend that too, but I think the proposed syntax was
> >>overly verbose for what we actually support right now. Let's discuss that as
> >>a separate thread, as well.
> >
> >[Action required within three days.  This is a generic notification.]
> >
> >The above-described topic is currently a PostgreSQL 10 open item.
> 
> I don't think we will come up with anything better than what we have now, so
> I have removed this from the open items list.

Michael shared[1] better pg_hba.conf syntax on 2016-11-05.  I agreed[2] with
his framing of the problem and provided two syntax alternatives, on
2017-01-18.  Michael implemented[3] a variation of one of those on 2017-02-20,
which you declined in your 2017-03-07 commit with just the explanation quoted
above.  I say Michael came up with something better five months ago.
Reserving, as HEAD does today, keyword "scram" to mean "type of SCRAM we
introduced first" will look ugly in 2027.  Cryptographic hash functions have a
short shelf life compared to PostgreSQL.

nm

[1] 
https://www.postgresql.org/message-id/CAB7nPqS99Z31f7jhoYYMoBDbuZSQRpn+HQzByA=ewfmdywc...@mail.gmail.com
[2] https://www.postgresql.org/message-id/20170118052356.GA5952@gust
[3] 
https://www.postgresql.org/message-id/cab7npqsalxkoohbk3ugbf+kfq4pqgtgjk_os68f3nkxghdo...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2017-04-06 Thread Haribabu Kommi
On Thu, Apr 6, 2017 at 5:17 AM, Andres Freund  wrote:

> Hi,
>
>
> I'm somewhat inclined to think that this really would be better done in
> an extension like pg_stat_statements.
>

Thanks for the review.



>
> > +}
> > +
> > +/*
> > + * Count SQL statement for pg_stat_sql view
> > + */
> > +void
> > +pgstat_count_sqlstmt(const char *commandTag)
> > +{
> > + PgStat_SqlstmtEntry *htabent;
> > + boolfound;
> > +
> > + if (!pgstat_backend_sql)
> > + {
> > + /* First time through - initialize SQL statement stat
> table */
> > + HASHCTL hash_ctl;
> > +
> > + memset(_ctl, 0, sizeof(hash_ctl));
> > + hash_ctl.keysize = NAMEDATALEN;
> > + hash_ctl.entrysize = sizeof(PgStat_SqlstmtEntry);
> > + pgstat_backend_sql = hash_create("SQL statement stat
> entries",
> > +
>   PGSTAT_SQLSTMT_HASH_SIZE,
> > +
>   _ctl,
> > +
>   HASH_ELEM | HASH_BLOBS);
> > + }
> > +
> > + /* Get the stats entry for this SQL statement, create if necessary
> */
> > + htabent = hash_search(pgstat_backend_sql, commandTag,
> > +   HASH_ENTER, );
> > + if (!found)
> > + htabent->count = 1;
> > + else
> > + htabent->count++;
> > +}
>
>
> That's not really something in this patch, but a lot of this would be
> better if we didn't internally have command tags as strings, but as an
> enum.  We should really only convert to a string with needed.  That
> we're doing string comparisons on Portal->commandTag is just plain bad.
>
> If so, we could also make this whole patch a lot cheaper - have a fixed
> size array that has an entry for every possible tag (possibly even in
> shared memory, and then use atomics there).
>

During the development, I thought of using an array of all command tags
and update the counters using the tag name, but not like the enum values.
Now I have to create a mapping array with enums and tag names for easier
counter updates. The problem in this approach is, whenever any addition
of new commands, this mapping array needs to be updated with both
new enum and new tag name, whereas with hash table approach, it works
future command additions also, but there is some performance overhead
compared to the array approach.

I will modify the patch to use the array approach and provide it to the next
commitfest.



> > +++ b/src/backend/tcop/postgres.c
> > @@ -1109,6 +1109,12 @@ exec_simple_query(const char *query_string)
> >
> >   PortalDrop(portal, false);
> >
> > + /*
> > +  * Count SQL statement for pg_stat_sql view
> > +  */
> > + if (pgstat_track_sql)
> > + pgstat_count_sqlstmt(commandTag);
> > +
>
> Isn't doing this at the message level quite wrong?  What about
> statements executed from functions and such?  Shouldn't this integrate
> at the executor level instead?
>

Currently the patch is designed to find out only the user executed
statements
that are successfully finished, (no internal statements that are executed
from
functions and etc).

The same question was asked by dilip also in earlier mails, may be now it is
the time that we can decide the approach of statement counts.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Masahiko Sawada
On Fri, Apr 7, 2017 at 1:23 PM, Kyotaro HORIGUCHI
 wrote:
> At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek 
>  wrote in 
> <8c7afb37-be73-c6bd-80bc-e87522f04...@2ndquadrant.com>
>> On 06/04/17 16:44, Masahiko Sawada wrote:
>> > On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
>> >  wrote:
>> >>> I prefer subscription option than GUC. Something like following.
>> >>>
>> >>> CREATE SUBSCRIPTION s1 CONNECTION 'blah'
>> >>>PUBLICATION p1 WITH (noreconnect = true);
>> >>>
>> >>> Stored in pg_subscription?
>>
>> I don't think that's a very good solution, you'd lose replication on
>> every network glitch, upstream server restart, etc.
>
> Yes, you're right. This would work if apply worker distinguishes
> permanent error. But it is overkill so far.
>
>> > I've added this as an open item, and sent a patch for this.
>> >
>>
>> I am not exactly sure what's the open item from this thread. To use the
>> wal_retrieve_interval to limit table sync restarts?
>
> It's not me. I also don't think this critical.
>

Thank you for the comment.

It's not critical but it could be problem. So I thought we should fix
it before the PostgreSQL 10 release. If it's not appropriate as an
open item I'll remove it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign tables don't enforce the partition constraint

2017-04-06 Thread Amit Langote
On 2017/04/03 16:44, Amit Langote wrote:
> Hi Ashutosh,
> 
> On 2017/04/03 15:49, Ashutosh Bapat wrote:
 Similarly, a partition constraint
 should also be enforced at the foreign server. Probably we should
 update documentation of create foreign table to mention this.
>>>
>>> That is a good idea.
>>>
>>> Here's the patch.
> 
> Thanks for creating the patch.
> 
> +Constraints and partition bounds on foreign tables (such as
> 
> We use "partition constraint" instead of "partition bounds" to mean the
> implicit constraint of a partition (there are a few instances of that in
> the documentation).  So, perhaps this could be written as: Constraints
> (both the user-defined constraints such as CHECK
> or NOT NULL clauses and the partition constraint) are not
> enforced by the core PostgreSQL system, ...
> 
> And once we've mentioned that a constraint means one of these things, we
> need not repeat "partition bounds/constraints" in the subsequent
> paragraphs.  If you agree, attached is the updated patch.

Since it seems that we agree that this documentation tweak is good idea, I
will add this to the open items list to avoid it being missed.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Amit Langote
On 2017/04/07 12:16, Ashutosh Bapat wrote:
> On Fri, Apr 7, 2017 at 7:28 AM, Amit Langote
>  wrote:
>>
>> And I see that just in 3f902354b08 lets the partition tuple-routing code
>> keep utilizing that optimization.
> 
> I am not able to find this commit
> fatal: ambiguous argument '3f902354b08': unknown revision or path not
> in the working tree.
> Use '--' to separate paths from revisions

Sorry I probably wasn't clear.  3f902354b08 is what Tom committed earlier
today.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash tables in dynamic shared memory

2017-04-06 Thread Thomas Munro
On Mon, Dec 19, 2016 at 12:33 PM, Thomas Munro
 wrote:
> Since Dilip Kumar's Parallel Bitmap Heap Scan project is no longer
> using this, I think I should park it here unless/until another
> potential use case pops up.  Some interesting candidates have been
> mentioned already, and I'm fairly sure there are other uses too, but I
> don't expect anyone to be interested in committing this patch until
> something concrete shows up, so I'll go work on other things until
> then.

Here is a rebased version.  Changes since v2:

1.  Out-of-memory conditions now raise errors (following
dsa_allocate's change in behaviour to match palloc).
2.  Moved into 'lib' where other reusable data structures live.

There are still a couple of things I'd like to adjust in this code
(including feedback from John and improvements to the iterator code
which is overcomplicated) but I'm posting it today as infrastructure
for another patch.

-- 
Thomas Munro
http://www.enterprisedb.com


dht-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-06 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-07 13:07:59 +0900, Michael Paquier wrote:
>> On Fri, Apr 7, 2017 at 1:01 PM, Tom Lane  wrote:
>>> Still, it's not very clear why we need to cater for building just libpq
>>> rather than the whole distribution, and a user of win32.mak presumably
>>> has the option to do the latter.

>> Indeed. Those recent reports indicate that removing win32.c would be a
>> bad move.

> For me they indicate the contrary, that we're currently not properly
> maintaining it so that longstanding errors crop up.

Yeah.  For win32.mak, the key question is whether there is still anybody
who'd have an insurmountable problem with building the whole distro via
src/tools/msvc/ rather than just building libpq with win32.mak.  Given our
lack of infrastructure for testing win32.mak, continuing support for it
seems like quite an expensive proposition from the developer-time
standpoint.  I don't really want to do that if it's only going to save
somebody an occasional few minutes of build time.

bcc32.mak is in a different category because it's basically the only
solution if you want to build libpq in Borland C.  But the lack of
user input suggests that maybe nobody cares about that anymore.

Borland C, per se, has been dead since the 90s according to wikipedia.
There are successor products with different names, none of which I can
recall anybody ever mentioning on the PG lists.  I speculate that
people are taking libpq.dll built with MSVC and using it in those
products, if they're using them with PG at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-06 Thread Pavan Deolasee
On Thu, Apr 6, 2017 at 12:20 AM, Peter Geoghegan  wrote:

> On Wed, Apr 5, 2017 at 11:27 AM, Andres Freund  wrote:
> > I propose we move this patch to the next CF.
>
> I agree. I think it's too late to be working out fine details around
> TOAST like this. This is a patch that touches the storage format in a
> fairly fundamental way.
>
> The idea of turning WARM on or off reminds me a little bit of the way
> it was at one time suggested that HOT not be used against catalog
> tables, a position that Tom pushed against.


I agree. I am grateful that Tom put his put down and helped me find answers
to all hard problems, including catalog tables and create index
concurrently. So I was very clear in my mind from the very beginning that
WARM must support all these things too. Obviously it still doesn't support
everything like other index methods and expression indexes, but IMHO that's
a much smaller problem. Also, making sure that WARM works on system tables
helped me find any corner bugs which would have otherwise skipped via
regular regression testing.



> I'm not saying that it's
> necessarily a bad idea, but we should exhaust alternatives, and have a
> clear rationale for it.
>

One reason why it's probably a good idea is because we know WARM will not
effective for all use cases and it might actually cause performance
regression for some of them. Even worse and as Robert fears, it might cause
data loss issues. Though TBH I haven't yet seen any concrete example where
it breaks so badly that it causes data loss, but that may be because the
patch still hasn't received enough eye balls or outside tests. Having table
level option would allow us to incrementally improve things instead of
making the initial patch so large that reviewing it is a complete
nightmare. May be it's already a nightmare.

It's not as if HOT would not have caused regression for some specific use
cases. But I think the general benefit was so strong that we never invested
time in finding and tuning for those specific cases, thus avoided some more
complexity to the code. WARM's benefits are probably not the same as HOT or
our standards may have changed or we probably have resources to do much
more elaborate tests, which were missing 10 years back. But now that we are
aware of some regressions, the choice is between spending considerable
amount of time trying to handle every case vs doing it incrementally and
start delivering to majority of the users, yet keeping the patch at a
manageable level.

Even if we were to provide table level option, my preference would be keep
it ON by default.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Jim Nasby

On 4/6/17 9:21 PM, Andres Freund wrote:

Personally I'm way more excited about what a SPI feature like this
could do for plpgsql than about what it can do for plpython.  If the
latter is what floats your boat, that's fine; but I want a feature
that we can build on for other uses, not a hack that we know we need
to redesign next month.


Yeah, I thought about plpgsql and I can't see any way to make that work 
through an SPI callback (perhaps just due to my ignorance on things C). 
I suspect what plpgsql actually wants is a way to tell SPI to start the 
executor up, a function that pulls individual tuples out of the 
executor, and then a function to shut the executor down.



Dislike of the proposed implementation, alternative proposals, and the
refutation of the "absolutely no way to do more without breaking plpy"
argument leads to me to conclude that this should be returned with
feedback.


Agreed.
--
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG http://OpenSCG.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-06 Thread Kyotaro HORIGUCHI
At Thu, 6 Apr 2017 18:42:37 +0200, Petr Jelinek  
wrote in <8c7afb37-be73-c6bd-80bc-e87522f04...@2ndquadrant.com>
> On 06/04/17 16:44, Masahiko Sawada wrote:
> > On Thu, Apr 6, 2017 at 9:06 PM, Kyotaro HORIGUCHI
> >  wrote:
> >>> I prefer subscription option than GUC. Something like following.
> >>>
> >>> CREATE SUBSCRIPTION s1 CONNECTION 'blah'
> >>>PUBLICATION p1 WITH (noreconnect = true);
> >>>
> >>> Stored in pg_subscription?
> 
> I don't think that's a very good solution, you'd lose replication on
> every network glitch, upstream server restart, etc.

Yes, you're right. This would work if apply worker distinguishes
permanent error. But it is overkill so far.

> > I've added this as an open item, and sent a patch for this.
> > 
> 
> I am not exactly sure what's the open item from this thread. To use the
> wal_retrieve_interval to limit table sync restarts?

It's not me. I also don't think this critical.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Andres Freund
On 2017-04-07 00:11:59 -0400, Tom Lane wrote:
> Jim Nasby  writes:
> > On 4/6/17 8:13 PM, Tom Lane wrote:
> >> Given Peter's objections, I don't think this is getting into v10 anyway,
> >> so we might as well take a bit more time and do it right.
> 
> > Well, Peter's objection is that we're not going far enough in plpython, 
> > but there's absolutely no way to do more without breaking plpy, which 
> > seems a non-starter. We should certainly be able to expand the existing 
> > API to provide even more benefit, but I see no reason to leave the 
> > performance gain this patch provides on the floor just because there's 
> > more to be had with a different API.
> 
> Personally I'm way more excited about what a SPI feature like this
> could do for plpgsql than about what it can do for plpython.  If the
> latter is what floats your boat, that's fine; but I want a feature
> that we can build on for other uses, not a hack that we know we need
> to redesign next month.

Dislike of the proposed implementation, alternative proposals, and the
refutation of the "absolutely no way to do more without breaking plpy"
argument leads to me to conclude that this should be returned with
feedback.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor code improvement to postgresGetForeignPlan

2017-04-06 Thread Tatsuro Yamada
Hi,

The declaration of postgresGetForeignPlan uses baserel, but
the actual definition uses foreignrel. It would be better to sync.

Please find attached a patch.

Tatsuro Yamada
NTT Open Source Software Center
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2851869..54edf4d 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -280,7 +280,7 @@ static void postgresGetForeignPaths(PlannerInfo *root,
 		RelOptInfo *baserel,
 		Oid foreigntableid);
 static ForeignScan *postgresGetForeignPlan(PlannerInfo *root,
-	   RelOptInfo *baserel,
+	   RelOptInfo *foreignrel,
 	   Oid foreigntableid,
 	   ForeignPath *best_path,
 	   List *tlist,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Jim Nasby
On Apr 6, 2017, at 9:10 PM, Andres Freund  wrote:
> 
>>> Why?  We could very well return a somewhat "smarter" object. Returning
>>> rows row-by-row if accessed via iterator, materializes when accessed via
>>> row offset.
>> 
>> I completely agree with that. What I don't understand is the objection to
>> speeding up the old access method. Or are you thinking we'd just abandon the
>> old method?
> 
> What I'm saying is that we can do that transparently, with the current
> API.  And there's no need to materialize anything in plpython, we can
> transparently use the SPI materialized version.

Oh, just switching from a list to an iterator. Ok, I finally get it.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-06 Thread Andres Freund
On 2017-04-07 13:07:59 +0900, Michael Paquier wrote:
> On Fri, Apr 7, 2017 at 1:01 PM, Tom Lane  wrote:
> > Still, it's not very clear why we need to cater for building just libpq
> > rather than the whole distribution, and a user of win32.mak presumably
> > has the option to do the latter.  The core argument for bcc32.mak,
> > I think, is that we never did support building the server with Borland C
> > ... but there's no evidence that people are still building libpq with it
> > either.
> 
> Indeed. Those recent reports indicate that removing win32.c would be a
> bad move.

For me they indicate the contrary, that we're currently not properly
maintaining it so that longstanding errors crop up.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Tom Lane
Jim Nasby  writes:
> On 4/6/17 8:13 PM, Tom Lane wrote:
>> Given Peter's objections, I don't think this is getting into v10 anyway,
>> so we might as well take a bit more time and do it right.

> Well, Peter's objection is that we're not going far enough in plpython, 
> but there's absolutely no way to do more without breaking plpy, which 
> seems a non-starter. We should certainly be able to expand the existing 
> API to provide even more benefit, but I see no reason to leave the 
> performance gain this patch provides on the floor just because there's 
> more to be had with a different API.

Personally I'm way more excited about what a SPI feature like this
could do for plpgsql than about what it can do for plpython.  If the
latter is what floats your boat, that's fine; but I want a feature
that we can build on for other uses, not a hack that we know we need
to redesign next month.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Andres Freund
On 2017-04-06 21:06:59 -0700, Jim Nasby wrote:
> On 4/6/17 9:04 PM, Andres Freund wrote:
> > On 2017-04-06 09:14:43 -0700, Jim Nasby wrote:
> > > On 4/6/17 9:04 AM, Peter Eisentraut wrote:
> > > > On 4/6/17 03:50, Craig Ringer wrote:
> > > > > But otherwise, pending docs changes, I think it's ready for committer.
> > > > 
> > > > My opinion is still that this is ultimately the wrong approach.  The
> > > > right fix for performance issues in PL/Python is to change PL/Python not
> > > > to materialize the list of tuples.  Now with this change we would be
> > > > moving from two result materializations to one, but I think we are
> > > > keeping the wrong one.
> > > 
> > > That's an option for future improvement, but I see no way to accomplish 
> > > that
> > > without completely breaking plpy.
> > 
> > Why?  We could very well return a somewhat "smarter" object. Returning
> > rows row-by-row if accessed via iterator, materializes when accessed via
> > row offset.
> 
> I completely agree with that. What I don't understand is the objection to
> speeding up the old access method. Or are you thinking we'd just abandon the
> old method?

What I'm saying is that we can do that transparently, with the current
API.  And there's no need to materialize anything in plpython, we can
transparently use the SPI materialized version.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-06 Thread Michael Paquier
On Fri, Apr 7, 2017 at 1:01 PM, Tom Lane  wrote:
> Still, it's not very clear why we need to cater for building just libpq
> rather than the whole distribution, and a user of win32.mak presumably
> has the option to do the latter.  The core argument for bcc32.mak,
> I think, is that we never did support building the server with Borland C
> ... but there's no evidence that people are still building libpq with it
> either.

Indeed. Those recent reports indicate that removing win32.c would be a
bad move. So what about nuking bcc32.mak but updating win32.mak?
-- 
Michael
VMware vCenter Server
www.vmware.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Jim Nasby

On 4/6/17 9:04 PM, Andres Freund wrote:

On 2017-04-06 09:14:43 -0700, Jim Nasby wrote:

On 4/6/17 9:04 AM, Peter Eisentraut wrote:

On 4/6/17 03:50, Craig Ringer wrote:

But otherwise, pending docs changes, I think it's ready for committer.


My opinion is still that this is ultimately the wrong approach.  The
right fix for performance issues in PL/Python is to change PL/Python not
to materialize the list of tuples.  Now with this change we would be
moving from two result materializations to one, but I think we are
keeping the wrong one.


That's an option for future improvement, but I see no way to accomplish that
without completely breaking plpy.


Why?  We could very well return a somewhat "smarter" object. Returning
rows row-by-row if accessed via iterator, materializes when accessed via
row offset.


I completely agree with that. What I don't understand is the objection 
to speeding up the old access method. Or are you thinking we'd just 
abandon the old method?

--
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG http://OpenSCG.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-06 Thread Andres Freund
On 2017-04-07 00:01:01 -0400, Tom Lane wrote:
> Still, it's not very clear why we need to cater for building just libpq
> rather than the whole distribution, and a user of win32.mak presumably
> has the option to do the latter.

I think the raison d'etre for that infrastructure primarily comes from
the times where windows wasn't supported for server builds?

+1 for yanking this out.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Andres Freund
On 2017-04-06 09:14:43 -0700, Jim Nasby wrote:
> On 4/6/17 9:04 AM, Peter Eisentraut wrote:
> > On 4/6/17 03:50, Craig Ringer wrote:
> > > But otherwise, pending docs changes, I think it's ready for committer.
> > 
> > My opinion is still that this is ultimately the wrong approach.  The
> > right fix for performance issues in PL/Python is to change PL/Python not
> > to materialize the list of tuples.  Now with this change we would be
> > moving from two result materializations to one, but I think we are
> > keeping the wrong one.
> 
> That's an option for future improvement, but I see no way to accomplish that
> without completely breaking plpy.

Why?  We could very well return a somewhat "smarter" object. Returning
rows row-by-row if accessed via iterator, materializes when accessed via
row offset.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-06 Thread Tom Lane
I wrote:
> Michael Paquier  writes:
>> While looking at some SCRAM stuff, I have bumped into bcc32.mak and
>> win32.mak in src/interfaces/libpq. To put it short: those files are
>> not up to date. The code of SCRAM is in the tree for a bit of time
>> now, and should have updated those files to list and clean up objects,
>> but nobody has reported failures in using them.

>> At the minimum, they should be updated. Or perhaps they could just be
>> ripped out? Who uses that on Windows now?

> I'm quite sure no developers use them, or have done so for years.

A bit of digging in the git logs says that the last patch that clearly
resulted from user interest in Borland C was ce53791b2 in April 2009.
bcc32.mak has been touched in passing for various other changes since
then, but I'd say the odds that it actually still works are pretty small,
even before this issue.

win32.mak has considerably more recent interest, eg cd9b4f24c.
So we should consider the two cases separately.

Still, it's not very clear why we need to cater for building just libpq
rather than the whole distribution, and a user of win32.mak presumably
has the option to do the latter.  The core argument for bcc32.mak,
I think, is that we never did support building the server with Borland C
... but there's no evidence that people are still building libpq with it
either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Jim Nasby

On 4/6/17 8:13 PM, Tom Lane wrote:

It's on the pointy end for Pg10, and I thought we'd be fine to include
this in pg10 then aim to clean up DestReceiver in early pg11, or even
as a post-feature-freeze refactoring fixup in pg10. Should the
callback approach be blocked because the API it has to use is a bit
ugly?

Given Peter's objections, I don't think this is getting into v10 anyway,
so we might as well take a bit more time and do it right.


Well, Peter's objection is that we're not going far enough in plpython, 
but there's absolutely no way to do more without breaking plpy, which 
seems a non-starter. We should certainly be able to expand the existing 
API to provide even more benefit, but I see no reason to leave the 
performance gain this patch provides on the floor just because there's 
more to be had with a different API.



Also, I'm entirely -1 on "post-feature-freeze refactoring fixups".
We're going to have more than enough to do trying to stabilize the
existing committed code, I fear (cf Robert's pessimistic summary of
the open-items list, a couple days ago).  We don't need to be
planning on doing new design post-freeze, whether it's painted as
mere refactoring or not.


Agreed, and I agree that the current patch is a bit of a hack when it 
comes to DestReceiver (or really, DestReceiver has become an ugly wart 
over the years, as you pointed out).


I'll plan to pick this up again once the dust settles on this commitfest.
--
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG http://OpenSCG.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Tom Lane
Ashutosh Bapat  writes:
> While I agree that we can remove indesc->tdtypeid ==
> outdesc->tdtypeid, I am not sure whether it should be replaced by
> !indesc->tdhasoid && !outdesc->tdhasoid.

No, that was overly conservative; the correct test is that the tdhasoid
settings must be equal.  Reading Robert's commit message for 3838074f8
and mine for 3f902354b might clarify this.

> If that's required, it seems
> to be a bug that needs to be fixed in earlier braches as well.

It's not a bug in older branches, because the tdtypeid comparison
was enough to guarantee the same tdhasoid values.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-06 Thread Tom Lane
Michael Paquier  writes:
> While looking at some SCRAM stuff, I have bumped into bcc32.mak and
> win32.mak in src/interfaces/libpq. To put it short: those files are
> not up to date. The code of SCRAM is in the tree for a bit of time
> now, and should have updated those files to list and clean up objects,
> but nobody has reported failures in using them.

> At the minimum, they should be updated. Or perhaps they could just be
> ripped out? Who uses that on Windows now?

I'm quite sure no developers use them, or have done so for years.
The ostensible point is to support people who are trying to build
just libpq on Windows, for use in applications talking to PG servers
elsewhere.  It was reasonable that some users would want to
do that back when we didn't have a credible native port of the server,
but it's been quite some time since that was true.

In short, I think we could rip them out without much push-back.
But maybe I'm missing some remaining use-case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Ashutosh Bapat
On Fri, Apr 7, 2017 at 5:06 AM, Tom Lane  wrote:
> So I now think it's okay to remove consideration of matching the target
> rowtype OID from the tupconvert.c functions, although we have to realize
> that that is effectively an API change for them, one which has a definite
> potential for biting third-party callers.

While I agree that we can remove indesc->tdtypeid ==
outdesc->tdtypeid, I am not sure whether it should be replaced by
!indesc->tdhasoid && !outdesc->tdhasoid. If that's required, it seems
to be a bug that needs to be fixed in earlier braches as well.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Ashutosh Bapat
On Fri, Apr 7, 2017 at 7:35 AM, Tom Lane  wrote:
> Amit Langote  writes:
>> ... One of
>> the earlier versions of that patch introduced a consider_typeid parameter
>> for which only ExecEvalConvertRowtype() passed true.
>
> Yeah, I was thinking of adding a flag along that line to fix this, but
> desisted after figuring out that ExecEvalConvertRowtype was the only
> candidate for using it.  Ashutosh's patch had already shown that it'd
> be better to pass "false" there too, so we'd end up with no use cases
> at all.

Probably we should also add an assertion there to make sure
ExecEvalConvertRowtype never gets same input and output types. If
that's the case, we don't need the copy as well.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Ashutosh Bapat
On Fri, Apr 7, 2017 at 7:28 AM, Amit Langote
 wrote:
>
> And I see that just in 3f902354b08 lets the partition tuple-routing code
> keep utilizing that optimization.

I am not able to find this commit
fatal: ambiguous argument '3f902354b08': unknown revision or path not
in the working tree.
Use '--' to separate paths from revisions


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Tom Lane
Craig Ringer  writes:
> On 7 April 2017 at 00:54, Tom Lane  wrote:
>> ... External callers will only be
>> interested in the result of the canSetTag subquery.

> I wasn't aware that such queries could ever return a result set, though.

Possibly not, but the point is that they should be invisible to the
caller.  As the patch is set up, I think they'd look like empty result
sets instead, because the passed-in DestReceiver is used for them.
What we want is to use DestNone for non-canSetTag queries, and either
DestSPI or the caller's receiver for the canSetTag query.

> Personally I think this patch should be allowed to bypass
> CreateDestReceiver and create its own struct. I don't really see that
> it should be required to clean up the whole API first.

Well, if you bypass CreateDestReceiver then the question is what you're
putting in mydest and whether anything will get confused by that.  The
core problem with the existing API is that there's no provision for
adding new kinds of DestReceivers without a corresponding addition to
the CommandDest enum.  I think we really need some non-kluge solution
to that before moving forward.

> It's on the pointy end for Pg10, and I thought we'd be fine to include
> this in pg10 then aim to clean up DestReceiver in early pg11, or even
> as a post-feature-freeze refactoring fixup in pg10. Should the
> callback approach be blocked because the API it has to use is a bit
> ugly?

Given Peter's objections, I don't think this is getting into v10 anyway,
so we might as well take a bit more time and do it right.

Also, I'm entirely -1 on "post-feature-freeze refactoring fixups".
We're going to have more than enough to do trying to stabilize the
existing committed code, I fear (cf Robert's pessimistic summary of
the open-items list, a couple days ago).  We don't need to be
planning on doing new design post-freeze, whether it's painted as
mere refactoring or not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Ashutosh Bapat
On Thu, Apr 6, 2017 at 8:51 PM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> In ExecEvalConvertRowtype(), if the input row doesn't require any
>> conversion, we simply return that row as is.
>
> Huh.  That's been like that for a very long time.
>
>> I tried to create a testcase where this assertion would fail without
>> multi-level partitioned table, but could not construct one.
>
> You just need nested no-op ConvertRowtypeExprs, which is easily done with
> multiple levels of inheritance:
>
> regression=# create table pp (f1 int, f2 text);
> CREATE TABLE
> regression=# create table cc() inherits (pp);
> CREATE TABLE
> regression=# create table gc() inherits (cc);
> CREATE TABLE
> regression=# insert into gc values(11,'foo');
> INSERT 0 1
> regression=# select (gc.*)::cc from gc;
> gc
> --
>  (11,foo)
> (1 row)
>
> regression=# select (gc.*)::cc::pp from gc;
> server closed the connection unexpectedly

Oh, I tried multi-level inheritance, but always tried to select on the
topmost parent. Obviously that didn't work since we flatten
inheritance in planner. I tried to cast rows of one table to the type
of another table with the same definition. We don't allow such
coercion. I missed
if (typeInheritsFrom(inputTypeId, targetTypeId)
|| typeIsOfTypedTable(inputTypeId, targetTypeId))
in coerce_type().

>
> and in the log I've got
>
> TRAP: FailedAssertion("!(( (tuple)->t_choice.t_datum.datum_typeid ) == 
> indesc->tdtypeid || ( (tuple)->t_choice.t_datum.datum_typeid ) == 2249)", 
> File: "execExprInterp.c", Line: 2824)
>
> Now the question is whether we should go to the trouble of making a tuple
> copy just to inject the parent's rowtype.  If the only reason to do so is
> to satisfy ExecEvalConvertRowtype's own assertion, it seems like we might
> be better advised just to drop the assertion.  On the other hand it seems
> like a good general principle that a tuple datum ought to be advertising
> a rowtype OID that matches what the expression tree says it should be.

Yes, I too came to the same conclusion.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Compilation warning with MSVC in pg_depend.c

2017-04-06 Thread Michael Paquier
Hi all,

I am getting the following warning with MSVC 2010 for what is a result
of 3217327 (identity columns):
  c:\users\michael\git\postgres\src\backend\catalog\pg_depend.c(615):
warning C4715: 'getOwnedSequence' : not all control paths return a
value [C:\Users\michael\git\postgres\postgres.vcxproj]

I agree that this compiler is dumb after looking at the code, but
could it be possible to silence this warning with something like the
attached?

Thanks,
-- 
Michael
VMware vCenter Server
www.vmware.com


seq-msvc-warning.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2017-04-06 Thread Tom Lane
Peter Eisentraut  writes:
> I think one problem is that diff -u is not as portable as diff -c.  For
> example, the HP-UX 11 man page of diff doesn't list it.

FWIW, I can confirm that HPUX 10.20's diff hasn't got it.  That would
not affect gaur/pademelon, if we make this change, because I installed
GNU diffutils on that machine a decade or two ago.  It might be a bigger
issue for the other HPUX critters though.

Some other data points:

* POSIX 2008 requires diff -u.
* SUS v2 (POSIX 1997) does not.
* My other pet dinosaur, prairiedog (macOS 10.4 something), has it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Undefined psql variables

2017-04-06 Thread Kyotaro HORIGUCHI
At Thu, 6 Apr 2017 19:21:21 -0400, Corey Huinker  
wrote in 

[HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-06 Thread Michael Paquier
Hi all,

While looking at some SCRAM stuff, I have bumped into bcc32.mak and
win32.mak in src/interfaces/libpq. To put it short: those files are
not up to date. The code of SCRAM is in the tree for a bit of time
now, and should have updated those files to list and clean up objects,
but nobody has reported failures in using them.

At the minimum, they should be updated. Or perhaps they could just be
ripped out? Who uses that on Windows now?
Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: recursive json_populate_record()

2017-04-06 Thread Andrew Dunstan


On 04/05/2017 07:24 PM, Andrew Dunstan wrote:
>
> OK, I have been through this and I think it's basically committable. I
> am currently doing some cleanup, such as putting all the typedefs in one
> place, fixing redundant comments and adding more comments, declaring all
> the static functions, and minor code changes, but nothing major. I
> should be ready to commit tomorrow some time.
>


I have committed this. I will continue to work on adding comments.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication apply to run with sync commit off by default

2017-04-06 Thread Craig Ringer
On 7 April 2017 at 01:38, Peter Eisentraut
 wrote:

> I think this change could use some more documentation.  Under what
> circumstances would a user want to turn this back on?

The main reason is if you want to use synchronous_standby_names and
synchronous commit on the upstream. Turning sync appy back on for
logical replicas will cause them to send more timely standby status
updates, so commits on the upstream can return faster.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256

2017-04-06 Thread Michael Paquier
On Fri, Apr 7, 2017 at 2:47 AM, Heikki Linnakangas  wrote:
> On 04/06/2017 08:42 PM, Heikki Linnakangas wrote:
>>> There is for example this portion in the new tables:
>>> +static const Codepoint prohibited_output_chars[] =
>>> +{
>>> +   0xD800, 0xF8FF, /* C.3, C.5 */
>>>
>>>- Start Table C.5 -
>>>D800-DFFF; [SURROGATE CODES]
>>>- End Table C.5 -
>>> This indicates a range of values. Wouldn't it be better to split this
>>> table in two, one for the range of codepoints and another one with the
>>> single entries?
>>
>> I considered that, but there are relatively few singular codepoints in
>> the tables, so it wouldn't save much space. In this patch, singular
>> codepoints are represented by a range like "0x3000, 0x3000".

I am really wondering if this should not reflect the real range
reported by the RFC. I understand that you have grouped things to save
a couple of bytes, but that would protect from any updates of the
codepoints within those ranges (unlikely to happen I agree).

>>> +   0x1D173, 0x1D17A,   /* C.2.2 */
>>> This is for musical symbols. It seems to me that checking for a range
>>> is what is intended.
>>
>> Can you elaborate?
>
> Oh, I think I understand the confusion now. All the arrays represent
> codepoint ranges, not singular codepoints. I renamed them to "*_ranges", to
> make that more clear.

Thanks! Yes I got confused by the name.

+/* Is the given Unicode codepoint in the given table? */
+#define IS_CODE_IN_TABLE(code, map) is_code_in_table(code, map, lengthof(map))
[...]
+static bool
+is_code_in_table(pg_wchar code, const pg_wchar *map, int mapsize)
+{
+   Assert(mapsize % 2 == 0);
+
+   if (code < map[0] || code > map[mapsize - 1])
+   return false;
+
+   if (bsearch(, map, mapsize / 2, sizeof(pg_wchar) * 2,
+   codepoint_range_cmp))
+   return true;
+   else
+   return false;
+}
Those could be renamed to range_table as well to avoid more confusion.

> The SASLprep implementation depends on the UTF-8 functions from
> src/backend/utils/mb/wchar.c. So to use it, you must also compile and link
> that. That doesn't change anything for the current users of these
> functions, the backend and libpq, as they both already link with wchar.o.
> It would be good to move those functions into a separate file in
> src/commmon, but I'll leave that for another day.

Fine for me.

You may want to add a .gitignore in src/common/unicode for norm_test
and norm_test_table.h.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-06 Thread Craig Ringer
On 7 April 2017 at 00:54, Tom Lane  wrote:

> I can certainly get on board with the idea of letting a SPI caller provide
> a DestReceiver instead of accumulating the query results into a
> SPITupleTable, but the way it was done here seems quite bizarre.  I think
> for instance you've mishandled non-canSetTag queries; those should have
> any results discarded, full stop.  External callers will only be
> interested in the result of the canSetTag subquery.

That's something I didn't even know was a thing to look for. Thanks
for spotting it.

For other readers who may also be confused, this refers to internally
generated queries that should not be visible to the caller. We use
these in things like CREATE SCHEMA, ALTER TABLE, in FKs, etc.

I wasn't aware that such queries could ever return a result set, though.

> I don't much like DestSPICallback either.  We may well need some better
> way for extension code to create a custom DestReceiver that does something
> out of the ordinary with query result tuples.  But if so, it should not be
> tied to SPI, not even just by name.
>
> I think that 0001/0002 need to be refactored as (A) a change to make
> DestReceiver creation more flexible, and then (B) a change to SPI to allow
> a caller to pass in the receiver to use.

That's exactly what I tried to avoid suggesting upthread, since it'd
be quite much more invasive than the current patch, though definitely
a desirable cleanup.

Personally I think this patch should be allowed to bypass
CreateDestReceiver and create its own struct. I don't really see that
it should be required to clean up the whole API first.

It's on the pointy end for Pg10, and I thought we'd be fine to include
this in pg10 then aim to clean up DestReceiver in early pg11, or even
as a post-feature-freeze refactoring fixup in pg10. Should the
callback approach be blocked because the API it has to use is a bit
ugly?

> After poking around a bit, it seems like we've allowed the original
> notion of CommandDest to get way out of hand.  The only places where
> CreateDestReceiver gets called with a non-constant argument (that is,
> where the caller doesn't know perfectly well which kind of DestReceiver
> it wants) are two calls in postgres.c, and both of those are ultimately
> passing whereToSendOutput, which has got only a really limited set of
> possible values.  I am thinking that we should cut down enum CommandDest
> to be just
>
> DestNone,/* results are discarded */
> DestDebug,   /* results go to debugging output */
> DestRemote,  /* results sent to frontend process */
> DestOther/* something else */
>
> and change CreateDestReceiver so it throws an error for DestOther; the
> way to create any type of receiver other than these few is to call the
> underlying creation function directly, rather than going through
> CreateDestReceiver.
>
> Having done that, the means for creating a custom receiver is just to
> set up an appropriate struct that embeds struct _DestReceiver and
> always has mydest = DestOther (or maybe we should just lose the mydest
> field altogether).  We could document that a bit better, but it's really
> not complicated.

I strongly agree that this is the way DestReceiver creation should be
refactored.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2017-04-06 Thread Noah Misch
On Thu, Apr 06, 2017 at 07:01:32PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I personally, and I know of a bunch of other regular contributors, find
> > context diffs very hard to read.  Besides general dislike, for things
> > like regression test output context diffs are just not well suited.
> 
> Personally, I disagree completely.  Unified diffs are utterly unreadable
> for anything beyond trivial cases of small well-separated changes.
> 
> It's possible that regression failure diffs will usually fall into that
> category, but I'm not convinced.

For reading patches, I frequently use both formats.  Overall, I perhaps read
unified 3/4 of the time and context 1/4 of the time.

For regression diffs, I use PG_REGRESS_DIFF_OPTS=-u and have never converted a
regression diff to context form.  Hence, +1 for the proposed change.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2017-04-06 Thread David Rowley
On 7 April 2017 at 10:31, Andres Freund  wrote:
> Hi,
>
> I personally, and I know of a bunch of other regular contributors, find
> context diffs very hard to read.  Besides general dislike, for things
> like regression test output context diffs are just not well suited.
> E.g. in
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prairiedog=2017-04-06%2021%3A10%3A56=check
> the salient point (ERROR:  50 is outside the valid range for parameter 
> "effective_io_concurrency" (0 .. 0))
> is 130 lines into the diff, whereas it's right at the start in a unified diff.
> Issues with one error that causes a lot of followup output changes are
> really common in our regression suite.
>
> I personally have PG_REGRESS_DIFF_OPTS set to -dU10, but that doesn't
> help much analyzing buildfarm output.
>
> Therefore I propose changing the defaults in pg_regress.c.

You mean people actually use those diffs? I've never done anything
apart from using . That way I can
reject and accept the changes as I wish, just by kicking the results
over to the expected results, or not, if there's a genuine mistake.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Tom Lane
Amit Langote  writes:
> ... One of
> the earlier versions of that patch introduced a consider_typeid parameter
> for which only ExecEvalConvertRowtype() passed true.

Yeah, I was thinking of adding a flag along that line to fix this, but
desisted after figuring out that ExecEvalConvertRowtype was the only
candidate for using it.  Ashutosh's patch had already shown that it'd
be better to pass "false" there too, so we'd end up with no use cases
at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_basebackup: Allow use of arbitrary compression program

2017-04-06 Thread Michael Harris
Hello,

Back in pg 9.2, we hacked a copy of pg_basebackup to add a command
line option which would allow the user to specify an arbitrary
external program (potentially including arguments) to be used to
compress the tar backup.

Our motivation was to be able to use pigz (parallel gzip
implementation) to speed up the compression. It also allows using
tools like bzip2, xz, etc instead of the inbuilt zlib.

I never ended up submitting that upstream, but now it looks like I
will have to repeat the exercise for 9.6, so I was wondering if such a
feature would be welcomed.

I found one or two references to people asking for this, eg:
https://www.commandprompt.com/blog/a_pg_basebackup_wish_list/

To do it properly would require:

1) Adding command line option as follows:

  -C, --compressprog=PROG
 Use supplied program for compression

2) The current logic either uses zlib if compiled in, or offers no
compression at all, controlled by a series of #ifdef/#endif. I would
prefer that the user can either use zlib or an external program
without having to recompile, so I would remove the #ifdefs and replace
them with run time branching.

3) When opening the output file, if the -C option was used, use popen
to open a child process and write to that.

My questions are:
- Has anything like this already been discussed?
- Would this be a welcome contribution?
- Can anyone see any problems with the above approach?

Thanks!

Regards
Mike Harris


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-04-06 Thread David Rowley
On 7 April 2017 at 13:41, Tom Lane  wrote:
> David Rowley  writes:
>> On 7 April 2017 at 11:47, Tom Lane  wrote:
>>> What I'm on about is that you can't do the early advance to the
>>> next outer tuple unless you're sure that all the quals that were
>>> relevant to the uniqueness proof have been checked for the current
>>> inner tuple.  That affects all three join types not only merge.
>
>> Well, look at the join code and you'll see this only happens after the
>> joinqual is evaulated. I didn't make a special effort here. I just
>> borrowed the location that JOIN_SEMI was already using.
>
> Right, and that's exactly the point: some of the conditions you're
> depending on might have ended up in the otherqual not the joinqual.
>
> We'd discussed rearranging the executor logic enough to deal with
> such situations and agreed that it seemed too messy; but that means
> that the optimization needs to take care not to use otherqual
> (ie pushed-down) conditions in the uniqueness proofs.
>
>> Oh yeah. I get it, but that's why we ignore !can_join clauses
>
> can_join seems to me to be not particularly relevant ... there's
> nothing that prevents that from getting set for pushed-down clauses.
>
> It's possible that the case I'm worried about is unreachable in
> practice because all the conditions that could be of interest would?
> be strict and therefore would have forced join strength reduction.
> But I'm not comfortable with assuming that.

Okay, well how about we protect against that by not using such quals
as unique proofs? We'd just need to ignore anything that's
outerjoin_delayed?

If we're struggling to think of a case that this will affect, then we
shouldn't be too worried about any missed optimisations.

A patch which does this is attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


unique_joins_2017-04-07b.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Amit Langote
On 2017/04/07 8:36, Tom Lane wrote:
> I wrote:
>> I propose to deal with this by reverting 3838074f8 in toto, and then
>> trying to clarify that comment, and maybe adding a regression test case
>> based on the example I showed earlier so that it will be a little more
>> obvious if someone breaks this again.
>> However, I see that 3838074f8 touches some partitioning code, which
>> makes me wonder if there's anything in the partitioning logic that
>> really depends on this erroneous "optimization".

Definitely misread the comment there, but was mystified why the tests
didn't break.  The partitioning tuple-routing code optionally avoids
converting tuples by using this optimization.  Since TupleDesc.tdtypeid of
the parent and the partition to which a tuple is routed are never the
same, tuples would always have to be converted before 3838074f8.  One of
the earlier versions of that patch introduced a consider_typeid parameter
for which only ExecEvalConvertRowtype() passed true.

> After further poking around, I've concluded that that approach is probably
> an overreaction.  Of the dozen or so callers of convert_tuples_by_position
> and convert_tuples_by_name, it seems that ExecEvalConvertRowtype is the
> only one that really needs the correct composite-datum headers in the
> converted tuple; and even for it, forcing use of do_convert_tuple is
> a pretty expensive, brute-force way to get that result.  Ashutosh's
> proposal to use heap_copy_tuple_as_datum when no column rearrangement
> is required should be substantially more efficient.
> 
> So I now think it's okay to remove consideration of matching the target
> rowtype OID from the tupconvert.c functions, although we have to realize
> that that is effectively an API change for them, one which has a definite
> potential for biting third-party callers.

And I see that just in 3f902354b08 lets the partition tuple-routing code
keep utilizing that optimization.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2017-04-06 Thread Andrew Dunstan


On 04/06/2017 09:17 PM, Peter Eisentraut wrote:
> On 4/6/17 18:31, Andres Freund wrote:
>> I personally have PG_REGRESS_DIFF_OPTS set to -dU10, but that doesn't
>> help much analyzing buildfarm output.
>>
>> Therefore I propose changing the defaults in pg_regress.c.
> I think one problem is that diff -u is not as portable as diff -c.  For
> example, the HP-UX 11 man page of diff doesn't list it.
>


Ugh. I suppose we could run a test to see if it was available. If it
comes to that, I guess I could do a similar test in the buildfarm
client. Seems a bit like overkill, though.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-04-06 Thread Michael Paquier
On Thu, Apr 6, 2017 at 8:04 AM, Michael Paquier
 wrote:
> On Thu, Apr 6, 2017 at 2:11 AM, Heikki Linnakangas  wrote:
>> At a quick glance, moving pg_frontend_random() to src/common looks like a
>> non-starter. It uses pglock_thread() which is internal to libpq, so it won't
>> compile as it is. I think I'm going to change scram_build_verifier() to take
>> a pre-generated salt as argument, to avoid the need for a random number
>> generator in src/common.
>
> Oops. Need an updated set of patches?

Attached is an updated set of patches anyway. This is similar to the
last set, except that I removed the part where pg_frontend_random() is
refactored, extending scram_build_verifier() to use a pre-generated
salt.

Hope that helps.
-- 
Michael


0001-Use-base64-based-encoding-for-stored-and-server-keys.patch
Description: Binary data


0002-Move-routine-to-build-SCRAM-verifier-into-src-common.patch
Description: Binary data


0003-Refactor-frontend-side-random-number-generation.patch
Description: Binary data


0004-Extend-PQencryptPassword-with-a-hashing-method.patch
Description: Binary data


0005-Extend-psql-s-password-and-createuser-to-handle-SCRA.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-04-06 Thread Tom Lane
David Rowley  writes:
> On 7 April 2017 at 11:47, Tom Lane  wrote:
>> What I'm on about is that you can't do the early advance to the
>> next outer tuple unless you're sure that all the quals that were
>> relevant to the uniqueness proof have been checked for the current
>> inner tuple.  That affects all three join types not only merge.

> Well, look at the join code and you'll see this only happens after the
> joinqual is evaulated. I didn't make a special effort here. I just
> borrowed the location that JOIN_SEMI was already using.

Right, and that's exactly the point: some of the conditions you're
depending on might have ended up in the otherqual not the joinqual.

We'd discussed rearranging the executor logic enough to deal with
such situations and agreed that it seemed too messy; but that means
that the optimization needs to take care not to use otherqual
(ie pushed-down) conditions in the uniqueness proofs.

> Oh yeah. I get it, but that's why we ignore !can_join clauses

can_join seems to me to be not particularly relevant ... there's
nothing that prevents that from getting set for pushed-down clauses.

It's possible that the case I'm worried about is unreachable in
practice because all the conditions that could be of interest would
be strict and therefore would have forced join strength reduction.
But I'm not comfortable with assuming that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about one of the old Autonomous Transaction approach

2017-04-06 Thread Vaishnavi Prabakaran
Hi All,

Regarding the discussion about Autonomous transaction in below message ID,
long time ago, it has been specified that having a new structure "Struct
PGAutonomousXACT" was rejected in PGCon hackers meeting. Can anyone know
why is it been rejected? What is the disadvantage/problem identified with
that approach?

ca+u5nmkeum4abrqbndlyt5ledektae8rbiyw3977yhmeowq...@mail.gmail.com



I tried to look for answers going through various mails related to
Autonomous transaction with no luck. Any answer or hint about where to look
for answers will be helpful.


Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-04-06 Thread David Rowley
On 7 April 2017 at 11:47, Tom Lane  wrote:
> David Rowley  writes:
>> On 7 April 2017 at 07:26, Tom Lane  wrote:
>>> I'm looking through this, and I'm failing to see where it deals with
>>> the problem we discussed last time, namely that you can't apply the
>>> optimization unless all clauses that were used in the uniqueness
>>> proof are included in the join's merge/hash conditions + joinquals.
>
>> The code in question is:
>> mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual &&
>> mergestate->js.first_inner_tuple_only;
>
> Uh, AFAICS that only protects the skip-mark-and-restore logic.
> What I'm on about is that you can't do the early advance to the
> next outer tuple unless you're sure that all the quals that were
> relevant to the uniqueness proof have been checked for the current
> inner tuple.  That affects all three join types not only merge.

Well, look at the join code and you'll see this only happens after the
joinqual is evaulated. I didn't make a special effort here. I just
borrowed the location that JOIN_SEMI was already using.

For example, from hash join:

if (joinqual == NULL || ExecQual(joinqual, econtext))
{
node->hj_MatchedOuter = true;
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));

/* In an antijoin, we never return a matched tuple */
if (node->js.jointype == JOIN_ANTI)
{
node->hj_JoinState = HJ_NEED_NEW_OUTER;
continue;
}

/*
* Skip to the next outer tuple if we only need to join to
* the first inner matching tuple.
*/
if (node->js.first_inner_tuple_only)
node->hj_JoinState = HJ_NEED_NEW_OUTER;

Note the first line and the final two lines.

Here's the one from nested loop:

if (ExecQual(joinqual, econtext))
{
node->nl_MatchedOuter = true;

/* In an antijoin, we never return a matched tuple */
if (node->js.jointype == JOIN_ANTI)
{
node->nl_NeedNewOuter = true;
continue; /* return to top of loop */
}

/*
* Skip to the next outer tuple if we only need to join to the
* first inner matching tuple.
*/
if (node->js.first_inner_tuple_only)
node->nl_NeedNewOuter = true;

Again, note the first line and final 2 lines.

> The case that would be relevant to this is, eg,
>
> create table t1 (f1 int, f2 int, primary key(f1,f2));
>
> select * from t_outer left join t1 on (t_outer.f1 = t1.f1)
> where t_outer.f2 = t2.f2;

hmm, that query is not valid, unless you have created some table named
t_outer. I don't know how you've defined that. So I guess you must
have meant:

postgres=# explain verbose select * from t1 t_outer left join t1 on
(t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2;
QUERY PLAN
---
 Hash Join  (cost=66.50..133.57 rows=128 width=16)
   Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2
   Inner Unique: Yes
   Hash Cond: ((t_outer.f1 = t1.f1) AND (t_outer.f2 = t1.f2))
   ->  Seq Scan on public.t1 t_outer  (cost=0.00..32.60 rows=2260 width=8)
 Output: t_outer.f1, t_outer.f2
   ->  Hash  (cost=32.60..32.60 rows=2260 width=8)
 Output: t1.f1, t1.f2
 ->  Seq Scan on public.t1  (cost=0.00..32.60 rows=2260 width=8)
   Output: t1.f1, t1.f2
(10 rows)

Which did become an INNER JOIN due to the strict W

If you'd had done:

postgres=# explain verbose select * from t1 t_outer left join t1 on
(t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2 or t1.f1 is null;
   QUERY PLAN

 Merge Left Join  (cost=0.31..608.67 rows=255 width=16)
   Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2
   Inner Unique: No
   Merge Cond: (t_outer.f1 = t1.f1)
   Filter: ((t_outer.f2 = t1.f2) OR (t1.f1 IS NULL))
   ->  Index Only Scan using t1_pkey on public.t1 t_outer
(cost=0.16..78.06 rows=2260 width=8)
 Output: t_outer.f1, t_outer.f2
   ->  Materialize  (cost=0.16..83.71 rows=2260 width=8)
 Output: t1.f1, t1.f2
 ->  Index Only Scan using t1_pkey on public.t1
(cost=0.16..78.06 rows=2260 width=8)
   Output: t1.f1, t1.f2
(11 rows)

You'll notice that "Inner Unique: No"

> Your existing patch would think t1 is unique-inner, but the qual pushed
> down from WHERE would not be a joinqual so the wrong thing would happen
> at runtime.
>
> (Hm ... actually, this example wouldn't fail as written because
> the WHERE qual is probably strict, so the left join would get
> reduced to an inner join and then pushed-down-ness no longer
> matters.  But hopefully you get my drift.)

Oh yeah. I get it, but that's why we ignore !can_join clauses

/* Ignore if it's not a mergejoinable clause */
if (!restrictinfo->can_join ||
restrictinfo->mergeopfamilies == NIL)
continue; /* not mergejoinable */

no?

>> I don't really think the List idea would be nearly as efficient.
>
> No, what I'm saying is that each RelOptInfo would contain a single List of
> 

Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2017-04-06 Thread Peter Eisentraut
On 4/6/17 18:31, Andres Freund wrote:
> I personally have PG_REGRESS_DIFF_OPTS set to -dU10, but that doesn't
> help much analyzing buildfarm output.
> 
> Therefore I propose changing the defaults in pg_regress.c.

I think one problem is that diff -u is not as portable as diff -c.  For
example, the HP-UX 11 man page of diff doesn't list it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug with pg_basebackup and 'shared' tablespace

2017-04-06 Thread Kyotaro HORIGUCHI
Hi, Pierre.

Maybe you're the winner:p

At Thu, 06 Apr 2017 12:34:09 +0200, Pierre Ducroquet  
wrote in <1714428.BHRm6e8A2D@peanuts2>
> On Thursday, April 6, 2017 2:00:55 PM CEST Kyotaro HORIGUCHI wrote:
> > https://www.postgresql.org/docs/9.6/static/manage-ag-tablespaces.html
> > 
> > | The location must be an existing, empty directory that is owned
> > | by the PostgreSQL operating system user.
> > 
> > This explicitly prohibits to share one tablespace directory among
> > multiple servers. The code is just missing the case of multiple
> > servers with different versions. I think the bug is rather that
> > Pg9.6 in the case allowed to create the tablespace.
> > 
> > The current naming rule of tablespace directory was introduced as
> > of 9.0 so that pg_upgrade (or pg_migrator at the time) can
> > perform in-place migration. It is not intended to share a
> > directory among multiple instances with different versions.
> > 
> > That being said, an additional trick in the attached file will
> > work for you.
> 
> Thanks for your answer.
> Indeed, either PostgreSQL should enforce that empty folder restriction, or 
> pg_basebackup should lift it and the documentation should reflect this.

That being said, it is a different matter if the behavior is
preferable. The discussion on the behavior is continued here.

https://www.postgresql.org/message-id/20170406.160844.120459562.horiguchi.kyot...@lab.ntt.co.jp

> Right now, there is a conflict between pg_basebackup and the server since 
> they 
> do not allow the same behaviour. I can submit a patch either way, but I won't 
> decide what is the right way to do it.
> I know tricks will allow to work around that issue, I found them hopefully 
> and 
> I guess most people affected by this issue would be able to find and use 
> them, 
> but nevertheless being able to build a server that can no longer be base-
> backuped does not seem right.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication and Character encoding

2017-04-06 Thread Kyotaro HORIGUCHI
At Thu, 6 Apr 2017 14:43:56 -0400, Peter Eisentraut 
 wrote in 
<330a093a-d155-c866-cbf2-8f36fdf51...@2ndquadrant.com>
> On 4/6/17 11:47, Peter Eisentraut wrote:
> > On 4/5/17 21:32, Kyotaro HORIGUCHI wrote:
> >> At Wed, 5 Apr 2017 11:33:51 -0400, Peter Eisentraut 
> >>  wrote in 
> >> <5401fef6-c0c0-7e8a-d8b1-169e30cbd...@2ndquadrant.com>
> >>> After further thinking, I prefer the alternative approach of using
> >>> pq_sendcountedtext() as is and sticking the trailing zero byte on on the
> >>> receiving side.  This is a more localized change, and keeps the logical
> >>> replication protocol consistent with the main FE/BE protocol.  (Also, we
> >>> don't need to send a useless byte around.)
> >>
> >> I'm not sure about the significance of the trailing zero in the
> >> the logical replication protocol.
> > 
> > It doesn't matter.  It could be "nice" to have it because then the
> > receiving side doesn't need to add it if it wants to assemble a C
> > string.  But for the reasons above, it doesn't seem worth doing that.

Ok, it's fine for me.

> committed

Thanks!

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate usage of tablespace location?

2017-04-06 Thread Kyotaro HORIGUCHI
I don't mean that this is the only or best way to go.

I apologize for the possible lack of explanation.

At Thu, 06 Apr 2017 12:03:51 -0400, Tom Lane  wrote in 
<21084.1491494...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI  writes:
> > I noticed by the following report, PostgreSQL can share the same
> > directory as tablespaces of two servers with different
> > pg-versions.
> 
> > https://www.postgresql.org/message-id/2008148.rxBNyNRHPZ@peanuts2
> 
> > 8.4 checked that the tablespace location is empty, but from 9.0,
> > the check is replaced with creating a PG_PGVER_CATVER
> > subdirectory. This works for multiple servers with the same
> > version, but don't for servers with different versions.
> 
> Please explain why you think it doesn't work.  This patch seems to
> be reverting an intentional behavioral change, and you haven't

I understand that the change is for in-place upgrade, not for
sharing a tablespace diretory between two version of PostgreSQL
servers. It actually rejects the second server with the same
version to come. If this is correct, it doesn't seem right to
accept the second server of the different version.

If we allow sharing of the directory, theoretically we can allow
the same between the same version of servers by adding system
identifier in the subdirectory name.


> really explained why we'd want to.  It certainly doesn't look like
> it addresses the referenced complaint about pg_basebackup behavior.

My point is that "the direcotry for newly created tablespace is
really reuiqred to be literary empty or not?"

Practically it doesn't need to be empty and succesful creation of
PG_VER_CATVER directory is enough as the current implement
does. If we take this way the documentation and pg_basebackup
should be changed and the problem will be resolved as the result.

https://www.postgresql.org/docs/9.6/static/manage-ag-tablespaces.html

- The location must be an existing, empty directory that is owned
- by the PostgreSQL operating system user. All objects subsequently
- created within the tablespace will be stored in files underneath
- this directory.
+ CREATE TABLESPACE creates a subdirectory named after server
+ version in the location. The location must not contain a file
+ or directory of that name for the subdirectory. All objects
+ subsequently created within the tablespace will be stored in
+ files underneath the subdirectory.

Then, modify pg_basebackup to follow the description above.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove pg_stat_progress_vacuum from Table 28.2

2017-04-06 Thread Amit Langote
On 2017/04/07 0:56, Fujii Masao wrote:
> On Tue, Apr 4, 2017 at 10:19 AM, Amit Langote
>  wrote:
>> It seems pg_stat_progress_vacuum is not supposed to appear in the table
>> titled "Collected Statistics Views".  It was added by c16dc1aca.  Attached
>> patch fixes that.
> 
> Instead, it should appear in the table of "Dynamic Statistics Views"
> because it reports dynamic info, i.e., progress, about VACUUM activity?

I thought the same at first, but then realized we have a entirely separate
section 28.4. Progress Reporting.  So it may not be necessary to keep it
anywhere in 28.2.

In fact I also proposed [1] to move the descriptions of "Dynamic
Statistics Views" to a new section within the same chapter.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/70cd6b98-7c3f-e368-04ed-e053d18b7d81%40lab.ntt.co.jp




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BRIN desummarization writes junk WAL records

2017-04-06 Thread Tom Lane
I am seeing the database fail to restart after a crash during the
regression tests, due to a divide-by-zero fault in BRIN wal replay.

Core was generated by `postgres: startup'.
Program terminated with signal 8, Arithmetic exception.
#0  brinSetHeapBlockItemptr (buf=, pagesPerRange=0, 
heapBlk=0, tid=...) at brin_revmap.c:169
169 iptr += HEAPBLK_TO_REVMAP_INDEX(pagesPerRange, heapBlk);
(gdb) bt
#0  brinSetHeapBlockItemptr (buf=, pagesPerRange=0, 
heapBlk=0, tid=...) at brin_revmap.c:169
#1  0x00478cdc in brin_xlog_desummarize_page (record=0x2403ac8)
at brin_xlog.c:274
#2  brin_redo (record=0x2403ac8) at brin_xlog.c:320
#3  0x00513174 in StartupXLOG () at xlog.c:7171
#4  0x006dea91 in StartupProcessMain () at startup.c:217
#5  0x0052214a in AuxiliaryProcessMain (argc=2, argv=0x7fff4bb8d1f0)
at bootstrap.c:425
#6  0x006d98b7 in StartChildProcess (type=StartupProcess)
at postmaster.c:5256
#7  0x006ddae6 in PostmasterMain (argc=3, argv=)
at postmaster.c:1329
#8  0x00658038 in main (argc=3, argv=0x2402b20) at main.c:228

The proximate cause of the exception seems to be that
brinSetHeapBlockItemptr is being passed pagesPerRange = 0,
which is problematic since HEAPBLK_TO_REVMAP_INDEX tries to
divide by that.  Looking one level down, the bogus value
seems to be coming out of an xl_brin_desummarize WAL record:

(gdb) f 1
#1  0x00478cdc in brin_xlog_desummarize_page (record=0x2403ac8)
at brin_xlog.c:274
274 brinSetHeapBlockItemptr(buffer, xlrec->pagesPerRange, 
xlrec->heapBlk, iptr);
(gdb) p *xlrec
$1 = {pagesPerRange = 0, heapBlk = 0, regOffset = 1}

This is, perhaps, not unrelated to the fact that
brinRevmapDesummarizeRange doesn't seem to be bothering to fill
that field of the record.

BTW, is it actually sensible that xl_brin_desummarize's heapBlk
is declared OffsetNumber and not BlockNumber?  If there's a reason
why that's correct, the field name seems damn misleading.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2017-04-06 Thread Tatsuo Ishii
>> Having said all that, I think we're at the point in the commitfest
>> where if there's any design question at all about a patch, it should
>> get booted to the next cycle.  We are going to have more than enough
>> to do to stabilize what's already committed, we don't need to be
>> adding more uncertainty.
> 
> Ok, I will move the patch to the next cf.

Done.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making clausesel.c Smarter

2017-04-06 Thread David Rowley
On 7 April 2017 at 09:05, Claudio Freire  wrote:
> On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire  wrote:
 If you ask me, I'd just leave:

 + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
 DEFAULT_INEQ_SEL)
 + {
 + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
 implies we have no stats */
 + s2 = DEFAULT_RANGE_INEQ_SEL;
 + }
 + else
 + {
 ...
 +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
 +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
 +   notnullsel = 1.0 - nullsel;
 +
 +   /* Adjust for double-exclusion of NULLs */
 +   s2 += nullsel;
 +   if (s2 <= 0.0) {
 +  if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
 +  {
 +   /* Most likely contradictory clauses found */
 +   s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
 +  }
 +  else
 +  {
 +   /* Could be a rounding error */
 +   s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
 +  }
 +   }
 + }

 Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
 cautious) estimation of the amount of rounding error that could be
 there with 32-bit floats.

 The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
 an estimation based on stats, maybe approximate, but one that makes
 sense (ie: preserves the monotonicity of the CPF), and as such
 negative values are either sign of a contradiction or rounding error.
 The previous code left the possibility of negatives coming out of
 default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
 but that doesn't seem possible coming out of scalarineqsel.

 But I'd like it if Tom could comment on that, since he's the one that
 did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
 in 2004).

 Barring that, I'd just change the

 s2 = DEFAULT_RANGE_INEQ_SEL;

 To

 s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;

 Which makes a lot more sense.
>>>
>>> I think to fix this properly the selfuncs would have to return the
>>> estimate, and nullfrac separately, so the nullfrac could just be
>>> applied once per expression. There's already hacks in there to get rid
>>> of the double null filtering. I'm not proposing that as something for
>>> this patch. It would be pretty invasive to change this.
>>
>> There's no need, you can compute the nullfrac with nulltestsel. While
>> there's some rework involved, it's not a big deal (it just reads the
>> stats tuple), and it's a clean solution.
>
> I'm marking this Waiting on Author until we figure out what to do with
> those issues (the null issue quoted above, and the quadratic behavior)

I've attached a rebased patch as the existing one no longer applies.

I've not yet had time to wrap my head around the null handling stuff.
I'll try to get to that today.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


smarter_clausesel_2017-04-07.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-04-06 Thread Tom Lane
David Rowley  writes:
> On 7 April 2017 at 07:26, Tom Lane  wrote:
>> I'm looking through this, and I'm failing to see where it deals with
>> the problem we discussed last time, namely that you can't apply the
>> optimization unless all clauses that were used in the uniqueness
>> proof are included in the join's merge/hash conditions + joinquals.

> The code in question is:
> mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual &&
> mergestate->js.first_inner_tuple_only;

Uh, AFAICS that only protects the skip-mark-and-restore logic.
What I'm on about is that you can't do the early advance to the
next outer tuple unless you're sure that all the quals that were
relevant to the uniqueness proof have been checked for the current
inner tuple.  That affects all three join types not only merge.

The case that would be relevant to this is, eg,

create table t1 (f1 int, f2 int, primary key(f1,f2));

select * from t_outer left join t1 on (t_outer.f1 = t1.f1)
where t_outer.f2 = t2.f2;

Your existing patch would think t1 is unique-inner, but the qual pushed
down from WHERE would not be a joinqual so the wrong thing would happen
at runtime.

(Hm ... actually, this example wouldn't fail as written because
the WHERE qual is probably strict, so the left join would get
reduced to an inner join and then pushed-down-ness no longer
matters.  But hopefully you get my drift.)

> I don't really think the List idea would be nearly as efficient.

No, what I'm saying is that each RelOptInfo would contain a single List of
Relids of proven-unique-for outer rels (and another one for the negative
cache).  No array, no more searching than you have now, just removal of an
uncertainly-safe array fetch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread Beena Emerson
On Fri, Apr 7, 2017 at 2:35 AM, Tomas Vondra 
wrote:

> On 04/06/2017 08:33 PM, David Steele wrote:
>>
>>
>> I'm in favor of 16,64,256,1024.
>>
>>
> I don't see a particular reason for this, TBH. The sweet spots will be
> likely dependent hardware / OS configuration etc. Assuming there actually
> are sweet spots - no one demonstrated that yet.
>
> Also, I don't see how supporting additional WAL sizes increases chance of
> incompatibility. We already allow that, so either the tools (e.g. backup
> solutions) assume WAL segments are always 16MB (in which case are
> essentially broken) or support valid file sizes (in which case they should
> have no issues with the new ones).
>
> If we're going to do this, I'm in favor of deciding some reasonable upper
> limit (say, 1GB or 2GB sounds good), and allowing all 2^n values up to that
> limit.


I think the majority consensus is to use all valid values. Since 1GB is
what we have finalized as the upper limit, lets continue with that for now.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Tom Lane
I wrote:
> I propose to deal with this by reverting 3838074f8 in toto, and then
> trying to clarify that comment, and maybe adding a regression test case
> based on the example I showed earlier so that it will be a little more
> obvious if someone breaks this again.
> However, I see that 3838074f8 touches some partitioning code, which
> makes me wonder if there's anything in the partitioning logic that
> really depends on this erroneous "optimization".

After further poking around, I've concluded that that approach is probably
an overreaction.  Of the dozen or so callers of convert_tuples_by_position
and convert_tuples_by_name, it seems that ExecEvalConvertRowtype is the
only one that really needs the correct composite-datum headers in the
converted tuple; and even for it, forcing use of do_convert_tuple is
a pretty expensive, brute-force way to get that result.  Ashutosh's
proposal to use heap_copy_tuple_as_datum when no column rearrangement
is required should be substantially more efficient.

So I now think it's okay to remove consideration of matching the target
rowtype OID from the tupconvert.c functions, although we have to realize
that that is effectively an API change for them, one which has a definite
potential for biting third-party callers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread Beena Emerson
Hello,

On Wed, Apr 5, 2017 at 6:06 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

>
> (Roughly speaking, to get started, this would mean compiling with
> --with-wal-segsize 16, 32, 64, 128, 256, run make check-world both
> sequentially and in parallel, and take note of a) passing, b) run time,
> c) disk space.)
>
>
The attached patch updates a pg_upgrade test which fails for higher segment
values: The output of SELECT restart_lsn FROM pg_replication_slots WHERE
slot_name = 'slot1'}.

The following are the results of the installcheck-world execution.

wal_size time   cluster_size   pg_wal   files
16 5m32.761s   539M 417M  26
32 5m32.618s   545M 417M 13
64 5m39.685s   571M 449M  7
128   5m52.961s641M 513M  4
256   6m13.402s   635M 512M   2
512   7m3.252s 1.2G  1G   2
1024 9m0.205s 1.2G   1G  1


wal_size time   cluster_size   pg_wal   files
16 5m31.137s   542M 417M  26
32 5m29.532s  539M 417M 13
64 5m36.189s   571M 449M  7
128   5m50.027s635M 513M  4
256   6m13.603s   635M 512M   2
512   7m4.154s 1.2G   1G   2
1024 8m55.357s1.2G   1G  1

For every test, except for connect/test5 in src/interfaces/ecpg, all else
passed.

We can see that smaller chunks take lesser time for the same amount of WAL
(128 and 256, 512 and 1024). But these tests are not large enough to
conclude.


Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


initdb_update_regress.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-04-06 Thread David Rowley
On 7 April 2017 at 07:26, Tom Lane  wrote:
> I'm looking through this, and I'm failing to see where it deals with
> the problem we discussed last time, namely that you can't apply the
> optimization unless all clauses that were used in the uniqueness
> proof are included in the join's merge/hash conditions + joinquals.

Many thanks for looking at this again.

The test in question is in nodeMergeJoin.c. I believe the join is
still unique no matter where the clauses are evaluated. It should be
up to the executor code to make use of the knowledge how it sees fit.
The planner should not make assumptions on how the executor will make
use of this knowledge. I've carefully crafted a comment in
nodeMergejoin.c which explains all of this, and also about the
limitations and where things might be improved later.

The code in question is:

mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual &&
mergestate->js.first_inner_tuple_only;


> I don't especially like the centralized unique_rels cache structure.
> It's not terribly clear what it's for, and you're making uncomfortably
> large assumptions about never indexing off the end of the array, despite
> not having any range checks for the subscripts.  Wouldn't it be better to
> add simple List fields into RelOptInfo, representing the outer rels this
> rel has been proven unique or not-unique for?  That would dodge the array
> size question and would be more readily extensible to someday applying
> this to join rels .

hmm, perhaps bounds checking could be done, but it's no worse than
planner_rt_fetch().
I don't really think the List idea would be nearly as efficient. The
array provides a direct lookup for the List of proof relations. A List
of List list would require a linear lookup just to find the correct
List, then the existing linear lookup to find the proofs. The cache is
designed to be fast. Slowing it down seems like a bad idea. Perhaps
throwing it away would be better, since it's not required and was only
added as an optimisation.

The non_unique_rels will most often have a NULL bitmap set due to the
incremental join search by the standard planner. So access to this as
an array should be very fast, as we'll quickly realise there are no
proofs to be found.

> I also think some more thought is needed about handling JOIN_UNIQUE_OUTER
> and JOIN_UNIQUE_INNER cases.  In the first place, the patch cavalierly
> violates the statement in joinpath.c that those jointype values never
> propagate outside that module.  In the second place, shouldn't
> JOIN_UNIQUE_INNER automatically result in the path getting marked
> inner_unique?  I suspect the logic in add_paths_to_joinrel ought to
> look something like
>
> if (jointype == JOIN_UNIQUE_INNER)
> extra.inner_unique = true;
> else
> extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
> (jointype == JOIN_UNIQUE_OUTER ? JOIN_INNER : 
> jointype),
> restrictlist);

hmm, innerrel_is_unique() is without prejudice as to the jointypes it
supports, so much so that the argument is completely ignored. It was
probably left over from some previous incarnation of the patch.
If we treat JOIN_UNIQUE_INNER specially, then we'd better be sure that
it's made unique on the RHS join quals. It looks like
create_unique_path() uses sjinfo->semi_rhs_exprs to uniquify the
relation, and compute_semijoin_info() seems to take all of the join
conditions there or nothing at all, so I think it's safe to
automatically mark JOIN_UNIQUE_INNERs this way.

Updated patch is attached.




-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


unique_joins_2017-04-07.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Undefined psql variables

2017-04-06 Thread Corey Huinker
On Sun, Apr 2, 2017 at 4:57 PM, Fabien COELHO  wrote:

>
> I'm inclined to suggest that we should require all extensions beyond the
>> boolean-literal case to be set up as a keyword followed by appropriate
>> argument(s); that seems like it's enough to prevent syntax conflicts from
>> future additions.  So you could imagine
>>
>> \if defined varname
>> \if sql boolean expression to send to server
>> \if compare value operator value
>>
>
> I'm still thinking:-)
>
> Independently of the my aethetical complaint against having a pretty
> unusual keyword prefix syntax, how would you envision a \set assignment
> variant? Would \if have a different expression syntax somehow?


Any further thoughts?


Re: [HACKERS] [COMMITTERS] pgsql: Collect and use multi-column dependency stats

2017-04-06 Thread Simon Riggs
On 6 April 2017 at 17:41, David Rowley  wrote:
> On 7 April 2017 at 00:47, Simon Riggs  wrote:
>> On 5 April 2017 at 18:48, Tom Lane  wrote:
>>> Simon Riggs  writes:
 Collect and use multi-column dependency stats
>>>
>>> The buildfarm is unhappy about the fact that this changed the API
>>> for clauselist_selectivity().  I am not convinced that that change
>>> was a good idea, so before telling FDW authors that they need to
>>> change their code, I'd like to hear a defense of the API change.
>>> Why not just use the existing varRelid parameter for that?  Why
>>> is there an assumption that only one rel's extended stats will
>>> ever be of interest?  This function does get used for join clauses.
>>
>> Point noted. Reading thread and hope to fix today.
>
> I've attached a rebased patch which fixes up the conflict with the
> BRIN cost estimate patch which went in a short while ago.

Looks enough to me, for now at least. Minor comment added.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread David Steele
On 4/6/17 6:52 PM, Tomas Vondra wrote:
> On 04/06/2017 11:45 PM, David Steele wrote:
>>
>> How many people in the field are running custom builds of
>> Postgres?  And of those, how many have changed the WAL segment size?
>> I've never encountered a non-standard segment size or talked to anyone
>> who has.  I'm not saying it has *never* happened but I would venture to
>> say it's rare.
> 
> I agree it's rare, but I don't think that means we can just consider the
> option as 'unsupported'. We're even mentioning it in the docs as a valid
> way to customize granularity of the WAL archival.
> 
> I certainly know people who run custom builds, and some of them run with
> custom WAL segment size. Some of them are our customers, some are not.
> And yes, some of them actually patched the code to allow 256MB WAL
> segments.

I feel a little better knowing that *somebody* is doing it in the field.

>> Just because we don't change the default doesn't mean that others won't.
>>  I still think testing for sizes other than 16MB is severely lacking and
>> I don't believe caveat emptor is the way to go.
> 
> Aren't you mixing regression and performance testing? I agree we need to
> be sure all segment sizes are handled correctly, no argument here.

Not intentionally.  Our standard test suite is only regression as far as
I can see.  All the performance testing I've seen has been done ad hoc.

>> I don't have plans to add animals.  I think we'd need a way to tell
>> 'make check' to use a different segment size for tests and then
>> hopefully reconfigure some of the existing animals.
> 
> OK. My point was that we don't have that capability now, and the latest
> patch is not adding it either.

Agreed.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2017-04-06 Thread Tom Lane
Andres Freund  writes:
> I personally, and I know of a bunch of other regular contributors, find
> context diffs very hard to read.  Besides general dislike, for things
> like regression test output context diffs are just not well suited.

Personally, I disagree completely.  Unified diffs are utterly unreadable
for anything beyond trivial cases of small well-separated changes.

It's possible that regression failure diffs will usually fall into that
category, but I'm not convinced.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread Tomas Vondra

On 04/06/2017 11:45 PM, David Steele wrote:

On 4/6/17 5:05 PM, Tomas Vondra wrote:

On 04/06/2017 08:33 PM, David Steele wrote:

On 4/5/17 7:29 AM, Simon Riggs wrote:


2. It's not clear to me the advantage of being able to pick varying
filesizes. I see great disadvantage in having too many options, which
greatly increases the chance of incompatibility, annoyance and
breakage. I favour a small number of values that have been shown by
testing to be sweet spots in performance and usability. (1GB has been
suggested)


I'm in favor of 16,64,256,1024.



I don't see a particular reason for this, TBH. The sweet spots will be
likely dependent hardware / OS configuration etc. Assuming there
actually are sweet spots - no one demonstrated that yet.


Fair enough, but my feeling is that this patch has never been about
server performance, per se.  Rather, is is about archive management and
trying to stem the tide of WAL as servers get bigger and busier.
Generally, archive commands have to make a remote connection to offload
WAL and that has a cost per segment.



Perhaps, although Robert also mentioned that the fsync at the end of 
each WAL segment is noticeable. But the thread is a bit difficult to 
follow, different people have different ideas about the motivation of 
the patch, etc.



Also, I don't see how supporting additional WAL sizes increases chance
of incompatibility. We already allow that, so either the tools (e.g.
backup solutions) assume WAL segments are always 16MB (in which case are
essentially broken) or support valid file sizes (in which case they
should have no issues with the new ones).


I don't see how a compile-time option counts as "supporting that" in
practice.  How many people in the field are running custom builds of
Postgres?  And of those, how many have changed the WAL segment size?
I've never encountered a non-standard segment size or talked to anyone
who has.  I'm not saying it has *never* happened but I would venture to
say it's rare.



I agree it's rare, but I don't think that means we can just consider the 
option as 'unsupported'. We're even mentioning it in the docs as a valid 
way to customize granularity of the WAL archival.


I certainly know people who run custom builds, and some of them run with 
custom WAL segment size. Some of them are our customers, some are not. 
And yes, some of them actually patched the code to allow 256MB WAL segments.



If we're going to do this, I'm in favor of deciding some reasonable
upper limit (say, 1GB or 2GB sounds good), and allowing all 2^n values
up to that limit.


I'm OK with that.  I'm also OK with providing a few reasonable choices.
I guess that means I'll just go with the majority opinion.


3. New file allocation has been a problem raised with this patch for
some months now.


I've been playing around with this and I don't think short tests show
larger sizes off to advantage.  Larger segments will definitely perform
more poorly until Postgres starts recycling WAL.  Once that happens I
think performance differences should be negligible, though of course
this needs to be verified with longer-running tests.


I'm willing to do some extensive performance testing on the patch. I
don't see how that could happen in the next few day (before the feature
freeze), particularly considering we're interested in long tests.


Cool.  I've been thinking about how to do some meaningful tests for this
(mostly pgbench related).  I'd like to hear what you are thinking.



My plan was to do some pgbench tests with different workloads, scales 
(in shared buffers, in RAM, exceeds RAM), and different storage 
configurations (SSD vs. HDD, WAL/datadir on the same/different 
device/fs, possibly also ext4/xfs).



The question however is whether we need to do this testing when we don't
actually change the default (at least the patch submitted on 3/27 does
seem to keep the 16MB). I assume people specifying a custom value when
calling initdb are expected to know what they are doing (and I don't see
how we can prevent distros from choosing a bad value in their packages -
they could already do that with configure-time option).


Just because we don't change the default doesn't mean that others won't.
 I still think testing for sizes other than 16MB is severely lacking and
I don't believe caveat emptor is the way to go.



Aren't you mixing regression and performance testing? I agree we need to 
be sure all segment sizes are handled correctly, no argument here.



Do we actually have any infrastructure for that? Or do you plan to add
some new animals with different WAL segment sizes?


I don't have plans to add animals.  I think we'd need a way to tell
'make check' to use a different segment size for tests and then
hopefully reconfigure some of the existing animals.



OK. My point was that we don't have that capability now, and the latest 
patch is not adding it either.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 

Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2017-04-06 Thread Andrew Dunstan


On 04/06/2017 06:31 PM, Andres Freund wrote:
> Hi,
>
> I personally, and I know of a bunch of other regular contributors, find
> context diffs very hard to read.  Besides general dislike, for things
> like regression test output context diffs are just not well suited.
> E.g. in
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prairiedog=2017-04-06%2021%3A10%3A56=check
> the salient point (ERROR:  50 is outside the valid range for parameter 
> "effective_io_concurrency" (0 .. 0))
> is 130 lines into the diff, whereas it's right at the start in a unified diff.
> Issues with one error that causes a lot of followup output changes are
> really common in our regression suite.
>
> I personally have PG_REGRESS_DIFF_OPTS set to -dU10, but that doesn't
> help much analyzing buildfarm output.
>
> Therefore I propose changing the defaults in pg_regress.c.
>

+1

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to change pg_regress diffs to unified by default?

2017-04-06 Thread Thomas Munro
On Fri, Apr 7, 2017 at 10:31 AM, Andres Freund  wrote:
> Hi,
>
> I personally, and I know of a bunch of other regular contributors, find
> context diffs very hard to read.  Besides general dislike, for things
> like regression test output context diffs are just not well suited.
> E.g. in
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prairiedog=2017-04-06%2021%3A10%3A56=check
> the salient point (ERROR:  50 is outside the valid range for parameter 
> "effective_io_concurrency" (0 .. 0))
> is 130 lines into the diff, whereas it's right at the start in a unified diff.
> Issues with one error that causes a lot of followup output changes are
> really common in our regression suite.
>
> I personally have PG_REGRESS_DIFF_OPTS set to -dU10, but that doesn't
> help much analyzing buildfarm output.
>
> Therefore I propose changing the defaults in pg_regress.c.

+1

So much easier to read.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-06 Thread Michael Paquier
On Fri, Apr 7, 2017 at 7:29 AM, Simon Riggs  wrote:
> On 6 April 2017 at 16:15, Álvaro Hernández Tortosa  wrote:
>
>> Per the SCRAM RFC, it is the server who advertises and the client who 
>> picks.
>
> Yes, but what does the RFC say about how to handle servers with an 
> pg_hba.conf?
>
> How and what will we advertise?

Did you read the first email of this thread? The RFC says that the
protocol implementers are free to do what they want as this is
protocol-specific. At least that's what I understand.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Time to change pg_regress diffs to unified by default?

2017-04-06 Thread Andres Freund
Hi,

I personally, and I know of a bunch of other regular contributors, find
context diffs very hard to read.  Besides general dislike, for things
like regression test output context diffs are just not well suited.
E.g. in
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prairiedog=2017-04-06%2021%3A10%3A56=check
the salient point (ERROR:  50 is outside the valid range for parameter 
"effective_io_concurrency" (0 .. 0))
is 130 lines into the diff, whereas it's right at the start in a unified diff.
Issues with one error that causes a lot of followup output changes are
really common in our regression suite.

I personally have PG_REGRESS_DIFF_OPTS set to -dU10, but that doesn't
help much analyzing buildfarm output.

Therefore I propose changing the defaults in pg_regress.c.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-06 Thread Simon Riggs
On 6 April 2017 at 16:15, Álvaro Hernández Tortosa  wrote:

> Per the SCRAM RFC, it is the server who advertises and the client who 
> picks.

Yes, but what does the RFC say about how to handle servers with an pg_hba.conf?

How and what will we advertise?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-06 Thread Michael Paquier
On Fri, Apr 7, 2017 at 5:15 AM, Álvaro Hernández Tortosa  
wrote:
> I don't see it. The message AuthenticationSASL.String could contain a
> CSV of the SCRAM protocols supported. This is specially important to support
> channel binding (which is just another protocol name for this matter), which
> is the really enhanced security mechanism of SCRAM. Since this message is
> sent regardless, and the client replies with PasswordMessage, no extra round
> trip is required. However, PasswordMessage needs to also include a field
> with the name of the selected protocol (it is the client who picks). Or a
> different message would need to be created, but no extra round-trips more
> than those required by SCRAM itself (4 messages for SCRAM + 1 extra for the
> server to tell the client it needs to use SCRAM).

Yes, it seems to me that the list of protocols to send should be done
by sendAuthRequest(). Then the client parses the received string, and
sends an extra 'p' message with its choice before sending the first
SCRAM message. So there is no need for any extra round trips.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-06 Thread Peter Geoghegan
On Thu, Apr 6, 2017 at 2:50 PM, Andres Freund  wrote:
> Pushed with very minor wording changes.

This had a typo:

+ * If copy is true, the slot receives a copied tuple that'll that will stay


-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-06 Thread Peter Geoghegan
On Thu, Apr 6, 2017 at 2:50 PM, Andres Freund  wrote:
> Pushed with very minor wording changes.

Thanks Andres.


-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-06 Thread Joe Conway
On 04/06/2017 12:35 PM, Tom Lane wrote:
> Joe Conway  writes:
>> Any thoughts on whether 0001a and 0001b ought to be backpatched? I'm
>> thinking not given the lack of past complaints but it might make sense
>> to do.
> 
> I think 0001a absolutely needs to be, because it is fixing what is really
> an ABI violation: sepgsql_needs_fmgr_hook is supposed to return our notion
> of bool, but as things stand it's returning _Bool (which is why the
> compiler is complaining).  Now we might get away with that on most
> hardware, but on platforms where those are different widths, it's possible
> to imagine function-return conventions that would make it fail.
> 
> 0001b seems to only be needed for compilers that aren't smart enough
> to see that tclass won't be referenced for RELKIND_INDEX, so it's
> just cosmetic.

Ok, committed/pushed that way.

I found some missing bits in the 0002 patch -- new version attached.
Will wait on new regression tests before committing, but I expect we'll
have those by end of today and be able to commit the rest tomorrow.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
Add partitioned table support to sepgsql

The new partitioned table capability added a new relkind, namely
RELKIND_PARTITIONED_TABLE. Update sepgsql to treat this new relkind
exactly the same way it does RELKIND_RELATION.

Issue raised by Stephen Frost and initial patch by Mike Palmiotto.
Review by Tom Lane and Robert Haas, and editorializing by me.

Discussion: https://postgr.es/m/flat/623bcaae-112e-ced0-8c22-a84f75ae0c53%40joeconway.com

diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c
index bc17089..b643720 100644
*** a/contrib/sepgsql/dml.c
--- b/contrib/sepgsql/dml.c
*** check_relation_privileges(Oid relOid,
*** 190,195 
--- 190,196 
  	switch (relkind)
  	{
  		case RELKIND_RELATION:
+ 		case RELKIND_PARTITIONED_TABLE:
  			result = sepgsql_avc_check_perms(,
  			 SEPG_CLASS_DB_TABLE,
  			 required,
*** check_relation_privileges(Oid relOid,
*** 225,231 
  	/*
  	 * Only columns owned by relations shall be checked
  	 */
! 	if (relkind != RELKIND_RELATION)
  		return true;
  
  	/*
--- 226,232 
  	/*
  	 * Only columns owned by relations shall be checked
  	 */
! 	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
  		return true;
  
  	/*
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 1a8f884..6239800 100644
*** a/contrib/sepgsql/label.c
--- b/contrib/sepgsql/label.c
*** exec_object_restorecon(struct selabel_ha
*** 779,785 
  			case RelationRelationId:
  relForm = (Form_pg_class) GETSTRUCT(tuple);
  
! if (relForm->relkind == RELKIND_RELATION)
  	objtype = SELABEL_DB_TABLE;
  else if (relForm->relkind == RELKIND_SEQUENCE)
  	objtype = SELABEL_DB_SEQUENCE;
--- 787,794 
  			case RelationRelationId:
  relForm = (Form_pg_class) GETSTRUCT(tuple);
  
! if (relForm->relkind == RELKIND_RELATION ||
! 	relForm->relkind == RELKIND_PARTITIONED_TABLE)
  	objtype = SELABEL_DB_TABLE;
  else if (relForm->relkind == RELKIND_SEQUENCE)
  	objtype = SELABEL_DB_SEQUENCE;
*** exec_object_restorecon(struct selabel_ha
*** 803,809 
  			case AttributeRelationId:
  attForm = (Form_pg_attribute) GETSTRUCT(tuple);
  
! if (get_rel_relkind(attForm->attrelid) != RELKIND_RELATION)
  	continue;	/* no need to assign security label */
  
  objtype = SELABEL_DB_COLUMN;
--- 812,819 
  			case AttributeRelationId:
  attForm = (Form_pg_attribute) GETSTRUCT(tuple);
  
! if (get_rel_relkind(attForm->attrelid) != RELKIND_RELATION &&
! 	get_rel_relkind(attForm->attrelid) != RELKIND_PARTITIONED_TABLE)
  	continue;	/* no need to assign security label */
  
  objtype = SELABEL_DB_COLUMN;
diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c
index ab98a9b..59a6d9b 100644
*** a/contrib/sepgsql/relation.c
--- b/contrib/sepgsql/relation.c
*** sepgsql_attribute_post_create(Oid relOid
*** 54,65 
  	ObjectAddress object;
  	Form_pg_attribute attForm;
  	StringInfoData audit_name;
  
  	/*
! 	 * Only attributes within regular relation have individual security
! 	 * labels.
  	 */
! 	if (get_rel_relkind(relOid) != RELKIND_RELATION)
  		return;
  
  	/*
--- 54,66 
  	ObjectAddress object;
  	Form_pg_attribute attForm;
  	StringInfoData audit_name;
+ 	char		relkind = get_rel_relkind(relOid);
  
  	/*
! 	 * Only attributes within regular relations or partition relations have
! 	 * individual security labels.
  	 */
! 	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
  		return;
  
  	/*
*** sepgsql_attribute_drop(Oid relOid, AttrN
*** 135,142 
  {
  	ObjectAddress object;
  	char	   *audit_name;
  
! 	if (get_rel_relkind(relOid) != RELKIND_RELATION)
  		return;
 

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-06 Thread Andres Freund
On 2017-03-13 18:14:07 -0700, Peter Geoghegan wrote:
> On Wed, Jan 25, 2017 at 3:11 PM, Peter Geoghegan  wrote:
> > On Wed, Jan 25, 2017 at 3:11 PM, Tom Lane  wrote:
> >> Please.  You might want to hit the existing ones with a separate patch,
> >> but it doesn't much matter; I'd be just as happy with a patch that did
> >> both things.
> >
> > Got it.
> 
> Attached is a patch that does both things at once.

Pushed with very minor wording changes.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-04-06 Thread Alvaro Herrera
Tom Lane wrote:

> TBH, I think that code is in the noise.  It doesn't involve any disk
> access, or catalog access, or user-defined function calls.  I wouldn't
> bother trying to account for it.

I removed it in the pushed version.

> What you should be accounting for is the ensuing heap page accesses,
> but I assume that's done somewhere else.

It's supposed to be accounted for, yeah.

One thing we do not account for is the number of extra heap accesses we
do for unsummarized ranges (mostly, heap has grown but the index doesn't
cover the new pages yet).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread David Steele
On 4/6/17 5:05 PM, Tomas Vondra wrote:
> On 04/06/2017 08:33 PM, David Steele wrote:
>> On 4/5/17 7:29 AM, Simon Riggs wrote:
>>
>>> 2. It's not clear to me the advantage of being able to pick varying
>>> filesizes. I see great disadvantage in having too many options, which
>>> greatly increases the chance of incompatibility, annoyance and
>>> breakage. I favour a small number of values that have been shown by
>>> testing to be sweet spots in performance and usability. (1GB has been
>>> suggested)
>>
>> I'm in favor of 16,64,256,1024.
>>
> 
> I don't see a particular reason for this, TBH. The sweet spots will be
> likely dependent hardware / OS configuration etc. Assuming there
> actually are sweet spots - no one demonstrated that yet.

Fair enough, but my feeling is that this patch has never been about
server performance, per se.  Rather, is is about archive management and
trying to stem the tide of WAL as servers get bigger and busier.
Generally, archive commands have to make a remote connection to offload
WAL and that has a cost per segment.

> Also, I don't see how supporting additional WAL sizes increases chance
> of incompatibility. We already allow that, so either the tools (e.g.
> backup solutions) assume WAL segments are always 16MB (in which case are
> essentially broken) or support valid file sizes (in which case they
> should have no issues with the new ones).

I don't see how a compile-time option counts as "supporting that" in
practice.  How many people in the field are running custom builds of
Postgres?  And of those, how many have changed the WAL segment size?
I've never encountered a non-standard segment size or talked to anyone
who has.  I'm not saying it has *never* happened but I would venture to
say it's rare.

> If we're going to do this, I'm in favor of deciding some reasonable
> upper limit (say, 1GB or 2GB sounds good), and allowing all 2^n values
> up to that limit.

I'm OK with that.  I'm also OK with providing a few reasonable choices.
I guess that means I'll just go with the majority opinion.

>>> 3. New file allocation has been a problem raised with this patch for
>>> some months now.
>>
>> I've been playing around with this and I don't think short tests show
>> larger sizes off to advantage.  Larger segments will definitely perform
>> more poorly until Postgres starts recycling WAL.  Once that happens I
>> think performance differences should be negligible, though of course
>> this needs to be verified with longer-running tests.
>>
> I'm willing to do some extensive performance testing on the patch. I
> don't see how that could happen in the next few day (before the feature
> freeze), particularly considering we're interested in long tests.

Cool.  I've been thinking about how to do some meaningful tests for this
(mostly pgbench related).  I'd like to hear what you are thinking.

> The question however is whether we need to do this testing when we don't
> actually change the default (at least the patch submitted on 3/27 does
> seem to keep the 16MB). I assume people specifying a custom value when
> calling initdb are expected to know what they are doing (and I don't see
> how we can prevent distros from choosing a bad value in their packages -
> they could already do that with configure-time option).

Just because we don't change the default doesn't mean that others won't.
 I still think testing for sizes other than 16MB is severely lacking and
I don't believe caveat emptor is the way to go.

> Do we actually have any infrastructure for that? Or do you plan to add
> some new animals with different WAL segment sizes?

I don't have plans to add animals.  I think we'd need a way to tell
'make check' to use a different segment size for tests and then
hopefully reconfigure some of the existing animals.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Collect and use multi-column dependency stats

2017-04-06 Thread David Rowley
On 7 April 2017 at 00:47, Simon Riggs  wrote:
> On 5 April 2017 at 18:48, Tom Lane  wrote:
>> Simon Riggs  writes:
>>> Collect and use multi-column dependency stats
>>
>> The buildfarm is unhappy about the fact that this changed the API
>> for clauselist_selectivity().  I am not convinced that that change
>> was a good idea, so before telling FDW authors that they need to
>> change their code, I'd like to hear a defense of the API change.
>> Why not just use the existing varRelid parameter for that?  Why
>> is there an assumption that only one rel's extended stats will
>> ever be of interest?  This function does get used for join clauses.
>
> Point noted. Reading thread and hope to fix today.

I've attached a rebased patch which fixes up the conflict with the
BRIN cost estimate patch which went in a short while ago.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


find_relation_from_clauses_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel explain analyze support not exercised

2017-04-06 Thread Andres Freund
On 2017-04-06 17:33:49 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I guess we'll have to see. My personal conclusion is that greater
> > coverage of parallelism is worth some very minor config trouble for
> > people doing installcheck against clusters with non-default config.
> 
> The buildfarm seems entirely unwilling to play along.

That was the parallel bitmap scan test, not this (I think).  I've
already pushed a fix which should address the issue - it at least does
so locally.  Both Dilip and I had apparently forgotten that we disallow
setting effective_io_concurrency on platforms without USE_PREFETCH
support.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel explain analyze support not exercised

2017-04-06 Thread Tom Lane
Andres Freund  writes:
> I guess we'll have to see. My personal conclusion is that greater
> coverage of parallelism is worth some very minor config trouble for
> people doing installcheck against clusters with non-default config.

The buildfarm seems entirely unwilling to play along.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-06 Thread Peter Geoghegan
On Thu, Apr 6, 2017 at 2:05 PM, Andres Freund  wrote:
> I'm not sure you entirely got my point here.  If tuplesort_gettupleslot
> is called with copy = true, it'll store that tuple w/
> ExecStoreMinimalTuple passing shouldFree = copy = true.  If the caller
> is in a short-lived context, e.g. some expression context, and resets
> that context before the slot is cleared (either by storing another tuple
> or ExecClearTuple()) you'll get a double free, because the context has
> freed the tuple in bulk, and then
> if (slot->tts_shouldFree)
> heap_freetuple(slot->tts_tuple);
> will do its work.

Calling ExecClearTuple() will mark the slot "tts_shouldFree = false"
-- you only have a problem when a memory context is reset, which
obviously cannot be accounted for by ExecClearTuple(). ISTM that
ResetExprContext() is kind of called to "make sure" that memory is
freed in a short-lived expression context, at a level up from any
ExecStoreMinimalTuple()/ExecClearTuple() style memory management. The
conventions are not centrally documented, AFAIK, but this still seems
natural enough to me. Per-tuple contexts tend to be associated with
expression contexts.

In any case, I'm not sure where you'd centrally document the
conventions. Although, it seems clear that it wouldn't be anywhere
this patch currently touches. The executor README, perhaps?

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No-op case in ExecEvalConvertRowtype

2017-04-06 Thread Tom Lane
I wrote:
> Ashutosh Bapat  writes:
>> In ExecEvalConvertRowtype(), if the input row doesn't require any
>> conversion, we simply return that row as is.

> Huh.  That's been like that for a very long time.

So I imagined that this was an ancient bug, and was proceeding on that
basis, until I noticed that the test case I showed doesn't crash in 9.6
or before.  Which is pretty interesting because the assertion in
ExecEvalConvertRowtype is certainly there, back to 9.4 (and in an even
stronger fashion in older branches).

Digging further, the reason that the back branches don't crash is that
they don't believe that these tuple conversions are no-ops.  And that
traces to this test in convert_tuples_by_name:

/*
 * Check to see if the map is one-to-one and the tuple types are the same.
 * (We check the latter because if they're not, we want to do conversion
 * to inject the right OID into the tuple datum.)
 */
if (indesc->natts == outdesc->natts &&
indesc->tdtypeid == outdesc->tdtypeid)
{
...

Because a ConvertRowtypeExpr would only be inserted to change a tuple's
rowtype from some composite type to some other composite type, it's
basically impossible for convert_tuples_by_name to decide that the mapping
is a no-op, which explains the comment in ExecEvalConvertRowtype doubting
that a no-op case is possible.  Moreover, if a no-op case did happen, the
tuple being returned would in fact contain the right type OID, so there's
no bug.

Or at least that's how it is in 9.6.  In HEAD, it's been broken by
commit 3838074f8, which as near as I can tell was completely misguided.
Apparently, Robert and Amit misread the comment about "injecting the right
OID" to refer to a possible value in the tuple's OID system column, rather
than the rowtype OID that must be placed in the composite datum's header.

I propose to deal with this by reverting 3838074f8 in toto, and then
trying to clarify that comment, and maybe adding a regression test case
based on the example I showed earlier so that it will be a little more
obvious if someone breaks this again.

However, I see that 3838074f8 touches some partitioning code, which
makes me wonder if there's anything in the partitioning logic that
really depends on this erroneous "optimization".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Increase parallel bitmap scan test coverage.

2017-04-06 Thread Andres Freund
On 2017-04-06 20:43:48 +, Andres Freund wrote:
> Increase parallel bitmap scan test coverage.
> 
> Author: Dilip Kumar
> Discussion: 
> https://postgr.es/m/20170331184603.qcp7t4md5bzxb...@alap3.anarazel.de

This turned the !linux buildfarm red, because it relies on setting
effective_io_concurrency (to increase coverage to the prefetching code).
I plan to wrap the SET in a DO block with an exception handler.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making clausesel.c Smarter

2017-04-06 Thread Claudio Freire
On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire  wrote:
>>> If you ask me, I'd just leave:
>>>
>>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>>> DEFAULT_INEQ_SEL)
>>> + {
>>> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>>> implies we have no stats */
>>> + s2 = DEFAULT_RANGE_INEQ_SEL;
>>> + }
>>> + else
>>> + {
>>> ...
>>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>>> +   notnullsel = 1.0 - nullsel;
>>> +
>>> +   /* Adjust for double-exclusion of NULLs */
>>> +   s2 += nullsel;
>>> +   if (s2 <= 0.0) {
>>> +  if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>>> +  {
>>> +   /* Most likely contradictory clauses found */
>>> +   s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>>> +  }
>>> +  else
>>> +  {
>>> +   /* Could be a rounding error */
>>> +   s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>> +  }
>>> +   }
>>> + }
>>>
>>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>>> cautious) estimation of the amount of rounding error that could be
>>> there with 32-bit floats.
>>>
>>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>>> an estimation based on stats, maybe approximate, but one that makes
>>> sense (ie: preserves the monotonicity of the CPF), and as such
>>> negative values are either sign of a contradiction or rounding error.
>>> The previous code left the possibility of negatives coming out of
>>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>>> but that doesn't seem possible coming out of scalarineqsel.
>>>
>>> But I'd like it if Tom could comment on that, since he's the one that
>>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
>>> in 2004).
>>>
>>> Barring that, I'd just change the
>>>
>>> s2 = DEFAULT_RANGE_INEQ_SEL;
>>>
>>> To
>>>
>>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>
>>> Which makes a lot more sense.
>>
>> I think to fix this properly the selfuncs would have to return the
>> estimate, and nullfrac separately, so the nullfrac could just be
>> applied once per expression. There's already hacks in there to get rid
>> of the double null filtering. I'm not proposing that as something for
>> this patch. It would be pretty invasive to change this.
>
> There's no need, you can compute the nullfrac with nulltestsel. While
> there's some rework involved, it's not a big deal (it just reads the
> stats tuple), and it's a clean solution.

I'm marking this Waiting on Author until we figure out what to do with
those issues (the null issue quoted above, and the quadratic behavior)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-06 Thread Andres Freund
On 2017-04-04 13:49:11 -0700, Peter Geoghegan wrote:
> On Tue, Apr 4, 2017 at 1:32 PM, Andres Freund  wrote:
> >>  static bool
> >>  tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
> >> @@ -2091,12 +2092,15 @@ tuplesort_gettuple_common(Tuplesortstate *state, 
> >> bool forward,
> >>   * NULL value in leading attribute will set abbreviated value to zeroed
> >>   * representation, which caller may rely on in abbreviated inequality 
> >> check.
> >>   *
> >> - * The slot receives a copied tuple (sometimes allocated in caller memory
> >> - * context) that will stay valid regardless of future manipulations of the
> >> - * tuplesort's state.
> >> + * If copy is TRUE, the slot receives a copied tuple that will stay valid
> >> + * regardless of future manipulations of the tuplesort's state.  Memory is
> >> + * owned by the caller.  If copy is FALSE, the slot will just receive a
> >> + * pointer to a tuple held within the tuplesort, which is more efficient,
> >> + * but only safe for callers that are prepared to have any subsequent
> >> + * manipulation of the tuplesort's state invalidate slot contents.
> >>   */
> >
> > Hm. Do we need to note anything about how long caller memory needs to
> > live? I think we'd get into trouble if the caller does a memory context
> > reset, without also clearing the slot?
> 
> Since we arrange to have caller explicitly own memory (it's always in
> their own memory context (current context), which is not the case with
> other similar routines), it's 100% the caller's problem. As I assume
> you know, convention in executor around resource management of memory
> like this is pretty centralized within ExecStoreTuple(), and nearby
> routines. In general, the executor is more or less used to having this
> be its problem alone, and is pessimistic about memory lifetime unless
> otherwise noted.

I'm not sure you entirely got my point here.  If tuplesort_gettupleslot
is called with copy = true, it'll store that tuple w/
ExecStoreMinimalTuple passing shouldFree = copy = true.  If the caller
is in a short-lived context, e.g. some expression context, and resets
that context before the slot is cleared (either by storing another tuple
or ExecClearTuple()) you'll get a double free, because the context has
freed the tuple in bulk, and then
if (slot->tts_shouldFree)
heap_freetuple(slot->tts_tuple);
will do its work.

Now, that's obviously not a problem for existing code, because it worked
that way for a long time - I just was wondering whether that needs to be
called out.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-06 Thread Tomas Vondra

On 04/06/2017 08:33 PM, David Steele wrote:

On 4/5/17 7:29 AM, Simon Riggs wrote:

On 5 April 2017 at 06:04, Beena Emerson  wrote:


The  WALfilename - LSN mapping disruption for higher values you mean? Is
there anything else I have missed?


I see various issues raised but not properly addressed

1. we would need to drop support for segment sizes < 16MB unless we
adopt a new incompatible filename format.
I think at 16MB the naming should be the same as now and that
WALfilename -> LSN is very important.
For this release, I think we should just allow >= 16MB and avoid the
issue thru lack of time.


+1.


2. It's not clear to me the advantage of being able to pick varying
filesizes. I see great disadvantage in having too many options, which
greatly increases the chance of incompatibility, annoyance and
breakage. I favour a small number of values that have been shown by
testing to be sweet spots in performance and usability. (1GB has been
suggested)


I'm in favor of 16,64,256,1024.



I don't see a particular reason for this, TBH. The sweet spots will be 
likely dependent hardware / OS configuration etc. Assuming there 
actually are sweet spots - no one demonstrated that yet.


Also, I don't see how supporting additional WAL sizes increases chance 
of incompatibility. We already allow that, so either the tools (e.g. 
backup solutions) assume WAL segments are always 16MB (in which case are 
essentially broken) or support valid file sizes (in which case they 
should have no issues with the new ones).


If we're going to do this, I'm in favor of deciding some reasonable 
upper limit (say, 1GB or 2GB sounds good), and allowing all 2^n values 
up to that limit.



3. New file allocation has been a problem raised with this patch for
some months now.


I've been playing around with this and I don't think short tests show
larger sizes off to advantage.  Larger segments will definitely perform
more poorly until Postgres starts recycling WAL.  Once that happens I
think performance differences should be negligible, though of course
this needs to be verified with longer-running tests.



I'm willing to do some extensive performance testing on the patch. I 
don't see how that could happen in the next few day (before the feature 
freeze), particularly considering we're interested in long tests.


The question however is whether we need to do this testing when we don't 
actually change the default (at least the patch submitted on 3/27 does 
seem to keep the 16MB). I assume people specifying a custom value when 
calling initdb are expected to know what they are doing (and I don't see 
how we can prevent distros from choosing a bad value in their packages - 
they could already do that with configure-time option).



If archive_timeout is set then there will also be additional time taken
to zero out potentially larger unused parts of the segment.  I don't see
this as an issue, however, because if archive_timeout is being triggered
then the system is very likely under lower than usual load.


Lack of clarity and/or movement on these issues is very, very close to
getting the patch rejected now. Let's not approach this with the
viewpoint that I or others want it to be rejected, lets work forwards
and get some solid changes that will improve the world without causing
problems.


I would definitely like to see this go in, though I agree with Peter
that we need a lot more testing.  I'd like to see some build farm
animals testing with different sizes at the very least, even if there's
no time to add new tests.



Do we actually have any infrastructure for that? Or do you plan to add 
some new animals with different WAL segment sizes?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel explain analyze support not exercised

2017-04-06 Thread Andres Freund
Hi,

On 2017-04-03 17:11:33 -0400, Robert Haas wrote:
> On Mon, Apr 3, 2017 at 3:31 PM, Andres Freund  wrote:
> >> If this is 'make check', then we should have 8 parallel workers
> >> allowed, so if we only do one of these at a time, then I think we're
> >> OK.  But if somebody changes that configuration setting or if it's
> >> 'make installcheck', then the configuration could be anything.
> >
> > Hm - we already rely on max_parallel_workers_per_gather being set with
> > some of the explains in the test.  So I guess we're ok also relying on
> > actual workers being present?
> 
> I'm not really sure about that one way or the other.  Our policy on
> which configurations are supported vis-a-vis 'make installcheck' seems
> to be, essentially, that if a sufficiently-prominent community member
> cares about it, then it ends up getting made to work, unless an
> even-more-prominent community member objects.  That's why, for
> example, our regression tests pass in Czech.  I can't begin to guess
> whether breaking installcheck against configurations with low values
> of max_parallel_workers or max_worker_processes will bother anybody.

I guess we'll have to see. My personal conclusion is that greater
coverage of parallelism is worth some very minor config trouble for
people doing installcheck against clusters with non-default config.

Thanks Rafia!

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel bitmapscan isn't exercised in regression tests

2017-04-06 Thread Andres Freund
On 2017-04-06 10:00:32 +0530, Dilip Kumar wrote:
> On Tue, Apr 4, 2017 at 5:51 AM, Dilip Kumar  wrote:
> > Sure I can do that, In attached patch, I only fixed the problem of not
> > executing the bitmap test.  Now, I will add few cases to cover other
> > parts especially rescan and prefetching logic.
> 
> I have added two test cases to cover rescan, prefetch and lossy pages
> logic for parallel bitmap.  I have removed the existing case because
> these two new cases will be enough to cover that part as well.
> 
> Now, nodeBitmapHeapScan.c has 95.5% of line coverage.

Great!  Pushed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TPC-H Q20 from 1 hour to 19 hours!

2017-04-06 Thread Tomas Vondra

Hi,

I've been looking at this issue today, and so far I don't think it's a 
bug in the foreign key estimation. It seems mostly that the 9.5 
estimates were hopelessly bad, and the join estimation changes simply 
pushed it a tiny bit the wrong direction.


Although maybe there is a bug (or at least a change of behavior) in one 
case, but I'll get to that.


I've managed to extract a small part of Q20 that demonstrates the 
differences between versions quite nicely, I think. The part causing the 
trouble looks like this:


  explain select
  ps_suppkey
  from
  partsupp,
  (
  select
  l_partkey agg_partkey,
  l_suppkey agg_suppkey
  from
  lineitem
  where
  l_shipdate >= date '1997-01-01'
  and l_shipdate < date '1997-01-01' + interval '1' year
  group by
  l_partkey,
  l_suppkey
  ) agg_lineitem
  where
  agg_partkey = ps_partkey
  and agg_suppkey = ps_suppkey
  and ps_partkey in (
  select
  p_partkey
  from
  part
  where
  p_name like 'hot%'
  );

i.e. it aggregates the "lineitem" table, and then joins "partsupp" and 
"part" tables to it.


"aggregated lineitem" <-> partsupp <-> part

I've collected estimates from four different variants of the query (see 
the attached exlain.sql):


1) SIMPLE
  - join directly to lineitem (without the aggregation)
  - remove the p_name LIKE pattern matching

2) SIMPLE+LIKE
  - like SIMPLE, but keep the LIKE condition

3) GROUPING
  - join to the aggregated lineitem table
  - remove the p_name LIKE pattern matching

4) GROUPING+LIKE
  - like GROUPING, but keep the LIKE condition

I've collected estimates on a 20GB data set, both from 9.5 (so without 
any of the FK estimation changes) and on master with different foreign 
keys between the tables.


no-keys  - no foreign keys between the three tables
lineitem - lineitem references partsupp
partsupp - partsupp references part
both - both foreign keys

And the results look like this (actual row counts were collected on 9.5, 
but that should not matter - the results should be the same on all 
versions):


  branch SIMPLE SIMPLE+LIKE GROUPINGGROUPING+LIKE
  
  actual  119994608 1311974 10897186   119238
 9.5   2863  35  160  160
 no-keys   2340  24  868  868
lineitem  119994848 1229750  868  868
partsupp   2340  24 1737   18
   both-keys  119994848 1212065 1737   18

This seems mostly sane, I guess, but let's look at various cases.

In the SIMPLE cases, the foreign key "lineitem->partsupp" makes a huge 
difference - the estimates are pretty exact, both with and without the 
LIKE condition. The "partsupp->part" key makes almost no difference, 
though - the minor differences (35/24 and 1229750/1212065) seem to be 
mostly due to minor differences in stats built by ANALYZE, particularly 
in histograms used by patternsel().


In the GROUPING cases, the situation is obviously much worse. The 
grouping makes it impossible to use the "lineitem->partsupp" foreign 
key, resulting in severe underestimates. The "partsupp->part" is used, 
but the difference is pretty negligible as it's a simple (one column) 
foreign key.


The change from 160 -> 868 is merely due to 84f9a35e3 changing how we 
estimate number of groups in a GROUP BY clause. In 9.5 we get this:


->  HashAggregate  (rows=1836028) (actual rows=10897186)

while since 9.6 we get this

->  GroupAggregate  (rows=9674242)

Not only is that much closer to the actual value than the 9.5 estimate, 
but it's almost exactly the factor between 160 and 868:


9674242 / 1836028 = 5.27
160 * 5.26 = 843

So I'd say the 160 vs. 868 is expected, although the result is still way 
off, of course.


Which brings me to the slightly suspicious bit. On 9.5, there's no 
difference between GROUP and GROUP+LIKE cases - the estimates are 
exactly the same in both cases. This is true too, but only without the 
foreign key between "partsupp" and "part", i.e. the two non-grouped 
relations in the join. And what's more, the difference (1737 vs. 16) is 
pretty much exactly 100x, which is the estimate for the LIKE condition.


So it kinda seems 9.5 does not apply this condition for semi-joins, 
while >=9.6 does that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
   QUERY PLAN   
 
-
 

Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-06 Thread Simon Riggs
On 6 April 2017 at 16:05, Tom Lane  wrote:

> Perhaps we could turn this around: have the client send (in the connection
> request packet) a list of auth protocols it thinks it is able to handle.
> (I'm envisioning this as being more or less fixed for any one version of
> any one client, since it would basically mean "I have code to do X, Y, or
> Z".)  Then the server can pick one that is allowed by pg_hba.conf,

+1

Much better plan.

> or it
> can just ignore the list and send what it wants anyway, probably leading
> to client disconnect.

It would need to follow one of the requested protocols, but mark the
request as doomed. Otherwise we'd be revealing information. That's
what SCRAM does now.

Since the list is currently length one, we can add more later when we
get a list potentially > 1.

> We could avoid this being a protocol break by having the server's default
> assumption being that the client can handle all pre-SCRAM auth protocols.

+1

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-06 Thread Álvaro Hernández Tortosa



On 06/04/17 22:05, Tom Lane wrote:

Simon Riggs  writes:

How would we provide the list of protocols? Surely the protocol is
defined by pg_hba.conf, which makes it dependent upon username,
database and ip range. If the list were accurate, it would allow an
attacker to discover how best to attack. If the list were inaccurate
it would just be an annoyance.
At minimum, providing the list of protocols means an extra round trip
to the server.

Yeah, that's a problem.


I don't see it. The message AuthenticationSASL.String could contain 
a CSV of the SCRAM protocols supported. This is specially important to 
support channel binding (which is just another protocol name for this 
matter), which is the really enhanced security mechanism of SCRAM. Since 
this message is sent regardless, and the client replies with 
PasswordMessage, no extra round trip is required. However, 
PasswordMessage needs to also include a field with the name of the 
selected protocol (it is the client who picks). Or a different message 
would need to be created, but no extra round-trips more than those 
required by SCRAM itself (4 messages for SCRAM + 1 extra for the server 
to tell the client it needs to use SCRAM).





ISTM that if you have a valid role to connect to then you'll also know
what authentication mechanism to use so you should be able to specify
the mechanism in your connection message and save the extra trip.

I do not buy that in the least.  It has never been the case before now
that clients know in advance what the auth challenge method will be.
If we put that requirement on them for SCRAM, we're just going to be
exporting a lot of pain and end-user-visible inconsistency.

Perhaps we could turn this around: have the client send (in the connection
request packet) a list of auth protocols it thinks it is able to handle.


Per the SCRAM RFC, it is the server who advertises and the client 
who picks.



Regards,

Álvaro


--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-06 Thread Tom Lane
Simon Riggs  writes:
> How would we provide the list of protocols? Surely the protocol is
> defined by pg_hba.conf, which makes it dependent upon username,
> database and ip range. If the list were accurate, it would allow an
> attacker to discover how best to attack. If the list were inaccurate
> it would just be an annoyance.

> At minimum, providing the list of protocols means an extra round trip
> to the server.

Yeah, that's a problem.

> ISTM that if you have a valid role to connect to then you'll also know
> what authentication mechanism to use so you should be able to specify
> the mechanism in your connection message and save the extra trip.

I do not buy that in the least.  It has never been the case before now
that clients know in advance what the auth challenge method will be.
If we put that requirement on them for SCRAM, we're just going to be
exporting a lot of pain and end-user-visible inconsistency.

Perhaps we could turn this around: have the client send (in the connection
request packet) a list of auth protocols it thinks it is able to handle.
(I'm envisioning this as being more or less fixed for any one version of
any one client, since it would basically mean "I have code to do X, Y, or
Z".)  Then the server can pick one that is allowed by pg_hba.conf, or it
can just ignore the list and send what it wants anyway, probably leading
to client disconnect.

We could avoid this being a protocol break by having the server's default
assumption being that the client can handle all pre-SCRAM auth protocols.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Row Level Security UPDATE Confusion

2017-04-06 Thread Rod Taylor
I'm a little confused on why a SELECT policy fires against the NEW record
for an UPDATE when using multiple FOR policies. The ALL policy doesn't seem
to have that restriction.


DROP TABLE IF EXISTS t;

CREATE USER simple;
CREATE USER split;
CREATE TABLE t(value int);
grant select, update on table t to simple, split;

INSERT INTO t values (1), (2);


ALTER TABLE t ENABLE ROW LEVEL SECURITY;
CREATE POLICY simple_all ON t TO simple USING (value > 0) WITH CHECK (true);

CREATE POLICY split_select ON t FOR SELECT TO split USING (value > 0);
CREATE POLICY split_update ON t FOR UPDATE TO split USING (true) WITH CHECK
(true);


SET session authorization simple;
SELECT * FROM t;
UPDATE t SET value = value * -1 WHERE value = 1;

SET session authorization split;
SELECT * FROM t;
UPDATE t SET value = value * -1 WHERE value = 2;
/* UPDATE fails with below error:
psql:/tmp/t.sql:24: ERROR:  42501: new row violates row-level security
policy for table "t"
LOCATION:  ExecWithCheckOptions, execMain.c:2045
*/

SET session authorization default;
SELECT * FROM t;

This seems consistent in both Pg 9.5 and upcoming Pg 10.


-- 
Rod Taylor


Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-06 Thread Simon Riggs
On 4 April 2017 at 02:02, Michael Paquier  wrote:
> Hi all,
>
> There is still one open item pending for SCRAM that has not been
> treated which is mentioned here:
> https://www.postgresql.org/message-id/b081887e-1712-3aa4-7dbe-e012333d5...@iki.fi
>
> When doing an authentication with SASL, the server decides what is the
> mechanism that the client has to use. As SCRAM-SHA-256 is only one of
> such mechanisms, it would be nice to have something more generic and
> have the server return to the client a list of protocols that the
> client can choose from. And also the server achnowledge which protocol
> is going to be used.
>
> Note that RFC4422 has some content on the matter
> https://tools.ietf.org/html/rfc4422#section-3.1:
>Mechanism negotiation is protocol specific.
>
>Commonly, a protocol will specify that the server advertises
>supported and available mechanisms to the client via some facility
>provided by the protocol, and the client will then select the "best"
>mechanism from this list that it supports and finds suitable.
>
> So once the server sends back the list of mechanisms that are
> supported, the client is free to use what it wants.
>
> On HEAD, a 'R' message with AUTH_REQ_SASL followed by
> SCRAM_SHA256_NAME is sent to let the client know what is the mechanism
> to use for the SASL exchange. In the future, this should be extended
> so as a list of names is sent, for example a comma-separated list, but
> we are free to choose the format we want here. With this list at hand,
> the client can then choose the protocol it thinks is the best. Still,
> there is a gap with our current implementation because the server
> expects the first message from the client to have a SCRAM format, but
> that's true only if SCRAM-SHA-256 is used as mechanism.

How would we provide the list of protocols? Surely the protocol is
defined by pg_hba.conf, which makes it dependent upon username,
database and ip range. If the list were accurate, it would allow an
attacker to discover how best to attack. If the list were inaccurate
it would just be an annoyance.

At minimum, providing the list of protocols means an extra round trip
to the server.

So while RFC4422 might say what "commonly" happens, I don't think it
applies sensibly to PostgreSQL, especially when we only have one
method.

ISTM that if you have a valid role to connect to then you'll also know
what authentication mechanism to use so you should be able to specify
the mechanism in your connection message and save the extra trip.

> In order to cover this gap, it seems to me that we need to have an
> intermediate state before the server is switched to FE_SCRAM_INIT so
> as the mechanism used is negotiated between the two parties. Once the
> protocol negotiation is done, the server can then move on with the
> mechanism to use. This would be important in the future to allow more
> SASL mechanisms to work. I am adding an open item for that.

So I don't see a gap or an open item on that point. I see a potential
future feature.

> For extensibility, we may also want to revisit the choice of defining
> 'scram' in pg_hba.conf instead of 'sasl'...

I'd like to see us replace "scram" with "sasl=scram-sha-256".

So when we extend it in future, we already have the syntax in place to
support "sasl=newmethod", rather than introduce syntax that we already
know will become outdated in future.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-06 Thread Tom Lane
Joe Conway  writes:
> Any thoughts on whether 0001a and 0001b ought to be backpatched? I'm
> thinking not given the lack of past complaints but it might make sense
> to do.

I think 0001a absolutely needs to be, because it is fixing what is really
an ABI violation: sepgsql_needs_fmgr_hook is supposed to return our notion
of bool, but as things stand it's returning _Bool (which is why the
compiler is complaining).  Now we might get away with that on most
hardware, but on platforms where those are different widths, it's possible
to imagine function-return conventions that would make it fail.

0001b seems to only be needed for compilers that aren't smart enough
to see that tclass won't be referenced for RELKIND_INDEX, so it's
just cosmetic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >