date:20170130

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-01-30 Thread Vladimir Borodin

> 31 янв. 2017 г., в 9:50, Michael Paquier  
> написал(а):
> 
> On Mon, Jan 30, 2017 at 4:28 PM, Andrew Borodin  wrote:
>> I'll summarize here my state of studying concurrent methods of page 
>> unlinking.
>> 
>> GIN B-tree does not have "high key". That means, that rightmost key on
>> a page is maximal for the non-leaf page.
>> But I do not see anything theoretical in a way of implementation of
>> Lanin and Shasha`s methods of page merging, with slight modifications.
>> Their paper does not even mention high key(high fence key in papers by
>> Goetz Graefe).
>> 
>> But it's not a simple task due to large portions of shared code
>> between entry tree and posting tree.
>> 
>> Also, I do not see a reason why this method can be practically
>> superior to proposed patch.
>> 
>> Currently, I do not have resources to implement a proof of concept for
>> fully concurrent page unlinking to make benchmarking.
> 
> I am marking this patch as returned with feedback.

Michael, sorry, but why? If I understood everything right, the main question 
from Jeff was why is it implemented in such way? And Jeff wanted to see other 
ways of solving the problem. Andrew wrote about them above and it seems that 
implementing them would be quite expensive and without any obvious win. I would 
rather expect some reaction from Jeff or may be someone else, so shouldn’t it 
be marked as «Ready for committer» or at least «Moved to next CF»?

> -- 
> Michael

--
May the force be with you…
https://simply.name

Re: [HACKERS] Multi-tenancy with RLS

2017-01-30 Thread Haribabu Kommi

On Mon, Dec 5, 2016 at 4:31 PM, Haribabu Kommi 
wrote:

>
>
> On Mon, Oct 3, 2016 at 3:11 PM, Michael Paquier  > wrote:
>
>> On Tue, Jul 19, 2016 at 3:42 PM, Haribabu Kommi
>>  wrote:
>> > The above changes are based on my understanding to the discussion
>> occurred in
>> > this mail. In case if I miss anything, please let me know, i will
>> > correct the same.
>>
>> The patch series still apply.
>>
>> +   " ((classid = (select oid from pg_class where
>> relname = 'pg_aggregate'))"
>> +   " OR (classid = (select oid from pg_class where
>> relname = 'pg_cast') AND has_cast_privilege(objid, 'any'))"
>> +   " OR (classid = (select oid from pg_class where
>> relname = 'pg_collation'))"
>> [... long list ...]
>> That's quite hard to digest...
>>
>> +static bool
>> +get_catalog_policy_string(Oid relationid, Form_pg_class
>> pg_class_tuple, char *buf)
>> This is an exceptionally weak interface at quick glance. This is using
>> SQL strings, and nothing is actually done regarding potentially
>> conflicting name types...
>>
>> The number of new files included in policy.c is impressive as well..
>>
>> This does not count as a full review though, so I am moving it to next
>> CF. Perhaps it will find its audience.
>>
>
> As the patch doesn't receive full review. Just kept in the commitfest to
> see any interest from others for this patch.
>
> Moved to next CF with "needs review" status.
>

This patch is not generating much interest from the community, may be
because of the design that is chosen to implement multi-tenancy.

Currently this patch is marked as rejected.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Commit fest 2017-01 will begin soon!

2017-01-30 Thread Michael Paquier

On Mon, Jan 23, 2017 at 1:17 PM, Michael Paquier
 wrote:
> As of the current score of the CF, for a total of 155 patches, 52 have
> been committed, 3 rejected and 7 marked as returned with feedback. It
> may look low, but actually it is not that bad by experience looking at
> those numbers.

The CF finishing soon, I have done a first pass on the patches
remaining on it, moving patches to the next CF or updating them. There
are still 48 patches still on the tracker when writing this email:
Needs review: 32.
Ready for Committer: 16.
This requires a certain amount of energy to classify, so if you would
like spare a bit of HP/MP to the CFM, feel free to cleanup your
patches or the ones you are reviewing.

@all: If I have updated your patch in a state that you think is not
appropriate, feel free to complain on the thread of the patch or here.
That's fine.
Thanks,
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Amit Langote

On 2017/01/31 6:57, Tomas Vondra wrote:
> On 01/30/2017 09:37 PM, Alvaro Herrera wrote:
>> Looks good to me.  I don't think we need to keep the names very short --
>> I would propose "standistinct", "stahistogram", "stadependencies".
>>
> 
> Yeah, I got annoyed by the short names too.
> 
> This however reminds me that perhaps pg_mv_statistic is not the best name.
> I know others proposed pg_statistic_ext (and pg_stats_ext), and while I
> wasn't a big fan initially, I think it's a better name. People generally
> don't know what 'multivariate' means, while 'extended' is better known
> (e.g. because Oracle uses it for similar stuff).
> 
> So I think I'll switch to that name too.

+1 to pg_statistics_ext.  Maybe, even pg_statistics_extended, however
being that verbose may not be warranted.

Thanks,
Amit





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 11:17 AM, Haribabu Kommi
 wrote:
> On Fri, Jan 27, 2017 at 5:32 PM, Dilip Kumar  wrote:
>> On Tue, Jan 24, 2017 at 10:18 AM, Dilip Kumar 
>> wrote:
>> > I have changed as per the comments. 0002 and 0003 are changed, 0001 is
>> > still the same.
>>
>> There is just one line change in 0003 compared to older version, all
>> other patches are the same.
>
> Thanks for the update. I have some comments

This review is too fresh to be addressed, so I have moved this patch
to the next CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] ALTER DEFAULT PRIVILEGES with GRANT/REVOKE ON SCHEMAS

2017-01-30 Thread Michael Paquier

On Thu, Jan 19, 2017 at 9:35 AM, Stephen Frost  wrote:
> Awesome, glad to hear it.  This is also on my list of patches that I'm
> planning to look at, just so folks know.

There is a patch, no new reviews, so moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-01-30 Thread Michael Paquier

On Mon, Jan 30, 2017 at 4:28 PM, Andrew Borodin  wrote:
> I'll summarize here my state of studying concurrent methods of page unlinking.
>
> GIN B-tree does not have "high key". That means, that rightmost key on
> a page is maximal for the non-leaf page.
> But I do not see anything theoretical in a way of implementation of
> Lanin and Shasha`s methods of page merging, with slight modifications.
> Their paper does not even mention high key(high fence key in papers by
> Goetz Graefe).
>
> But it's not a simple task due to large portions of shared code
> between entry tree and posting tree.
>
> Also, I do not see a reason why this method can be practically
> superior to proposed patch.
>
> Currently, I do not have resources to implement a proof of concept for
> fully concurrent page unlinking to make benchmarking.

I am marking this patch as returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 3:02 AM, Robert Haas  wrote:
> On Fri, Jan 27, 2017 at 5:34 PM, Jim Nasby  wrote:
>> On 1/26/17 10:11 PM, Beena Emerson wrote:
>>> In that case, we could add the file location parameter.  By default it
>>> would store in the cluster directory else in the location provided. We
>>> can update this parameter in standby for it to access the file.
>>
>> I don't see file location being as useful in this case. What would be more
>> useful is being able to forcibly load blocks into shared buffers so that you
>> didn't need to restart.
>
> Of course, you can already do that with the existing pg_prewarm and
> pg_buffercache functionality.  Any time you want, you can use
> pg_buffercache to dump out a list of everything in shared_buffers, and
> pg_prewarm to suck that same stuff in on the same node or a separate
> node.  All this patch is trying to do is provide a convenient,
> automated way to make that work.
>
> (An argument could be made that this ought to be in core and the
> default behavior, because who really wants to start with an ice-cold
> cold buffer cache on a production system?  It's possible that
> repopulating shared_buffers in the background after a restart could
> cause enough I/O to interfere with foreground activity that
> regrettably ends up needing none of the prewarmed buffers, but I think
> prewarming a few GB of data should be quite fast under normal
> circumstances, and any well-intentioned system can go wrong under some
> set of obscure circumstances.  Still, the patch takes the conservative
> course of making this an opt-in behavior, and that's probably for the
> best.)

I partially agree with this paragraph, at least there are advantages
to do so for cases where the data fits in shared buffers. Even for
data sets fitting in RAM that can be an advantage as the buffers would
get evicted from Postgres' cache but not the OS. Now for cases where
there are many load patterns on a given database (I have some here),
that's hard to make this thing by default on.

This patch needs to be visibly reshaped anyway, so I am marking it as
returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 2:13 PM, Vitaly Burovoy
 wrote:
> I'm almost ready to mark it as Ready for committer.
> The final round.

Moved to next CF with same status, waiting on author as the last patch
and the last review are very fresh.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-01-30 Thread Michael Paquier

On Sat, Jan 28, 2017 at 10:03 AM, Thomas Munro
 wrote:
> I have broken this up into a patch series, harmonised the private vs
> shared hash table code paths better and fixed many things including
> the problems with rescans and regression tests mentioned upthread.
> You'll see that one of the patches is that throwaway BufFile
> import/export facility, which I'll replace with your code as
> discussed.

Patch moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add support to COMMENT ON CURRENT DATABASE

2017-01-30 Thread Michael Paquier

On Sun, Jan 29, 2017 at 3:33 AM, Fabrízio de Royes Mello
 wrote:
> On Sat, Jan 28, 2017 at 4:26 PM, Fabrízio de Royes Mello
>  wrote:
>> On Fri, Jan 27, 2017 at 8:53 PM, Peter Eisentraut
>>  wrote:
>> >
>> > On 1/26/17 1:20 PM, Fabrízio de Royes Mello wrote:
>> > > Ok, but doing in that way the syntax would be:
>> > >
>> > > COMMENT ON DATABASE CURRENT_DATABASE IS 'comment';
>> >
>> > Yes, that's right.  We also have ALTER USER CURRENT_USER already.
>> >
>>
>> Ok, but if we make CURRENT_DATABASE reserved wouldn't it lead us to a
>> conflict with current_database() function?

Patch marked as returned with feedback as no new version has been
provided for the last 2 weeks and a half. Please feel free to update
if that's not adapted. The patch was waiting on author's input for
some time by the way..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generic type subscription

2017-01-30 Thread Michael Paquier

On Sat, Jan 28, 2017 at 2:31 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> On 24 January 2017 at 02:07, Tom Lane  wrote:
>> I took an extremely quick look over the patch
>
> Thank you for the feedback. It took some time for me to think about all
> suggestions and notes.

Okay, I am marking the patch as returned with feedback seeing those
reviews. Please don't hesitate to submit a new version.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 5:18 AM, Pavel Stehule  wrote:
> true, I am sorry

Last status is a new patch and no reviews. On top of that this thread
is quite active. So moved to next CF. Pavel, please be careful about
the status of the patch on the CF app, it was set to "waiting on
author"...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-30 Thread Michael Paquier

On Fri, Jan 27, 2017 at 8:52 AM, Craig Ringer  wrote:
> Now, if it's simpler to just xlog the gid at COMMIT PREPARED time when
> wal_level >= logical I don't think that's the end of the world. But
> since we already have almost everything we need in memory, why not
> just stash the gid on ReorderBufferTXN?

I have been through this thread... And to be honest, I have a hard
time understanding for which purpose the information of a 2PC
transaction is useful in the case of logical decoding. The prepare and
commit prepared have been received by a node which is at the root of
the cluster tree, a node of the cluster at an upper level, or a
client, being in charge of issuing all the prepare queries, and then
issue the commit prepared to finish the transaction across a cluster.
In short, even if you do logical decoding from the root node, or the
one at a higher level, you would care just about the fact that it has
been committed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 3:29 PM, Michael Paquier
 wrote:
> On Fri, Jan 27, 2017 at 8:52 AM, Craig Ringer  wrote:
>> Now, if it's simpler to just xlog the gid at COMMIT PREPARED time when
>> wal_level >= logical I don't think that's the end of the world. But
>> since we already have almost everything we need in memory, why not
>> just stash the gid on ReorderBufferTXN?
>
> I have been through this thread... And to be honest, I have a hard
> time understanding for which purpose the information of a 2PC
> transaction is useful in the case of logical decoding. The prepare and
> commit prepared have been received by a node which is at the root of
> the cluster tree, a node of the cluster at an upper level, or a
> client, being in charge of issuing all the prepare queries, and then
> issue the commit prepared to finish the transaction across a cluster.
> In short, even if you do logical decoding from the root node, or the
> one at a higher level, you would care just about the fact that it has
> been committed.

By the way, I have moved this patch to next CF, you guys seem to make
the discussion move on.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow to store select results into variables

2017-01-30 Thread Fabien COELHO



Bonjour Michaël,


Attached are the patch, a test script for the feature, and various test
scripts to trigger error cases.


I have moved this patch to next CF


Ok.


as the last status is a new patch set with no further reviews.


Indeed.

I did not check if the comments have been applied though, this is a bit 
too much for me now...


Sure.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identity columns

2017-01-30 Thread Michael Paquier

On Thu, Jan 5, 2017 at 9:34 AM, Vitaly Burovoy  wrote:
> I apologize for a silence since the last CF.
> I've tested your last patch and have several nitpickings:

Last update is a review from 3 weeks ago, this patch is marked as
returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files

2017-01-30 Thread Michael Paquier

On Wed, Jan 11, 2017 at 12:32 AM, Pavel Stehule  wrote:
> Thank you for review

Moved to next CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow to store select results into variables

2017-01-30 Thread Michael Paquier

On Sat, Jan 7, 2017 at 6:25 PM, Fabien COELHO  wrote:
> I think that I have done what you required.
>
> I have documented the fact that now the feature does not work if compound
> commands contain empty queries, which is a very minor drawback for a pgbench
> script anyway.
>
> Attached are the patch, a test script for the feature, and various test
> scripts to trigger error cases.

I have moved this patch to next CF as the last status is a new patch
set with no further reviews. I did not check if the comments have been
applied though, this is a bit too much for me now..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP FUNCTION of multiple functions

2017-01-30 Thread Michael Paquier

On Wed, Jan 18, 2017 at 2:00 PM, Michael Paquier
 wrote:
> On Wed, Jan 18, 2017 at 5:26 AM, Peter Eisentraut
>  wrote:
>> On 1/10/17 1:52 AM, Michael Paquier wrote:
>>> I don't see any problems with 0001.
>>
>> I was wondering, should we rename funcname -> name, and funcargs ->
>> args, or perhaps the whole FuncWithArgs struct, so there is no confusion
>> when used with operators?
>
> FuncWithArgs implies that this is related to a function, so removing
> func as prefix may make things cleaner.
>
>>> One comment though: there are still many list_make2() or even
>>> list_make3 calls for some object types. Would it make sense to replace
>>> those lists with a decided number of items by a Node and simplify the
>>> interface?
>>
>> (I don't see any list_make3.)
>
> Indeed, I am watching too much code.
>
>> It would be nice to refine this further,
>> but the remaining uses are quite marginal.  The main problem was that
>> before you had to create singleton lists and then unpack them, because
>> there was no other way.  The remaining uses are more genuine lists or lcons.
>
> OK. Of course, I am not saying that this patch in particular should
> shake more the world. I have been just trying to point out future
> potential improvements and keep a trace of them in the archives while
> thinking about it.
>
>>> In 0005, a nit:
>>> +DROP FUNCTION functest_IS_1(int, int, text), functest_IS_2(int),
>>> functest_IS_3(int);
>>>  -- Cleanups
>>> The DROP query could be moved below the cleanup comment.
>>
>> I can do that, but the idea was that the commands below the cleanups
>> line weren't really tests.
>
> That's a nit, you can ignore that.
>
>>> While looking at 0006... DROP POLICY and DROP RULE could be unified. I
>>> just noticed that while reading the code.
>>
>> DROP TRIGGER also looks similar.  drop_type3 then. ;-)
>
> Or drop_type_on, drop_type_on_table, etc.

I am marking the patch as returned with feedback, as this thread has
died 2 weeks ago.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 2:58 PM, Pavel Stehule  wrote:
> I found a error - I sent mail only to author 2016-12-31 :( - It is my
> mistake. I am sorry

Ah... Thanks for the update. No problem.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 2:58 PM, Nikhil Sontakke
 wrote:
>>> CheckPointTwoPhase() in (5) does not sync this prepared transaction
>>> because the checkpointer's KnownPreparedList is empty.
>>
>> And that's why this needs to be stored in shared memory with a number
>> of elements made of max_prepared_xacts...
>
> Yeah. Was thinking about this yesterday. How about adding entries in
> TwoPhaseState itself (which become valid later)? Only if it does not
> cause a lot of code churn.

That's possible as well, yes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-01-30 Thread Fabien COELHO




This is code lifted from variable.c's ParseVariableBool().  When the other
patch for "psql hooks" is committed (the one that detects when the string
wasn't a valid boolean), this code will go away and we'll just use
ParseVariableBool() again.


The ParseVariableBool function has been updated, and the new version is 
much cleaner, including all fixes that I suggested in your copy, so you 
can use it in your patch.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-01-30 Thread Pavel Stehule

2017-01-31 6:56 GMT+01:00 Michael Paquier :

> On Tue, Jan 31, 2017 at 2:53 PM, Pavel Stehule 
> wrote:
> > I tested new set of these patches and I found some regressions there -
> > mentioned in my last mail.
> >
> > Maybe I miss new update, bit I don't know about it.
>
> The last update I am aware of is that saying: "lot of patches. I hope
> I look on these patches this week.". Here is the message:
> https://www.postgresql.org/message-id/CAFj8pRD2qq6v0jm6kqmWMwo-
> yNSvn8Vvf+m=DBy3ps=y0_3...@mail.gmail.com
>
> And there is no sign of reviews on the new versions.
>

I found a error - I sent mail only to author 2016-12-31 :( - It is my
mistake. I am sorry



> --
> Michael
>

Re: [HACKERS] Speedup twophase transactions

2017-01-30 Thread Nikhil Sontakke

>> CheckPointTwoPhase() in (5) does not sync this prepared transaction
>> because the checkpointer's KnownPreparedList is empty.
>
> And that's why this needs to be stored in shared memory with a number
> of elements made of max_prepared_xacts...

Yeah. Was thinking about this yesterday. How about adding entries in
TwoPhaseState itself (which become valid later)? Only if it does not
cause a lot of code churn.

The current non-shmem list patch only needs to handle standby
shutdowns correctly. Other aspects like standby promotion/recovery are
handled ok AFAICS.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-01-30 Thread Pavel Stehule

>
>
>
> 2017-01-31 6:51 GMT+01:00 Michael Paquier :
>
>>
>>
>> The current patch status was "waiting on author", but that's incorrect
>> as a new series of this patch has been sent. Please be careful with
>> the status of the CF app! I am moving it to next CF with "needs
>> review". Gerdan Santos has marked himself as a reviewer of the patch
>> but I saw no activity, so I have removed his name to not confuse
>> people looking for patches to review (that happens!).
>>
>
> I tested new set of these patches and I found some regressions there -
> mentioned in my last mail.
>
> Maybe I miss new update, bit I don't know about it.
>

Some infrastructure related patches (maybe 50%) from this set can be
committed now - the moving complete set of patches to next commitfest
generates generates lot of repeated useless work.

Regards

Pavel


> Regards
>
> Pavel
>
>> --
>> Michael
>>
>
>

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 2:53 PM, Pavel Stehule  wrote:
> I tested new set of these patches and I found some regressions there -
> mentioned in my last mail.
>
> Maybe I miss new update, bit I don't know about it.

The last update I am aware of is that saying: "lot of patches. I hope
I look on these patches this week.". Here is the message:
https://www.postgresql.org/message-id/CAFj8pRD2qq6v0jm6kqmWMwo-yNSvn8Vvf+m=DBy3ps=y0_3...@mail.gmail.com

And there is no sign of reviews on the new versions.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-01-30 Thread Pavel Stehule

2017-01-31 6:51 GMT+01:00 Michael Paquier :

> On Tue, Dec 27, 2016 at 12:23 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello,
> >
> > At Mon, 26 Dec 2016 14:24:33 +0100, Pavel Stehule <
> pavel.steh...@gmail.com> wrote in  yNSvn8Vvf+m=DBy3ps=y0_3...@mail.gmail.com>
> > pavel.stehule> 2016-12-26 9:40 GMT+01:00 Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp
> >> >:
> >>
> >> > > Thanks for reviewing but I ran out of time for this CF..
> >> > >
> >> > > I'm going to move this to the next CF.
> >> >
> >> > I splitted the patch into small pieces. f3fd531 conflicted to
> >> > this so rebased onto the current master HEAD.
> >> >
> >> > 0001 is psql_completion refactoring.
> >> > 0002-0003 are patches prividing new infrastructures.
> >> > 0004 is README for the infrastructures.
> >> > 0005 is letter-case correction of SET/RESET/SHOW using 0002.
> >> > 0006-0008 are improvements of recursive syntaxes using 0001 and 0004.
> >> > 0009-0016 are simplifying (maybe) completion code per syntax.
> >> >
> >> > The last one (0017) is the IF(NOT)EXIST modifications. It
> >> > suggests if(not)exists for syntaxes already gets object
> >> > suggestion. So some kind of objects like operator, cast and so
> >> > don't get an if.. suggestion. Likewise, I intentionally didn't
> >> > modified siggestions for "TEXT SEARCH *".
> >> >
> >> >
> >> lot of patches. I hope I look on these patches this week.
> >
> > Thank you for looking this and sorry for the many files. But I
> > hople that they would be far easier to read.
>
> The current patch status was "waiting on author", but that's incorrect
> as a new series of this patch has been sent. Please be careful with
> the status of the CF app! I am moving it to next CF with "needs
> review". Gerdan Santos has marked himself as a reviewer of the patch
> but I saw no activity, so I have removed his name to not confuse
> people looking for patches to review (that happens!).
>

I tested new set of these patches and I found some regressions there -
mentioned in my last mail.

Maybe I miss new update, bit I don't know about it.

Regards

Pavel

> --
> Michael
>

Re: [HACKERS] PoC: Partial sort

2017-01-30 Thread Michael Paquier

On Mon, Dec 5, 2016 at 2:04 PM, Haribabu Kommi  wrote:
>
>
> On Fri, Dec 2, 2016 at 4:05 AM, Robert Haas  wrote:
>>
>> On Tue, Sep 13, 2016 at 4:32 AM, Alexander Korotkov
>>  wrote:
>> > On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan  wrote:
>> >> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
>> >>  wrote:
>> >> > Hmm... I'm not completely agree with that. In typical usage partial
>> >> > sort
>> >> > should definitely use quicksort.  However, fallback to other sort
>> >> > methods is
>> >> > very useful.  Decision of partial sort usage is made by planner.  But
>> >> > planner makes mistakes.  For example, our HashAggregate is purely
>> >> > in-memory.
>> >> > In the case of planner mistake it causes OOM.  I met such situation
>> >> > in
>> >> > production and not once.  This is why I'd like partial sort to have
>> >> > graceful
>> >> > degradation for such cases.
>> >>
>> >> I think that this should be moved to the next CF, unless a committer
>> >> wants to pick it up today.
>> >
>> > Patch was rebased to current master.
>>
>> Just a few quick observations on this...
>>
>> It strikes me that the API contract change in ExecMaterializesOutput
>> is pretty undesirable.  I think it would be better to have a new
>> executor node for this node rather than overloading the existing
>> "Sort" node, sharing code where possible of course.  The fact that
>> this would distinguish them more clearly in an EXPLAIN plan seems
>> good, too.  "Partial Sort" is the obvious thing, but there might be
>> even better alternatives -- maybe "Incremental Sort" or something like
>> that?  Because it's not partially sorting the data, it's making data
>> that already has some sort order have a more rigorous sort order.
>>
>> I think that it will probably be pretty common to have queries where
>> the data is sorted by (mostly_unique_col) and we want to get it sorted
>> by (mostly_unique_col, disambiguation_col).  In such cases I wonder if
>> we'll incur a lot of overhead by feeding single tuples to the
>> tuplesort stuff and performing lots of 1-item sorts.  Not sure if that
>> case needs any special optimization.
>>
>> I also think that the "HeapTuple prev" bit in SortState is probably
>> not the right way of doing things.  I think that should use an
>> additional TupleTableSlot rather than a HeapTuple.
>>
>
> The feedback from the reviewer has received at the end of commitfest.
> So Moved to next CF with "waiting on author" status.

This patch is on its 6th commit fest now. As the thread has died and
as feedback has been provided but not answered I am marking it as
returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-01-30 Thread Michael Paquier

On Tue, Dec 27, 2016 at 12:23 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Mon, 26 Dec 2016 14:24:33 +0100, Pavel Stehule  
> wrote in 
> pavel.stehule> 2016-12-26 9:40 GMT+01:00 Kyotaro HORIGUCHI 
> > >:
>>
>> > > Thanks for reviewing but I ran out of time for this CF..
>> > >
>> > > I'm going to move this to the next CF.
>> >
>> > I splitted the patch into small pieces. f3fd531 conflicted to
>> > this so rebased onto the current master HEAD.
>> >
>> > 0001 is psql_completion refactoring.
>> > 0002-0003 are patches prividing new infrastructures.
>> > 0004 is README for the infrastructures.
>> > 0005 is letter-case correction of SET/RESET/SHOW using 0002.
>> > 0006-0008 are improvements of recursive syntaxes using 0001 and 0004.
>> > 0009-0016 are simplifying (maybe) completion code per syntax.
>> >
>> > The last one (0017) is the IF(NOT)EXIST modifications. It
>> > suggests if(not)exists for syntaxes already gets object
>> > suggestion. So some kind of objects like operator, cast and so
>> > don't get an if.. suggestion. Likewise, I intentionally didn't
>> > modified siggestions for "TEXT SEARCH *".
>> >
>> >
>> lot of patches. I hope I look on these patches this week.
>
> Thank you for looking this and sorry for the many files. But I
> hople that they would be far easier to read.

The current patch status was "waiting on author", but that's incorrect
as a new series of this patch has been sent. Please be careful with
the status of the CF app! I am moving it to next CF with "needs
review". Gerdan Santos has marked himself as a reviewer of the patch
but I saw no activity, so I have removed his name to not confuse
people looking for patches to review (that happens!).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2017-01-30 Thread Michael Paquier

On Wed, Jan 4, 2017 at 4:17 AM, Peter Eisentraut
 wrote:
> It seems like everyone was generally in favor of this.  I looked around
> the internet for caveats but everyone was basically saying, you should
> definitely do this.
>
> Why not for EXEC_BACKEND?
>
> O_CLOEXEC is a newer interface.  There are older systems that don't have
> it but have FD_CLOEXEC for fcntl().  We should use that as a fallback.
>
> Have you gone through the code and checked for other ways file
> descriptors might get opened?  Here is a blog posts that lists some
> candidates: http://udrepper.livejournal.com/20407.html
>
> Ideally, we would have a test case that exec's something that lists the
> open file descriptors, and we check that there are only those we expect.
>
> The comment "We don't expect execve() calls inside the postgres code" is
> not quite correct, as we do things like archive_command and COPY to
> program (see OpenPipeStream()).

Oskari, are you planning to answer to this review? As the thread has
died 3 weeks ago, I am marking this as returned with feedback. Don't
hesitate to change the status of the patch if you have a new version.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wait events for disk I/O

2017-01-30 Thread Rushabh Lathia

On Tue, Jan 31, 2017 at 8:54 AM, Michael Paquier 
wrote:

> On Mon, Jan 30, 2017 at 10:01 PM, Rushabh Lathia
>  wrote:
> > Attached is the patch, which extend the existing wait event
> infrastructure
> > to implement the wait events for the disk I/O. Basically
> pg_stat_activity's
> > wait event information to show data about disk I/O as well as IPC
> primitives.
> >
> > Implementation details:
> >
> > - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO
> > - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch,
> > FileWriteback, FileSync, and FileTruncate. Set this wait event just
> before
> > performing the file system operation and clear it just after.
> > - Pass down an appropriate wait event from  caller of any of those
> > functions.
> > - Also set and clear a wait event around standalone calls to read(),
> > write(), fsync() in other parts of the system.
> > - Added documentation for all newly added wait event.
>
> Looks neat, those are unlikely to overlap with other wait events.
>

Thanks.


>
> > Open issue:
> > - Might missed few standalone calls to read(), write(), etc which need
> > to pass the wait_event_info.
>
> It may be an idea to use LD_PRELOAD with custom versions of read(),
> write() and fsync(), and look at the paths where no flags are set in
> MyProc->wait_event_info, and log information when that happens.
>
>
Yes, may be I will try this.


> > Thanks to my colleague Robert Haas for his help in design.
> > Please let me know your thought, and thanks for reading.
>
> Did you consider a wrapper of the type pg_read_event() or
> pg_write_event(), etc.?
>

I thought on that, but eventually stick to this approach as it looks
more neat and uniform with other wait event implementation.



> --
> Michael
>



Thanks,
Rushabh Lathia
www.EnterpriseDB.com

Re: [HACKERS] Speedup twophase transactions

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 2:34 PM, Nikhil Sontakke
 wrote:
>>> I wonder what's the best location for this in the common case when we
>>> do shutdown of standby.  We could add code in XLOG_CHECKPOINT_SHUTDOWN
>>> and XLOG_CHECKPOINT_ONLINE xlog_redo code path.
>>
>> ShutdownXLOG() calls CreateRestartPoint() when a standby shuts down,
>> so doing all the durability work in CheckPointTwoPhase() would take
>> care of any problems.
>>
>
> ShutdownXLOG() gets called from the checkpointer process. See comments
> above about the checkpointer not having access to the proper
> KnownPreparedList.
>
> The following test sequence will trigger the issue:
>
> 1) start master
> 2) start replica
> 3) prepare a transaction on master
> 4) shutdown master
> 5) shutdown replica
>
> CheckPointTwoPhase() in (5) does not sync this prepared transaction
> because the checkpointer's KnownPreparedList is empty.

And that's why this needs to be stored in shared memory with a number
of elements made of max_prepared_xacts...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2017-01-30 Thread Michael Paquier

On Thu, Jan 5, 2017 at 2:54 PM, Kuntal Ghosh  wrote:
> On Wed, Dec 21, 2016 at 10:53 PM, Robert Haas  wrote:
>
>> On a first read-through of this patch -- I have not studied it in
>> detail yet -- this looks pretty good to me.  One concern is that this
>> patch adds a bit of code to XLogInsert(), which is a very hot piece of
>> code.  Conceivably, that might produce a regression even when this is
>> disabled; if so, we'd probably need to make it a build-time option.  I
>> hope that's not necessary, because I think it would be great to
>> compile this into the server by default, but we better make sure it's
>> not a problem.  A bulk load into an existing table might be a good
>> test case.
>>
> I've done some bulk load testing with 16,32,64 clients. I didn't
> notice any regression
> in the results.
>
>> Aside from that, I think the biggest issue here is that the masking
>> functions are virtually free of comments, whereas I think they should
>> have extensive and detailed comments.  For example, in heap_mask, you
>> have this:
>>
>> +page_htup->t_infomask =
>> +HEAP_XMIN_COMMITTED | HEAP_XMIN_INVALID |
>> +HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID;
>>
>> For something like this, you could write "We want to ignore
>> differences in hint bits, since they can be set by SetHintBits without
>> emitting WAL.  Force them all to be set so that we don't notice
>> discrepancies."  Actually, though, I think that you could be a bit
>> more nuanced here: HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID =
>> HEAP_XMIN_FROZEN, so maybe what you should do is clear
>> HEAP_XMAX_COMMITTED and HEAP_XMAX_INVALID but only clear the others if
>> one is set but not both.
>>
> I've modified it as follows:
> +
> +   if (!HeapTupleHeaderXminFrozen(page_htup))
> +   page_htup->t_infomask |= HEAP_XACT_MASK;
> +   else
> +   page_htup->t_infomask |=
> HEAP_XMAX_COMMITTED | HEAP_XMAX_INVALID;
>
>> Anyway, leaving that aside, I think every single change that gets
>> masked in every single masking routine needs a similar comment,
>> explaining why that change can happen on the master without also
>> happening on the standby and hopefully referring to the code that
>> makes that unlogged change.
>>
> I've added comments for all the masking routines.
>
>> I think wal_consistency_checking, as proposed by Peter, is better than
>> wal_consistency_check, as implemented.
>>
> Modified to wal_consistency_checking.
>
>> Having StartupXLOG() call pfree() on the masking buffers is a waste of
>> code.  The process is going to exit anyway.
>>
>> + "Inconsistent page found, rel %u/%u/%u, forknum %u, blkno 
>> %u",
>>
> Done.
>
>> Primary error messages aren't capitalized.
>>
>> +if (!XLogRecGetBlockTag(record, block_id, , , ))
>> +{
>> +/* Caller specified a bogus block_id. Do nothing. */
>> +continue;
>> +}
>>
>> Why would the caller do something so dastardly?
>>
> Modified to following comment:
> +   /*
> +* WAL record doesn't contain a block reference
> +* with the given id. Do nothing.
> +*/
>
> I've attached the patch with the modified changes. PFA.

Moved to CF 2017-03 with same status, "ready for committer".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-30 Thread Nikhil Sontakke

>> Having CheckPointTwoPhase() do the flush would mean shifting the data
>> from KnownPreparedList into TwoPhaseState shmem.
>
> Er, no. For CheckPointTwoPhase(), at recovery what needs to be done is
> to have all the entries in KnownPreparedList() flushed to disk and
> have those entries removed while holding a shared memory lock.

The KnownPreparedList is constructed by the recovery process.
CheckPointTwoPhase() gets called by the checkpointer process. The
checkpointer does not have access to this valid KnownPreparedList.

>And for
> the rest we need to be careful to have PrescanPreparedTransactions,
> RecoverPreparedTransactions and StandbyRecoverPreparedTransactions
> aware of entries that are in KnownPreparedList().


Yeah, that part is straightforward. It does involve duplication of the
earlier while loops to work against KnownPreparedList. A smart single
while loop which loops over the 2PC files followed by the list would
help here :-)

> Let's leave the
> business of putting the information from KnownPreparedList to
> TwoPhaseState in RecoverPreparedTransactions, which need to be aware
> of entries in KnownPreparedList() anyway. The only thing that differs
> is how the 2PC information is fetched: from the segments or from the
> files in pg_twophase.
>

Yeah.  This part is also ok. We also got to be careful to mark the
shmem gxact entry with "ondisk = false" and need to set
prepare_start_lsn/prepare_end_lsn properly as well.

>> I wonder what's the best location for this in the common case when we
>> do shutdown of standby.  We could add code in XLOG_CHECKPOINT_SHUTDOWN
>> and XLOG_CHECKPOINT_ONLINE xlog_redo code path.
>
> ShutdownXLOG() calls CreateRestartPoint() when a standby shuts down,
> so doing all the durability work in CheckPointTwoPhase() would take
> care of any problems.
>

ShutdownXLOG() gets called from the checkpointer process. See comments
above about the checkpointer not having access to the proper
KnownPreparedList.

The following test sequence will trigger the issue:

1) start master
2) start replica
3) prepare a transaction on master
4) shutdown master
5) shutdown replica

CheckPointTwoPhase() in (5) does not sync this prepared transaction
because the checkpointer's KnownPreparedList is empty.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-01-30 Thread Andres Freund

Hi Heikki,

On 2016-11-22 10:56:07 -0800, Peter Geoghegan wrote:
> I attach V4 of amcheck.

Is there any chance you could look at the concurrency related parts of
amcheck?  I'd like to push this to be mergeable, but that area is a bit
outside of my area of expertise...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Refactoring of replication commands using printsimple

2017-01-30 Thread Michael Paquier

Hi all,

This is a follow up of the refactoring that has been discussed in the
thread to increase the default size of WAL segments:
https://www.postgresql.org/message-id/cab7npqq4hynrlq+w1jrryvsysoxuqa40pyb2uw5uqkkag4h...@mail.gmail.com

The discussion has resulted in the creation of a84069d9 that has
introduced a new DestReceiver method called printsimple that does not
need any catalog access. After some investigation, I have noticed that
a couple of messages used in the replication protocol could be
refactored as well:
- IDENTIFY_SYSTEM
- TIMELINE_HISTORY
- CREATE_REPLICATION_SLOT
This results in the following code reduction:
 3 files changed, 115 insertions(+), 162 deletions(-)

A commit fest entry has been created:
https://commitfest.postgresql.org/13/978/

Thanks,
-- 
Michael


refactor-repl-cmd-output-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-30 Thread Peter Geoghegan

On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro
 wrote:
> On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan  wrote:
>> Attached is V7 of the patch.
>
> I am doing some testing.  First, some superficial things from first pass:
>
> [Various minor cosmetic issues]

Oops.

> Just an observation:  if you ask for a large number of workers, but
> only one can be launched, it will be constrained to a small fraction
> of maintenance_work_mem, but use only one worker.  That's probably OK,
> and I don't see how to do anything about it unless you are prepared to
> make workers wait for an initial message from the leader to inform
> them how many were launched.

Actually, the leader-owned worker Tuplesort state will have the
appropriate amount, so you'd still need to have 2 participants (1
worker + leader-as-worker). And, sorting is much less sensitive to
having a bit less memory than hashing (at least when there isn't
dozens of runs to merge in the end, or multiple passes). So, I agree
that this isn't worth worrying about for a DDL statement.

> Should this 64KB minimum be mentioned in the documentation?

You mean user-visible documentation, and not just tuplesort.h? I don't
think that that's necessary. That's a ludicrously low amount of memory
for a worker to be limited to anyway. It will never come up with
remotely sensible use of the feature.

> +   if (!btspool->isunique)
> +   {
> +   shm_toc_estimate_keys(>estimator, 2);
> +   }
>
> Project style: people always tell me to drop the curlies in cases like
> that.  There are a few more examples in the patch.

I only do this when there is an "else" that must have curly braces,
too. There are plenty of examples of this from existing code, so I
think it's fine.

> + /* Wait for workers */
> + ConditionVariableSleep(>workersFinishedCv,
> +   WAIT_EVENT_PARALLEL_FINISH);
>
> I don't think we should reuse WAIT_EVENT_PARALLEL_FINISH in
> tuplesort_leader_wait and worker_wait.  That belongs to
> WaitForParallelWorkersToFinish, so someone who see that in
> pg_stat_activity won't know which it is.

Noted.

> IIUC worker_wait() is only being used to keep the worker around so its
> files aren't deleted.  Once buffile cleanup is changed to be
> ref-counted (in an on_dsm_detach hook?) then workers might as well
> exit sooner, freeing up a worker slot... do I have that right?

Yes. Or at least I think it's very likely that that will end up happening.

> Incidentally, barrier.c could probably be used for this
> synchronisation instead of these functions.  I think
> _bt_begin_parallel would call BarrierInit(>barrier,
> scantuplesortstates) and then after LaunchParallelWorkers() it'd call
> a new interface BarrierDetachN(>barrier, scantuplesortstates -
> pcxt->nworkers_launched) to forget about workers that failed to
> launch.  Then you could use BarrierWait where the leader waits for the
> workers to finish, and BarrierDetach where the workers are finished
> and want to exit.

I thought about doing that, actually, but I don't like creating
dependencies on some other uncommited patch, which is a moving target
(barrier stuff isn't committed yet). It makes life difficult for
reviewers. I put off adopting condition variables until they were
committed for the same reason -- it's was easy to do without them for
a time. I'll probably get around to it before too long, but feel no
urgency about it. Barriers will only allow me to make a modest net
removal of code, AFAIK.

Thanks
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2017-01-30 Thread Vitaly Burovoy

On 1/27/17, Haribabu Kommi  wrote:
> Updated patches are attached.

Hello,

I'm almost ready to mark it as Ready for committer.
The final round.

1.
>+DATA(insert OID = 774 ( macaddr8 ...
>+#define MACADDR8OID 972
What does this number (972) mean? I think it should be 774, the same
number as was used in the type definition.

Since there is an editing required, please, fix whitespaces:
2.
>+DATA(insert OID = 3371 (  403 macaddr8_ops
>PGNSP PGUID ));
>+DATA(insert OID = 3372 (  405 macaddr8_ops
>PGNSP PGUID ));

only one (not three) tab before "PGNSP" should be used (see other rows
around yours: "OID = 1983", "OID = 1991" etc).

3.
>diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
some new rows have two tabs instead of eight spaces.

4.
Some other files need to be fixed by pgindent (it helps supporting in
the future):
contrib/btree_gist/btree_macaddr8.c (a lot of rows)
src/include/utils/inet.h  (I have no idea why it changes indentation
to tabs for macaddr8 whereas it leaves spaces for macaddr)

5.
src/backend/utils/adt/network.c
pg_indent makes it uglier. I've just found how to write the line for it:
res *= ((double) 256) * 256 * 256 * 256;

-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-01-30 Thread Michael Paquier

On Sat, Jan 28, 2017 at 8:04 AM, Michael Paquier
 wrote:
> On Sat, Jan 28, 2017 at 7:29 AM, Robert Haas  wrote:
>> On Thu, Jan 26, 2017 at 8:53 PM, Michael Paquier
>>  wrote:
 I've not done like the most careful review ever, but I'm in favor of the
 general change (provided the byval thing is fixed obviously).
>>>
>>> Thanks for the review.
>>
>> Why not use pg_ltoa and pg_lltoa like the output functions for the datatype 
>> do?
>
> No particular reason.
>
>> Might use CStringGetTextDatum(blah) instead of
>> PointerGetDatum(cstring_to_text(blah)).
>
> Yes, thanks.

I am going to create a new thread for this refactoring patch, as
that's different than the goal of this thread.

Now, regarding the main patch. Per the status of the last couple of
days, the patch has received a review but no new versions, so I am
marking it as returned with feedback for now. Feel free to update the
status of the patch to something else if you think that's more
adapted.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-30 Thread Michael Paquier

On Fri, Jan 20, 2017 at 10:14 PM, Michael Paquier
 wrote:
> On Fri, Jan 20, 2017 at 10:11 PM, Alvaro Herrera
>  wrote:
>>> diff --git a/src/backend/replication/basebackup.c 
>>> b/src/backend/replication/basebackup.c
>>
>>> @@ -148,6 +149,9 @@ static const char *excludeFiles[] =
>>>   /* Skip auto conf temporary file. */
>>>   PG_AUTOCONF_FILENAME ".tmp",
>>>
>>> + /* Skip current log file temporary file */
>>> + LOG_METAINFO_DATAFILE_TMP,
>>> +
>>
>> Sorry if this has already been answered, but why are we not also
>> skipping LOG_METAINFO_DATAFILE itself?  I can't see a scenario where
>> it's useful to have that file in a base backup, since it's very likely
>> that the log file used when the backup is restored will be different.
>
> I have done the same remark upthread, and Gilles has pointed out that
> it would be useful to get the last log file that was used by a backup.
> Including this file represents no harm as well.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-01-30 Thread Michael Paquier

On Mon, Dec 5, 2016 at 2:15 PM, Haribabu Kommi  wrote:
> Moved to next CF with " needs review" status.

Same, to CF 2017-03 this time.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE

2017-01-30 Thread Michael Paquier

On Thu, Jan 12, 2017 at 10:21 AM, Michael Paquier
 wrote:
> On Thu, Jan 12, 2017 at 4:09 AM, Robert Haas  wrote:
>> On Tue, Jan 10, 2017 at 11:41 PM, Michael Paquier
>>  wrote:
>>> Do you think that expanding the wait query by default could be
>>> intrusive for the other tests? I am wondering about such a white list
>>> to generate false positives for the existing tests, including
>>> out-of-core extensions, as all the tests now rely only on
>>> pg_blocking_pids().
>>
>> It won't affect anything unless running at transaction isolation level
>> serializable with the "read only deferrable" option.
>
> Indeed as monitoring.sgml says. And that's now very likely close to
> zero. It would be nice to add a comment in the patch to just mention
> that. In short, I withdraw my concerns about this patch, the addition
> of a test for repeatable read outweights the tweaks done in the
> isolation tester. I am marking this as ready for committer, I have not
> spotted issues with it.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-30 Thread Thomas Munro

On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan  wrote:
> Attached is V7 of the patch.

I am doing some testing.  First, some superficial things from first pass:

Still applies with some offsets and one easy-to-fix rejected hunk in
nbtree.c (removing some #include directives and a struct definition).

+/* Sort parallel code from state for sort__start probes */
+#define PARALLEL_SORT(state)   ((state)->shared == NULL ? 0 : \
+(state)->workerNum >= 0 : 1 : 2)

Typo: ':' instead of '?', --enable-dtrace build fails.

+ the entire utlity command, regardless of the number of

Typo: s/utlity/utility/

+   /* Perform sorting of spool, and possibly a spool2 */
+   sortmem = Max(maintenance_work_mem / btshared->scantuplesortstates, 64);

Just an observation:  if you ask for a large number of workers, but
only one can be launched, it will be constrained to a small fraction
of maintenance_work_mem, but use only one worker.  That's probably OK,
and I don't see how to do anything about it unless you are prepared to
make workers wait for an initial message from the leader to inform
them how many were launched.

Should this 64KB minimum be mentioned in the documentation?

+   if (!btspool->isunique)
+   {
+   shm_toc_estimate_keys(>estimator, 2);
+   }

Project style: people always tell me to drop the curlies in cases like
that.  There are a few more examples in the patch.

+ /* Wait for workers */
+ ConditionVariableSleep(>workersFinishedCv,
+   WAIT_EVENT_PARALLEL_FINISH);

I don't think we should reuse WAIT_EVENT_PARALLEL_FINISH in
tuplesort_leader_wait and worker_wait.  That belongs to
WaitForParallelWorkersToFinish, so someone who see that in
pg_stat_activity won't know which it is.

IIUC worker_wait() is only being used to keep the worker around so its
files aren't deleted.  Once buffile cleanup is changed to be
ref-counted (in an on_dsm_detach hook?) then workers might as well
exit sooner, freeing up a worker slot... do I have that right?

Incidentally, barrier.c could probably be used for this
synchronisation instead of these functions.  I think
_bt_begin_parallel would call BarrierInit(>barrier,
scantuplesortstates) and then after LaunchParallelWorkers() it'd call
a new interface BarrierDetachN(>barrier, scantuplesortstates -
pcxt->nworkers_launched) to forget about workers that failed to
launch.  Then you could use BarrierWait where the leader waits for the
workers to finish, and BarrierDetach where the workers are finished
and want to exit.

+ /* Prepare state to create unified tapeset */
+ leaderTapes = palloc(sizeof(TapeShare) * state->maxTapes);

Missing cast (TapeShare *) here?  Project style judging by code I've
seen, and avoids gratuitously C++ incompatibility.

+_bt_parallel_shared_estimate(Snapshot snapshot)
...
+tuplesort_estimate_shared(int nWorkers)

Inconsistent naming?

More soon.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2017-01-30 Thread Michael Paquier

On Tue, Jan 17, 2017 at 4:31 PM, Michael Paquier
 wrote:
> On Tue, Nov 29, 2016 at 1:35 PM, Michael Paquier
>  wrote:
>> On Tue, Nov 22, 2016 at 8:35 PM, Haribabu Kommi
>>  wrote:
>>> Hi Craig,
>>>
>>> This is a gentle reminder.
>>>
>>> you assigned as reviewer to the current patch in the 11-2016 commitfest.
>>> But you haven't shared your review yet. Please share your review about
>>> the patch. This will help us in smoother operation of commitfest.
>>>
>>> Please Ignore if you already shared your review.
>>
>> I have moved this CF entry to 2017-01, the remaining, still unreviewed
>> patch are for renaming pg_subxact and pg_clog.
>
> The second patch has rotten a little because of the backup
> documentation. By the way, is something going to happen in the CF?
> Craig, you are a reviewer of this patch.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-30 Thread Michael Paquier

On Mon, Jan 30, 2017 at 8:01 PM, Vladimir Rusinov  wrote:
>
> On Fri, Jan 27, 2017 at 11:03 PM, Peter Eisentraut
>  wrote:
>>
>> 1. Rename nothing
>> 2. Rename directory only
>> 3. Rename everything
>
>
> 3 or 1 (with a slight preference for 3).
>
> Not sure if my vote counts, but for me as ex-DBA consistency mattered a lot.

Any voice counts :)

> This is one of the reasons PostgreSQL is actually nicer to work with than
> many other databases. I remember 'wal' vs 'xlog' was actually one of a very
> few confusing things. I think it is important to stick to just one term,
> whether it be wal or xlog. I'd prefer wal since it's a term used in many
> other systems. 'xlog' may be confused with likes of binary log in MySQL
> which is not what it is.

The first time I worked on Postgres, I was not able to do the mapping
between both terms as well.

> Now, I've rebased my patches to rename functions (with and without aliases)
> on top of current master, but there's probably no point of posting them now
> given there are patches above that rename other things as well.

OK, I have moved this patch to CF 2017-03, let's make the discussion
continue with more votes coming in.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-30 Thread Michael Paquier

On Thu, Jan 26, 2017 at 9:11 PM, Kyotaro HORIGUCHI
 wrote:
>> Though, I know the community is against behaviour changing GUCs.  I
>> will not spend more time on this, before I get positive feedback from
>> others.
>
> That's too bad. I'm sorry that I wasn't very helpful..

You make a constructive discussion in a way that you think makes sense
by providing feedback, there's nothing bad in that IMO, quite the
contrary to be honest.

(I am just showing up here to tell that I have moved this patch to CF 2017-03).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2017-01-30 Thread Michael Paquier

On Fri, Dec 2, 2016 at 1:39 PM, Haribabu Kommi  wrote:
> The latest proposed patch still having problems.
> Closed in 2016-11 commitfest with "moved to next CF" status because of a bug
> fix patch.
> Please feel free to update the status once you submit the updated patch.

And moved to CF 2017-03...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-01-30 Thread Michael Paquier

On Tue, Dec 20, 2016 at 4:54 PM, Michael Paquier
 wrote:
> I am attaching that to next CF.

Moved to CF 2017-03. Both patches still apply.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 6:57 AM, Tomas Vondra
 wrote:
> This however reminds me that perhaps pg_mv_statistic is not the best name. I
> know others proposed pg_statistic_ext (and pg_stats_ext), and while I wasn't
> a big fan initially, I think it's a better name. People generally don't know
> what 'multivariate' means, while 'extended' is better known (e.g. because
> Oracle uses it for similar stuff).
>
> So I think I'll switch to that name too.

I have moved this patch to the next CF, with Álvaro as reviewer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 12:55 PM, Haribabu Kommi
 wrote:
>
>
> On Tue, Jan 31, 2017 at 10:04 AM, Tom Lane  wrote:
>>
>> Haribabu Kommi  writes:
>> > On Mon, Jan 30, 2017 at 5:18 PM, Michael Paquier
>> > 
>> > wrote:
>> >> #define USER_AUTH_LAST uaPeer
>> >> StaticAssertStmt(lengthof(UserAuthName) == USER_AUTH_LAST + 1,
>> >> "UserAuthName must include all user authentication names");
>>
>> > Thanks for the review. Added the static assert statement.
>>
>> This isn't exactly bulletproof, since somebody could add another enum
>> value and forget to update the macro.  Still, it's better than nothing.
>> I tried to make it a shade more idiot-proof by putting the #define
>> physically inside the enum list --- you'd have to really have blinders
>> on to not notice it there.  (Not that people haven't made equally silly
>> mistakes :-(.)
>>
>> Pushed with that adjustment.  Thanks for working on this!
>
>
> Thanks for your support.

The modifications looks fine for me, thanks for adding the assertion.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY as a set returning function

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 3:05 AM, Corey Huinker  wrote:
> On Fri, Jan 27, 2017 at 9:42 PM, Peter Eisentraut
>  wrote:
>>
>> On 1/25/17 8:51 PM, Corey Huinker wrote:
>> > # select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
>>
>> I find these parameters weird.  Just looking at this, one has no idea
>> what the "true" means.  Why not have a "filename" and a "program"
>> parameter and make them mutually exclusive?
>
>
> It was done that way to match the parameters of BeginCopyFrom() and it could
> easily be changed.
>
> I suppose I could have written it as:
>
> select * from copy_srf(filename => 'echo "x\ty"', is_program => true) as t(x
> text, y text);
>
>
> But this goes to the core of this patch/poc: I don't know where we want to
> take it next.
>
> Options at this point are:
> 1. Continue with a SRF, in which case discussions about parameters are
> completely valid.
> 2. Add a RETURNING clause to COPY. This would dispense with the parameters
> problem, but potentially create others.
> 3. Add the TupDesc parameter to BeginCopyFrom, make sure all data structures
> are visible to an extension, and call it a day. If someone wants to write an
> extension that makes their own copy_srf(), they can.
> 4. Someone else's better idea.

Here is a 4: Refactoring BeginCopyFrom so as instead of a Relation are
used a TupleDesc, a default expression list, and a relation name. You
could as well make NextCopyFrom() smarter so as it does nothing if no
expression contexts are given by the caller, which is the case of your
function here. To be honest, I find a bit weird to use
NextCopyFromRawFields when there is already a bunch of sanity checks
happening in NextCopyFrom(), where you surely want to have them
checked even for your function.

Looking at the last patch proposed, I find the current patch proposed
as immature, as rsTupDesc basically overlaps with the relation a
caller can define in BeginCopyFrom(), so marking this patch as
"returned with feedback".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-30 Thread Haribabu Kommi

On Tue, Jan 31, 2017 at 10:04 AM, Tom Lane  wrote:

> Haribabu Kommi  writes:
> > On Mon, Jan 30, 2017 at 5:18 PM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >> #define USER_AUTH_LAST uaPeer
> >> StaticAssertStmt(lengthof(UserAuthName) == USER_AUTH_LAST + 1,
> >> "UserAuthName must include all user authentication names");
>
> > Thanks for the review. Added the static assert statement.
>
> This isn't exactly bulletproof, since somebody could add another enum
> value and forget to update the macro.  Still, it's better than nothing.
> I tried to make it a shade more idiot-proof by putting the #define
> physically inside the enum list --- you'd have to really have blinders
> on to not notice it there.  (Not that people haven't made equally silly
> mistakes :-(.)
>
> Pushed with that adjustment.  Thanks for working on this!
>

Thanks for your support.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] delta relations in AFTER triggers

2017-01-30 Thread Michael Paquier

On Sat, Jan 21, 2017 at 6:37 AM, Kevin Grittner  wrote:
> On Fri, Jan 20, 2017 at 2:08 PM, Nico Williams  wrote:
>> On Fri, Jan 20, 2017 at 01:37:33PM -0600, Kevin Grittner wrote:
>
>>> There is currently plenty of room for pseudo-MV implementations,
>>> and may be for a while.  It's a good indication of the need for the
>>> feature in core.  An implementation in the guts of core can have
>>> advantages that nothing else can, of course.  For example, for
>>> eager application of the deltas, nothing will be able to beat
>>> capturing tuples already in RAM and being looked at for possible
>>> trigger firing into a RAM-with-spill-to-disk tuplestore.
>>
>> BTW, automatic updates of certain types of MVs should be easy: add
>> constraints based on NEW/OLD rows from synthetic triggers to the
>> underlying query.
>
> Convincing me that this is a good idea for actual MVs, versus
> pseudo-MVs using tables, would be an uphill battle.  I recognize
> the need to distinguish between MVs which contain recursive CTEs in
> their definitions and MVs that don't, so that the DRed algorithm
> can be used for the former and the counting algorithm for the
> latter; but firing triggers for row-at-a-time maintenance is not
> going to be efficient for very many cases, and the cost of
> identifying those cases to handle them differently is probably
> going to exceed any gains.  Comparative benchmarks, once there is
> an implementation using set-based techniques, could potentially
> convince me; but there's not much point arguing about it before
> that exists.  :-)

I have moved this patch to the next CF. It would be nice to progress
in this topic in PG10.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

2017-01-30 Thread Michael Paquier

On Mon, Jan 30, 2017 at 3:37 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, this is the revised version of character conversion using radix tree.

Thanks for the new version, I'll look at it once I am done with the
cleanup of the current CF. For now I have moved it to the CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wait events for disk I/O

2017-01-30 Thread Michael Paquier

On Mon, Jan 30, 2017 at 10:01 PM, Rushabh Lathia
 wrote:
> Attached is the patch, which extend the existing wait event infrastructure
> to implement the wait events for the disk I/O. Basically pg_stat_activity's
> wait event information to show data about disk I/O as well as IPC primitives.
>
> Implementation details:
>
> - Added PG_WAIT_IO to pgstat.h and a new enum WaitEventIO
> - Added a wait_event_info argument to FileRead, FileWrite, FilePrefetch,
> FileWriteback, FileSync, and FileTruncate. Set this wait event just before
> performing the file system operation and clear it just after.
> - Pass down an appropriate wait event from  caller of any of those
> functions.
> - Also set and clear a wait event around standalone calls to read(),
> write(), fsync() in other parts of the system.
> - Added documentation for all newly added wait event.

Looks neat, those are unlikely to overlap with other wait events.

> Open issue:
> - Might missed few standalone calls to read(), write(), etc which need
> to pass the wait_event_info.

It may be an idea to use LD_PRELOAD with custom versions of read(),
write() and fsync(), and look at the paths where no flags are set in
MyProc->wait_event_info, and log information when that happens.

> Thanks to my colleague Robert Haas for his help in design.
> Please let me know your thought, and thanks for reading.

Did you consider a wrapper of the type pg_read_event() or
pg_write_event(), etc.?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 3:45 AM, Nikhil Sontakke
 wrote:
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record)
>>  (errmsg("unexpected timeline ID %u (should be %u)
>> in checkpoint record",
>>  checkPoint.ThisTimeLineID, ThisTimeLineID)));
>>
>> +KnownPreparedRecreateFiles(checkPoint.redo);
>>  RecoveryRestartPoint();
>>  }
>> And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC
>> files are not flushed to disk with this patch. This is a problem as a
>> new restart point is created... Having the flush in CheckpointTwoPhase
>> really makes the most sense.
>
> Having CheckPointTwoPhase() do the flush would mean shifting the data
> from KnownPreparedList into TwoPhaseState shmem.

Er, no. For CheckPointTwoPhase(), at recovery what needs to be done is
to have all the entries in KnownPreparedList() flushed to disk and
have those entries removed while holding a shared memory lock. And for
the rest we need to be careful to have PrescanPreparedTransactions,
RecoverPreparedTransactions and StandbyRecoverPreparedTransactions
aware of entries that are in KnownPreparedList(). Let's leave the
business of putting the information from KnownPreparedList to
TwoPhaseState in RecoverPreparedTransactions, which need to be aware
of entries in KnownPreparedList() anyway. The only thing that differs
is how the 2PC information is fetched: from the segments or from the
files in pg_twophase.

> I wonder what's the best location for this in the common case when we
> do shutdown of standby.  We could add code in XLOG_CHECKPOINT_SHUTDOWN
> and XLOG_CHECKPOINT_ONLINE xlog_redo code path.

ShutdownXLOG() calls CreateRestartPoint() when a standby shuts down,
so doing all the durability work in CheckPointTwoPhase() would take
care of any problems.

(moving patch to CF 2017-03)
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-01-30 Thread Haribabu Kommi

On Fri, Jan 27, 2017 at 5:32 PM, Dilip Kumar  wrote:

> On Tue, Jan 24, 2017 at 10:18 AM, Dilip Kumar 
> wrote:
> > I have changed as per the comments. 0002 and 0003 are changed, 0001 is
> > still the same.
>
> 2 days back my colleague Rafia, reported one issue (offlist) that
> parallel bitmap node is not scaling as good as other nodes e.g
> parallel sequence scan and parallel index scan.
>
> After some perf analysis, I found that there was one unconditional
> spin lock in parallel bitmap patch which we were taking for checking
> the prefetch target. Basically, we were always taking the lock and
> checking the prefetch_target is reached to prefetch_maximum. So even
> after it will reach to prefetch_maximum we will acquire the lock and
> check after every tuple. I have changed that logic, now I will check
> the condition first if we need to increase the target then will take
> the lock and recheck the condition.
>
> There is just one line change in 0003 compared to older version, all
> other patches are the same.
>
> Some performance data to show that new parallel bitmap patch performs
> way better than the previous version.
> TPCH scale 20, work_mem 64MB, shared buffers 8GB (4 workers)
> machine intel 8 socket machine
>
> v13(time in ms)   v14 (time in ms)
> Q637065.84118202.903
>
> Q14 15229.569 11341.121

Thanks for the update. I have some comments

0002-hash-support-alloc-free-v14.patch:

+ if (tb->alloc)
+ {

The memory for tb->alloc is allocated always, is the if check still
required?

0003-parallel-bitmap-heap-scan-v14.patch:

+ * and set parallel flag in lower level bitmap index scan. Later
+ * bitmap index node will use this flag to indicate tidbitmap that
+ * it needs to create an shared page table.
+ */

I feel better to mention, where this flag is used, so that it will be more
clear.

+ * Mark tidbitmap as shared and also store DSA area in it.
+ * marking tidbitmap as shared is indication that this tidbitmap
+ * should be created in shared memory. DSA area will be used for

The flag of shared is set in tidbitmap structure itself, but the
comment is mentioned that tidbitmpa should be created in shared memory.
I think that is the page table that needs to be created in shared memory.
Providing some more information here will be helpful.

- node->tbmres = tbmres = tbm_iterate(tbmiterator);
+ node->tbmres = tbmres = pbms_parallel_iterate(tbmiterator,
+ pbminfo ? >tbmiterator : NULL);

Instead Passing both normal and paralle iterators to the functions
and checking inside again for NULL, How about just adding check
in the caller itself? Or if you prefer the current method, then try to
explain the details of input in the function header and more description.

+ /* Increase prefetch target if it's not yet at the max. */
+ if (node->prefetch_target < node->prefetch_maximum)

I didn't evaluate all scenarios, but the above code may have a problem,
In case of parallel mode the the prefetch_target is fetched from node
and not from the pbminfo. Later it gets from the pminfo and update that.
May be this code needs to rewrite.

+ TBMIterator *iterator = node->prefetch_iterator;

Why another variable? Why can't we use the prefetch_iterator itself?
Currently node->tbmiterator and node->prefetch_iterator are initialized
irrespective of whether is it a parallel one or not. But while using, there
is a check to use the parallel one in case of parallel. if it is the case,
why can't we avoid the initialization itself?

+ else
+ needWait = false;

By default needWait is false. Just set that to true only for the
case of PBM_INPROGRESS

+ * so that during free we can directly get the dsa_pointe and free it.

dsa_pointe -> dsa_pointer

+typedef struct
+{
+ TIDBitmap  *tbm; /* TIDBitmap we're iterating over */
+ int spageptr; /* next spages index */
+ int schunkptr; /* next schunks index */
+ int schunkbit; /* next bit to check in current schunk */
+ TBMIterateResult output; /* MUST BE LAST (because variable-size) */
+} TBMIterator;

I didn't find the need of moving this structure. Can you point me where it
is used.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Potential data loss of 2PC files

2017-01-30 Thread Michael Paquier

On Fri, Jan 6, 2017 at 9:26 PM, Ashutosh Bapat
 wrote:
> On Wed, Jan 4, 2017 at 12:16 PM, Michael Paquier
>  wrote:
>> On Wed, Jan 4, 2017 at 1:23 PM, Ashutosh Bapat
>>  wrote:
>>> I don't have anything more to review in this patch. I will leave that
>>> commitfest entry in "needs review" status for few days in case anyone
>>> else wants to review it. If none is going to review it, we can mark it
>>> as "ready for committer".
>>
>> Thanks for the input!
>
> Marking this as ready for committer.

Patch is moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-01-30 Thread Michael Paquier

On Mon, Jan 30, 2017 at 10:52 PM, Heikki Linnakangas  wrote:
> So, if I understood correctly, the problem scenario is:
>
> 1. Create and write to a file.
> 2. fsync() the file.
> 3. Crash.
> 4. After restart, the file is gone.

Yes, that's a problem with fsync's durability, and we need to achieve
that at checkpoint. I find [1] a good read on the matter. That's
easier to decrypt than [2] or [3] in the POSIX spec..

[1]: 
http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/
[2]: http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
[3]: http://pubs.opengroup.org/onlinepubs/009695399/functions/rename.html

> If that can happen, don't we have the same problem in many other places?
> Like, all the SLRUs? They don't fsync the directory either.

Right, pg_commit_ts and pg_clog enter in this category.

> Is unlink() guaranteed to be durable, without fsyncing the directory? If
> not, then we need to fsync() the directory even if there are no files in it
> at the moment, because some might've been removed earlier in the checkpoint
> cycle.

Hm... I am not an expert in file systems. At least on ext4 I can see
that unlink() is atomic, but not durable. So if an unlink() is
followed by a power failure, the previously unlinked file could be
here if the parent directory is not fsync'd.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-01-30 Thread Claudio Freire

On Mon, Jan 30, 2017 at 5:51 AM, Masahiko Sawada  wrote:
> 
>  * We are willing to use at most maintenance_work_mem (or perhaps
>  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
>  * initially allocate an array of TIDs of that size, with an upper limit that
>  * depends on table size (this limit ensures we don't allocate a huge area
>  * uselessly for vacuuming small tables).  If the array threatens to overflow,
>
> I think that we need to update the above paragraph comment at top of
> vacuumlazy.c file.

Indeed, I missed that one. Fixing.

>
> 
> +   numtuples = Max(numtuples,
> MaxHeapTuplesPerPage);
> +   numtuples = Min(numtuples, INT_MAX / 2);
> +   numtuples = Min(numtuples, 2 *
> pseg->max_dead_tuples);
> +   numtuples = Min(numtuples,
> MaxAllocSize / sizeof(ItemPointerData));
> +   seg->dt_tids = (ItemPointer)
> palloc(sizeof(ItemPointerData) * numtuples);
>
> Why numtuples is limited to "INT_MAX / 2" but not INT_MAX?

I forgot to mention this one in the OP.

Googling around, I found out some implemetations of bsearch break with
array sizes beyond INT_MAX/2 [1] (they'd overflow when computing the
midpoint).

Before this patch, this bsearch call had no way of reaching that size.
An initial version of the patch (the one that allocated a big array
with huge allocation) could reach that point, though, so I reduced the
limit to play it safe. This latest version is back to the starting
point, since it cannot allocate segments bigger than 1GB, but I opted
to keep playing it safe and leave the reduced limit just in case.

> 
> @@ -1376,35 +1411,43 @@ lazy_vacuum_heap(Relation onerel, LVRelStats
> *vacrelstats)
> pg_rusage_init();
> npages = 0;
>
> -   tupindex = 0;
> -   while (tupindex < vacrelstats->num_dead_tuples)
> +   segindex = 0;
> +   tottuples = 0;
> +   for (segindex = tupindex = 0; segindex <=
> vacrelstats->dead_tuples.last_seg; tupindex = 0, segindex++)
> {
> -   BlockNumber tblk;
> -   Buffer  buf;
> -   Pagepage;
> -   Sizefreespace;
>
> This is a minute thing but tupindex can be define inside of for loop.

Right, changing.

>
> 
> @@ -1129,10 +1159,13 @@ lazy_scan_heap(Relation onerel, int options,
> LVRelStats *vacrelstats,
>   * instead of doing a second scan.
>   */
>  if (nindexes == 0 &&
> -vacrelstats->num_dead_tuples > 0)
> +vacrelstats->dead_tuples.num_entries > 0)
>  {
>  /* Remove tuples from heap */
> -lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, );
> +Assert(vacrelstats->dead_tuples.last_seg == 0);/*
> Should not need more
> + *
> than one segment per
> + * page */
>
> I'm not sure we need to add Assert() here but it seems to me that the
> comment and code is not properly correspond and the comment for
> Assert() should be wrote above of Assert() line.

Well, that assert is the one that found the second bug in
lazy_clear_dead_tuples, so clearly it's not without merit.

I'll rearrange the comments as you ask though.


Updated and rebased v7 attached.


[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776671
From d32610b0ad6b9413aa4b2d808019d3c67d578f3c Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.
---
 src/backend/commands/vacuumlazy.c| 314 ---
 src/test/regress/expected/vacuum.out |   8 +
 src/test/regress/sql/vacuum.sql  |  10 ++
 3 files changed, 268 insertions(+), 64 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 005440e..8cb614f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -12,11 +12,24 @@
  *
  * We are willing to use at most maintenance_work_mem (or perhaps
  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
- * initially allocate an array of TIDs of that size, with an upper limit that
+ * initially allocate an array of TIDs of 128MB, or an upper limit that
  * depends on table size (this limit ensures we don't allocate a huge area
- * uselessly for vacuuming small tables).  If the array threatens to overflow,
- * we suspend the heap scan phase and perform a pass of index

[HACKERS] Fix handling of ALTER DEFAULT PRIVILEGES

2017-01-30 Thread Stephen Frost

Greetings,

While working through some pg_dump regression tests, I came across a bug
in 9.6+ due to the changes I made to how GRANT/REVOKEs were handled.

The attached describes and fixes the bug, which only impacts dumping
from 9.6+ clusters, so hopefully not too many people have been impacted
by it (it's also a bit of an odd case anyway, as noted).

I'm still reviewing and playing with it, but anticipate pushing it
sometime tomorrow, along with a bunch of additional pg_dump regression
tests, built on the prior framework (as an independent patch, and just
for PG10, whereas the attached will be back-patched to 9.6).

Thoughts and comments welcome, of course, but it's a relatively
mechanical set of changes to match how other the ACLs for everything
else are handled (I checked all other callers of buildACLCommands()).

Thanks!

Stephen
From 77956955561711cf8e96b23cdbbee400586f5440 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 30 Jan 2017 13:22:17 -0500
Subject: [PATCH] pg_dump: Fix handling of ALTER DEFAULT PRIVILEGES

In commit 23f34fa, we changed how ACLs were handled to use the new
pg_init_privs catalog and to dump out the ACL commands as REVOKE+GRANT
combinations instead of trying to REVOKE all rights always and then
GRANT back just the ones which were in place.

Unfortunately, the DEFAULT PRIVILEGES system didn't quite get the
correct treatment with this change and ended up (incorrectly) only
including positive GRANTs instead of both the REVOKEs and GRANTs
necessary to preserve the correct privileges.

There are only a couple cases where such REVOKEs are possible because,
generally speaking, there's few rights which exist on objects by
default to be revoked.

Examples of REVOKEs which weren't being correctly preserved are when
privileges are REVOKE'd from the creator/owner, like so:

ALTER DEFAULT PRIVILEGES
  FOR ROLE myrole
  REVOKE SELECT ON TABLES FROM myrole;

or when other default privileges are being revoked, such as EXECUTE
rights granted to public for functions:

ALTER DEFAULT PRIVILEGES
  FOR ROLE myrole
  REVOKE EXECUTE ON FUNCTIONS FROM PUBLIC;

Fix this by correctly working out what the correct REVOKE statements are
(if any) and dump them out, just as we do for everything else.

Noticed while developing additional regression tests for pg_dump, which
will be landing shortly.

Back-patch to 9.6 where the bug was introduced.
---
 src/bin/pg_dump/dumputils.c | 23 -
 src/bin/pg_dump/dumputils.h |  4 ++-
 src/bin/pg_dump/pg_dump.c   | 63 -
 src/bin/pg_dump/pg_dump.h   |  3 +++
 4 files changed, 78 insertions(+), 15 deletions(-)

diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 81ec650..b41f2b9 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -364,11 +364,12 @@ buildACLCommands(const char *name, const char *subname,
  */
 bool
 buildDefaultACLCommands(const char *type, const char *nspname,
-		const char *acls, const char *owner,
+		const char *acls, const char *racls,
+		const char *initacls, const char *initracls,
+		const char *owner,
 		int remoteVersion,
 		PQExpBuffer sql)
 {
-	bool		result;
 	PQExpBuffer prefix;
 
 	prefix = createPQExpBuffer();
@@ -384,14 +385,22 @@ buildDefaultACLCommands(const char *type, const char *nspname,
 	if (nspname)
 		appendPQExpBuffer(prefix, "IN SCHEMA %s ", fmtId(nspname));
 
-	result = buildACLCommands("", NULL,
-			  type, acls, "", owner,
-			  prefix->data, remoteVersion,
-			  sql);
+	if (strlen(initacls) != 0 || strlen(initracls) != 0)
+	{
+		appendPQExpBuffer(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);\n");
+		if (!buildACLCommands("", NULL, type, initacls, initracls, owner,
+			  prefix->data, remoteVersion, sql))
+			return false;
+		appendPQExpBuffer(sql, "SELECT pg_catalog.binary_upgrade_set_record_init_privs(false);\n");
+	}
+
+	if (!buildACLCommands("", NULL, type, acls, racls, owner,
+		  prefix->data, remoteVersion, sql))
+		return false;
 
 	destroyPQExpBuffer(prefix);
 
-	return result;
+	return true;
 }
 
 /*
diff --git a/src/bin/pg_dump/dumputils.h b/src/bin/pg_dump/dumputils.h
index ba61fa9..d7eecdf 100644
--- a/src/bin/pg_dump/dumputils.h
+++ b/src/bin/pg_dump/dumputils.h
@@ -41,7 +41,9 @@ extern bool buildACLCommands(const char *name, const char *subname,
  const char *owner, const char *prefix, int remoteVersion,
  PQExpBuffer sql);
 extern bool buildDefaultACLCommands(const char *type, const char *nspname,
-		const char *acls, const char *owner,
+		const char *acls, const char *racls,
+		const char *initacls, const char *initracls,
+		const char *owner,
 		int remoteVersion,
 		PQExpBuffer sql);
 extern void buildShSecLabelQuery(PGconn *conn, const char *catalog_name,
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a6de9d7..35ac05e 100644
---

Re: [HACKERS] Allow interrupts on waiting standby

2017-01-30 Thread Michael Paquier

On Fri, Jan 27, 2017 at 10:17 PM, Michael Paquier
 wrote:
> Two things I forgot in this patch:
> - documentation for the new wait event
> - the string for the wait event or this would show up as "???" in
> pg_stat_activity.
> There are no default clauses in the pgstat_get_wait_* routines so my
> compiler is actually complaining...

Both things are fixed in the new version attached. I have added as
well this patch to the next commit fest:
https://commitfest.postgresql.org/13/977/
-- 
Michael


standby-delay-latch-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

2017-01-30 Thread Tom Lane

"Daniel Verite"  writes:
>   Tom Lane wrote:
>> Evidently, this test script is relying on the preceding behavior that
>> setting a bool variable to an empty string was equivalent to setting
>> it to "true".  If it's just that script I would be okay with saying
>> "well, it's a bug in that script" ... but I'm a bit worried that this
>> may be the tip of the iceberg, ie maybe a lot of people have done
>> things like this.  Should we reconsider the decision to reject empty
>> strings in ParseVariableBool?
> ...
> So it does cause regressions. I don't know if we should reaccept
> empty strings immediately to fix that, but in the long run, I think
> that the above situation with the empty :AUTOCOMMIT is not really
> sustainable: when we extend what we do with variables
> (/if /endif and so on), it will become even more of a problem.

Yeah, in a green field we'd certainly not allow this, but the problem
I'm having is that in all previous versions you could do, eg,

\set ON_ERROR_STOP
... stuff expecting ON_ERROR_STOP to be on
\unset ON_ERROR_STOP
... stuff expecting ON_ERROR_STOP to be off

and it works, and looks perfectly natural, and people may well be relying
on that.  Especially since the docs aren't very clear that this shouldn't
work --- psql-ref.sgml repeatedly uses phrases like "FOO is set" to
indicate that the boolean variable FOO is considered to be "on".

Moreover, the committed patch is inconsistent in that it forbids
only one of the above.  Why is it okay to treat unset as "off",
but not okay to treat the default empty-string value as "on"?

Maybe it's worth breaking backwards compatibility on this point, but
I'm feeling unconvinced.  This seems rather different from rejecting
clearly-wrongly-spelled values.

One possible compromise that would address your concern about display
is to modify the hook API some more so that variable hooks could actually
substitute new values.  Then for example the bool-variable hooks could
effectively replace "\set AUTOCOMMIT" by "\set AUTOCOMMIT on" and
"\unset AUTOCOMMIT" by "\set AUTOCOMMIT off", ensuring that interrogation
of their values always produces sane results.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

2017-01-30 Thread Daniel Verite

Tom Lane wrote:

> Evidently, this test script is relying on the preceding behavior that
> setting a bool variable to an empty string was equivalent to setting
> it to "true".  If it's just that script I would be okay with saying
> "well, it's a bug in that script" ... but I'm a bit worried that this
> may be the tip of the iceberg, ie maybe a lot of people have done
> things like this.  Should we reconsider the decision to reject empty
> strings in ParseVariableBool?

Sigh. It was considered upthread, I'm requoting the relevant bit:

 if (pg_strncasecmp(value, "true", len) == 0)
  return true;
  It happens that "" as a value yields the same result than "true" for this
  test so it passes, but it seems rather unintentional.

  The resulting problem from the POV of the user is that we can do that,
  for instance:

test=> \set AUTOCOMMIT

  without error message or feedback, and that leaves us without much
  clue about autocommit being enabled:

test=> \echo :AUTOCOMMIT

test=> 

  So I've changed ParseVariableBool() in v4 to reject empty string as an
  invalid boolean (but not NULL since the startup logic requires NULL
  meaning false in the early initialization of these variables).

  "make check" seems OK with that, I hope it doesn't cause any regression
  elsewhere.

So it does cause regressions. I don't know if we should reaccept
empty strings immediately to fix that, but in the long run, I think
that the above situation with the empty :AUTOCOMMIT is not really
sustainable: when we extend what we do with variables
(/if /endif and so on), it will become even more of a problem.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

2017-01-30 Thread Daniel Verite

Tom Lane wrote:

> Also, if you want to argue that allowing it to change intra-
> transaction is too confusing, why would we only forbid this direction
> of change and not both directions?

The thread "Surprising behaviour of \set AUTOCOMMIT ON" at:
https://www.postgresql.org/message-id/CAH2L28sTP-9dio3X1AaZRyWb0-ANAx6BDBi37TGmvw1hBiu0oA%40mail.gmail.com
seemed to converge towards the conclusion implemented in the autocommit_hook
proposed in the patch.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-01-30 Thread David Rowley

On 31 January 2017 at 13:10, David Rowley  wrote:
> I've attached a patch which implements this.

Please disregards previous patch. (I forgot git commit before git diff
to make the patch)

I've attached the correct patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


unique_joins_2017-01-31a.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

2017-01-30 Thread Tom Lane

So I pushed this, and the buildfarm members that are testing RedisFDW
immediately fell over:

*** /home/andrew/bf/root/HEAD/redis_fdw.build/test/expected/redis_fdw.out   
2017-01-30 18:20:27.440677318 -0500
--- /home/andrew/bf/root/HEAD/redis_fdw.build/test/results/redis_fdw.out
2017-01-30 18:32:33.404677320 -0500
***
*** 26,31 
--- 26,32 
 options (database '15', tabletype 'zset');
  -- make sure they are all empty - if any are not stop the script right now
  \set ON_ERROR_STOP
+ unrecognized value "" for "ON_ERROR_STOP": boolean expected
  do $$
declare
  rows bigint;

==

Evidently, this test script is relying on the preceding behavior that
setting a bool variable to an empty string was equivalent to setting
it to "true".  If it's just that script I would be okay with saying
"well, it's a bug in that script" ... but I'm a bit worried that this
may be the tip of the iceberg, ie maybe a lot of people have done
things like this.  Should we reconsider the decision to reject empty
strings in ParseVariableBool?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-01-30 Thread David Rowley

On 28 January 2017 at 05:44, Tom Lane  wrote:
> I wrote:
>> David Rowley  writes:
>>> hmm. I'm having trouble understanding why this is a problem for Unique
>>> joins, but not for join removal?
>
>> Ah, you know what, that's just mistaken.  I was thinking that we
>> short-circuited the join on the strength of the hash (or merge) quals
>> only, but actually we check all the joinquals first.  As long as the
>> uniqueness proof uses only joinquals and not conditions that will end up
>> as otherquals, it's fine.
>
> Actually, after thinking about that some more, it seems to me that there
> is a performance (not correctness) issue here: suppose that we have
> something like
>
> select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y
>
> If there's a unique index on t2.x, we'll be able to mark the join
> inner-unique.  However, short-circuiting would only occur after
> finding a row that passes both joinquals.  If the y condition is
> true for only a few rows, this would pretty nearly disable the
> optimization.  Ideally we would short-circuit after testing the x
> condition only, but there's no provision for that.

I've attached a patch which implements this, though only for
MergeJoin, else I'd imagine we'd also need to ensure all proofs used
for testing the uniqueness were also hash-able too. I added some XXX
comments in analyzejoin.c around the mergeopfamilies == NIL tests to
mention that Merge Join depends on all the unique proof quals having
mergeopfamilies.  This also assumes we'll never use some subset of
mergejoin-able quals for a merge join, which could be an interesting
area in the future, as we might have some btree index on a subset of
those columns to provide pre-sorted input. In short, it's a great
optimisation, but I'm a little scared we might break it one day.

Implementing this meant removing the match_first_row_only being set
for JOIN_SEMI, as this optimisation only applies to unique_inner and
not JOIN_SEMI alone. This also means we should be checking for unique
properties on JOIN_SEMI now too, which I've enabled.

Also, I've removed all jointype checks from innerrel_is_unique(). I
took a bit of time to think about JOIN_UNIQUE_OUTER and
JOIN_UNIQUE_INNER. I think these are fine too, in fact one of the
regression test plans moved away from using a Semi Join due to proving
that the inner side was unique. That could perhaps allow a better
plan, since the join order can be swapped. I'd really like someone
else to have a think about that too, just to make sure I've not
blundered that.

I've put the join removal code back to the way it was before. As I
mentioned, we can't use the caches here since we're also using
additional quals for proofs in this case.

I wasn't sure if I should add some regression tests which exercises
MergeJoin a bit to test the new optimisation. Any thoughts on that?

David

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

unique_joins_2017-01-31.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-01-30 Thread Tom Lane

Andres Freund  writes:
> Wonder if we there's an argument to be made for implementing this
> roughly similarly to split_pathtarget_at_srf - instead of injecting a
> ProjectSet node we'd add a FunctionScan node below a Result node.

Yeah, possibly.  That would have the advantage of avoiding an ExecProject
step when the SRFs aren't buried, which would certainly be the expected
case.

If you don't want to make ExecInitExpr responsible, then the planner would
have to do something like split_pathtarget_at_srf anyway to decompose the
expressions, no matter which executor representation we use.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Subtle bug in "Simplify tape block format" commit

2017-01-30 Thread Peter Geoghegan

On Mon, Jan 30, 2017 at 10:01 AM, Peter Geoghegan  wrote:
> Let me take another look at this later today before proceeding. I want
> to run it against a custom test suite I've developed.

I've done so. Some more thoughts:

* I don't think that this is really any less efficient than my patch.
It's just that the write of a block happens within ltsWriteBlock()
automatically, rather than being an explicit thing that comes at the
tail-end of a run. This is more general, but otherwise very similar.
The downside is that it will work when something that we might want to
break happens, but I do concede that that's worth it.

* If you're writing out any old bit pattern, I suggest using 0x7F
bytes rather than nul bytes (I do agree that it does seem worth making
this some particular bit pattern, so as to not have to bother with
Valgrind suppressions and so on). That pattern immediately gives you a
hint on where to look for more information when there is a problem. To
me, it suggests "this is something weird". We don't want this to
happen very often.

* I think that the name lts->nBlocksAllocated works better than
lts->nBlocksReserved. After all, you are still writing stuff out in a
way that respects BufFile limitations around holes, and still
reporting that to the user via trace_sort as the number of blocks
actually written.

* I think that you should put the new code into a new function, called
ltsAllocBlocksUntil(), or similar -- this can do the BufFileWrite()
stuff itself, with a suitable custom defensive elog(ERROR) message.
That way, the only new branch needed in ltsWriteBlock() is "if
(blocknum > lts->nBlocksWritten)" (maybe use unlikely() too?), and you
can make it clear that ltsAllocBlocksUntil() is a rarely needed thing,
which seems appropriate.

We really don't want ltsAllocBlocksUntil() logic to be called very
often, and certainly not to write more than 1 or 2 blocks at a time
(no more than 1 with current usage). After all, that would mean
writing to the same position twice or more, for no reason at all.
Maybe note in comments that it's only called at the end of a run in
practice, or something to that effect. Keeping writes sequential is
very important, to keep logtape block reclamation effective.

I was actually planning on introducing a distinction between blocks
allocated and blocks written anyway, for the benefit of logtape.c
parallel sort, where holes can be left behind post-unification (the
start of each workers space needs to be aligned to
MAX_PHYSICAL_FILESIZE, so multi-block ranges of holes will be common).
So, your approach makes sense to me, and I now slightly prefer your
patch to my original.

Thanks
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-01-30 Thread Andres Freund

On 2017-01-30 17:24:31 -0500, Tom Lane wrote:
> Make it work like Agg and WindowFunc.  To wit, dump the actually special
> function calls (the set-returning functions) into a list that's internal
> to the FunctionScan node, and then anything above those goes into scalar
> expressions in the node's tlist, which refer to the SRF outputs using
> Vars or things morally equivalent to Vars.

Hm. That should be fairly doable.  (I'd advocate very strongly against
building that list via ExecInitExpr, but that's an implementation
detail).  We'd evaluate SRFs early, but that's just consistent with
targetlist SRFs.

Wonder if we there's an argument to be made for implementing this
roughly similarly to split_pathtarget_at_srf - instead of injecting a
ProjectSet node we'd add a FunctionScan node below a Result node.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch proposal

2017-01-30 Thread Michael Paquier

On Tue, Jan 31, 2017 at 4:49 AM, David Steele  wrote:
> On 1/27/17 3:19 AM, Venkata B Nagothi wrote:
>> I have split the patch into two different
>> pieces. One is to determine if the recovery can start at all and other
>> patch deals with the incomplete recovery situation.
>
> I think the first patch looks promising and I would rather work through
> that before considering the second patch.  Once there are tests for the
> first patch I will complete my review.

Based on that, I am moving the patch to next CF with "Needs Review".
Venkata, please be careful in updating correctly the patch status, it
was still on "Waiting on Author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] kNN for SP-GiST

2017-01-30 Thread Nikita Glukhov


Hello hackers,

I'd like to present a series of patches which is a continuation of
Vlad Sterzhanov's work on kNN for SP-GiST that was done for GSOC'14.

Original thread: "KNN searches support for SP-GiST [GSOC'14]"
https://www.postgresql.org/message-id/cak2ajc0-jhrb3ujozl+arbchotvwbejvprn-jofenn0va6+...@mail.gmail.com


I've done the following:
  * rebased onto HEAD
  * fixed several bugs
  * fixed indentation and unnecessary whitespace changes
  * implemented logic for order-by in spgvalidate() and spgproperty()
  * used pairing_heap instead of RBTree for the priority queue
(as it was done earlier in GiST)
  * used existing traversalValue* fields instead of the newly added suppValue* 
fields
  * added caching of support functions infos
  * added recheck support for distances
  * refactored code
 - simplified some places
 - some functions were moved from spgproc.c to spgscan.c
   (now spgproc.c contains only one public function spg_point_distance()
that can be moved somewhere and this file can be removed then)
 - extracted functions spgTestLeafTuple(), spgMakeInnerItem()
  * added new opclasses for circles and polygons with order-by-distance support
   (it was originally intended for kNN recheck testing)
  * improved tests for point_ops


Below is a very brief description of the patches:
1. Extracted two subroutines from GiST code (patch is the same as in "kNN for 
btree").
2. Exported two box functions that are used in patch 5.
3. Extracted subroutine from GiST code that is used in patch 4.
4. Main patch: added kNN support to SP-GiST.
5. Added ordering operators to kd_point_ops and quad_point_ops.
6. Added new SP-GiST support function spg_compress (gist_compress analogue)
for compressed (lossy) representation of types in SP-GiST, used in patch 7.
7. Added new SP-GiST quad-tree opclasses for circles and polygons
(reused existing quad-tree box_ops).

If you want to see the step-by-step sequence of changes applied to the original 
patch,
you can see it on my github: 
https://github.com/glukhovn/postgres/commits/knn_spgist

Please note that the tests for circle_ops and poly_ops will not work until
the а bug found in SP-GiST box_ops will not be fixed
(see 
https://www.postgresql.org/message-id/9ea5b157-478c-8874-bc9b-050076b7d042%40postgrespro.ru).

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index f92baed..1f6b671 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -23,6 +23,7 @@
 #include "storage/lmgr.h"
 #include "utils/builtins.h"
 #include "utils/syscache.h"
+#include "utils/lsyscache.h"
 
 
 /*
@@ -851,12 +852,6 @@ gistproperty(Oid index_oid, int attno,
 			 IndexAMProperty prop, const char *propname,
 			 bool *res, bool *isnull)
 {
-	HeapTuple	tuple;
-	Form_pg_index rd_index PG_USED_FOR_ASSERTS_ONLY;
-	Form_pg_opclass rd_opclass;
-	Datum		datum;
-	bool		disnull;
-	oidvector  *indclass;
 	Oid			opclass,
 opfamily,
 opcintype;
@@ -890,49 +885,28 @@ gistproperty(Oid index_oid, int attno,
 	}
 
 	/* First we need to know the column's opclass. */
-
-	tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(index_oid));
-	if (!HeapTupleIsValid(tuple))
+	opclass = get_index_column_opclass(index_oid, attno);
+	if (!OidIsValid(opclass))
 	{
 		*isnull = true;
 		return true;
 	}
-	rd_index = (Form_pg_index) GETSTRUCT(tuple);
-
-	/* caller is supposed to guarantee this */
-	Assert(attno > 0 && attno <= rd_index->indnatts);
-
-	datum = SysCacheGetAttr(INDEXRELID, tuple,
-			Anum_pg_index_indclass, );
-	Assert(!disnull);
-
-	indclass = ((oidvector *) DatumGetPointer(datum));
-	opclass = indclass->values[attno - 1];
-
-	ReleaseSysCache(tuple);
 
 	/* Now look up the opclass family and input datatype. */
-
-	tuple = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclass));
-	if (!HeapTupleIsValid(tuple))
+	if (!get_opclass_opfamily_and_input_type(opclass, , ))
 	{
 		*isnull = true;
 		return true;
 	}
-	rd_opclass = (Form_pg_opclass) GETSTRUCT(tuple);
-
-	opfamily = rd_opclass->opcfamily;
-	opcintype = rd_opclass->opcintype;
-
-	ReleaseSysCache(tuple);
 
 	/* And now we can check whether the function is provided. */
-
 	*res = SearchSysCacheExists4(AMPROCNUM,
  ObjectIdGetDatum(opfamily),
  ObjectIdGetDatum(opcintype),
  ObjectIdGetDatum(opcintype),
  Int16GetDatum(procno));
+	isnull = false;
+
 	return true;
 }
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index aff88a5..26c81c9 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1050,6 +1050,32 @@ get_opclass_input_type(Oid opclass)
 	return result;
 }
 
+/*
+ * get_opclass_family_and_input_type
+ *
+ *		Returns the OID of the operator family the opclass belongs to,
+ *the OID of the datatype the opclass indexes
+ */
+bool

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-30 Thread Tom Lane

Haribabu Kommi  writes:
> On Mon, Jan 30, 2017 at 5:18 PM, Michael Paquier 
> wrote:
>> #define USER_AUTH_LAST uaPeer
>> StaticAssertStmt(lengthof(UserAuthName) == USER_AUTH_LAST + 1,
>> "UserAuthName must include all user authentication names");

> Thanks for the review. Added the static assert statement.

This isn't exactly bulletproof, since somebody could add another enum
value and forget to update the macro.  Still, it's better than nothing.
I tried to make it a shade more idiot-proof by putting the #define
physically inside the enum list --- you'd have to really have blinders
on to not notice it there.  (Not that people haven't made equally silly
mistakes :-(.)

Pushed with that adjustment.  Thanks for working on this!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-01-30 Thread Tom Lane

Andres Freund  writes:
> On 2017-01-30 16:55:56 -0500, Tom Lane wrote:
>> No, but it allows whatever looks syntactically like a function, including
>> casts.  IIRC, we made func_expr work that way ages ago to deflect
>> complaints that it wasn't very clear why some things-that-look-like-
>> functions were allowed in CREATE INDEX and others not.

> But given e.g. the above example that's just about no limitation at all,
> because you can nest nearly arbitrarily complex things within the
> expression.

Yeah, exactly.  But that's true anyway because even if it was
syntactically a plain function, it might've been a SQL function that the
planner chooses to inline.

>> But are we prepared to break working queries?

> Within some limits, we imo should be.

In this case I think the argument for rejecting is pretty darn weak;
it's an arbitrary implementation restriction, not anything with much
principle to it.

>> We could probably fix this with the modification that was discussed
>> previously, to allow FunctionScan nodes to project a scalar tlist
>> from the outputs of their SRFs.

> Hm. I'm not quite following. Could you expand?

Make it work like Agg and WindowFunc.  To wit, dump the actually special
function calls (the set-returning functions) into a list that's internal
to the FunctionScan node, and then anything above those goes into scalar
expressions in the node's tlist, which refer to the SRF outputs using
Vars or things morally equivalent to Vars.

This would not support nested SRFs in FROM, but that case has always
failed and I've heard no field requests to make it work, so I don't feel
bad about keeping that restriction.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-01-30 Thread Andres Freund

Hi,

On 2017-01-30 16:55:56 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-01-27 17:58:04 +0530, Rushabh Lathia wrote:
> >> SELECT  *
> >> FROM pg_constraint pc,
> >> CAST(CASE WHEN pc.contype IN ('f','u','p') THEN generate_series(1,
> >> array_upper(pc.conkey, 1)) ELSE NULL END AS int) AS position;
> >> 
> >> Above query is failing with "set-valued function called in context that
> >> cannot accept a set".
> 
> > I think that's correct. Functions in FROM are essentially a shorthand
> > for ROWS FROM(). And ROWS FROM doesn't allow arbitrary expressions.
> 
> No, but it allows whatever looks syntactically like a function, including
> casts.  IIRC, we made func_expr work that way ages ago to deflect
> complaints that it wasn't very clear why some things-that-look-like-
> functions were allowed in CREATE INDEX and others not.

But given e.g. the above example that's just about no limitation at all,
because you can nest nearly arbitrarily complex things within the
expression.

> > If, I didn't check, that worked previously, I think that was more
> > accident than intent.
> 
> Yeah, probably.

Really looks that way. I think it only works that way because we hit the
recovery branch for:
/*
 * Normally the passed expression tree will be a FuncExprState, since 
the
 * grammar only allows a function call at the top level of a table
 * function reference.  However, if the function doesn't return set then
 * the planner might have replaced the function call via 
constant-folding
 * or inlining.  So if we see any other kind of expression node, execute
 * it via the general ExecEvalExpr() code; the only difference is that 
we
 * don't get a chance to pass a special ReturnSetInfo to any functions
 * buried in the expression.
 */
which does a normal ExecEvalExpr() whenever the expression to be
evaluated isn't a FuncExpr.

At the very least I think we need to amend that paragraph explaining
that there's a bunch of other cases it can be hit. And add tests for it.

> But are we prepared to break working queries?

Within some limits, we imo should be.

> As I understood it, the agreement on this whole tlist-SRF change
> was that we would not change any behavior that wasn't ill-defined.

I'd argue that behaviour that only worked through some edge case is
kinda ill defined ;) (and no, I'm not that serious)

> We could probably fix this with the modification that was discussed
> previously, to allow FunctionScan nodes to project a scalar tlist
> from the outputs of their SRFs.

Hm. I'm not quite following. Could you expand?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra


On 01/30/2017 09:37 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:


The 'built' flags may be easily replaced with a check if the bytea-like
columns are NULL, and the 'enabled' columns may be replaced by the array of
char, just like you proposed.

That'd give us a single catalog looking like this:

pg_mv_statistics
  starelid
  staname
  stanamespace
  staowner  -- all the above as currently
  staenabledarray of "char" {d,f,s}
  stakeys
  stadeps  (dependencies)
  standist (ndistinct coefficients)
  stamcv   (MCV list)
  stahist  (histogram)

Which is probably a better / simpler structure than the current one.


Looks good to me.  I don't think we need to keep the names very short --
I would propose "standistinct", "stahistogram", "stadependencies".



Yeah, I got annoyed by the short names too.

This however reminds me that perhaps pg_mv_statistic is not the best 
name. I know others proposed pg_statistic_ext (and pg_stats_ext), and 
while I wasn't a big fan initially, I think it's a better name. People 
generally don't know what 'multivariate' means, while 'extended' is 
better known (e.g. because Oracle uses it for similar stuff).


So I think I'll switch to that name too.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-01-30 Thread Tom Lane

Andres Freund  writes:
> On 2017-01-27 17:58:04 +0530, Rushabh Lathia wrote:
>> SELECT  *
>> FROM pg_constraint pc,
>> CAST(CASE WHEN pc.contype IN ('f','u','p') THEN generate_series(1,
>> array_upper(pc.conkey, 1)) ELSE NULL END AS int) AS position;
>> 
>> Above query is failing with "set-valued function called in context that
>> cannot accept a set".

> I think that's correct. Functions in FROM are essentially a shorthand
> for ROWS FROM(). And ROWS FROM doesn't allow arbitrary expressions.

No, but it allows whatever looks syntactically like a function, including
casts.  IIRC, we made func_expr work that way ages ago to deflect
complaints that it wasn't very clear why some things-that-look-like-
functions were allowed in CREATE INDEX and others not.

> If, I didn't check, that worked previously, I think that was more
> accident than intent.

Yeah, probably.  But are we prepared to break working queries?
As I understood it, the agreement on this whole tlist-SRF change
was that we would not change any behavior that wasn't ill-defined.

We could probably fix this with the modification that was discussed
previously, to allow FunctionScan nodes to project a scalar tlist
from the outputs of their SRFs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-01-30 Thread Andres Freund

Hi,

On 2017-01-27 17:58:04 +0530, Rushabh Lathia wrote:
> Consider the below test;
> 
> CREATE TABLE tab ( a int primary key);
> 
> SELECT  *
> FROM pg_constraint pc,
> CAST(CASE WHEN pc.contype IN ('f','u','p') THEN generate_series(1,
> array_upper(pc.conkey, 1)) ELSE NULL END AS int) AS position;
> 
> Above query is failing with "set-valued function called in context that
> cannot
> accept a set".

I think that's correct. Functions in FROM are essentially a shorthand
for ROWS FROM(). And ROWS FROM doesn't allow arbitrary expressions.  It
works if you remove the CASE because then it's a valid ROWS FROM
content.


If, I didn't check, that worked previously, I think that was more
accident than intent.

> But if I remove the CASE from the query then it working just
> good.
> 
> Like:
> 
> SELECT  *
> FROM pg_constraint pc,
> CAST(generate_series(1, array_upper(pc.conkey, 1)) AS int) AS position;

This IMO shouldn't work either due to the CAST. But indeed it does.


> This started failing with 69f4b9c85f168ae006929eec44fc44d569e846b9. It seems
> check_srf_call_placement() sets the hasTargetSRFs flag and but when the SRFs
> at the rtable ofcourse this flag doesn't get set. It seems like missing
> something
> their, but I might be completely wrong as not quire aware of this area.

That's right, because it's not in the targetlist.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-01-30 Thread Tom Lane

David Rowley  writes:
> On 31 January 2017 at 04:56, Tom Lane  wrote:
>> I'm not following.  If the join removal code had reached the stage of
>> making a uniqueness check, and that check had succeeded, the join would be
>> gone and there would be no repeat check later.  If it didn't reach the
>> stage of making a uniqueness check, then again there's no duplication.

> I had forgotten the unique check was performed last. In that case the
> check for unused columns is duplicated needlessly each time.

I think we do need to repeat that each time, as columns that were formerly
used in a join condition to a now-dropped relation might thereby have
become unused.

> But let's
> drop it, as putting the code back is not making things any worse.

Agreed, if there is something to be won there, we can address it
separately.

> I don't think that's possible. The whole point that the current join
> removal code retries to remove joins which it already tried to remove,
> after a successful removal is exactly because it is possible for a
> join to become provability unique on the removal of another join.

Not seeing that ... example please?

> If you remove that retry code, a regression test fails.

Probably because of the point about unused columns...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-30 Thread Peter Eisentraut

On 1/25/17 12:54 AM, Ashutosh Bapat wrote:
> The documentation available at
> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
> does not make it clear that the lower bound of a range partition is
> always inclusive and the higher one is exclusive. I think a note in
> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
> would be helpful.

Hmm.  I see the practical use of that, but I think this is going to be a
source of endless confusion.  Can we make that a bit clearer in the
syntax, for example by using additional keywords (INCLUSIVE/EXCLUSIVE)?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-01-30 Thread David Rowley

On 31 January 2017 at 04:56, Tom Lane  wrote:
> David Rowley  writes:
>> I can make this change, but before I do I just want to point that I
>> don't think what you've said here is entirely accurate.
>
>> Let's assume unique joins are very common place, and join removals are
>> not so common. If a query has 5 left joins, and only one of which can
>> be removed, then the new code will most likely perform 5 unique join
>> checks, whereas the old code would perform 9, as those unique checks
>> are performed again once the 1 relation is removed for the remaining
>> 4.
>
> I'm not following.  If the join removal code had reached the stage of
> making a uniqueness check, and that check had succeeded, the join would be
> gone and there would be no repeat check later.  If it didn't reach the
> stage of making a uniqueness check, then again there's no duplication.

I had forgotten the unique check was performed last. In that case the
check for unused columns is duplicated needlessly each time. But let's
drop it, as putting the code back is not making things any worse.

> There will be some advantage in making a negative cache entry if join
> removal performs a uniqueness check that fails, but I don't really see
> why that's hard.  It does not seem like removal of a relation could
> cause another rel to become unique that wasn't before, so keeping
> negative cache entries across join removals ought to be safe.

I don't think that's possible. The whole point that the current join
removal code retries to remove joins which it already tried to remove,
after a successful removal is exactly because it is possible for a
join to become provability unique on the removal of another join. If
you remove that retry code, a regression test fails. I believe this is
because there initially would have been more than one RHS rel, and the
bitmap singleton check would have failed. After all a unique index is
on a single relation, so proofs don't exist when >1 rel is on the RHS.

In any case, it's not possible to use the cache with join removals, as
we use the otherquals for unique tests in join removals, but we can't
for unique joins, as I'm  adding those optimizations to the executor
which rely on the uniqueness being on the join condition alone, so we
can skip to the next outer tuple on matched join, but unmatched quals.
If I change how join removals work in that regard it will disallow
join removals where they were previously possible. So that's a no go
area.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

2017-01-30 Thread Tom Lane

I wrote:
> Rahila Syed  writes:
>>> +* Switching AUTOCOMMIT from OFF to ON is not allowed when inside a
>>> +* transaction, because it cannot be effective until the current
>>> +* transaction is ended.

>> The above change in autocommit behaviour needs to be documented.

> Yeah, definitely.

Actually ... while trying to write some documentation for that, I found
myself wondering why we need such a prohibition at all.  If you are inside
a transaction, then autocommit has no effect until after you get out of
the transaction, and the documentation about it seems clear enough on the
point.  Also, if you want to argue that allowing it to change intra-
transaction is too confusing, why would we only forbid this direction
of change and not both directions?

I'm afraid we might be breaking some peoples' scripts to no particularly
good end, so I'm going to leave this out of the committed patch.  If you
think this really is a valid change to make, we can commit it separately,
but let's discuss it on its own merits.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Alvaro Herrera

Tomas Vondra wrote:

> The 'built' flags may be easily replaced with a check if the bytea-like
> columns are NULL, and the 'enabled' columns may be replaced by the array of
> char, just like you proposed.
> 
> That'd give us a single catalog looking like this:
> 
> pg_mv_statistics
>   starelid
>   staname
>   stanamespace
>   staowner  -- all the above as currently
>   staenabled  array of "char" {d,f,s}
>   stakeys
>   stadeps  (dependencies)
>   standist (ndistinct coefficients)
>   stamcv   (MCV list)
>   stahist  (histogram)
> 
> Which is probably a better / simpler structure than the current one.

Looks good to me.  I don't think we need to keep the names very short --
I would propose "standistinct", "stahistogram", "stadependencies".

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: test_pg_dump TAP test whitespace cleanup

2017-01-30 Thread Alvaro Herrera

Stephen Frost wrote:

> Uh, after finding our perltidyrc and running with that, I have to ask,
> why are we setting vertical-tightness to 2 (never break before a closing
> token)?  The default of 0 (always break) end up being a lot nicer when
> working on lists.  Stacking isolated opening/closing tokens does make
> sense, to me, at least.  Then again, I guess those might screw up other
> bits of code. :/

I think it's mostly a matter of personal preference among the (small
group of) people who first set up the perltidy args -- see the thread
around here:
https://www.postgresql.org/message-id/20120616024516.gc26...@momjian.us

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-30 Thread Pavel Stehule

Hi

2017-01-30 20:38 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
>
> > I am sending new version - it is based on own executor scan node and
> > tuplestore.
> >
> > Some now obsolete regress tests removed, some new added.
> >
> > The executor code (memory context usage) should be cleaned little bit -
> but
> > other code should be good.
>
> I think you forgot nodeTableFuncscan.c.
>

true, I am sorry

attached

Regards

Pavel


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


xmltable-41.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: test_pg_dump TAP test whitespace cleanup

2017-01-30 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> 
> > > This will be undone by the next perltidy run.
> > 
> > Ugh.
> > 
> > I certainly hope what was there before wasn't the result of a perltidy
> > run as it was quite ugly and inconsistent..
> 
> Maybe it was.  I checked the diff after running perltidy after your
> commit and there are some changes that look like a reversion of your
> changes, but I don't know if there are other changes.

Uh, after finding our perltidyrc and running with that, I have to ask,
why are we setting vertical-tightness to 2 (never break before a closing
token)?  The default of 0 (always break) end up being a lot nicer when
working on lists.  Stacking isolated opening/closing tokens does make
sense, to me, at least.  Then again, I guess those might screw up other
bits of code. :/

Sigh.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra


On 01/30/2017 05:12 PM, Alvaro Herrera wrote:
>

Hmm.  So we have a catalog pg_mv_statistics which stores two things:
1. the configuration regarding mvstats that have been requested by user
   via CREATE/ALTER STATISTICS
2. the actual values captured from the above, via ANALYZE

I think this conflates two things that really are separate, given their
different timings and usage patterns.  This decision is causing the
catalog to have columns enabled/built flags for each set of stats
requested, which looks a bit odd.  In particular, the fact that you have
to heap_update the catalog in order to add more stuff as it's built
looks inconvenient.

Have you thought about having the "requested" bits be separate from the
actual computed values?  Something like

pg_mv_statistics
  starelid
  staname
  stanamespace
  staowner   -- all the above as currently
  staenabledarray of "char" {d,f,s}
  stakeys
// no CATALOG_VARLEN here

where each char in the staenabled array has a #define and indicates one
type, "ndistinct", "functional dep", "selectivity" etc.

The actual values computed by ANALYZE would live in a catalog like:

pg_mv_statistics_values
  stvstaid  -- OID of the corresponding pg_mv_statistics row.  Needed?


Definitely needed. How else would you know which MCV list and histogram 
belong together? This works just like in pg_statistic - when both MCV 
and histograms are enabled for the statistic, we first build MCV list, 
then histogram on remaining rows. So we need to pair them.



  stvrelid  -- same as starelid
  stvkeys   -- same as stakeys
#ifdef CATALOG_VARLEN
  stvkind   'd' or 'f' or 's', etc
  stvvalue  the bytea blob
#endif

I think that would be simpler, both conceptually and in terms of code.


I think the main issue here is that it throws away the special data 
types (pg_histogram, pg_mcv, pg_ndistinct, pg_dependencies), which I 
think is a neat idea and would like to keep it. This would throw that 
away, making everything bytea again. I don't like that.




The other angle to consider is planner-side: how does the planner gets
to the values?  I think as far as the planner goes, the first catalog
doesn't matter at all, because a statistics type that has been enabled
but not computed is not interesting at all; planner only cares about the
values in the second catalog (this is why I added stvkeys).  Currently
you're just caching a single pg_mv_statistics row in get_relation_info
(and only if any of the "built" flags is set), which is simple.  With my
proposed change, you'd need to keep multiple pg_mv_statistics_values
rows.

But maybe you already tried something like what I propose and there's a
reason not to do it?



Honestly, I don't see how this improves the situation. We still need to 
cache data for exactly one catalog, so how is that simpler?


The way I see it, it actually makes things more complicated, because now 
we have two catalogs to manage instead of one (e.g. when doing DROP 
STATISTICS, or after ALTER TABLE ... DROP COLUMN).


The 'built' flags may be easily replaced with a check if the bytea-like 
columns are NULL, and the 'enabled' columns may be replaced by the array 
of char, just like you proposed.


That'd give us a single catalog looking like this:

pg_mv_statistics
  starelid
  staname
  stanamespace
  staowner  -- all the above as currently
  staenabledarray of "char" {d,f,s}
  stakeys
  stadeps  (dependencies)
  standist (ndistinct coefficients)
  stamcv   (MCV list)
  stahist  (histogram)

Which is probably a better / simpler structure than the current one.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch proposal

2017-01-30 Thread David Steele

On 1/27/17 3:19 AM, Venkata B Nagothi wrote:

> I will be adding the tests in
> src/test/recovery/t/003_recovery_targets.pl
> . My tests would look more or less
> similar to recovery_target_action. I do not see any of the tests added
> for the parameter recovery_target_action ? Anyways, i will work on
> adding tests for recovery_target_incomplete. 

Do you know when those will be ready?

>  
> 
> 4) I'm not convinced that fatal errors by default are the way to go.
> It's a pretty big departure from what we have now.
> 
> 
> I have changed the code to generate ERRORs instead of FATALs. Which is
> in the patch recoveryStartPoint.patch

I think at this point in recovery any ERROR at all will probably be
fatal.  Once you have some tests in place we'll know for sure.

Either way, the goal would be to keep recovery from proceeding and let
the user adjust their targets.  Since this is a big behavioral change
which will need buy in from the community.

> Also, I don't think it's very intuitive how recovery_target_incomplete
> works.  For instance, if I set recovery_target_incomplete=promote and
> set recovery_target_time, but pick a backup after the specified time,
> then there will be an error "recovery_target_time %s is older..." rather
> than a promotion.
> 
> 
> Yes, that is correct and that is the expected behaviour. Let me explain -

My point was that this combination of options could lead to confusion on
the part of users.  However, I'd rather leave that alone for the time
being and focus on the first patch.

> I have split the patch into two different
> pieces. One is to determine if the recovery can start at all and other
> patch deals with the incomplete recovery situation.

I think the first patch looks promising and I would rather work through
that before considering the second patch.  Once there are tests for the
first patch I will complete my review.

-- 
-David
da...@pgmasters.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra


On 01/30/2017 05:55 PM, Alvaro Herrera wrote:

Minor nitpicks:

Let me suggest to use get_attnum() in CreateStatistics instead of
SearchSysCacheAttName for each column.  Also, we use type AttrNumber for
attribute numbers rather than int16.  Finally in the same function you
have an erroneous ERRCODE_UNDEFINED_COLUMN which should be
ERRCODE_DUPLICATE_COLUMN in the loop that searches for duplicates.

May I suggest that compare_int16 be named attnum_cmp (just to be
consistent with other qsort comparators) and look like
return *((const AttrNumber *) a) - *((const AttrNumber *) b);
instead of memcmp?



Yes, I think this is pretty much what Kyotaro-san pointed out in his 
review. I'll go through the patch and make sure the correct data types 
are used.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-30 Thread Alvaro Herrera

Pavel Stehule wrote:

> I am sending new version - it is based on own executor scan node and
> tuplestore.
> 
> Some now obsolete regress tests removed, some new added.
> 
> The executor code (memory context usage) should be cleaned little bit - but
> other code should be good.

I think you forgot nodeTableFuncscan.c.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra


Hello,

On 01/26/2017 10:03 AM, Ideriha, Takeshi wrote:


Though I pointed out these typoes and so on,
I believe these feedback are less priority compared to the source code itself.

So please work on my feedback if you have time.



I think getting the comments (and docs in general) right is just as 
important as the code. So thank you for your review!


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra

On 01/26/2017 10:43 AM, Dilip Kumar wrote:

histograms
--
+ if (matches[i] == MVSTATS_MATCH_FULL)
+ s += mvhist->buckets[i]->ntuples;
+ else if (matches[i] == MVSTATS_MATCH_PARTIAL)
+ s += 0.5 * mvhist->buckets[i]->ntuples;

Isn't it will be better that take some percentage of the bucket based
on the number of distinct element for partial matching buckets.

I don't think so, for the same reason why ineq_histogram_selectivity() 
in selfuncs.c uses

binfrac = 0.5;

for partial bucket matches - it provides minimum average error. Even if 
we knew the number of distinct items in the bucket, we have no idea what 
the distribution within the bucket looks like. Maybe 99% of the bucket 
are covered by a single distinct value, maybe all the items are squashed 
on one side of the bucket, etc.

Moreover we don't really know the number of distinct values in the 
bucket - we only know the number of distinct items in the sample, and 
only while building the histogram. I don't think it makes much sense to 
estimate the number of distinct items in a bucket, because the buckets 
contain only very few rows so the estimates would be wildly inaccurate.

+static int
+update_match_bitmap_histogram(PlannerInfo *root, List *clauses,
+  int2vector *stakeys,
+  MVSerializedHistogram mvhist,
+  int nmatches, char *matches,
+  bool is_or)
+{
+ int i;

For each clause we are processing all the buckets, can't we use some
data structure which can make multi-dimensions information searching
faster.

>

No, we're not processing all buckets for each clause. We're' only 
processing buckets that were not "ruled out" by preceding clauses. 
That's the whole point of the bitmap.

For example for condition (a=1) AND (b=2), the code will first evaluate 
(a=1) on all buckets, and then (b=2) but only on buckets where (a=1) was 
evaluated as true. Similarly for OR clauses.

>

Something like HTree, RTree, Maybe storing histogram in these formats
will be difficult?

Maybe, but I don't want to do that in the first version. I'm not opposed 
to doing that in the future, if we find out the v1 histograms are not 
efficient (I don't think we will, based on tests I did while working on 
the patch). Support for other histogram implementations is pretty much 
why there is 'type' field in the struct.

For now I think we should stick with the simple implementation.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: test_pg_dump TAP test whitespace cleanup

2017-01-30 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> 
> > > This will be undone by the next perltidy run.
> > 
> > Ugh.
> > 
> > I certainly hope what was there before wasn't the result of a perltidy
> > run as it was quite ugly and inconsistent..
> 
> Maybe it was.  I checked the diff after running perltidy after your
> commit and there are some changes that look like a reversion of your
> changes, but I don't know if there are other changes.

Yeah, I took a look at what perltidy is doing and at least some of that
was stuff I went through and fixed.

The changes to the %tests structure are pretty reasonable and it looks
like it's going to preserve at least some of what I did there because it
doesn't (always, at least) re-combine lines that have been split.
Unfortunately, what's it's doing to the %pgdump_runs structure are
rather annoying because it wants to turn something like:

-   column_inserts => {
-   dump_cmd => [
-   'pg_dump',
-   '-f', "$tempdir/column_inserts.sql",
-   '-a',
-   '--column-inserts',
-   'postgres',
-   ], },

into:

+column_inserts => {
+dump_cmd => [
+'pg_dump', '-f',
+"$tempdir/column_inserts.sql", '-a',
+'--column-inserts','postgres',
+],
+},

I don't mind the change in indentation, of course, but obviously it's a
lot nicer when the '-f $tempdir/column_insert.sql' is all on the same
line since that's clearly a flag with an argument.

I could change all of those to be long versions, eg "--file=blah", to
keep them together.  Not really ideal, but I'm not really worried that
getopt() is going to break either.

Unless anyone has a better idea, I'll see about doing that, and then
run perltidy of it to clean it up.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] patch: function xmltable

2017-01-30 Thread Pavel Stehule

Hi

2017-01-25 15:07 GMT+01:00 Alvaro Herrera :

> Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > >> XMLTABLE is specified by the standard to return multiple rows ... but
> > >> then as far as my reading goes, it is only supposed to be supported in
> > >> the range table (FROM clause) not in the target list.  I wonder if
> > >> this would end up better if we only tried to support it in RT.  I
> asked
> > >> Pavel to implement it like that a few weeks ago, but ...
> >
> > > Right - it makes sense in the FROM list - but then it should be an
> > > executor node, instead of some expression thingy.
> >
> > +1 --- we're out of the business of having simple expressions that
> > return rowsets.
>
> Well, that's it.  I'm not committing this patch against two other
> committers' opinion, plus I was already on the fence about the
> implementation anyway.  I think you should just go with the flow and
> implement this by creating nodeTableexprscan.c.  It's not even
> difficult.
>

I am sending new version - it is based on own executor scan node and
tuplestore.

Some now obsolete regress tests removed, some new added.

The executor code (memory context usage) should be cleaned little bit - but
other code should be good.

Regards

Pavel


> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


xmltable-40.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM's ancillary tasks

2017-01-30 Thread Alvaro Herrera

Thomas Munro wrote:

> About BRIN indexes:  I couldn't find an explanation of why BRIN
> indexes don't automatically create new summary tuples when you insert
> a new tuple in an unsummarised page range.  Is it deferred until
> VACUUM time in order to untangle some otherwise unresolvable
> interlocking or crash safety problem, or could that one day be done?

The reason is performance for the bulk insert case, which we don't want
to slow down; the range summarization is done at a later time by a
background process so that the inserting process is not slowed down by
having to repeatedly re-compute the summary tuple for each heap
insertion.  I think the ideal mechanism would be that a summarization is
signalled somehow (to another process) as soon as an insertion occupies
a block just past the previous unsummarized range.  (If there are many
readers, perhaps it's better to summarize when the range is half full or
something like that.)

We could have a reloption that switches from this behavior to the other
obvious possibility which is to insert a new summary tuple upon the
first heap insertion to an unsummarized range, but ISTM that that
behavior is pessimal.

> Counting inserts seems slightly bogus because you can't tell whether
> those were inserts into an existing summarised block which is
> self-maintaining or not.  At first glance it looks a bit like
> unsummarised ranges can only appear at the end of the table, is that
> right?  If so, couldn't you detect the number of unsummarised BRIN
> blocks just by comparing the highest summarised BRIN block and the
> current heap size?

We don't have mechanism to invalidate the summary of a range thus far,
so yeah we could try to detect it directly as you suggest.

I would like to be able to invalidate these tuples though, for the case
where many tuples are removed from a range (a fresh summarization could
produce tighter limits).  I think this problem does not necessarily
invalidate the idea above.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-30 Thread Nikhil Sontakke

> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record)
>  (errmsg("unexpected timeline ID %u (should be %u)
> in checkpoint record",
>  checkPoint.ThisTimeLineID, ThisTimeLineID)));
>
> +KnownPreparedRecreateFiles(checkPoint.redo);
>  RecoveryRestartPoint();
>  }
> And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC
> files are not flushed to disk with this patch. This is a problem as a
> new restart point is created... Having the flush in CheckpointTwoPhase
> really makes the most sense.

Having CheckPointTwoPhase() do the flush would mean shifting the data
from KnownPreparedList into TwoPhaseState shmem.

I wonder what's the best location for this in the common case when we
do shutdown of standby.  We could add code in XLOG_CHECKPOINT_SHUTDOWN
and XLOG_CHECKPOINT_ONLINE xlog_redo code path.

Regards,
Nikhils




-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-30 Thread Alvaro Herrera

Pavan Deolasee wrote:
> On Wed, Jan 25, 2017 at 10:06 PM, Alvaro Herrera 
> wrote:

> > > +( \
> > > + ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0 \
> > > +)
> > >
> > > +#define HeapTupleHeaderGetRootOffset(tup) \
> > > +( \
> > > + AssertMacro(((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0), \
> > > + ItemPointerGetOffsetNumber(&(tup)->t_ctid) \
> > > +)
> >
> > Interesting stuff; it took me a bit to see why these macros are this
> > way.  I propose the following wording which I think is clearer:
> >
> >   Return whether the tuple has a cached root offset.  We don't use
> >   HeapTupleHeaderIsHeapLatest because that one also considers the slow
> >   case of scanning the whole block.
> 
> Umm, not scanning the whole block, but HeapTupleHeaderIsHeapLatest compares
> t_ctid with the passed in TID and returns true if those matches. To know if
> root lp is cached, we only rely on the HEAP_LATEST_TUPLE flag. Though if
> the flag is set, then it implies latest tuple too.

Well, I'm just trying to fix the problem that when I saw that macro, I
thought "why is this checking the bitmask directly instead of using the
existing IsHeapLatest macro?" when I saw the code.  It turned out that
IsHeapLatest is not just simply comparing the bitmask, but it also does
more expensive processing which is unwanted in this case.  I think the
comment to this macro should explain why the other macro cannot be used.

> > Please flag the macros that have multiple evaluation hazards -- there
> > are a few of them.
> 
> Can you please tell me an example? I must be missing something.

Any macro that uses an argument more than once is subject to multiple
evaluations of that argument; for example, if you pass a function call to
the macro as one of the parameters, the function is called multiple
times.  In many cases this is not a problem because the argument is
always a constant, but sometimes it does become a problem.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

2017-01-30 Thread Tom Lane

BTW, while testing this patch I noticed that the error reports are
a tad redundant:

regression=# \set AUTOCOMMIT foo
unrecognized value "foo" for "AUTOCOMMIT": boolean expected
\set: error while setting variable
regression=# 

The "error while setting variable" message seems entirely content-free.
I think we should drop that and instead establish a rule that SetVariable
should print a message for itself about any failure.  (There are a lot
of call sites that don't check for or print a message, but that's only
because they aren't expecting failure.  If one were to happen, printing
a message doesn't seem unreasonable.)  That would in turn propagate into
an API requirement that var hooks that return FALSE are responsible for
printing a message about the reason, which is why it would be appropriate
to make that change as part of this patch.

Barring objections PDQ, I'll make this change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 157 matches

Mail list logo