Re: [HACKERS] [Proposal] Improvement of GiST page layout

2016-02-08 Thread Heikki Linnakangas

On 09/02/16 07:15, Andrew Borodin wrote:

Hi, hackers!

I want to propose improvement of GiST page layout.

GiST is optimized to reduce disk operations on search queries, for
example, windows search queries in case of R-tree.

I expect that more complicated page layout will help to tradeoff some
of CPU efficiency for disk efficiency.

GiST tree is a balanced tree structure with two kinds of pages:
internal pages and leaf pages. Each tree page contains bunch of tuples
with the same structure. Tuples of an internal page reference other
pages of the same tree, while a leaf page tuples holds heap TIDs.

During execution of search query GiST for each tuple on page invokes
key comparison algorithm with two possible outcomes: 'no' and 'may
be'. 'May be' answer recursively descends search algorithms to
referenced page (in case of internal page) or yields tuple (in case of
leaf page).

Expected tuples count on page is around of tenth to hundreds. This
count is big enough to try to save some cache lines from loading into
CPU during enumeration.

For B-trees during inspection of a page we effectively apply binary
search algorithm, which is not possible in GiST tree.

Let's consider R-tree with arbitrary fan-out f. If a given query will
find exactly one data tuple, it is easily to show that keys comparison
count is minimal if f->e /*round_to_optimal(2.78) == 3*/ (tree have to
review f*h keys, h=logf(N), f*logf(N) is minimal when f->e). Smaller
keys comparison count means less cache lines are touched. So fan-out
reduction means cache pressure reduction (except avg fan-out 2, which
seems to be too small) and less time waiting for RAM. I suppose, all
that reasoning holds true in a cases when not just one tuple will be
found.


GiST comparison operators tend to be quite expensive, so I suspect much 
of the gain is simply from reducing the number of comparisons, rather 
than cache efficiency. The conclusion is the same though, so it doesn't 
matter.



How do we reduce tree fan-out? Obviously, we can’t fill page with just
3 tuples. But we can install small tree-like structure inside one
page. General GiST index has root page. But a page tree should have
“root” layer of tuples. Private (or internal, intermediate, auxiliary,
I can’t pick precise word) tuples have only keys and fixed-size(F)
array of underlying records offsets. Each layer is linked-list. After
page have just been allocated there is only “ground” level of regular
tuples. Eventually record count reaches F-1 and we create new root
layer with two tuples. Each new tuple references half of preexisting
records. Placement of new “ground” tuples on page eventually will
cause internal tuple to split. If there is not enough space to spilt
internal tuple we mark page for whole page-split during next iteration
of insertion algorithms of owning tree. That is why tuple-spilt
happens on F-1 tuples, not on F: if we have no space for splitting, we
just adding reference to last slot. In this algorithm, page split will
cause major page defragmentation: we take root layer, halve it and
place halves on different pages. When half of a data is gone to other
page, restructuration should tend to place records in such a fashion
that accessed together tuples lie together. I think, placing whole
level together is a good strategy.


Yeah, something like that would make a lot of sense. I've actually been 
thinking of this same idea myself over the years, but never got around 
to implement it.


I'm not wedded to that particular layout, it seems a bit complicated. A 
two-level setup within each page might be good enough in practice: there 
would regular tuples, like today, and also "container" tuples, which 
represent a group of regular tuples. Also note that you can freely move 
tuples around in a page, so instead of storing an actual array of 
offsets with the container tuples, you could store just a range of line 
pointers, and keep all the child tuples packed together.


I wonder if it's really a good idea to perform page split by simply 
using the "container" tuples. It certainly saves CPU time when 
splitting, but I'm worried that the split might not be as good as today. 
The mini-groups within the page are grown "organically" as tuples are 
inserted, and the page split algorithm might come up with a better 
grouping if it sees all the tuples at once.



I expect that implementation of this proposal could speed up
insertions into index by 20% and performance of queries by 30% when
all index accommodates in shared buffer. In case of buffer starvation,
when index is accessed through disk this feature will cause 15%
performance degrade since effective page capacity will be smaller.


Not sure how you got those numbers, but the general direction sounds right.


Should this feature be configurable? May be this should be other
access method?


I think this should be included in GiST, and always enabled. Or at least 
enabled by default. While the index will be slightly larger because of 
th

Re: [HACKERS] Existence check for suitable index in advance when concurrently refreshing.

2016-02-08 Thread Fujii Masao
On Thu, Jan 28, 2016 at 1:01 AM, Masahiko Sawada  wrote:
> On Wed, Jan 27, 2016 at 4:42 PM, Fujii Masao  wrote:
>> On Tue, Jan 26, 2016 at 9:33 PM, Masahiko Sawada  
>> wrote:
>>> Hi all,
>>>
>>> In concurrently refreshing materialized view, we check whether that
>>> materialized view has suitable index(unique and not having WHERE
>>> condition), after filling data to new snapshot
>>> (refresh_matview_datafill()).
>>> This logic leads to taking a lot of time until postgres returns ERROR
>>> log if that table doesn't has suitable index and table is large. it
>>> wastes time.
>>> I think we should check whether that materialized view can use
>>> concurrently refreshing or not in advance.
>>
>> +1
>>
>>> The patch is attached.
>>>
>>> Please give me feedbacks.
>
> Thank you for having look at this patch.
>
>> +indexRel = index_open(indexoid, RowExclusiveLock);
>>
>> Can we use AccessShareLock here, instead?
>
> Yeah, I think we can use it. Fixed.
>
>> +if (indexStruct->indisunique &&
>> +IndexIsValid(indexStruct) &&
>> +RelationGetIndexExpressions(indexRel) == NIL &&
>> +RelationGetIndexPredicate(indexRel) == NIL)
>> +hasUniqueIndex = true;
>> +
>> +index_close(indexRel, RowExclusiveLock);
>>
>> In the case where hasUniqueIndex = true, ISTM that we can get out of
>> the loop immediately just after calling index_close(). No?
>
> Fixed.
>
>> +/* Must have at least one unique index */
>> +Assert(foundUniqueIndex);
>>
>> Can we guarantee that there is at least one valid unique index here?
>> If yes, it's better to write the comment about that.
>>
>
> Added.
>
> Attached latest patch. Please review it.

Thanks for updating the patch!
Attached is the updated version of the patch.
I removed unnecessary assertion check and change of source code
that you added, and improved the source comment.
Barring objection, I'll commit this patch.

Regards,

-- 
Fujii Masao
*** a/src/backend/commands/matview.c
--- b/src/backend/commands/matview.c
***
*** 217,222  ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
--- 217,267 
  			 RelationGetRelationName(matviewRel));
  
  	/*
+ 	 * Check that there is a unique index with no WHERE clause on
+ 	 * one or more columns of the materialized view if CONCURRENTLY
+ 	 * is specified.
+ 	 */
+ 	if (concurrent)
+ 	{
+ 		List		*indexoidlist = RelationGetIndexList(matviewRel);
+ 		ListCell 	*indexoidscan;
+ 		bool		hasUniqueIndex = false;
+ 
+ 		foreach(indexoidscan, indexoidlist)
+ 		{
+ 			Oid			indexoid = lfirst_oid(indexoidscan);
+ 			Relation	indexRel;
+ 			Form_pg_index	indexStruct;
+ 
+ 			indexRel = index_open(indexoid, AccessShareLock);
+ 			indexStruct = indexRel->rd_index;
+ 
+ 			if (indexStruct->indisunique &&
+ IndexIsValid(indexStruct) &&
+ RelationGetIndexExpressions(indexRel) == NIL &&
+ RelationGetIndexPredicate(indexRel) == NIL &&
+ indexStruct->indnatts > 0)
+ 			{
+ hasUniqueIndex = true;
+ index_close(indexRel, AccessShareLock);
+ break;
+ 			}
+ 
+ 			index_close(indexRel, AccessShareLock);
+ 		}
+ 
+ 		list_free(indexoidlist);
+ 
+ 		if (!hasUniqueIndex)
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 	 errmsg("cannot refresh materialized view \"%s\" concurrently",
+ 			quote_qualified_identifier(get_namespace_name(RelationGetNamespace(matviewRel)),
+ 	   RelationGetRelationName(matviewRel))),
+ 	 errhint("Create a unique index with no WHERE clause on one or more columns of the materialized view.")));
+ 	}
+ 
+ 	/*
  	 * The stored query was rewritten at the time of the MV definition, but
  	 * has not been scribbled on by the planner.
  	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-02-08 Thread Fabien COELHO


Hi Michaël,


Attached 19-d and 19-e.


+/* return builtin script "name", or fail if not found */
builtin does not sound like correct English to me, but built-in is.


I notice that "man bash" uses "builtin" extensively, so I think it is okay 
like that, but I would be fine as well with "built-in".


I suggest to let it as is unless some native speaker really requires 
"built-in", in which case there would be many places to update, so that 
would be another orthographic-oriented patch:-)


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Amit Kapila
On Tue, Feb 9, 2016 at 5:37 AM, Michael Paquier 
wrote:
>
> On Mon, Feb 8, 2016 at 11:24 PM, Amit Kapila 
wrote:
> > On Mon, Feb 8, 2016 at 12:28 PM, Michael Paquier <
michael.paqu...@gmail.com>
> > wrote:
> >>
> >>
> >> >>   /*
> >> >> +  * Fetch the progress position before taking any WAL insert
lock.
> >> >> This
> >> >> +  * is normally an operation that does not take long, but
leaving
> >> >> this
> >> >> +  * lookup out of the section taken an exclusive lock saves a
> >> >> couple
> >> >> +  * of instructions.
> >> >> +  */
> >> >> + progress_lsn = GetProgressRecPtr();
> >> >
> >> > too long for my taste. How about:
> >> > /* get progress, before acuiring insert locks to shorten locked
section
> >> > */
> >>
> >> Check.
> >>
> >
> > What is the need of holding locks one-by-one during checkpoint when
> > we anyway are going to take lock on all the insertion slots.
>
> A couple of records can slip in while scanning the progress LSN
> through all the locks.
>

Do you see any benefit in allowing checkpoints for such cases considering
bgwriter will anyway take care of logging standby snapshot for such
cases?
I don't think there is any reasonable benefit by improving the situation of
idle-system check for checkpoint other than just including
standbysnapshot WAL record.  OTOH as checkpoint is not so seldom
operation, so allowing such a change should be okay, but I don't see
the need for same.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Pavel Stehule
Hi


> I just rechecked the thread.  In my reading, lots of people argued
> whether it should be called \rotate or \pivot or \crosstab; it seems the
> \crosstabview proposal was determined to be best.  I can support that
> decision.  But once the details were discussed, it was only you and
> Daniel left in the thread; nobody else participated.  While I understand
> that you may think that "silence is consent", what I am afraid of is
> that some committer will look at this two months from now and say "I
> hate this Hcol+ stuff, -1 from me" and send the patch back for syntax
> rework.  IMO it's better to have more people chime in here so that the
> patch that we discuss during the next commitfest is really the best one
> we can think of.
>

I have not a feeling so we did some with Daniel privately. All work was
public (I checked my mailbox) - but what is unhappy - in more mailing list
threads (not sure how it is possible, because subjects looks same). The
discus about the design was public, I am sure. It was relative longer
process, with good progress (from my perspective), because Daniel accepts
and fixed all my objection. The proposed syntax is fully consistent with
other psql commands - hard to create something new there, because psql
parser is pretty limited. Although I am thinking so syntax is good, clean
and useful I am open to discuss about it. Please, try the last design, last
patch - I spent lot of hours (and I am sure so Daniel much more) in
thinking how this can be designed better.


> Also, what about the business of putting "x" if there's no third column?
> Three months from now some Czech psql hacker will say "we should use
> Unicode chars for this" and we will be forever stuck with \pset
> unicode_crosstab_marker to change the character to a ☑ BALLOT BOX WITH
> CZECH.  Maybe we should think that a bit harder -- for example, what
> about just rejecting the case with no third column and forcing the user
> to add a third column with the character they choose?  That way you
> avoid that mess.
>

These features are in category "nice to have". There are no problem to do
in last commitfest or in next release cycle. It is not reason why to block
commit of this feature, and I am sure so lot of users can be pretty happy
with "basic" version of this patch. The all necessary functionality is
there and working. We can continue on development in other cycles, but for
this cycle, I am sure, so all necessary work is done.


>
> > This feature has only small relation to SQL PIVOTING feature - it is just
> > form of view and I am agree with Daniel about sense of this feature.
>
> Yes, I don't disagree there.  Robert Haas and David Fetter also
> expressed their support for psql-side processing, so I think we're good
> there.
>
>
great. Personally, I have not any objection against current state. If
anybody has, please do it early. We can move to forward. This is nice
feature, good patch, and there is not reason why stop here.

Regards

Pavel


> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


[HACKERS] [Proposal] Improvement of GiST page layout

2016-02-08 Thread Andrew Borodin
Hi, hackers!

I want to propose improvement of GiST page layout.

GiST is optimized to reduce disk operations on search queries, for
example, windows search queries in case of R-tree.

I expect that more complicated page layout will help to tradeoff some
of CPU efficiency for disk efficiency.

GiST tree is a balanced tree structure with two kinds of pages:
internal pages and leaf pages. Each tree page contains bunch of tuples
with the same structure. Tuples of an internal page reference other
pages of the same tree, while a leaf page tuples holds heap TIDs.

During execution of search query GiST for each tuple on page invokes
key comparison algorithm with two possible outcomes: 'no' and 'may
be'. 'May be' answer recursively descends search algorithms to
referenced page (in case of internal page) or yields tuple (in case of
leaf page).

Expected tuples count on page is around of tenth to hundreds. This
count is big enough to try to save some cache lines from loading into
CPU during enumeration.

For B-trees during inspection of a page we effectively apply binary
search algorithm, which is not possible in GiST tree.

Let's consider R-tree with arbitrary fan-out f. If a given query will
find exactly one data tuple, it is easily to show that keys comparison
count is minimal if f->e /*round_to_optimal(2.78) == 3*/ (tree have to
review f*h keys, h=logf(N), f*logf(N) is minimal when f->e). Smaller
keys comparison count means less cache lines are touched. So fan-out
reduction means cache pressure reduction (except avg fan-out 2, which
seems to be too small) and less time waiting for RAM. I suppose, all
that reasoning holds true in a cases when not just one tuple will be
found.

How do we reduce tree fan-out? Obviously, we can’t fill page with just
3 tuples. But we can install small tree-like structure inside one
page. General GiST index has root page. But a page tree should have
“root” layer of tuples. Private (or internal, intermediate, auxiliary,
I can’t pick precise word) tuples have only keys and fixed-size(F)
array of underlying records offsets. Each layer is linked-list. After
page have just been allocated there is only “ground” level of regular
tuples. Eventually record count reaches F-1 and we create new root
layer with two tuples. Each new tuple references half of preexisting
records. Placement of new “ground” tuples on page eventually will
cause internal tuple to split. If there is not enough space to spilt
internal tuple we mark page for whole page-split during next iteration
of insertion algorithms of owning tree. That is why tuple-spilt
happens on F-1 tuples, not on F: if we have no space for splitting, we
just adding reference to last slot. In this algorithm, page split will
cause major page defragmentation: we take root layer, halve it and
place halves on different pages. When half of a data is gone to other
page, restructuration should tend to place records in such a fashion
that accessed together tuples lie together. I think, placing whole
level together is a good strategy.

Let’s look how page grows with fan-out factor F=5. RLS – root layer
start, G – ground tuple, Ix – internal tuple of level x.

When we added 3 ground tuples it’s just a ground layer
RLS=0|G G G
Then we place one more tuple and layer splits:
RLS=4|G G G G I0 I0
Each I0 tuple now references two G tuples. We keep placing G tuples.
RLS=4|G G G G I0 I0 G G
And then one of I0 tuples is spitted
RLS=4|G G G G I0 I0 G G G I0
And right after one more I0 split causes new layer
RLS=12|G G G G I0 I0 G G G I0 G I0 I1 I1

And so on, until we have space on a page. In a regular GiST we ran out
of space on page before we insert tuple on page. Now we can run out of
space during insertion. But this will not be fatal, we still will be
able to place ground level tuple. Inner index structure will use
one-extra slot for reference allocation, but next insertion on a page
will deal with it. On a pages marked for split we have to find which
exactly index tuple has run out of extra-slot during split and fix it.

Several years ago I had unsuccessful attempt to implement akin
algorithm in a database engine of a proprietary system. I stuck in the
debug of deletion algorithm and tree condensation, it is in use in
that system. I suppose it was a mistake to defrag page with creeping
heuristic, eventually I dropped the idea and just moved on to actually
important tasks, there is always deadline rush in business. I’m newbie
in Postgres internals. But as I see there is no deletion on GiST page.
So I feel itch to try this feature one more time.

I expect that implementation of this proposal could speed up
insertions into index by 20% and performance of queries by 30% when
all index accommodates in shared buffer. In case of buffer starvation,
when index is accessed through disk this feature will cause 15%
performance degrade since effective page capacity will be smaller.
Should this feature be configurable? May be this should be other
access method?

I need help to 

Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Ashutosh Bapat
Sorry, I was wrong. For public user mapping userid is 0 (InvalidOid), which
is returned as is in UserMapping object. I confused InvalidOid with -1.
Sorry for the confusion.

On Tue, Feb 9, 2016 at 10:21 AM, Tom Lane  wrote:

> Ashutosh Bapat  writes:
> > Sorry to come to this late.
> > The userid being printed is from UserMapping. The new API
> > GetUserMappingById() allows an FDW to get user mapping by its OID. This
> API
> > is intended to be used by FDWs to fetch the user mapping inferred by the
> > core for given join between foreign relations. In such user mapping
> object
> > , userid may be -1 for a public user mapping.
>
> If that is actually how it works, it's broken and I'm going to insist
> on a redesign.  There is nothing anywhere that says that 0x
> is not a valid OID.
>
> regards, tom lane
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] pgbench stats per script & other stuff

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 4:22 AM, Fabien COELHO  wrote:
>> +   /* compute total_weight */
>> +   for (i = 0; i < num_scripts; i++)
>> +   {
>> +   total_weight += sql_script[i].weight;
>> +
>> +   /* detect overflow... */
>> If let as int64, you may want to remove this overflow check, or keep
>> it as int32.
>
>
> I'd rather keep int64, and remove the check.

OK, and you did so. Thanks.

>>> [JSON/YAML]
>>> If you want something else, it would help to provide a sample of what you
>>> expect.
>>
>> You could do that with an additional option here as well:
>> --output-format=normal|yamljson. The tastes of each user is different.
>
> I think that json/yaml-ifying pgbench output is beyond the object of this
> patch, so should be kept out?

Yeah, that's just a free idea that this set of patches does not need
to address. If someone thinks that's worth it, feel free to submit a
patch, perhaps we could add a TODO item on the wiki. Regarding the
output generated by your patch, I think that's fine. Perhaps Alvaro
has other thoughts on the matter. I don't know this part.

>>> Find attached a 18-d which addresses these concerns, and a actualized
>>> 18-e
>>> for the prefix.
>>
>>
>> (I could live without that personally)
>
> Hmmm, I type them and I'm not so good with a keyboard, so "se" is better
> than:
>
> "selct-onlyect-only".

I can understand that feeling.

>> -/* return builtin script "name", or fail if not found */
>> +/* return commands for selected builtin script, if unambiguous */
>> static script_t *
>> findBuiltin(const char *name)
>> This comment needs a refresh. This does not return a set of commands,
>> but the script itself.
>
> Indeed.
>
> Attached 19-d and 19-e.

+/* return builtin script "name", or fail if not found */
builtin does not sound like correct English to me, but built-in is.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Tom Lane
Ashutosh Bapat  writes:
> Sorry to come to this late.
> The userid being printed is from UserMapping. The new API
> GetUserMappingById() allows an FDW to get user mapping by its OID. This API
> is intended to be used by FDWs to fetch the user mapping inferred by the
> core for given join between foreign relations. In such user mapping object
> , userid may be -1 for a public user mapping.

If that is actually how it works, it's broken and I'm going to insist
on a redesign.  There is nothing anywhere that says that 0x
is not a valid OID.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 1:22 PM, Ashutosh Bapat
 wrote:
> The userid being printed is from UserMapping. The new API
> GetUserMappingById() allows an FDW to get user mapping by its OID. This API
> is intended to be used by FDWs to fetch the user mapping inferred by the
> core for given join between foreign relations. In such user mapping object ,
> userid may be -1 for a public user mapping.

I am a bit surprised by this sentence, UserMapping->userid is an Oid,
and those are unsigned. Could you clarify?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 1:16 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Tue, 9 Feb 2016 00:48:57 +0900, Fujii Masao  wrote 
> in 
>> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
>>  wrote:
>> > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
>> >  wrote:
>> >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>> >>  wrote:
>> >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  
>> >>> wrote:
>>  On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>   wrote:
>> > Yes, please let's use the custom language, and let's not care of not
>> > more than 1 level of nesting so as it is possible to represent
>> > pg_stat_replication in a simple way for the user.
>> 
>>  "not" is used twice in this sentence in a way that renders me not able
>>  to be sure that I'm not understanding it not properly.
>> >>>
>> >>> 4 times here. Score beaten.
>> >>>
>> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>> >>> to only support configurations up to one level of nested objects, like
>> >>> that:
>> >>> 2[node1, node2, node3]
>> >>> node1, 2[node2, node3], node3
>> >>> In short, we could restrict things so as we cannot define a group of
>> >>> nodes within an existing group.
>> >>
>> >> No, actually, that's stupid. Having up to two nested levels makes more
>> >> sense, a quite common case for this feature being something like that:
>> >> 2{node1,[node2,node3]}
>> >> In short, sync confirmation is waited from node1 and (node2 or node3).
>> >>
>> >> Flattening groups of nodes with a new catalog will be necessary to
>> >> ease the view of this data to users:
>> >> - group name?
>> >> - array of members with nodes/groups
>> >> - group type: quorum or priority
>> >> - number of items to wait for in this group
>> >
>> > So, here are some thoughts to make that more user-friendly. I think
>> > that the critical issue here is to properly flatten the meta data in
>> > the custom language and represent it properly in a new catalog,
>> > without messing up too much with the existing pg_stat_replication that
>> > people are now used to for 5 releases since 9.0. So, I would think
>> > that we will need to have a new catalog, say
>> > pg_stat_replication_groups with the following things:
>> > - One line of this catalog represents the status of a group or of a single 
>> > node.
>> > - The status of a node/group is either sync or potential, if a
>> > node/group is specified more than once, it may be possible that it
>> > would be sync and potential depending on where it is defined, in which
>> > case setting its status to 'sync' has the most sense. If it is in sync
>> > state I guess.
>> > - Move sync_priority and sync_state, actually an equivalent from
>> > pg_stat_replication into this new catalog, because those represent the
>> > status of a node or group of nodes.
>> > - group name, and by that I think that we had perhaps better make
>> > mandatory the need to append a name with a quorum or priority group.
>> > The group at the highest level is forcibly named as 'top', 'main', or
>> > whatever if not directly specified by the user. If the entry is
>> > directly a node, use the application_name.
>> > - Type of group, quorum or priority
>> > - Elements in this group, an element can be a group name or a node
>> > name, aka application_name. If group is of type priority, the elements
>> > are listed in increasing order. So the elements with lower priority
>> > get first, etc. We could have one column listing explicitly a list of
>> > integers that map with the elements of a group but it does not seem
>> > worth it, what users would like to know is what are the nodes that are
>> > prioritized. This covers the former 'priority' field of
>> > pg_stat_replication.
>> >
>> > We may have a good idea of how to define a custom language, still we
>> > are going to need to design a clean interface at catalog level more or
>> > less close to what is written here. If we can get a clean interface,
>> > the custom language implemented, and TAP tests that take advantage of
>> > this user interface to check the node/group statuses, I guess that we
>> > would be in good shape for this patch.
>> >
>> > Anyway that's not a small project, and perhaps I am over-complicating
>> > the whole thing.
>> >
>> > Thoughts?
>>
>> I agree that we would need something like such new view in the future,
>> however it seems too late to work on that for 9.6 unfortunately.
>> There is only one CommitFest left. Let's focus on very simple case, i.e.,
>> 1-level priority list, now, then we can extend it to cover other cases.
>>
>> If we can commit the simple version too early and there is enough
>> time before the date of feature freeze, of course I'm happy to review
>> the extended version like you proposed, for 9.6.
>
> I agree to Fujii-san. There would be many of convenient gadgets
> around this and they are completely welcome, but having
> fundamental functionality in 9.6 seems to be far benetifical for
> most of us.

Hm. R

Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Ashutosh Bapat
Sorry to come to this late.

The userid being printed is from UserMapping. The new API
GetUserMappingById() allows an FDW to get user mapping by its OID. This API
is intended to be used by FDWs to fetch the user mapping inferred by the
core for given join between foreign relations. In such user mapping object
, userid may be -1 for a public user mapping. I think using %u for -1 will
print it as largest integer. Would that create confusion for users?

On Mon, Feb 8, 2016 at 9:37 PM, Tom Lane  wrote:

> Etsuro Fujita  writes:
> > Here is a patch to use %u not %d to print umid and userid.
>
> Pushed, thanks.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Amit Langote

Hi Suraj,

On 2016/02/09 12:16, kharagesuraj wrote:
> Hello,
> 
> 
>>> I agree with first version, and attached the updated patch which are
>>> modified so that it supports simple multiple sync replication you
>>> suggested.
>>> (but test cases are not included yet.)
> 
> I have tried for some basic in-built test cases for multisync rep.
> I have created one patch over Michael's  href="http://www.postgresql.org/message-id/CAB7nPqTEqou=xryrgsga13qw1xxssd6tfhz-sm_j3egdvso...@mail.gmail.com";>patch
>  patch.
> Still it is in progress.
> Please have look and correct me if i am wrong and suggest remaining test 
> cases.
> 
> recovery_test_suite_with_multisync.patch (36K) 
> 

Thanks for creating the patch. Sorry to nitpick but as has been brought up
before, it's better to send patches as email attachments (that is, not as
a links to external sites).

Also, it would be helpful if your patch is submitted as a diff over
applying Michael's patch. That is, only the stuff specific to testing the
multiple sync feature and let the rest be taken care of by Michael's base
patch.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 9 Feb 2016 00:48:57 +0900, Fujii Masao  wrote in 

> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
>  wrote:
> > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
> >  wrote:
> >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
> >>  wrote:
> >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  
> >>> wrote:
>  On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>   wrote:
> > Yes, please let's use the custom language, and let's not care of not
> > more than 1 level of nesting so as it is possible to represent
> > pg_stat_replication in a simple way for the user.
> 
>  "not" is used twice in this sentence in a way that renders me not able
>  to be sure that I'm not understanding it not properly.
> >>>
> >>> 4 times here. Score beaten.
> >>>
> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
> >>> to only support configurations up to one level of nested objects, like
> >>> that:
> >>> 2[node1, node2, node3]
> >>> node1, 2[node2, node3], node3
> >>> In short, we could restrict things so as we cannot define a group of
> >>> nodes within an existing group.
> >>
> >> No, actually, that's stupid. Having up to two nested levels makes more
> >> sense, a quite common case for this feature being something like that:
> >> 2{node1,[node2,node3]}
> >> In short, sync confirmation is waited from node1 and (node2 or node3).
> >>
> >> Flattening groups of nodes with a new catalog will be necessary to
> >> ease the view of this data to users:
> >> - group name?
> >> - array of members with nodes/groups
> >> - group type: quorum or priority
> >> - number of items to wait for in this group
> >
> > So, here are some thoughts to make that more user-friendly. I think
> > that the critical issue here is to properly flatten the meta data in
> > the custom language and represent it properly in a new catalog,
> > without messing up too much with the existing pg_stat_replication that
> > people are now used to for 5 releases since 9.0. So, I would think
> > that we will need to have a new catalog, say
> > pg_stat_replication_groups with the following things:
> > - One line of this catalog represents the status of a group or of a single 
> > node.
> > - The status of a node/group is either sync or potential, if a
> > node/group is specified more than once, it may be possible that it
> > would be sync and potential depending on where it is defined, in which
> > case setting its status to 'sync' has the most sense. If it is in sync
> > state I guess.
> > - Move sync_priority and sync_state, actually an equivalent from
> > pg_stat_replication into this new catalog, because those represent the
> > status of a node or group of nodes.
> > - group name, and by that I think that we had perhaps better make
> > mandatory the need to append a name with a quorum or priority group.
> > The group at the highest level is forcibly named as 'top', 'main', or
> > whatever if not directly specified by the user. If the entry is
> > directly a node, use the application_name.
> > - Type of group, quorum or priority
> > - Elements in this group, an element can be a group name or a node
> > name, aka application_name. If group is of type priority, the elements
> > are listed in increasing order. So the elements with lower priority
> > get first, etc. We could have one column listing explicitly a list of
> > integers that map with the elements of a group but it does not seem
> > worth it, what users would like to know is what are the nodes that are
> > prioritized. This covers the former 'priority' field of
> > pg_stat_replication.
> >
> > We may have a good idea of how to define a custom language, still we
> > are going to need to design a clean interface at catalog level more or
> > less close to what is written here. If we can get a clean interface,
> > the custom language implemented, and TAP tests that take advantage of
> > this user interface to check the node/group statuses, I guess that we
> > would be in good shape for this patch.
> >
> > Anyway that's not a small project, and perhaps I am over-complicating
> > the whole thing.
> >
> > Thoughts?
> 
> I agree that we would need something like such new view in the future,
> however it seems too late to work on that for 9.6 unfortunately.
> There is only one CommitFest left. Let's focus on very simple case, i.e.,
> 1-level priority list, now, then we can extend it to cover other cases.
> 
> If we can commit the simple version too early and there is enough
> time before the date of feature freeze, of course I'm happy to review
> the extended version like you proposed, for 9.6.

I agree to Fujii-san. There would be many of convenient gadgets
around this and they are completely welcome, but having
fundamental functionality in 9.6 seems to be far benetifical for
most of us.

At least the extensible syntax is fixed, internal structures can
be gradually exnteded along with syntactical enhancement. Over
three levels of definition or group name

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Michael Paquier
On Tue, Feb 9, 2016 at 12:16 PM, kharagesuraj 
wrote:

> Hello,
>
>
>
>
>
> >> I agree with first version, and attached the updated *patch* which are
> >> modified so that it supports simple multiple sync replication you
> >>suggested.
> >> (but test cases are not included yet.)
>
>
>
> I have tried for some basic in-built test cases for multisync rep.
>
> I have created one patch over Michael's http://www.postgresql.org/message-id/CAB7nPqTEqou=[hidden email]
> ">patch patch.
>
> Still it is in progress.
>
> Please have look and correct me if i am wrong and suggest remaining test
> cases.
>

So the interesting part of this patch is 006_sync_rep.pl. I think that you
had better build something on top of my patch as a separate patch. This
would make things clearer.

+my $result = $node_master->psql('postgres', "select application_name,
sync_state from pg_stat_replication;");
+print "$result \n";
+is($result, "standby_1|sync\nstandby_2|sync\nstandby_3|potential",
'checked for sync standbys state initially');
Now regarding the test, you visibly got the idea, though I think that we'd
want to update a bit the parameters of postgresql.conf and re-run those
queries a couple of times, that's cheaper than having to re-create new
cluster nodes all the time, so just create a base, then switch s_s_names a
bit, and query pg_stat_replication, and you are already doing the latter.

Also, please attach patches directly to your emails. When loading something
on nabble this is located only there and not within postgresql.org which
would be annoying if nabble disappears at some point. You would also want
to use directly an email client and interact with the community mailing
lists this way instead of going through the nabble's forum-like interface
(never used it, not really willing to use it, but I guess that it is
similar to that).

I am attaching what you posted on this email for the archive's sake.
-- 
Michael


recovery_test_suite_with_multisync.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Re-Writes

2016-02-08 Thread Amit Kapila
On Mon, Feb 8, 2016 at 8:16 PM, Andres Freund  wrote:
>
> On 2016-02-08 10:38:55 +0530, Amit Kapila wrote:
> > I think deciding it automatically without user require to configure it,
> > certainly has merits, but what about some cases where user can get
> > benefits by configuring themselves like the cases where we use
> > PG_O_DIRECT flag for WAL (with o_direct, it will by bypass OS
> > buffers and won't cause misaligned writes even for smaller chunk sizes
> > like 512 bytes or so).  Some googling [1] reveals that other databases
> > also provides user with option to configure wal block/chunk size (as
> > BLOCKSIZE), although they seem to decide chunk size based on
> > disk-sector size.
>
> FWIW, you usually can't do that small writes with O_DIRECT. Usually it
> has to be 4KB (pagesize) sized, aligned (4kb again) writes. And on
> filesystems that do support doing such writes, they essentially fall
> back to doing buffered IO.
>

I have not observed this during the tests (observation is based on the
fact that whenever there is a use of OS buffer cache, writing in smaller
chunks (lesser than 4K) leads to reads and in-turn decrease the
performance). I don't see such an implication even in documentation.

> > An additional thought, which is not necessarily related to this patch
is,
> > if user chooses and or we decide to write in 512 bytes sized chunks,
> > which is usually a disk sector size, then can't we think of avoiding
> > CRC for each record for such cases, because each WAL write in
> > it-self will be atomic.  While reading, if we process in wal-chunk-sized
> > units, then I think it should be possible to detect end-of-wal based
> > on data read.
>
> O_DIRECT doesn't give any useful guarantees to do something like the
> above. It doesn't have any ordering or durability implications. You
> still need to do fdatasyncs and such.
>

It doesn't need to, if we use o_sync flag which we always use whenever
we use O_DIRECT mode during WAL writes.


> Besides, with the new CRC implications, that doesn't really seem like
> such a large win anyway.
>

I haven't check this till now that how much big win we can get if we
can avoid CRC's and still provide same reliability, but I think it can
certainly save CPU instructions both during writes and replay and
performance must be better than current.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Tom Lane
Noah Misch  writes:
> On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote:
>> We've seen variants
>> on this theme on half a dozen machines just in the past week --- and it
>> seems to mostly happen in 9.5 and HEAD, which is fishy.

> It has been affecting only the four AIX animals, which do share hardware.
> (Back in 2015 and once in 2016-01, it did affect axolotl and shearwater.)

Certainly your AIX critters have shown this a bunch, but here's another
current example:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2016-02-08%2014%3A49%3A23

> That's reasonable.  If you would like higher-fidelity data, I can run loops of
> "pg_ctl -w start; make installcheck; pg_ctl -t900 -w stop", and I could run
> that for HEAD and 9.2 simultaneously.  A day of logs from that should show
> clearly if HEAD is systematically worse than 9.2.

That sounds like a fine plan, please do it.

> So, I wish to raise the timeout for those animals.  Using an environment
> variable was a good idea; it's one less thing for test authors to remember.
> Since the variable affects a performance-related fudge factor rather than
> change behavior per se, I'm less skittish than usual about unintended
> consequences of dynamic scope.  (With said unintended consequences in mind, I
> made "pg_ctl register" ignore PGCTLTIMEOUT rather than embed its value into
> the service created.)

While this isn't necessarily a bad idea in isolation, the current
buildfarm scripts explicitly specify a -t value to pg_ctl stop, which
I would not expect an environment variable to override.  So we need
to fix the buildfarm script to allow the timeout to be configurable.
I'm not sure if there would be any value-add in having pg_ctl answer
to an environment variable once we've done that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Re-Writes

2016-02-08 Thread Amit Kapila
On Mon, Feb 8, 2016 at 8:11 PM, Robert Haas  wrote:
>
> On Mon, Feb 8, 2016 at 12:08 AM, Amit Kapila 
wrote:
> > I think deciding it automatically without user require to configure it,
> > certainly has merits, but what about some cases where user can get
> > benefits by configuring themselves like the cases where we use
> > PG_O_DIRECT flag for WAL (with o_direct, it will by bypass OS
> > buffers and won't cause misaligned writes even for smaller chunk sizes
> > like 512 bytes or so).  Some googling [1] reveals that other databases
> > also provides user with option to configure wal block/chunk size (as
> > BLOCKSIZE), although they seem to decide chunk size based on
> > disk-sector size.
>
> Well, if you can prove that we need that flexibility, then we should
> have a GUC.  Where's the benchmarking data to support that conclusion?
>

It is not posted as some more work is needed to complete the
benchmarks results when PG_O_DIRECT is used (mainly with
open_sync and open_datasync).  I will do so.  But, I think main thing
which needs to be taken care is that as smaller-chunk sized writes are
useful only in some cases, we need to ensure that users should not
get baffled by the same.  So there are multiple ways to provide the same,

a) at the startup, we ensure that if the user has set smaller chunk-size
(other than 4KB which will be default as decided based on the way
described by you upthread at configure time) and it can use PG_O_DIRECT
as we decide in get_sync_bit(), then allow it, otherwise either return an
error or just set it to default which is 4KB.

b) mention in docs that it better not to tinker with wal_chunk_size guc
unless you have other relevant settings (like wal_sync_method =
open_sync or open_datasync) and wal_level as default.

c) there is yet another option which is, let us do with 4KB sized
chunks for now as the benefit for not doing so only in sub-set of
cases we can support.

The reason why I think it is beneficial to provide an option of writing in
smaller chunks is that it could lead to reduce the amount of re-writes
by higher percentage where they can be used.  For example at 4KB,
there is ~35% reduction, similarly at smaller chunks it could gives us
saving unto 50% or 70% depending on the chunk_size.


>
> > An additional thought, which is not necessarily related to this patch
is,
> > if user chooses and or we decide to write in 512 bytes sized chunks,
> > which is usually a disk sector size, then can't we think of avoiding
> > CRC for each record for such cases, because each WAL write in
> > it-self will be atomic.  While reading, if we process in wal-chunk-sized
> > units, then I think it should be possible to detect end-of-wal based
> > on data read.
>
> Gosh, taking CRCs off of WAL records sounds like a terrible idea.  I'm
> not sure why you think that writing in sector-sized chunks would make
> that any more safe, because to me it seems like it wouldn't.  But even
> if it does, it's hard to believe that we don't derive some reliability
> from CRCs that we would lose without them.
>

I think here the point is not about more-safety, rather it is about whether
writing in disk-sector sizes gives equal reliability as CRC's, because
if it does, then not doing crc calculation for each record both while
writing and during replay can save CPU and should intern lead to better
performance.  Now, the reason why I thought it could give equal-reliability
is that as disk-sector writes are atomic, so it should buy us that
reliability.
I admit that much more analysis/research is required before doing that
and we can do that later if it proves to be any valuable in terms of
performance and reliability.  Here, I mentioned to say that writing in
smaller chunks have other potential benefits.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Noah Misch
On Mon, Feb 08, 2016 at 02:15:48PM -0500, Tom Lane wrote:
> Of late, by far the majority of the random-noise failures we see in the
> buildfarm have come from failure to shut down the postmaster in a
> reasonable timeframe.

> We've seen variants
> on this theme on half a dozen machines just in the past week --- and it
> seems to mostly happen in 9.5 and HEAD, which is fishy.

It has been affecting only the four AIX animals, which do share hardware.
(Back in 2015 and once in 2016-01, it did affect axolotl and shearwater.)  I
agree the concentration on 9.5 and HEAD is suspicious; while those branches
get the most buildfarm runs, that factor by itself doesn't explain the
distribution of failures among versions.

> What I'd like to do to investigate this is put in a temporary HEAD-only
> patch that makes ShutdownXLOG() and its subroutines much chattier about
> how far they've gotten and what time it is, and also makes pg_ctl print
> out the current time if it gives up waiting.  A few failed runs with
> that in place will at least allow us to confirm or deny whether it's
> just that the shutdown checkpoint is sometimes really slow, or whether
> there's a bug lurking.
> 
> Any objections?  Anybody have another idea for data to collect?

That's reasonable.  If you would like higher-fidelity data, I can run loops of
"pg_ctl -w start; make installcheck; pg_ctl -t900 -w stop", and I could run
that for HEAD and 9.2 simultaneously.  A day of logs from that should show
clearly if HEAD is systematically worse than 9.2.  By the way, you would
almost surely qualify for an account on this machine.


I had drafted the following message and patch last week, and I suppose it
belongs in this thread:

On Mon, Oct 12, 2015 at 06:41:06PM -0400, Tom Lane wrote:
> I'm not sure if this will completely fix our problems with "pg_ctl start"
> related buildfarm failures on very slow critters.  It does get rid of the
> hard wired 5-second timeout, but the 60-second timeout could still be an
> issue.  I think Noah was considering a patch to allow that number to be
> raised.  I'd be in favor of letting pg_ctl accept a default timeout length
> from an environment variable, and then the slower critters could be fixed
> by adjusting their buildfarm configurations.

Your commit 6bcce25 made src/bin test suites stop failing due to pg_ctl
startup timeouts, but other suites have been failing on the AIX buildfarm zoo
due to slow shutdown.  Example taking 72s to even reach ShutdownXLOG():
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sungazer&dt=2016-01-15%2012%3A43%3A00

So, I wish to raise the timeout for those animals.  Using an environment
variable was a good idea; it's one less thing for test authors to remember.
Since the variable affects a performance-related fudge factor rather than
change behavior per se, I'm less skittish than usual about unintended
consequences of dynamic scope.  (With said unintended consequences in mind, I
made "pg_ctl register" ignore PGCTLTIMEOUT rather than embed its value into
the service created.)

Thanks,
nm
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index eaa0cc8..6ceb781 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -362,7 +362,9 @@ PostgreSQL documentation
   

 The maximum number of seconds to wait when waiting for startup or
-shutdown to complete.  The default is 60 seconds.
+shutdown to complete.  Defaults to the value of the
+PGCTLTIMEOUT environment variable or, if not set, to 60
+seconds.

   
  
@@ -487,6 +489,17 @@ PostgreSQL documentation
 
   

+PGCTLTIMEOUT
+
+
+ 
+  Default limit on the number of seconds to wait when waiting for startup
+  or shutdown to complete.  If not set, the default is 60 seconds.
+ 
+
+   
+
+   
 PGDATA
 
 
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 9da38c4..bae6c22 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -72,6 +72,7 @@ typedef enum
 static bool do_wait = false;
 static bool wait_set = false;
 static int wait_seconds = DEFAULT_WAIT;
+static bool wait_seconds_arg = false;
 static bool silent_mode = false;
 static ShutdownMode shutdown_mode = FAST_MODE;
 static int sig = SIGINT;   /* default */
@@ -1431,7 +1432,8 @@ pgwin32_CommandLine(bool registration)
if (registration && do_wait)
appendPQExpBuffer(cmdLine, " -w");
 
-   if (registration && wait_seconds != DEFAULT_WAIT)
+   /* Don't propagate a value from an environment variable. */
+   if (registration && wait_seconds_arg && wait_seconds != DEFAULT_WAIT)
appendPQExpBuffer(cmdLine, " -t %d", wait_seconds);
 
if (registration && silent_mode)
@@ -2128,6 +2130,7 @@ main(int argc, char **argv)
{NULL, 0, NULL, 0}
};
 
+   char   *env_wait;
int option_

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread kharagesuraj
Hello,


>> I agree with first version, and attached the updated patch which are
>> modified so that it supports simple multiple sync replication you
>>suggested.
>> (but test cases are not included yet.)

I have tried for some basic in-built test cases for multisync rep.
I have created one patch over Michael's http://www.postgresql.org/message-id/CAB7nPqTEqou=xryrgsga13qw1xxssd6tfhz-sm_j3egdvso...@mail.gmail.com";>patch
 patch.
Still it is in progress.
Please have look and correct me if i am wrong and suggest remaining test cases.

Regards
Suraj Kharage


If you reply to this email, your message will be added to the discussion below:
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886259.html
This email was sent by 
kharagesuraj
 (via Nabble)
To receive all replies by email, subscribe to this 
discussion

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

recovery_test_suite_with_multisync.patch (36K) 





--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886503.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] 2016-01 Commitfest

2016-02-08 Thread Amit Langote
On 2016/02/09 6:46, Magnus Hagander wrote:
> On Mon, Feb 8, 2016 at 10:19 PM, Alvaro Herrera 
> wrote:
> 
>> Hi everybody,
>>
>> I just closed the last few remaining items in the commitfest.  This is
>> the final summary:
>>
>>  Committed: 32.
>>  Moved to next CF: 32.
>>  Rejected: 2.
>>  Returned with Feedback: 33.
>> Total: 99.
>>
>> I think we did a fairly decent job this time around: we only passed a
>> third of the patches to the upcoming commitfest, which seems pretty
>> decent.  The number of patches that we moved unreviewed was very small,
>> which is even better -- hopefully we're not letting too many people
>> down.
>>
>> I'm closing this commitfest now.  We even have three weeks before the
>> next one starts, so everybody can take a break for once!  Yay!
> 
> 
> Thanks for taking on the thankless task of pushing things along!

+1, thank you very much!

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-08 Thread Michael Paquier
On Mon, Feb 8, 2016 at 11:24 PM, Amit Kapila  wrote:
> On Mon, Feb 8, 2016 at 12:28 PM, Michael Paquier 
> wrote:
>>
>>
>> >>   /*
>> >> +  * Fetch the progress position before taking any WAL insert lock.
>> >> This
>> >> +  * is normally an operation that does not take long, but leaving
>> >> this
>> >> +  * lookup out of the section taken an exclusive lock saves a
>> >> couple
>> >> +  * of instructions.
>> >> +  */
>> >> + progress_lsn = GetProgressRecPtr();
>> >
>> > too long for my taste. How about:
>> > /* get progress, before acuiring insert locks to shorten locked section
>> > */
>>
>> Check.
>>
>
> What is the need of holding locks one-by-one during checkpoint when
> we anyway are going to take lock on all the insertion slots.

A couple of records can slip in while scanning the progress LSN
through all the locks.

> + * to not rely on taking an exclusive lock an all the WAL insertion locks,
>
> /an all/on all

Nice catch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Andres Freund
Hi!

Thanks for the answer. Sounds good.

On 2016-02-08 18:47:18 -0500, Robert Haas wrote:
> and if I'd gone out of my way to say "hey, everybody, here's a patch
> that you might want to object to" I'm sure I could have found some
> volunteers to do just that.  But, you know, that's not really what I
> want.

Sometimes I wonder if three shooting-from-the-hip answers shouldn't cost
a jog around the block or such (of which I'm sometimes guilty as
well!). Wouldn't just help the on-list volume, but also our collective
health ;)

> Unless you or Noah want to take a hand, I don't expect to get that
> sort of review.  Now, that having been said, I think your frustration
> with the way I handled it is somewhat justified, and since you are not
> arguing for a revert I'm not sure what I can do except try not to let
> my frustration get in the way next time.  Which I will try to do.

FWIW, I do hope to put more time into reviewing parallelism stuff in the
coming weeks. It's hard to balance all that one likes to do.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 5:27 PM, Andres Freund  wrote:
>> > contentious issue, a few days after the initial post. Hm. Not sure how
>> > you'd react if you weren't the author.
>>
>> Probably not very well.  Do you want me to revert it?
>
> No. I want(ed) to express that I am not comfortable with how this got
> in. My aim wasn't to generate a flurry of responses with everybody
> piling on, or anything like that. But it's unfortunately hard to
> avoid. I wish I knew a way, besides only sending private mails. Which I
> don't think is a great approach either.
>
> I do agree that we need something to tackle this problem, and that this
> quite possibly is the least bad way to do this. And certainly the only
> one that's been implemented and posted with any degree of completeness.
>
> But even given the last paragraph, posting a complex new patch in a
> somewhat related thread, and then pushing it 5 days later is pretty darn
> quick.

Sorry.  I understand your discomfort, and you're probably right.  I'll
try to handle it better next time.  I think my frustration with the
process got the better of me a little bit here.  This patch may very
well not be perfect, but it's sure as heck better than doing nothing,
and if I'd gone out of my way to say "hey, everybody, here's a patch
that you might want to object to" I'm sure I could have found some
volunteers to do just that.  But, you know, that's not really what I
want.  What I want is somebody to do a detailed review and help me fix
whatever the problems the patch may have.  And ideally, I'd like that
person to understand that you can't have parallel query without doing
something in this area - which I think you do, but certainly not
everybody probably did - and that a lot of simplistic, non-invasive
ideas for how to handle this are going to be utterly inadequate in
complex cases.  Unless you or Noah want to take a hand, I don't expect
to get that sort of review.  Now, that having been said, I think your
frustration with the way I handled it is somewhat justified, and since
you are not arguing for a revert I'm not sure what I can do except try
not to let my frustration get in the way next time.  Which I will try
to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: make NOTIFY list de-duplication optional

2016-02-08 Thread Merlin Moncure
On Mon, Feb 8, 2016 at 2:33 PM, Filip Rembiałkowski
 wrote:
> Here is my next try, after suggestions from -perf and -hackers list:
>
> * no GUC
>
> * small addition to NOTIFY grammar: NOTIFY ALL/DISTINCT
>
> * corresponding, 3-argument version of pg_notify(text,text,bool)

This is all sounding pretty good.   I wonder if the third argument
should be a boolean however.  If we make it 'text, 'send mode',
instead, we could leave some room for more specialization of the
queuing behavior.

For example, we've had a couple of requests over the years to have an
'immediate' mode which dumps the notification immediately to the
client without waiting for tx commit. This may or may not be a good
idea, but if it was ultimately proved to be, it could be introduced as
an alternate mode without adding an extra function.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Corey Huinker
>
> I think that's an argument to enable it till at least beta1. Let's
>> change the default, and add an item to the open items list to reconsider
>> then.
>>
>>
>
+1 during the beta, +0.95 for default thereafter.

I think that most databases in the past have defaulted to single-core
unless otherwise stated because machines that had multiple cores were
uncommon, and the query that could intelligently use parallel was even more
uncommon. So for them, the name of the game was "plan stability".

Machines are architected to be multicore now, and that will be the norm
going forward, as will larger workloads that can easily overwhelm a single
CPU. So I think Postgres should just enable parallel out of the box.

Having said that, it seems like the sort of thing I'd want set-able on a
per-user basis.

ALTER ROLE overly_needy_web_client SET max_parallel_degree = 1;
ALTER ROLE moar_powarrr SET max_parallel_degree = 32;

And of course this is my chance to re-ask that we not block the possibility
of one day being able to set this value relative to the number of cores
available on the machine, i.e. this user can have a parallel degree 2x the
number of CPUs, this one can only have 0.25x as many CPUs. It would be nice
to have our configurations adapt with the hardware.


Re: [HACKERS] 2016-01 Commitfest

2016-02-08 Thread David Steele
On 2/8/16 4:19 PM, Alvaro Herrera wrote:
> 
> I'm closing this commitfest now.  We even have three weeks before the
> next one starts, so everybody can take a break for once!  Yay!

Many thanks, Álvaro!

By the way, I would be happy to manage the next commitfest.  I've been
watching the process for a while and I think I have a good handle on
what to do. I'm nothing if not organized.

I know this is the last commitfest of the 9.6 development cycle so I'll
understand if the community does not think I have enough experience, but
to my mind it doesn't look all that hard, but rather time-consuming and
tedious.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Peter Geoghegan
On Mon, Feb 8, 2016 at 2:35 PM, Andres Freund  wrote:
> I think having a public git tree, that contains the current state, is
> greatly helpful for that. Just announce that you're going to screw
> wildly with history, and that you're not going to be terribly careful
> about commit messages.  That means observers can just do a fetch and a
> reset --hard to see the absolutely latest and greatest.  By all means
> post a series to the list every now and then, but I think for minor
> changes it's perfectly sane to say 'pull to see the fixups for the
> issues you noticed'.

I would really like for there to be a way to do that more often. It
would be a significant time saver, because it removes problems with
minor bitrot.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Andres Freund
On 2016-02-08 15:18:13 -0500, Robert Haas wrote:
> I agree that you had to be pretty deeply involved in that thread to
> follow everything that was going on.  But it's not entirely fair to
> say that it was impossible for anyone else to get involved.   Both
> Amit and I, mostly Amit, posted directions at various times saying:
> here is the sequence of patches that you currently need to apply as of
> this time.  There was not a heck of a lot of evidence that anyone was
> doing that, though, though I think a few people did, and towards the
> end things changed very quickly as I committed patches in the series.
> We certainly knew what each other were doing and not because of some
> hidden off-list collaboration that we kept secret from the community -
> we do talk every week, but almost all of our correspondence on those
> patches was on-list.

I think having a public git tree, that contains the current state, is
greatly helpful for that. Just announce that you're going to screw
wildly with history, and that you're not going to be terribly careful
about commit messages.  That means observers can just do a fetch and a
reset --hard to see the absolutely latest and greatest.  By all means
post a series to the list every now and then, but I think for minor
changes it's perfectly sane to say 'pull to see the fixups for the
issues you noticed'.


> I think it's an inherent peril of complicated patch sets that people
> who are not intimately involved in what is going on will have trouble
> following just because it takes a lot of work.

True. But it becomes doubly hard if there's no up-to-date high level
design overview somewhere outside $sizeable_brain. I know it sucks to
write these, believe me. Especially because one definitely feels that
nobody is reading those.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Andres Freund
Hi Robert,

On 2016-02-08 13:45:37 -0500, Robert Haas wrote:
> > I realize that this stuff has all been brewing long, and that there's
> > still a lot to do. So you gotta keep moving. And I'm not sure that
> > there's anything wrong or if there's any actually better approach. But
> > pushing an unreviewed, complex patch, that originated in a thread
> > orginally about different relatively small/mundane items, for a
> > contentious issue, a few days after the initial post. Hm. Not sure how
> > you'd react if you weren't the author.
> 
> Probably not very well.  Do you want me to revert it?

No. I want(ed) to express that I am not comfortable with how this got
in. My aim wasn't to generate a flurry of responses with everybody
piling on, or anything like that. But it's unfortunately hard to
avoid. I wish I knew a way, besides only sending private mails. Which I
don't think is a great approach either.

I do agree that we need something to tackle this problem, and that this
quite possibly is the least bad way to do this. And certainly the only
one that's been implemented and posted with any degree of completeness.

But even given the last paragraph, posting a complex new patch in a
somewhat related thread, and then pushing it 5 days later is pretty darn
quick.


> I mean, look.  [explanation why we need the infrastructure].  Do I really
> think anybody was going to spend the time to understand deadlock.c
> well enough to verify my changes?  No, I don't.  What I think would
> have happened is that the patch would have sat around like an
> albatross around my neck - totally ignored by everyone - until the end
> of the last CF, and then the discussion would have gone one of three
> ways:

Yes, believe me, I really get that. It's awfully hard to get substantial
review for pieces of code that require a lot of context.

But I think posting this patch in a new thread, posting a message that
you're intending to commit unless somebody protests with a substantial
arguments and/or a timeline of review, and then waiting a few days, are
something that should be done for a major piece of new infrastructure,
especially when it's somewhat controversial.

This doesn't just affect parallel execution, it affects one of least
understood parts of postgres code. And where hard to find bugs, likely
to only trigger in production, are to be expected.


> And, by the way, the patch, aside from the deadlock.c portion, was
> posted back in October, admittedly without much fanfare, but nobody
> reviewed that or any other patch on this thread.

I think it's unrealistic to expect random patches without a commitest
entry, posted somewhere deep in a thread, to get a review when there's
so many open commitfest entries that haven't gotten feedback, and which
we are supposed to look at.


> If I'd waited for those reviews to come in, parallel query would not
> be committed now, nor probably in 9.6, nor probably in 9.7 or 9.8
> either.  The whole project would just be impossible on its face.

Yes, that's a problem. But you're not the only one facing it, and you've
argued hard against such an approach in some other cases.


> I think it's myopic to say "well, but this patch might have bugs".
> Very true.  But also, all the other parallelism patches that are
> already committed or that are still under review but which can't be
> properly tested without this patch might have bugs, too, so you've got
> to weigh the risk that this patch might get better if I wait longer to
> commit it against the possibility that not having committed it reduces
> the chances of finding bugs elsewhere.  I don't want it to seem like
> I'm forcing this down the community's throat - I don't have a right to
> do that, and I will certainly revert this patch if that is the
> consensus.  But that is not what I think best myself.  What I think
> would be better is to (1) make an effort to get the buildfarm testing
> which this patch enables up and running as soon as possible and (2)
> for somebody to read over the committed code and raise any issues that
> they find.  Or, for that matter, to read over the committed code for
> any of the *other* parallelism patches and raise any issues that they
> find with *that* code.  There's certainly scads of code here and this
> is far from the only bit that might have bugs.

I think you are, and *you have to*, walk a very thin line here. I agree
that realistically there's just nobody with the bandwidth to keep up
with a fully loaded Robert. Not without ignoring their own stuff at
least. And I think the importance of what you're building means we need
to be flexible.  But I think that thin line in turn means that you have
to be *doubly* careful about communication. I.e. post new infrastructure
to new threads, "warn" that you're intending to commit something
potentially needing debate/review, etc.


> Oh: another thing that I would like to do is commit the isolation
> tests I wrote for the deadlock detector a while back, which nobody has
> 

Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 4:54 PM, Peter Geoghegan  wrote:
> On Mon, Feb 8, 2016 at 1:45 PM, Robert Haas  wrote:
>> Far from the negligence that you seem to be implying, I think Amit was
>> remarkably diligent about providing these kinds of updates.
>
> I don't think I remotely implied negligence. That word has very severe
> connotations (think "criminal negligence") that are far from what I
> intended.

OK, sorry, I think I misread your tone.

> I don't want to get stuck on that one example, which I acknowledged
> might not be representative when I raised it. I'm not really talking
> about parallel query in particular anyway. I'm mostly arguing for a
> consistent way to get instructions on how to at least build the patch,
> where that might be warranted.
>
> The CF app is one way. Another good way is: As long as we're using a
> patch series, be explicit about what goes where in the commit message.
> Have message-id references. That sort of thing. I already try to do
> that. That's all.

Yeah, me too.  Generally, although with some exceptions, my practice
is to keep reposting the whole patch stack, so that everything is in
one email.  In this particular case, though, there were patches from
me and patches from Amit, so that was harder to do.  I wasn't using
his patches to test my patches; I had other test code for that.  He
was using my patches as a base for his patches, but linked to them
instead of reposting them.  That's an unusually complicated scenario,
though: it's pretty rare around here to have two developers working
together on something as closely as Amit and I did on those patches.

> Thank you (and Amit) for working really hard on parallelism.

Thanks.

By the way, it bears saying, or if I've said it before repeating, that
although most of the parallelism code that has been committed was
written by me, Amit has made an absolutely invaluable contribution to
parallel query, and it wouldn't be committed today or maybe ever
without that contribution.  In addition to those parts of the code
that were committed as he wrote them, he prototyped quite a number of
things that I ended up rewriting, reviewed a ton of code that I wrote
and found bugs in it, wrote numerous bits and pieces of test code, and
generally put up with an absolutely insane level of me nitpicking his
work, breaking it by committing pieces of it or committing different
pieces that replaced pieces he had, demanding repeated rebases on
short time scales, and generally beating him up in just about every
conceivable way.  I am deeply appreciative of him being willing to
jump into this project, do a ton of work, and put up with me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2016-01 Commitfest

2016-02-08 Thread Tom Lane
Alvaro Herrera  writes:
> I'm closing this commitfest now.  We even have three weeks before the
> next one starts, so everybody can take a break for once!  Yay!

Yay!  And thanks for running this one --- I know it's a mighty
tedious and thankless task.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Peter Geoghegan
On Mon, Feb 8, 2016 at 1:45 PM, Robert Haas  wrote:
> Far from the negligence that you seem to be implying, I think Amit was
> remarkably diligent about providing these kinds of updates.

I don't think I remotely implied negligence. That word has very severe
connotations (think "criminal negligence") that are far from what I
intended.

> I admittedly didn't duplicate those same updates on the parallel
> mode/contexts thread to which you replied, but that's partly because I
> would often whack around that patch first and then Amit would adjust
> his patch to cope with my changes after the fact.  That doesn't seem
> to have been the case in this particular example, but if this is the
> closest thing you can come up with to a process failure during the
> development of parallel query, I'm not going to be sad about it: I'm
> going to have a beer.  Seriously: we worked really hard at this.

I don't want to get stuck on that one example, which I acknowledged
might not be representative when I raised it. I'm not really talking
about parallel query in particular anyway. I'm mostly arguing for a
consistent way to get instructions on how to at least build the patch,
where that might be warranted.

The CF app is one way. Another good way is: As long as we're using a
patch series, be explicit about what goes where in the commit message.
Have message-id references. That sort of thing. I already try to do
that. That's all.

Thank you (and Amit) for working really hard on parallelism.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread David G. Johnston
On Monday, February 8, 2016, Andres Freund  wrote:

> Hi,
>
> On 2016-02-08 16:07:05 -0500, Robert Haas wrote:
> > One of the questions I have about parallel query is whether it should
> > be enabled by default.  That is, should we make the default value of
> > max_parallel_degree to a value higher than 0?  Perhaps 1, say?
> >
> > There are some good reasons why this might be a bad idea, such as:
> >
> > - As discussed on a nearby thread, there is every possibility of nasty
> > bugs.
>
> I think that's an argument to enable it till at least beta1. Let's
> change the default, and add an item to the open items list to reconsider
> then.
>
>
I'd rather phrase that as: I cannot think of any reason to not make it on
by default so let's do so until experience tells us differently.  If
experience says it is too buggy then I'd be concerned about allowing it be
to enabled at all let alone by default.  If there are usage concerns then
ideally the postmaster could detect them and configure itself rather than
provide yet another knob for people to research.  I don't know enough about
the specifics on that end myself.  IOW I could not convince myself that the
end of beta was somehow important to this decision - but maybe the hash
join bug has me soured...

As a user I'd like it to just work within the confines of what system
resources I've told PostgreSQL as a whole it can use and that it detects it
has available to it by asking the O/S.

So, for me the quality of the feature is a on/off decision and not a
"defaults" one.  the argument that our default configuration is geared
toward low resource setups and since this is resource intensive it should
be disabled resonates though I'd rather it work reasonably (of self toggle
off) in both low and large resource environments.  Is that possible?

David J.


Re: [HACKERS] 2016-01 Commitfest

2016-02-08 Thread Magnus Hagander
On Mon, Feb 8, 2016 at 10:19 PM, Alvaro Herrera 
wrote:

> Hi everybody,
>
> I just closed the last few remaining items in the commitfest.  This is
> the final summary:
>
>  Committed: 32.
>  Moved to next CF: 32.
>  Rejected: 2.
>  Returned with Feedback: 33.
> Total: 99.
>
> I think we did a fairly decent job this time around: we only passed a
> third of the patches to the upcoming commitfest, which seems pretty
> decent.  The number of patches that we moved unreviewed was very small,
> which is even better -- hopefully we're not letting too many people
> down.
>
> I'm closing this commitfest now.  We even have three weeks before the
> next one starts, so everybody can take a break for once!  Yay!


Thanks for taking on the thankless task of pushing things along!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 4:11 PM, Peter Geoghegan  wrote:
> All that I wanted to do was look at EXPLAIN ANALYZE output that showed
> a parallel seq scan on my laptop, simply because I wanted to see a
> cool thing happen. I had to complain about it [1] to get clarification
> from you [2].
>
> I accept that this might have been a somewhat isolated incident (that
> I couldn't easily get *at least* a little instant gratification), but
> it still should be avoided. You've accused me of burying the lead
> plenty of times. Don't tell me that it was too hard to prominently
> place those details somewhere where I or any other contributor could
> reasonably expect to find them, like the CF app page, or a wiki page
> that is maintained on an ongoing basis (and linked to at the start of
> each thread). If I said that that was too much to you, you'd probably
> shout at me. If I persisted, you wouldn't commit my patch, and for me
> that probably means it's DOA.
>
> I don't think I'm asking for much here.

I don't think you are asking for too much; what I think is that Amit
and I were trying to do exactly the thing you asked for, and mostly
did.  On March 20th, Amit posted version 11 of the sequential scan
patch, and included directions about the order in which to apply the
patches:

http://www.postgresql.org/message-id/CAA4eK1JSSonzKSN=l-dwucewdlqkbmujvfpe3fgw2tn2zpo...@mail.gmail.com

On March 25th, Amit posted version 12 of the sequential scan patch,
and again included directions about which patches to apply:

http://www.postgresql.org/message-id/caa4ek1l50y0y1ogt_dh2eouyq-rqcnpvjboon2pcgjq+1by...@mail.gmail.com

On March 27th, Amit posted version 13 of the sequential scan patch,
which did not include those directions:

http://www.postgresql.org/message-id/caa4ek1lfr8sr9viuplpmkrquvcrhefdjsz1019rpwgjyftr...@mail.gmail.com

While perhaps Amit might have included directions again, I think it's
pretty reasonable that he felt that it might not be entirely necessary
to do so given that he had already done it twice in the last week.
This was still the state of affairs when you asked your question on
April 20th.  Two days after you asked that question, Amit posted
version 14 of the patch, and again included directions about what
patches to apply:

http://www.postgresql.org/message-id/caa4ek1jlv+2y1awjhsqpfiskhbf7jwf_nzirmzyno9upbrc...@mail.gmail.com

Far from the negligence that you seem to be implying, I think Amit was
remarkably diligent about providing these kinds of updates.  I
admittedly didn't duplicate those same updates on the parallel
mode/contexts thread to which you replied, but that's partly because I
would often whack around that patch first and then Amit would adjust
his patch to cope with my changes after the fact.  That doesn't seem
to have been the case in this particular example, but if this is the
closest thing you can come up with to a process failure during the
development of parallel query, I'm not going to be sad about it: I'm
going to have a beer.  Seriously: we worked really hard at this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Alvaro Herrera
Pavel Stehule wrote:

> > FWIW I think the general idea of this feature (client-side resultset
> > "pivoting") is a good one, but I don't really have an opinion regarding
> > your specific proposal.  I think you should first seek some more
> > consensus about the proposed design; in your original thread [1] several
> > guys defended the idea of having this be a psql feature, and the idea of
> > this being a parallel to \x seems a very sensible one,

Sorry, I meant \q here, not \x.

> > but there's really been no discussion on whether your proposed "+/-"
> > syntax to change sort order makes sense, for one thing.
> 
> I am sorry, but I disagree - the discussion about implementation was more
> than two months, and I believe so anybody who would to discuss had enough
> time to discuss. This feature and design was changed significantly and
> there was not anybody who sent feature design objection.

I just rechecked the thread.  In my reading, lots of people argued
whether it should be called \rotate or \pivot or \crosstab; it seems the
\crosstabview proposal was determined to be best.  I can support that
decision.  But once the details were discussed, it was only you and
Daniel left in the thread; nobody else participated.  While I understand
that you may think that "silence is consent", what I am afraid of is
that some committer will look at this two months from now and say "I
hate this Hcol+ stuff, -1 from me" and send the patch back for syntax
rework.  IMO it's better to have more people chime in here so that the
patch that we discuss during the next commitfest is really the best one
we can think of.

Also, what about the business of putting "x" if there's no third column?
Three months from now some Czech psql hacker will say "we should use
Unicode chars for this" and we will be forever stuck with \pset
unicode_crosstab_marker to change the character to a ☑ BALLOT BOX WITH
CZECH.  Maybe we should think that a bit harder -- for example, what
about just rejecting the case with no third column and forcing the user
to add a third column with the character they choose?  That way you
avoid that mess.

> This feature has only small relation to SQL PIVOTING feature - it is just
> form of view and I am agree with Daniel about sense of this feature.

Yes, I don't disagree there.  Robert Haas and David Fetter also
expressed their support for psql-side processing, so I think we're good
there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Peter Geoghegan
On Mon, Feb 8, 2016 at 1:24 PM, Andres Freund  wrote:
> I think that's an argument to enable it till at least beta1. Let's
> change the default, and add an item to the open items list to reconsider
> then.

+1.

Reminds me of what happened with the num_xloginsert_locks GUC (it was
eventually replaced with a #define before release, though).



-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Tom Lane
Robert Haas  writes:
> One of the questions I have about parallel query is whether it should
> be enabled by default.  That is, should we make the default value of
> max_parallel_degree to a value higher than 0?  Perhaps 1, say?

I'm not sure I'm on board with that as a releaseable default, but there
certainly would be an argument for turning it on from now to say mid
beta, so as to improve test coverage.  I think we've done similar things
in the past.

I don't understand however how that doesn't break the regression tests?
Surely we've got lots of EXPLAIN queries that would change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Joshua D. Drake

On 02/08/2016 01:11 PM, Peter Geoghegan wrote:

On Mon, Feb 8, 2016 at 12:18 PM, Robert Haas  wrote:



I accept that this might have been a somewhat isolated incident (that
I couldn't easily get *at least* a little instant gratification), but
it still should be avoided. You've accused me of burying the lead
plenty of times. Don't tell me that it was too hard to prominently
place those details somewhere where I or any other contributor could
reasonably expect to find them, like the CF app page, or a wiki page
that is maintained on an ongoing basis (and linked to at the start of
each thread). If I said that that was too much to you, you'd probably
shout at me. If I persisted, you wouldn't commit my patch, and for me
that probably means it's DOA.

I don't think I'm asking for much here.

[1] 
http://www.postgresql.org/message-id/CAM3SWZSefE4uQk3r_3gwpfDWWtT3P51SceVsL4=g8v_me2a...@mail.gmail.com
[2] 
http://www.postgresql.org/message-id/ca+tgmoarttf8eptbhinwxukfkctsfc7wtzfhgegqywe8e2v...@mail.gmail.com


This part of the thread seems like something that should be a new thread 
about how to write patches. I agree that patches that are large features 
that are in depth discussed on a maintained wiki page would be awesome. 
Creating that knowledge base without having to troll through code would 
be priceless in value.


Sincerely,

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count_nulls(VARIADIC "any")

2016-02-08 Thread Thomas Munro
On Tue, Feb 9, 2016 at 9:26 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Would num_values be a better name than num_nonnulls?
>
> If "value" is a term that excludes null values, it's news to me.

Ah, right, I was thinking of null as the absence of a value.  But in
fact it is a special value that indicates the absence of a "data
value".  And num_data_values doesn't sound great.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Andres Freund
Hi,

On 2016-02-08 16:07:05 -0500, Robert Haas wrote:
> One of the questions I have about parallel query is whether it should
> be enabled by default.  That is, should we make the default value of
> max_parallel_degree to a value higher than 0?  Perhaps 1, say?
> 
> There are some good reasons why this might be a bad idea, such as:
> 
> - As discussed on a nearby thread, there is every possibility of nasty
> bugs.

I think that's an argument to enable it till at least beta1. Let's
change the default, and add an item to the open items list to reconsider
then.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Pavel Stehule
Hi



> FWIW I think the general idea of this feature (client-side resultset
> "pivoting") is a good one, but I don't really have an opinion regarding
> your specific proposal.  I think you should first seek some more
> consensus about the proposed design; in your original thread [1] several
> guys defended the idea of having this be a psql feature, and the idea of
> this being a parallel to \x seems a very sensible one, but there's
> really been no discussion on whether your proposed "+/-" syntax to
> change sort order makes sense, for one thing.
>

I am sorry, but I disagree - the discussion about implementation was more
than two months, and I believe so anybody who would to discuss had enough
time to discuss. This feature and design was changed significantly and
there was not anybody who sent feature design objection.

This feature has only small relation to SQL PIVOTING feature - it is just
form of view and I am agree with Daniel about sense of this feature.

Regards

Pavel


>
> So please can we have that wiki page so that the syntax can be hammered
> out a bit more.
>
> I'm closing this as returned-with-feedback for now.
>
> [1] It's a good idea to add links to previous threads where things were
> discussed.  I had to search for
> www.postgresql.org/message-id/flat/78543039-c708-4f5d-a66f-0c0fbcda1f76@mm
> because you didn't provide a link to it when you started the second
> thread.
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] 2016-01 Commitfest

2016-02-08 Thread Alvaro Herrera
Hi everybody,

I just closed the last few remaining items in the commitfest.  This is
the final summary:

 Committed: 32.
 Moved to next CF: 32.
 Rejected: 2.
 Returned with Feedback: 33.
Total: 99. 

I think we did a fairly decent job this time around: we only passed a
third of the patches to the upcoming commitfest, which seems pretty
decent.  The number of patches that we moved unreviewed was very small,
which is even better -- hopefully we're not letting too many people
down.

I'm closing this commitfest now.  We even have three weeks before the
next one starts, so everybody can take a break for once!  Yay!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enable parallel query by default?

2016-02-08 Thread Joshua D. Drake

On 02/08/2016 01:07 PM, Robert Haas wrote:

Hi,

One of the questions I have about parallel query is whether it should
be enabled by default.  That is, should we make the default value of
max_parallel_degree to a value higher than 0?  Perhaps 1, say?


O.k. after some googling where I found your fantastic blog on the 
subject, max_parallel_degree looks like it should be 
max_parallel_workers. Am I correct in assuming that given the 
opportunity to use parallel workers, postgres will launch N number of 
workers up to the value of max_parallel_degree ?


If so, then I think 1 or 2 would be reasonable. By far the majority of 
servers are going to have at least two cores.




There are some good reasons why this might be a bad idea, such as:

- As discussed on a nearby thread, there is every possibility of nasty bugs.


Which we won't find if it doesn't get turned on.



- Parallel query uses substantially more resources than a regular
query, which might overload your system.


How much of a reality is that? Isn't this something we could just cover 
in the release notes?




On the other hand:

- Features that are on by default get more testing and thus might get
less buggy more quickly.


Correct.


- A lot of people don't change the default configuration and thus
wouldn't get any benefit from the feature if it's not on by default.



Correct.


Thoughts?



+1 on enabling by default.

Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Peter Geoghegan
On Mon, Feb 8, 2016 at 12:18 PM, Robert Haas  wrote:
> So, there may be a person who knows how to do all of that
> work and get it done in a reasonable time frame and also knows how to
> make sure that everybody has the opportunity to be as involved in the
> process as they want to be and that there are no bugs or controversial
> design decisions, but I am not that person.  I am doing my best.
>
>> To be more specific, I thought it was really hard to test parallel
>> sequential scan a few months ago, because there was so many threads
>> and so many dependencies. I appreciate that we now use git
>> format-patch patch series for complicated stuff these days, but it's
>> important to make it clear how everything fits together. That's
>> actually what I was thinking about when I said we need to be clear on
>> how things fit together from the CF app patch page, because there
>> doesn't seem to be a culture of being particular about that, having
>> good "annotations", etc.
>
> I agree that you had to be pretty deeply involved in that thread to
> follow everything that was going on.  But it's not entirely fair to
> say that it was impossible for anyone else to get involved.

All that I wanted to do was look at EXPLAIN ANALYZE output that showed
a parallel seq scan on my laptop, simply because I wanted to see a
cool thing happen. I had to complain about it [1] to get clarification
from you [2].

I accept that this might have been a somewhat isolated incident (that
I couldn't easily get *at least* a little instant gratification), but
it still should be avoided. You've accused me of burying the lead
plenty of times. Don't tell me that it was too hard to prominently
place those details somewhere where I or any other contributor could
reasonably expect to find them, like the CF app page, or a wiki page
that is maintained on an ongoing basis (and linked to at the start of
each thread). If I said that that was too much to you, you'd probably
shout at me. If I persisted, you wouldn't commit my patch, and for me
that probably means it's DOA.

I don't think I'm asking for much here.

[1] 
http://www.postgresql.org/message-id/CAM3SWZSefE4uQk3r_3gwpfDWWtT3P51SceVsL4=g8v_me2a...@mail.gmail.com
[2] 
http://www.postgresql.org/message-id/ca+tgmoarttf8eptbhinwxukfkctsfc7wtzfhgegqywe8e2v...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] enable parallel query by default?

2016-02-08 Thread Robert Haas
Hi,

One of the questions I have about parallel query is whether it should
be enabled by default.  That is, should we make the default value of
max_parallel_degree to a value higher than 0?  Perhaps 1, say?

There are some good reasons why this might be a bad idea, such as:

- As discussed on a nearby thread, there is every possibility of nasty bugs.
- Parallel query uses substantially more resources than a regular
query, which might overload your system.

On the other hand:

- Features that are on by default get more testing and thus might get
less buggy more quickly.
- A lot of people don't change the default configuration and thus
wouldn't get any benefit from the feature if it's not on by default.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: function parse_ident

2016-02-08 Thread Alvaro Herrera
Pavel Stehule wrote:

> I am looking on it

Thanks, closing as returned-with-feedback.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Alvaro Herrera
Daniel Verite wrote:
>   Teodor Sigaev wrote:
> 
> > Interesting feature, but it's not very obvious how to use it. I'd like to
> > see  some example(s) in documentation.
> 
> I'm thinking of making a wiki page, because examples pretty much
> require showing resultsets, and I'm not sure this would fly
> with our current psql documentation, which is quite compact.

Yeah, we need to keep in mind that the psql doc is processed as a
manpage also, so it may not be a great idea to add too many things
there.  But I also agree that some good examples would be useful.

FWIW I think the general idea of this feature (client-side resultset
"pivoting") is a good one, but I don't really have an opinion regarding
your specific proposal.  I think you should first seek some more
consensus about the proposed design; in your original thread [1] several
guys defended the idea of having this be a psql feature, and the idea of
this being a parallel to \x seems a very sensible one, but there's
really been no discussion on whether your proposed "+/-" syntax to
change sort order makes sense, for one thing.

So please can we have that wiki page so that the syntax can be hammered
out a bit more.

I'm closing this as returned-with-feedback for now.

[1] It's a good idea to add links to previous threads where things were
discussed.  I had to search for
www.postgresql.org/message-id/flat/78543039-c708-4f5d-a66f-0c0fbcda1f76@mm
because you didn't provide a link to it when you started the second
thread.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Andrew Dunstan



On 02/08/2016 02:15 PM, Tom Lane wrote:

Of late, by far the majority of the random-noise failures we see in the
buildfarm have come from failure to shut down the postmaster in a
reasonable timeframe.  An example is this current failure on hornet:

http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2016-02-08%2013%3A41%3A55

waiting for server to shut 
down...
 failed
pg_ctl: server does not shut down
=== db log file ==
2016-02-08 15:14:35.783 UTC [56b8b00d.9e00e2:2] LOG:  received fast shutdown 
request
2016-02-08 15:14:35.783 UTC [56b8b00d.9e00e2:3] LOG:  aborting any active 
transactions
2016-02-08 15:14:35.783 UTC [56b8b00d.7e0028:2] LOG:  autovacuum launcher 
shutting down
2016-02-08 15:14:35.865 UTC [56b8b00d.2e5000a:1] LOG:  shutting down

The buildfarm script runs pg_ctl stop with -t 120, and counting the dots
shows that pg_ctl was honoring that, so apparently the postmaster took
more than 2 minutes to shut down.

Looking at other recent successful runs, stopdb-C-1 usually takes 30 to
40 or so seconds on this machine.  So it's possible that it was just so
overloaded that it took three times that much time on this run, but I'm
starting to think there may be more to it than that.  We've seen variants
on this theme on half a dozen machines just in the past week --- and it
seems to mostly happen in 9.5 and HEAD, which is fishy.

The fact that we got "shutting down" but not "database system is shut
down" indicates that the server was in ShutdownXLOG().  Normally the
only component of that that takes much time is buffer flushing, but
could something else be happening?  Or perhaps the checkpoint code
has somehow not gotten the word to do its flushing at full speed?

What I'd like to do to investigate this is put in a temporary HEAD-only
patch that makes ShutdownXLOG() and its subroutines much chattier about
how far they've gotten and what time it is, and also makes pg_ctl print
out the current time if it gives up waiting.  A few failed runs with
that in place will at least allow us to confirm or deny whether it's
just that the shutdown checkpoint is sometimes really slow, or whether
there's a bug lurking.

Any objections?  Anybody have another idea for data to collect?





I think that's an excellent idea. This has been a minor running sore for 
ages on slow machines.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: make NOTIFY list de-duplication optional

2016-02-08 Thread Vik Fearing
On 02/08/2016 09:33 PM, Filip Rembiałkowski wrote:
> Here is my next try, after suggestions from -perf and -hackers list:
> 
> * no GUC
> 
> * small addition to NOTIFY grammar: NOTIFY ALL/DISTINCT
> 
> * corresponding, 3-argument version of pg_notify(text,text,bool)
> 
> * updated the docs to include new syntax and clarify behavior
> 
> * no hashtable in AsyncExistsPendingNotify
>  (I don't see much sense in that part; and it can be well done
> separately from this)

Please add this to the next commitfest:
https://commitfest.postgresql.org/9/new/
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: SET ROLE hook

2016-02-08 Thread Alvaro Herrera
Joe Conway wrote:
> On 01/06/2016 10:36 AM, Tom Lane wrote:
> > I think a design that was actually somewhat robust would require two
> > hooks, one at check_role and one at assign_role, wherein the first one
> > would do any potentially-failing work and package all required info into
> > a blob that could be passed through to the assign hook.
> 
> Fair enough -- will rework the patch and example code.

Returned with feedback.  Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> What I'd like to do to investigate this is put in a temporary HEAD-only
>> patch that makes ShutdownXLOG() and its subroutines much chattier about
>> how far they've gotten and what time it is, and also makes pg_ctl print
>> out the current time if it gives up waiting.

> Seems like a reasonable place to start.  I assume you'll be installing
> some debug-elog calls, enabled by default, and then once the problem is
> fixed, we'll change the default to disabled but keep the actual calls?

I had in mind to just "git revert" the patch when we're done with it.
There might be some parts we want to keep (for instance, I'm thinking of
logging "postmaster is done" after we've unlinked the pidfile, which might
be useful permanently).  But I plan to err on the side of noisiness for
the moment, rather than make something I think has long-term value.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-02-08 Thread Stephen Frost
Thomas,

* Thomas Munro (thomas.mu...@enterprisedb.com) wrote:
> On Tue, Feb 9, 2016 at 5:30 AM, Joe Conway  wrote:
> > On 02/08/2016 06:24 AM, Andres Freund wrote:
> >> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
> >>> Now that there is a setting to give a cluster a "name", it would be nice 
> >>> to
> >>> have an escape sequence in the log_line_prefix setting that could 
> >>> reference
> >>> the cluster_name.
> >>
> >> I've argued[1][2] for this when cluster_name was introduced, but back
> >> then I seemed to have been the only one arguing for it. Josh later
> >> jumped that train.
> >>
> >> Given that we now had a number of people wishing for this, can we maybe
> >> reconsider?
> >
> > Seems like a reasonable idea to me. But if we add a log_line_prefix
> > setting, shouldn't we also add it to csvlog output too?
> 
> Here's a tiny patch adding support for %C to log_line_prefix (this was
> part of the cluster_name patch that didn't go it).
> 
> Given that csvlog's output format is hardcoded in write_csvlog, how is
> it supposed to evolve without upsetting consumers of this data?
> Wouldn't we first need to add a GUC that lets you control the columns
> it outputs?

Not sure if you really want to go there, but I do agree with you and
there's a thread from a few years back about something similar:

http://www.postgresql.org/message-id/flat/20110112142345.ga4...@tamriel.snowman.net

Included in that thread is a patch, which likely requires some dusting
off, to add exactly that ability.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Alvaro Herrera
Tom Lane wrote:
> Of late, by far the majority of the random-noise failures we see in the
> buildfarm have come from failure to shut down the postmaster in a
> reasonable timeframe.

I noticed that.

> An example is this current failure on hornet:
> 
> http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2016-02-08%2013%3A41%3A55

FWIW you seem to have edited this URL -- it returns a blank page.
The right one is
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2016-02-08%2013%3A41%3A55&stg=stopdb-C-1

> What I'd like to do to investigate this is put in a temporary HEAD-only
> patch that makes ShutdownXLOG() and its subroutines much chattier about
> how far they've gotten and what time it is, and also makes pg_ctl print
> out the current time if it gives up waiting.  A few failed runs with
> that in place will at least allow us to confirm or deny whether it's
> just that the shutdown checkpoint is sometimes really slow, or whether
> there's a bug lurking.
> 
> Any objections?  Anybody have another idea for data to collect?

Seems like a reasonable place to start.  I assume you'll be installing
some debug-elog calls, enabled by default, and then once the problem is
fixed, we'll change the default to disabled but keep the actual calls?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-02-08 Thread Thomas Munro
On Tue, Feb 9, 2016 at 5:30 AM, Joe Conway  wrote:
> On 02/08/2016 06:24 AM, Andres Freund wrote:
>> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
>>> Now that there is a setting to give a cluster a "name", it would be nice to
>>> have an escape sequence in the log_line_prefix setting that could reference
>>> the cluster_name.
>>
>> I've argued[1][2] for this when cluster_name was introduced, but back
>> then I seemed to have been the only one arguing for it. Josh later
>> jumped that train.
>>
>> Given that we now had a number of people wishing for this, can we maybe
>> reconsider?
>
> Seems like a reasonable idea to me. But if we add a log_line_prefix
> setting, shouldn't we also add it to csvlog output too?

Here's a tiny patch adding support for %C to log_line_prefix (this was
part of the cluster_name patch that didn't go it).

Given that csvlog's output format is hardcoded in write_csvlog, how is
it supposed to evolve without upsetting consumers of this data?
Wouldn't we first need to add a GUC that lets you control the columns
it outputs?

-- 
Thomas Munro
http://www.enterprisedb.com


cluster_name_log_line_prefix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: make NOTIFY list de-duplication optional

2016-02-08 Thread Filip Rembiałkowski
On Mon, Feb 8, 2016 at 1:52 PM, Craig Ringer  wrote:

> Would it be correct to say that if ALL is specified then a message is queued
> no matter what. If DISTINCT is specified then it is only queued if no
> message with the same channel and argument is already queued for delivery.
Yes, exactly.

> Using DISTINCT can never decrease the total number of messages to be sent.
This sentence does not sound true. DISTINCT is the default, old
behaviour. It *can* decrease total number of messages (by
deduplication)

> I've found the deduplication functionality of NOTIFY very frustrating in the 
> past
> and I see this as a significant improvement. Sometimes the *number of times*
> something happened is significant too...
yep, same idea here.




Here is my next try, after suggestions from -perf and -hackers list:

* no GUC

* small addition to NOTIFY grammar: NOTIFY ALL/DISTINCT

* corresponding, 3-argument version of pg_notify(text,text,bool)

* updated the docs to include new syntax and clarify behavior

* no hashtable in AsyncExistsPendingNotify
 (I don't see much sense in that part; and it can be well done
separately from this)
diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml
index 4dd5608..933c76c 100644
--- a/doc/src/sgml/ref/notify.sgml
+++ b/doc/src/sgml/ref/notify.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-NOTIFY channel [ , payload ]
+NOTIFY [ ALL | DISTINCT ] channel [ , payload ]
 
  
 
@@ -105,6 +105,10 @@ NOTIFY channel [ , ALL is specified (contrary to DISTINCT, the
+   default), the server will deliver all notifications, including duplicates.
+   Removal of duplicate notifications takes place within transaction block,
+   finished with COMMIT, END or SAVEPOINT.
   
 
   
@@ -190,6 +194,12 @@ NOTIFY channel [ , NOTIFY command if you need to work with
 non-constant channel names and payloads.

+   
+There is a three-argument version, pg_notify(text,
+text, boolean). The third argument acts like
+the ALL keyword when set to true, and
+DISTINCT when set to false. 
+   
   
  
 
@@ -210,6 +220,21 @@ Asynchronous notification "virtual" with payload "This is the payload" received
 LISTEN foo;
 SELECT pg_notify('fo' || 'o', 'pay' || 'load');
 Asynchronous notification "foo" with payload "payload" received from server process with PID 14728.
+
+/* Identical messages from same (sub-) transaction can be eliminated - unless you use the ALL keyword */
+LISTEN bar;
+BEGIN;
+NOTIFY bar, 'Coffee please';
+NOTIFY bar, 'Coffee please';
+NOTIFY bar, 'Milk please';
+NOTIFY ALL bar, 'Milk please';
+SAVEPOINT s;
+NOTIFY bar, 'Coffee please';
+COMMIT;
+Asynchronous notification "bar" with payload "Coffee please" received from server process with PID 31517.
+Asynchronous notification "bar" with payload "Milk please" received from server process with PID 31517.
+Asynchronous notification "bar" with payload "Milk please" received from server process with PID 31517.
+Asynchronous notification "bar" with payload "Coffee please" received from server process with PID 31517.
 
  
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index c39ac3a..38a8246 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -510,6 +510,7 @@ pg_notify(PG_FUNCTION_ARGS)
 {
 	const char *channel;
 	const char *payload;
+	bool use_all;
 
 	if (PG_ARGISNULL(0))
 		channel = "";
@@ -521,10 +522,15 @@ pg_notify(PG_FUNCTION_ARGS)
 	else
 		payload = text_to_cstring(PG_GETARG_TEXT_PP(1));
 
+	if (PG_NARGS() > 2 && ! PG_ARGISNULL(2))
+		use_all = PG_GETARG_BOOL(2);
+	else
+		use_all = false;
+
 	/* For NOTIFY as a statement, this is checked in ProcessUtility */
 	PreventCommandDuringRecovery("NOTIFY");
 
-	Async_Notify(channel, payload);
+	Async_Notify(channel, payload, use_all);
 
 	PG_RETURN_VOID();
 }
@@ -540,7 +546,7 @@ pg_notify(PG_FUNCTION_ARGS)
  *		^^
  */
 void
-Async_Notify(const char *channel, const char *payload)
+Async_Notify(const char *channel, const char *payload, bool use_all)
 {
 	Notification *n;
 	MemoryContext oldcontext;
@@ -570,9 +576,10 @@ Async_Notify(const char *channel, const char *payload)
 	 errmsg("payload string too long")));
 	}
 
-	/* no point in making duplicate entries in the list ... */
-	if (AsyncExistsPendingNotify(channel, payload))
-		return;
+	if (!use_all)
+		/* remove duplicate entries in the list */
+		if (AsyncExistsPendingNotify(channel, payload))
+			return;
 
 	/*
 	 * The notification list needs to live until end of transaction, so store
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b307b48..7203f4a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -8528,11 +8528,12 @@ DropRuleStmt:
  *
  */
 
-NotifyStmt: NOTIFY ColId notify_payload
+NotifyStmt: NOTIFY all_or_distinct ColId notify_payload
 {
 	NotifyStmt *n = make

Re: [HACKERS] count_nulls(VARIADIC "any")

2016-02-08 Thread Tom Lane
Thomas Munro  writes:
> Would num_values be a better name than num_nonnulls?

If "value" is a term that excludes null values, it's news to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recently added typedef "string" is a horrid idea

2016-02-08 Thread Tom Lane
Peter Geoghegan  writes:
> On Sun, Feb 7, 2016 at 7:47 AM, Tom Lane  wrote:
>> Works for me.

> Attached patch is what I came up with. It required only minimal
> additional changes for consistency.

I'd already run into some trouble with pgindent messing up on "string",
so I went ahead and pushed this.  Thanks!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 2:48 PM, Peter Geoghegan  wrote:
> FWIW, I appreciate your candor. However, I think that you could have
> done a better job of making things easier for reviewers, even if that
> might not have made an enormous difference. I suspect I would have not
> been able to get UPSERT done as a non-committer if it wasn't for the
> epic wiki page, that made it at least possible for someone to jump in.

I'm not going to argue with the proposition that it could have been
done better.  Equally, I'm going to disclaim having the ability to
have done it better.  I've been working on this for three years, and
most of the work that I've put into it has gone into tinkering with C
code that was not in any way user-testable.  I've modified essentially
every major component of the system.  We had a shared memory facility;
I built another one.  We had background workers; I overhauled them.  I
invented a message queueing system, and then layered a modified
version of the FE/BE protocol on top of that message queue, and then
later layered tuple-passing on top of that same message queue and then
invented a bespoke protocol that is used to handle typemod mapping.
We had a transaction system; I made substantial, invasive
modifications to it.  I tinkered with the GUC subsystem, the combocid
system, and the system for loading loadable modules.  Amit added read
functions to a whole class of nodes that never had them before and
together we overhauled core pieces of the executer machinery.  Then I
hit the planner with hammer.  Finally there's this patch, which
affects heavyweight locking and deadlock detection.  I don't believe
that during the time I've been involved with this project anyone else
has ever attempted a project that required changing as many subsystems
as this one did - in some cases rather lightly, but in a number of
cases in pretty significant, invasive ways.  No other project in
recent memory has been this invasive to my knowledge.  Hot Standby
probably comes closest, but I think (admittedly being much closer to
this work than I was to that work) that this has its fingers in more
places.  So, there may be a person who knows how to do all of that
work and get it done in a reasonable time frame and also knows how to
make sure that everybody has the opportunity to be as involved in the
process as they want to be and that there are no bugs or controversial
design decisions, but I am not that person.  I am doing my best.

> To be more specific, I thought it was really hard to test parallel
> sequential scan a few months ago, because there was so many threads
> and so many dependencies. I appreciate that we now use git
> format-patch patch series for complicated stuff these days, but it's
> important to make it clear how everything fits together. That's
> actually what I was thinking about when I said we need to be clear on
> how things fit together from the CF app patch page, because there
> doesn't seem to be a culture of being particular about that, having
> good "annotations", etc.

I agree that you had to be pretty deeply involved in that thread to
follow everything that was going on.  But it's not entirely fair to
say that it was impossible for anyone else to get involved.   Both
Amit and I, mostly Amit, posted directions at various times saying:
here is the sequence of patches that you currently need to apply as of
this time.  There was not a heck of a lot of evidence that anyone was
doing that, though, though I think a few people did, and towards the
end things changed very quickly as I committed patches in the series.
We certainly knew what each other were doing and not because of some
hidden off-list collaboration that we kept secret from the community -
we do talk every week, but almost all of our correspondence on those
patches was on-list.

I think it's an inherent peril of complicated patch sets that people
who are not intimately involved in what is going on will have trouble
following just because it takes a lot of work.  Is anybody here
following what is going on on the postgres_fdw join pushdown thread?
There's only one patch to apply there right now (though there have
been as many as four at times in the past) and the people who are
actually working on it can follow along, but I'm not a bit surprised
if other people feel lost.  It's hard to think that the cause of that
is anything other than "it's hard to find the time to get invested in
a patch that other people are already working hard and apparently
diligently on, especially if you're not personally interested in
seeing that patch get committed, but sometimes even if you are".  For
example, I really want the work Fabien and Andres are doing on the
checkpointer to get committed this release.  I am reading the emails,
but I haven't tried the patches and I probably won't.  I don't have
time to be that involved in every patch.  I'm trusting that whatever
Andres commits - which will probably be a whole lot more complex than
what Fabien initi

Re: [HACKERS] count_nulls(VARIADIC "any")

2016-02-08 Thread Thomas Munro
On Fri, Feb 5, 2016 at 5:06 PM, Tom Lane  wrote:
> I wrote:
>> Pavel Stehule  writes:
>>> [ num_nulls_v6.patch ]
>
>> I started looking through this.  It seems generally okay, but I'm not
>> very pleased with the function name "num_notnulls".  I think it would
>> be better as "num_nonnulls", as I see Oleksandr suggested already.
>
> Not hearing any complaints, I pushed it with that change and some other
> cosmetic adjustments.

Would num_values be a better name than num_nonnulls?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Alvaro Herrera
Robert Haas wrote:

> Oh: another thing that I would like to do is commit the isolation
> tests I wrote for the deadlock detector a while back, which nobody has
> reviewed either, though Tom and Alvaro seemed reasonably positive
> about the concept.  Right now, the deadlock.c part of this patch isn't
> tested at all by any of our regression test suites, because NOTHING in
> deadlock.c is tested by any of our regression test suites.  You can
> blow it up with dynamite and the regression tests are perfectly happy,
> and that's pretty scary.

FWIW a couple of months back I thought you had already pushed that one
and was surprised to find that you hadn't.  +1 from me on pushing it.
(I don't mean specifically the deadlock tests, but rather the
isolationtester changes that allowed you to have multiple blocked
backends.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-02-08 Thread Alvaro Herrera
Fabien COELHO wrote:
> 
> Hello,
> 
> >>v23 attached, which does not change the message but does the other fixes.
> >
> >This doesn't apply anymore
> 
> Indeed, but the latest version was really v25.
> 
> >-- please rebase and submit to the next CF.
> 
> I already provided it as v25 on Feb 1st.
> 
> >I closed it here as returned with feedback.
> 
> I turned it to "moved to next CF" as the patch is already on the queue.

Ah, I didn't see v25 anywhere.  What you did should be fine, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-08 Thread Andres Freund
On 2016-02-08 19:52:30 +0100, Fabien COELHO wrote:
> I think I would appreciate comments to understand why/how the ringbuffer is
> used, and more comments in general, so it is fine if you improve this part.

I'd suggest to leave out the ringbuffer/new bgwriter parts. I think
they'd be committed separately, and probably not in 9.6.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2016-02-08 Thread Alvaro Herrera
I'm closing this as returned-with-feedback; AFAICS even the last version
submitted is still in research stage.  Please resubmit once you make
further progress.

Thanks,

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recently added typedef "string" is a horrid idea

2016-02-08 Thread Peter Geoghegan
On Sun, Feb 7, 2016 at 7:47 AM, Tom Lane  wrote:
> Works for me.

Attached patch is what I came up with. It required only minimal
additional changes for consistency.

-- 
Peter Geoghegan
From 01d8cb278cecb995ecc30cda0125d10c98f4d05c Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sun, 7 Feb 2016 19:18:14 -0800
Subject: [PATCH] Rename struct string to VarString

The previous struct name was likely to cause pgindent to do the wrong
thing, since "string" is a fairly common variable name in general.
---
 src/backend/utils/adt/varlena.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 5e7536a..ed0a20a 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -40,7 +40,7 @@
 int			bytea_output = BYTEA_OUTPUT_HEX;
 
 typedef struct varlena unknown;
-typedef struct varlena string;
+typedef struct varlena VarString;
 
 typedef struct
 {
@@ -75,7 +75,7 @@ typedef struct
 #ifdef HAVE_LOCALE_T
 	pg_locale_t locale;
 #endif
-} StringSortSupport;
+} VarStringSortSupport;
 
 /*
  * This should be large enough that most strings will fit, but small enough
@@ -89,8 +89,8 @@ typedef struct
 #define PG_GETARG_UNKNOWN_P_COPY(n) DatumGetUnknownPCopy(PG_GETARG_DATUM(n))
 #define PG_RETURN_UNKNOWN_P(x)		PG_RETURN_POINTER(x)
 
-#define DatumGetStringP(X)			((string *) PG_DETOAST_DATUM(X))
-#define DatumGetStringPP(X)			((string *) PG_DETOAST_DATUM_PACKED(X))
+#define DatumGetVarStringP(X)		((VarString *) PG_DETOAST_DATUM(X))
+#define DatumGetVarStringPP(X)		((VarString *) PG_DETOAST_DATUM_PACKED(X))
 
 static int	varstrfastcmp_c(Datum x, Datum y, SortSupport ssup);
 static int bpcharfastcmp_c(Datum x, Datum y, SortSupport ssup);
@@ -1766,7 +1766,7 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
 {
 	bool		abbreviate = ssup->abbreviate;
 	bool		collate_c = false;
-	StringSortSupport *sss;
+	VarStringSortSupport *sss;
 
 #ifdef HAVE_LOCALE_T
 	pg_locale_t locale = 0;
@@ -1853,7 +1853,7 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
 	 */
 	if (abbreviate || !collate_c)
 	{
-		sss = palloc(sizeof(StringSortSupport));
+		sss = palloc(sizeof(VarStringSortSupport));
 		sss->buf1 = palloc(TEXTBUFLEN);
 		sss->buflen1 = TEXTBUFLEN;
 		sss->buf2 = palloc(TEXTBUFLEN);
@@ -1909,8 +1909,8 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
 static int
 varstrfastcmp_c(Datum x, Datum y, SortSupport ssup)
 {
-	string	   *arg1 = DatumGetStringPP(x);
-	string	   *arg2 = DatumGetStringPP(y);
+	VarString  *arg1 = DatumGetVarStringPP(x);
+	VarString  *arg2 = DatumGetVarStringPP(y);
 	char	   *a1p,
 			   *a2p;
 	int			len1,
@@ -1979,10 +1979,10 @@ bpcharfastcmp_c(Datum x, Datum y, SortSupport ssup)
 static int
 varstrfastcmp_locale(Datum x, Datum y, SortSupport ssup)
 {
-	string	   *arg1 = DatumGetStringPP(x);
-	string	   *arg2 = DatumGetStringPP(y);
-	bool		arg1_match;
-	StringSortSupport *sss = (StringSortSupport *) ssup->ssup_extra;
+	VarString	   *arg1 = DatumGetVarStringPP(x);
+	VarString	   *arg2 = DatumGetVarStringPP(y);
+	bool			arg1_match;
+	VarStringSortSupport *sss = (VarStringSortSupport *) ssup->ssup_extra;
 
 	/* working state */
 	char	   *a1p,
@@ -2134,9 +2134,9 @@ varstrcmp_abbrev(Datum x, Datum y, SortSupport ssup)
 static Datum
 varstr_abbrev_convert(Datum original, SortSupport ssup)
 {
-	StringSortSupport *sss = (StringSortSupport *) ssup->ssup_extra;
-	string	   *authoritative = DatumGetStringPP(original);
-	char	   *authoritative_data = VARDATA_ANY(authoritative);
+	VarStringSortSupport *sss = (VarStringSortSupport *) ssup->ssup_extra;
+	VarString	   *authoritative = DatumGetVarStringPP(original);
+	char		   *authoritative_data = VARDATA_ANY(authoritative);
 
 	/* working state */
 	Datum		res;
@@ -2311,7 +2311,7 @@ done:
 static bool
 varstr_abbrev_abort(int memtupcount, SortSupport ssup)
 {
-	StringSortSupport *sss = (StringSortSupport *) ssup->ssup_extra;
+	VarStringSortSupport *sss = (VarStringSortSupport *) ssup->ssup_extra;
 	double		abbrev_distinct,
 key_distinct;
 
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 2:36 PM, Joshua D. Drake  wrote:
> I have no problem running any test cases you wish on a branch in a loop for
> the next week and reporting back any errors.
>
> Where this gets tricky is the tooling itself. For me to be able to do so
> (and others really) I need to be able to do this:
>
> * Download (preferably a tarball but I can do a git pull)
> * Exact instructions on how to set up the tests
> * Exact instructions on how to run the tests
> * Exact instructions on how to report the tests
>
> If anyone takes the time to do that, I will take the time and resources to
> run them.

Well, what I've done is push into the buildfarm code that will allow
us to do *the most exhaustive* testing that I know how to do in an
automated fashion. Which is to create a file that says this:

force_parallel_mode=regress
max_parallel_degree=2

And then run this: make check-world TEMP_CONFIG=/path/to/aforementioned/file

Now, that is not going to find bugs in the deadlock.c portion of the
group locking patch, but it's been wildly successful in finding bugs
in other parts of the parallelism code, and there might well be a few
more that we haven't found yet, which is why I'm hoping that we'll get
this procedure running regularly either on all buildfarm machines, or
on some subset of them, or on new animals that just do this.

Testing the deadlock.c changes is harder.  I don't know of a good way
to do it in an automated fashion, which is why I also posted the test
code Amit devised which allows construction of manual test cases.
Constructing a manual test case is *hard* but doable.  I think it
would be good to automate this and if somebody's got a good idea about
how to fuzz test it I think that would be *great*.  But that's not
easy to do.  We haven't had any testing at all of the deadlock
detector up until now, but somehow the deadlock detector itself has
been in the tree for  a very long time...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Peter Geoghegan
On Mon, Feb 8, 2016 at 10:45 AM, Robert Haas  wrote:
> And, by the way, the patch, aside from the deadlock.c portion, was
> posted back in October, admittedly without much fanfare, but nobody
> reviewed that or any other patch on this thread.  If I'd waited for
> those reviews to come in, parallel query would not be committed now,
> nor probably in 9.6, nor probably in 9.7 or 9.8 either.  The whole
> project would just be impossible on its face.  It would be impossible
> in the first instance if I did not have a commit bit, because there is
> just not enough committer bandwidth - even reviewer bandwidth more
> generally - to review the number of patches that I've submitted
> related to parallelism, so in the end some, perhaps many, of those are
> going to be committed mostly on the strength of my personal opinion
> that committing them is better than not committing them.  I am going
> to have a heck of a lot of egg on my face if it turns out that I've
> been too aggressive in pushing this stuff into the tree.  But,
> basically, the alternative is that we don't get the feature, and I
> think the feature is important enough to justify taking some risk.

FWIW, I appreciate your candor. However, I think that you could have
done a better job of making things easier for reviewers, even if that
might not have made an enormous difference. I suspect I would have not
been able to get UPSERT done as a non-committer if it wasn't for the
epic wiki page, that made it at least possible for someone to jump in.

To be more specific, I thought it was really hard to test parallel
sequential scan a few months ago, because there was so many threads
and so many dependencies. I appreciate that we now use git
format-patch patch series for complicated stuff these days, but it's
important to make it clear how everything fits together. That's
actually what I was thinking about when I said we need to be clear on
how things fit together from the CF app patch page, because there
doesn't seem to be a culture of being particular about that, having
good "annotations", etc.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-02-08 Thread Fabien COELHO


Hello,


v23 attached, which does not change the message but does the other fixes.


This doesn't apply anymore


Indeed, but the latest version was really v25.


-- please rebase and submit to the next CF.


I already provided it as v25 on Feb 1st.


I closed it here as returned with feedback.


I turned it to "moved to next CF" as the patch is already on the queue.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi-tenancy with RLS

2016-02-08 Thread Alvaro Herrera
I've closed this as returned-with-feedback.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Joshua D. Drake

On 02/08/2016 11:24 AM, Robert Haas wrote:

On Mon, Feb 8, 2016 at 2:00 PM, Joshua D. Drake  wrote:

If I am off base, please feel free to yell Latin at me again but isn't this
exactly what different trees are for in Git? Would it be possible to say:

Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism
fixes do"?

I can't review this patch but I can run a test suite on a number of
platforms and see if it behaves as expected.


Sure, I'd love to have the ability to push a branch into the buildfarm
and have the tests get run on all the buildfarm machines and let that
bake for a while before putting it into the main tree.  The problem
here is that the complicated part of this patch is something that's
only going to be tested in very rare cases.  The simple part of the



I have no problem running any test cases you wish on a branch in a loop 
for the next week and reporting back any errors.


Where this gets tricky is the tooling itself. For me to be able to do so 
(and others really) I need to be able to do this:


* Download (preferably a tarball but I can do a git pull)
* Exact instructions on how to set up the tests
* Exact instructions on how to run the tests
* Exact instructions on how to report the tests

If anyone takes the time to do that, I will take the time and resources 
to run them.


What I can't do, is fiddle around trying to figure out how to set this 
stuff up. I don't have the time and it isn't productive for me. I don't 
think I am the only one in this boat.


Let's be honest, a lot of people won't even bother to play with this 
even though it is easily one of the best features we have coming for 9.6 
until we release 9.6.0. That is a bad time to be testing.


The easier we make it for people like me, practitioners to test, the 
better it is for the whole project.


Sincerely,

JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove wal_level archive

2016-02-08 Thread Alvaro Herrera
Peter Eisentraut wrote:
> On 1/26/16 10:56 AM, Simon Riggs wrote:
> > Removing one of "archive" or "hot standby" will just cause confusion and
> > breakage, so neither is a good choice for removal.
> > 
> > What we should do is 
> > 1. Map "archive" and "hot_standby" to one level with a new name that
> > indicates that it can be used for both/either backup or replication.
> >   (My suggested name for the new level is "replica"...)
> > 2. Deprecate "archive" and "hot_standby" so that those will be removed
> > in a later release.
> 
> Updated patch to reflect these suggestions.

I wonder if the "keep one / keep both" argument is running in circles as
new reviewers arrive at the thread.  Perhaps somebody could read the
whole thread(s) and figure out a way to find consensus so that we move
forward on this.

I've closed it as returned-with-feedback for now.  Please resubmit to
next CF.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Tom Lane
Robert Haas  writes:
> Oh: another thing that I would like to do is commit the isolation
> tests I wrote for the deadlock detector a while back, which nobody has
> reviewed either, though Tom and Alvaro seemed reasonably positive
> about the concept.

Possibly the reason that wasn't reviewed is that it's not in the
commitfest list (or at least if it is, I sure don't see it).

Having said that, I don't have much of a problem with you pushing it
anyway, unless it will add 15 minutes to make check-world or some such.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-02-08 Thread Alvaro Herrera
I've closed this as returned-with-feedback.  Please resubmit once you
have found answers to the questions posed; from the submitted benchmark
numbers this looks very exciting, but it needs a bit more work.  You
don't necessarily have to agree with everything Robert says, but you
need to have well reasoned explanations for the points where you
disagree.

Thanks,

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 2:00 PM, Joshua D. Drake  wrote:
> If I am off base, please feel free to yell Latin at me again but isn't this
> exactly what different trees are for in Git? Would it be possible to say:
>
> Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism
> fixes do"?
>
> I can't review this patch but I can run a test suite on a number of
> platforms and see if it behaves as expected.

Sure, I'd love to have the ability to push a branch into the buildfarm
and have the tests get run on all the buildfarm machines and let that
bake for a while before putting it into the main tree.  The problem
here is that the complicated part of this patch is something that's
only going to be tested in very rare cases.  The simple part of the
patch, which handles the simple-deadlock case, is easy to hit,
although apparently zero people other than Amit and I have found it in
the few months since parallel sequential scan was committed, which
makes me thing people haven't tried very hard to break any part of
parallel query, which is a shame.  The really hairy part is in
deadlock.c, and it's actually very hard to hit that case.  It won't be
hit in real life except in pretty rare circumstances.  So testing is
good, but you not only need to know what you are testing but probably
have an automated tool that can run the test a gazillion times in a
loop, or be really clever and find a test case that Amit and I didn't
foresee.  And the reality is that getting anybody independent of the
parallel query effort to take an interested in deep testing has not
gone anywhere at all up until now.  I'd be happy for that change,
whether because of this commit or for any other reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-02-08 Thread Fabien COELHO


Hello Michaël,


+   /* compute total_weight */
+   for (i = 0; i < num_scripts; i++)
+   {
+   total_weight += sql_script[i].weight;
+
+   /* detect overflow... */
If let as int64, you may want to remove this overflow check, or keep
it as int32.


I'd rather keep int64, and remove the check.


[JSON/YAML]
If you want something else, it would help to provide a sample of what you
expect.


You could do that with an additional option here as well:
--output-format=normal|yamljson. The tastes of each user is different.


I think that json/yaml-ifying pgbench output is beyond the object of this
patch, so should be kept out?


   const char *name;
+   int weight;
   Command   **commands;
-   StatsData stats;
+   StatsData   stats;
Noise here?


Indeed.


Find attached a 18-d which addresses these concerns, and a actualized 18-e
for the prefix.


(I could live without that personally)


Hmmm, I type them and I'm not so good with a keyboard, so "se" is better 
than:


"selct-onlyect-only".


-/* return builtin script "name", or fail if not found */
+/* return commands for selected builtin script, if unambiguous */
static script_t *
findBuiltin(const char *name)
This comment needs a refresh. This does not return a set of commands,
but the script itself.


Indeed.

Attached 19-d and 19-e.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ade1b53..780a520 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -262,11 +262,13 @@ pgbench  options  dbname
 
 
  
-  -b scriptname
-  --builtin scriptname
+  -b scriptname[@weight]
+  --builtin=scriptname[@weight]
   

 Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the script.  If not specified, it is set to 1.
 Available builtin scripts are: tpcb-like,
 simple-update and select-only.
 With special name list, show the list of builtin scripts
@@ -321,12 +323,14 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

 Add a transaction script read from filename to
 the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.

   
@@ -686,9 +690,13 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   Pgbench executes test scripts chosen randomly from a specified list.
+   pgbench executes test scripts chosen randomly
+   from a specified list.
They include built-in scripts with -b and
user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
  
 
   
@@ -1135,12 +1143,11 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+latency average = 15.844 ms
+latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-SQL script 1: 
- - 1 transactions (100.0% of total, tps = 618.764555)
- - latency average = 15.844 ms
- - latency stddev = 2.715 ms
+script statistics:
  - statement latencies in milliseconds:
 0.004386\set nbranches 1 * :scale
 0.001343\set ntellers 10 * :scale
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 7eb6a2d..a73e289 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -38,6 +38,7 @@
 #include "portability/instr_time.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -179,6 +180,8 @@ char	   *login = NULL;
 char	   *dbName;
 const char *progname;
 
+#define WSEP '@'/* weight separator */
+
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /* variable definitions */
@@ -299,23 +302,26 @@ typedef struct
 static struct
 {
 	const char *name;
+	int			weight;
 	Command   **commands;
 	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
+static int64 total_weight = 0;
+
 static int	debug = 0;			/* debug flag */
 
 /* Define builtin test scripts */
-#define N_BUILTIN 3
-static struct
+typedef struct script_t
 {
 	char	   *name;			/* very short name for -b ... */
 	char	   *desc;			/* short description */
 	char	   *commands;		/* actual pgbench script */
-}
+} script_t;
 
-			builtin_script[] =
+#define N_BUILTIN 3
+static script_t builtin_script[] =
 {
 	{
 		"tpcb-like",
@@ 

[HACKERS] Tracing down buildfarm "postmaster does not shut down" failures

2016-02-08 Thread Tom Lane
Of late, by far the majority of the random-noise failures we see in the
buildfarm have come from failure to shut down the postmaster in a
reasonable timeframe.  An example is this current failure on hornet:

http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2016-02-08%2013%3A41%3A55

waiting for server to shut 
down...
 failed
pg_ctl: server does not shut down
=== db log file ==
2016-02-08 15:14:35.783 UTC [56b8b00d.9e00e2:2] LOG:  received fast shutdown 
request
2016-02-08 15:14:35.783 UTC [56b8b00d.9e00e2:3] LOG:  aborting any active 
transactions
2016-02-08 15:14:35.783 UTC [56b8b00d.7e0028:2] LOG:  autovacuum launcher 
shutting down
2016-02-08 15:14:35.865 UTC [56b8b00d.2e5000a:1] LOG:  shutting down

The buildfarm script runs pg_ctl stop with -t 120, and counting the dots
shows that pg_ctl was honoring that, so apparently the postmaster took
more than 2 minutes to shut down.

Looking at other recent successful runs, stopdb-C-1 usually takes 30 to
40 or so seconds on this machine.  So it's possible that it was just so
overloaded that it took three times that much time on this run, but I'm
starting to think there may be more to it than that.  We've seen variants
on this theme on half a dozen machines just in the past week --- and it
seems to mostly happen in 9.5 and HEAD, which is fishy.

The fact that we got "shutting down" but not "database system is shut
down" indicates that the server was in ShutdownXLOG().  Normally the
only component of that that takes much time is buffer flushing, but
could something else be happening?  Or perhaps the checkpoint code
has somehow not gotten the word to do its flushing at full speed?

What I'd like to do to investigate this is put in a temporary HEAD-only
patch that makes ShutdownXLOG() and its subroutines much chattier about
how far they've gotten and what time it is, and also makes pg_ctl print
out the current time if it gives up waiting.  A few failed runs with
that in place will at least allow us to confirm or deny whether it's
just that the shutdown checkpoint is sometimes really slow, or whether
there's a bug lurking.

Any objections?  Anybody have another idea for data to collect?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-08 Thread Alvaro Herrera
Since things are clearly still moving here, I closed it as
returned-with-feedback.  Please submit to the next CF so that we don't
lose it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-02-08 Thread Alvaro Herrera
I closed this one as "committed", since we pushed a bunch of parts.
Please submit the two remaining ones to the next commitfest.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-02-08 Thread Alvaro Herrera
Fabien COELHO wrote:
> 
> Hello Michaël,
> 
> v23 attached, which does not change the message but does the other fixes.

This doesn't apply anymore -- please rebase and submit to the next CF.
I closed it here as returned with feedback.

Thanks!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-08 Thread Alvaro Herrera
I don't think we need to preserve absolutely all the existing behavior
to the letter.  We do need to preserve the behavior for sensible cases
that people might be reasonably be using currently; and if there's
anything somewhat obscure but really useful that you currently get by
using some clever trick, them we should provide some reasonable way to
get that functionality with the new stuff too.  But this doesn't mean
all existing code must continue to behave in exactly the same way.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Joshua D. Drake

On 02/08/2016 10:45 AM, Robert Haas wrote:

On Mon, Feb 8, 2016 at 10:17 AM, Andres Freund  wrote:

On 2016-02-02 15:41:45 -0500, Robert Haas wrote:



I realize that this stuff has all been brewing long, and that there's
still a lot to do. So you gotta keep moving. And I'm not sure that
there's anything wrong or if there's any actually better approach. But
pushing an unreviewed, complex patch, that originated in a thread
orginally about different relatively small/mundane items, for a
contentious issue, a few days after the initial post. Hm. Not sure how
you'd react if you weren't the author.


Probably not very well.  Do you want me to revert it?


If I am off base, please feel free to yell Latin at me again but isn't 
this exactly what different trees are for in Git? Would it be possible 
to say:


Robert says, "Hey pull XYZ, run ABC tests. They are what the parallelism 
fixes do"?


I can't review this patch but I can run a test suite on a number of 
platforms and see if it behaves as expected.




albatross around my neck - totally ignored by everyone - until the end
of the last CF, and then the discussion would have gone one of three
ways:

1. Boy, this patch is complicated and I don't understand it.  Let's
reject it, even though without it parallel query is trivially broken!
Uh, we'll just let parallel query be broken.
2. Like #1, but we rip out parallel query in its entirety on the eve of beta.
3. Oh well, Robert says we need this, I guess we'd better let him commit it.


4. We need to push the release so we can test this.



I don't find any of those options to be better than the status quo.
If the patch is broken, another two months of having in the tree give
us a better chance of finding the bugs, especially because, combined


I think this further points to the need for more reviewers and less 
feature pushes. There are fundamental features that we could use, this 
is one of them. It is certainly more important than say pgLogical or BDR 
(not that those aren't useful but that we do have external solutions for 
that problem).




Oh: another thing that I would like to do is commit the isolation
tests I wrote for the deadlock detector a while back, which nobody has
reviewed either, though Tom and Alvaro seemed reasonably positive
about the concept.  Right now, the deadlock.c part of this patch isn't
tested at all by any of our regression test suites, because NOTHING in
deadlock.c is tested by any of our regression test suites.  You can
blow it up with dynamite and the regression tests are perfectly happy,
and that's pretty scary.


Test test test. Please commit.

Sincerely,

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-08 Thread Fabien COELHO


Hello Andres,


Any comments before I spend more time polishing this?


I'm running tests on various settings, I'll send a report when it is done.
Up to now the performance seems as good as with the previous version.

I'm currently updating docs and comments to actually describe the 
current state...


I did notice the mismatched documentation.

I think I would appreciate comments to understand why/how the ringbuffer 
is used, and more comments in general, so it is fine if you improve this 
part.


Minor details:

"typedefs.list" should be updated to WritebackContext.

"WritebackContext" is a typedef, "struct" is not needed.


I'll look at the code more deeply probably over next weekend.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Robert Haas
On Mon, Feb 8, 2016 at 10:17 AM, Andres Freund  wrote:
> On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
>> group-locking-v1.patch is a vastly improved version of the group
>> locking patch that we discussed, uh, extensively last year.  I realize
>> that there was a lot of doubt about this approach, but I still believe
>> it's the right approach, I have put a lot of work into making it work
>> correctly, I don't think anyone has come up with a really plausible
>> alternative approach (except one other approach I tried which turned
>> out to work but with significantly more restrictions), and I'm
>> committed to fixing it in whatever way is necessary if it turns out to
>> be broken, even if that amounts to a full rewrite.  Review is welcome,
>> but I honestly believe it's a good idea to get this into the tree
>> sooner rather than later at this point, because automated regression
>> testing falls to pieces without these changes, and I believe that
>> automated regression testing is a really good idea to shake out
>> whatever bugs we may have in the parallel query stuff.  The code in
>> this patch is all mine, but Amit Kapila deserves credit as co-author
>> for doing a lot of prototyping (that ended up getting tossed) and
>> testing.  This patch includes comments and an addition to
>> src/backend/storage/lmgr/README which explain in more detail what this
>> patch does, how it does it, and why that's OK.
>
> I see you pushed group locking support. I do wonder if somebody has
> actually reviewed this? On a quick scrollthrough it seems fairly
> invasive, touching some parts where bugs are really hard to find.
>
> I realize that this stuff has all been brewing long, and that there's
> still a lot to do. So you gotta keep moving. And I'm not sure that
> there's anything wrong or if there's any actually better approach. But
> pushing an unreviewed, complex patch, that originated in a thread
> orginally about different relatively small/mundane items, for a
> contentious issue, a few days after the initial post. Hm. Not sure how
> you'd react if you weren't the author.

Probably not very well.  Do you want me to revert it?

I mean, look.  Without that patch, parallel query is definitely
broken.  Just revert the patch and try running the regression tests
with force_parallel_mode=regress and max_parallel_degree>0.  It hangs
all over the place.  With the patch, every regression test suite we
have runs cleanly with those settings.  Without the patch, it's
trivial to construct a test case where parallel query experiences an
undetected deadlock.  With the patch, it appears to work reliably.
Could there bugs someplace?  Yes, there absolutely could.  Do I really
think anybody was going to spend the time to understand deadlock.c
well enough to verify my changes?  No, I don't.  What I think would
have happened is that the patch would have sat around like an
albatross around my neck - totally ignored by everyone - until the end
of the last CF, and then the discussion would have gone one of three
ways:

1. Boy, this patch is complicated and I don't understand it.  Let's
reject it, even though without it parallel query is trivially broken!
Uh, we'll just let parallel query be broken.
2. Like #1, but we rip out parallel query in its entirety on the eve of beta.
3. Oh well, Robert says we need this, I guess we'd better let him commit it.

I don't find any of those options to be better than the status quo.
If the patch is broken, another two months of having in the tree give
us a better chance of finding the bugs, especially because, combined
with the other patch which I also pushed, it enables *actual automated
regression testing* of the parallelism code, which I personally think
is a really good thing - and I'd like to see the buildfarm doing that
as soon as possible, so that we can find some of those bugs before
we're deep in beta.  Not just bugs in group locking but all sorts of
parallelism bugs that might be revealed by end-to-end testing.  The
*entire stack of patches* that began this thread was a response to
problems that were found by the automated testing that you can't do
without this patch.  Those bug fixes resulted in a huge increase in
the robustness of parallel query, and that would not have happened
without this code.  Every single one of those problems, some of them
in commits dating back years, was detected by the same method: run the
regression tests with parallel mode and parallel workers used for
every query for which that seems to be safe.

And, by the way, the patch, aside from the deadlock.c portion, was
posted back in October, admittedly without much fanfare, but nobody
reviewed that or any other patch on this thread.  If I'd waited for
those reviews to come in, parallel query would not be committed now,
nor probably in 9.6, nor probably in 9.7 or 9.8 either.  The whole
project would just be impossible on its face.  It would be impossible
in the first instance if I did not have a commit bit, because there i

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Daniel Verite
Teodor Sigaev wrote:

> Interesting feature, but it's not very obvious how to use it. I'd like to
> see  some example(s) in documentation.

I'm thinking of making a wiki page, because examples pretty much
require showing resultsets, and I'm not sure this would fly
with our current psql documentation, which is quite compact.

The current bit of doc I've produced is 53 lines long in manpage
format already. The text has not been proofread by a native
English speaker yet, so part of the problem might be that
it's just me struggling with english :)

> And I see an implementation of AVL tree in psql source code
> (src/bin/psql/crosstabview.c). Postgres already has a Red-Black tree
> implementation in src/include/lib/rbtree.h and
> src/backend/lib/rbtree.c. Is any advantage of using AVL tree here? I
> have some doubt about that and this implementation, obviously, will
> not be well tested. But I see in comments that implementation is
> reduced to insert only and it doesn't use the fact of ordering tree,
> so, even hash could be used.

Yes. I expect too that a RB tree or a hash-based algorithm would do
the job and perform well.

The AVL implementation in crosstabview is purposely simplified
and specialized for this job, resulting in ~185 lines of code
versus ~850 lines for rb-tree.c
But I understand the argument that the existing rb-tree has been
battle-tested, whereas this code hasn't.

I'm looking at rb-tree.c and thinking what it would take to
incorporate it:
1. duplicating or linking from backend/lib/rb-tree.c into psql/
2. replacing the elog() calls with something else in the case of psql
3. updating the crosstabview data structures and call sites.

While I'm OK with #3, #1 and #2 seem wrong.
I could adapt rb-tree.c so that the same code can be used
both client-side and server-side, but touching server-side
code for this feature and creating links in the source tree
feels invasive and overkill.

Another approach is to replace AVL with an hash-based algorithm,
but that raises the same sort of question. If crosstabview comes
with its specific implementation, why use that rather than existing
server-side code? But at a glance, utils/hash/dynahash.c seems quite
hard to convert for client-side use.

I'm open to ideas on this. In particular, if we have a hash table
implementation that is already blessed by the project and small enough
to make sense in psql, I'd be happy to consider it.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-02-08 Thread Joe Conway
On 02/08/2016 06:24 AM, Andres Freund wrote:
> On 2016-01-29 22:19:45 -0800, Evan Rempel wrote:
>> Now that there is a setting to give a cluster a "name", it would be nice to
>> have an escape sequence in the log_line_prefix setting that could reference
>> the cluster_name.
> 
> I've argued[1][2] for this when cluster_name was introduced, but back
> then I seemed to have been the only one arguing for it. Josh later
> jumped that train.
> 
> Given that we now had a number of people wishing for this, can we maybe
> reconsider?

Seems like a reasonable idea to me. But if we add a log_line_prefix
setting, shouldn't we also add it to csvlog output too?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-02-08 Thread Daniel Verite
Shulgin, Oleksandr wrote:

> Added to the Open commitfest: https://commitfest.postgresql.org/9/475/

Here's a review. Note that the patch tested and submitted
is not the initial one in the thread, so it doesn't exactly
match  $subject now.
What's tested here is a client-side approach, suggested by Tom
upthread as an alternative, and implemented by Oleksandr in 
0001-POC-errverbose-in-psql.patch

The patch has two parts: one part in libpq exposing a new function
named PQresultBuildErrorMessage, and a second part implementing an
\errverbose command in psql, essentially displaying the result of the
function.
The separation makes sense if we consider that other clients might benefit
from the function, or that libpq is a better place than psql to maintain
this in the future, as the list of error fields available in a PGresult
might grow.
OTOH maybe adding a new libpq function just for that is overkill, but this
is subjective.

psql part
=
Compiles and works as intended.
For instance with \set VERBOSITY default, an FK violation produces:

  # insert into table2 values(10);
  ERROR:  insert or update on table "table2" violates foreign key constraint
"table2_id_tbl1_fkey"
  DETAIL:  Key (id_tbl1)=(10) is not present in table "table1".

Then \errverbose just displays the verbose form of the error:
  # \errverbose
ERROR:  23503: insert or update on table "table2" violates foreign
  key constraint "table2_id_tbl1_fkey"
DETAIL:  Key (id_tbl1)=(10) is not present in table "table1".
SCHEMA NAME:  public
TABLE NAME:  table2
CONSTRAINT NAME:  table2_id_tbl1_fkey
LOCATION:  ri_ReportViolation, ri_triggers.c:3326

- make check passes
- valgrind test is OK (no obvious leak after using the command).

Missing bits:
- the command is not mentioned in the help (\? command, see help.c)
- it should be added to tab completions (see tab-complete.c)
- it needs to be described in the SGML documentation

libpq part
==
The patch implements and exports a new PQresultBuildErrorMessage()
function.

AFAICS its purpose is to produce a result similar to what
PQresultErrorMessage() would have produced, if PQERRORS_VERBOSE
was the verbosity in effect at execution time.

My comments:

- the name of the function does not really hint at what it does.
Maybe something like PQresultVerboseErrorMessage() would be more
explicit?

I would expect the new function to have the same interface than
PQresultErrorMessage(), but it's not the case.

- it takes a PGconn* and PGresult* as input parameters, but
PQresultErrorMessage takes only a  as input.
It's not clear why PGconn* is necessary.

- it returns a pointer to a strdup'ed() buffer, which
has to be freed separately by the caller. That differs from
PQresultErrorMessage() which keeps this managed inside the
PGresult struct.

- if protocol version < 3, an error message is returned. It's not
clear to the caller that this error is emitted by the function itself
rather than the query. Besides, are we sure it's necessary?
Maybe the function could just produce an output with whatever
error fields it has, even minimally with older protocol versions,
rather than failing?

- if the fixed error message is kept, it should pass through
libpq_gettext() for translation.

- a description of the function should be added to the SGML doc.
There's a sentence in PQsetErrorVerbosity() that says, currently:

  "Changing the verbosity does not affect the messages available from
   already-existing PGresult objects, only subsequently-created ones."

As it's precisely the point of that new function, that bit could
be altered to refer to it.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Pavel Stehule
Hi


>> I propose really basic functionality, that can be enhanced in future -
>> step by step. This proposal doesn't contain any controversial feature or
>> syntax, I hope. It is related to PLpgSQL only, but described feature can be
>> used from any PL languages with implemented interface.
>>
>
>
> I think it would make sense to implement the interface in at least one of
> our other supported PLs. I'm not entirely clear how well this will match up
> with, say, plperl, but I'd be interested to see.
>

The minimalistic interface can be based on get/set functions. We can do
necessary transformations there.

Regards

Pavel


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-02-08 Thread Tom Lane
Etsuro Fujita  writes:
> Here is a patch to use %u not %d to print umid and userid.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: function parse_ident

2016-02-08 Thread Pavel Stehule
2016-02-08 16:55 GMT+01:00 Teodor Sigaev :

> rebased, messages changes per Tom's proposal
>>
> Cool feature and sometimes I needed it a lot.
>
> But, seems, there are some bugs in error processing.
>

I am looking on it

Regards

Pavel


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Pavel Stehule
2016-02-08 16:45 GMT+01:00 jflack :

> On 02/08/2016 03:16 AM, Pavel Stehule wrote:
>
> > Only a owner of schema can edit functions inside schema
>
> Can't anyone granted CREATE on the schema do that? Would
> that be changed by this proposal?
>

yes, anybody with necessary rights can do it.

regards

Pavel


>
> -Chap
>
>


Re: [HACKERS] proposal: function parse_ident

2016-02-08 Thread Teodor Sigaev

rebased, messages changes per Tom's proposal

Cool feature and sometimes I needed it a lot.

But, seems, there are some bugs in error processing.

1
Following query is okay:
# select * from parse_ident(E'"Some \r Schema".someTable');
 parse_ident
--
 {"Some \r Schema",sometable}
but:
% select * from parse_ident(E'"Some \r Schema".9someTable');
 Schema".9someTable"tifier after "." symbol: ""Some

Return carriage is not escaped in error message. Suppose, any other
special charaters will not be escaped.

2
# select * from parse_ident('.someTable');
ERROR:  missing identifier after "." symbol: ".someTable"
Why AFTER  instead of BEFORE?

2
Function succesfully truncates long indentifier but not in case of quoted 
identifier.
select length(a[1]), length(a[2]) from 
parse_ident('"xx".y') 
as a ;

 length | length
+
414 | 63








--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Chapman Flack
[resending because thunderbird helpfully defaulted my sender
address to the one that -isn't- subscribed to -hackers, sorry]


On 02/08/2016 03:16 AM, Pavel Stehule wrote:

> Only a owner of schema can edit functions inside schema

Can't anyone granted CREATE on the schema do that? Would
that be changed by this proposal?

-Chap



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-08 Thread Fujii Masao
On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
 wrote:
> On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
>  wrote:
>> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>>  wrote:
>>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas  wrote:
 On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
  wrote:
> Yes, please let's use the custom language, and let's not care of not
> more than 1 level of nesting so as it is possible to represent
> pg_stat_replication in a simple way for the user.

 "not" is used twice in this sentence in a way that renders me not able
 to be sure that I'm not understanding it not properly.
>>>
>>> 4 times here. Score beaten.
>>>
>>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>>> to only support configurations up to one level of nested objects, like
>>> that:
>>> 2[node1, node2, node3]
>>> node1, 2[node2, node3], node3
>>> In short, we could restrict things so as we cannot define a group of
>>> nodes within an existing group.
>>
>> No, actually, that's stupid. Having up to two nested levels makes more
>> sense, a quite common case for this feature being something like that:
>> 2{node1,[node2,node3]}
>> In short, sync confirmation is waited from node1 and (node2 or node3).
>>
>> Flattening groups of nodes with a new catalog will be necessary to
>> ease the view of this data to users:
>> - group name?
>> - array of members with nodes/groups
>> - group type: quorum or priority
>> - number of items to wait for in this group
>
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0. So, I would think
> that we will need to have a new catalog, say
> pg_stat_replication_groups with the following things:
> - One line of this catalog represents the status of a group or of a single 
> node.
> - The status of a node/group is either sync or potential, if a
> node/group is specified more than once, it may be possible that it
> would be sync and potential depending on where it is defined, in which
> case setting its status to 'sync' has the most sense. If it is in sync
> state I guess.
> - Move sync_priority and sync_state, actually an equivalent from
> pg_stat_replication into this new catalog, because those represent the
> status of a node or group of nodes.
> - group name, and by that I think that we had perhaps better make
> mandatory the need to append a name with a quorum or priority group.
> The group at the highest level is forcibly named as 'top', 'main', or
> whatever if not directly specified by the user. If the entry is
> directly a node, use the application_name.
> - Type of group, quorum or priority
> - Elements in this group, an element can be a group name or a node
> name, aka application_name. If group is of type priority, the elements
> are listed in increasing order. So the elements with lower priority
> get first, etc. We could have one column listing explicitly a list of
> integers that map with the elements of a group but it does not seem
> worth it, what users would like to know is what are the nodes that are
> prioritized. This covers the former 'priority' field of
> pg_stat_replication.
>
> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
>
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.
>
> Thoughts?

I agree that we would need something like such new view in the future,
however it seems too late to work on that for 9.6 unfortunately.
There is only one CommitFest left. Let's focus on very simple case, i.e.,
1-level priority list, now, then we can extend it to cover other cases.

If we can commit the simple version too early and there is enough
time before the date of feature freeze, of course I'm happy to review
the extended version like you proposed, for 9.6.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-08 Thread Teodor Sigaev

Hi!

Interesting feature, but it's not very obvious how to use it. I'd like to see 
some example(s) in documentation.


And I see an implementation of AVL tree in psql source code 
(src/bin/psql/crosstabview.c). Postgres already has a Red-Black tree 
implementation in
src/include/lib/rbtree.h and src/backend/lib/rbtree.c. Is any advantage of using 
AVL tree here? I have some doubt about that and this implementation, obviously, 
will not be well tested. But I see in comments that implementation is reduced to 
insert only and it doesn't use the fact of ordering tree, so, even hash could be 
used.


Daniel Verite wrote:

Pavel Stehule wrote:


1. maybe we can decrease name to shorter "crossview" ?? I am happy with
crosstabview too, just crossview is correct too, and shorter


I'm in favor of keeping crosstabview. It's more explicit, only
3 characters longer and we have tab completion anyway.


2. Columns used for ordering should not be displayed by default. I can live
with current behave, but hiding ordering columns is much more practical for
me


I can see why, but I'm concerned about a consequence:
say we have 4 columns A,B,C,D and user does \crosstabview +A:B +C:D
If B and D are excluded by default, then there's nothing
left to display inside the grid.
It doesn't feel quite right. There's something counter-intuitive
in the fact that values in the grid would disappear depending on
whether and how headers are sorted.
With the 3rd argument, we let the user decide what they want
to see.


3. This code is longer, so some regress tests are recommended - attached
simple test case


I've added a few regression tests to the psql testsuite
based on your sample data. New patch with these tests
included is attached, make check passes.

Best regards,






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a raft of parallelism-related bug fixes

2016-02-08 Thread Andres Freund
On 2016-02-02 15:41:45 -0500, Robert Haas wrote:
> group-locking-v1.patch is a vastly improved version of the group
> locking patch that we discussed, uh, extensively last year.  I realize
> that there was a lot of doubt about this approach, but I still believe
> it's the right approach, I have put a lot of work into making it work
> correctly, I don't think anyone has come up with a really plausible
> alternative approach (except one other approach I tried which turned
> out to work but with significantly more restrictions), and I'm
> committed to fixing it in whatever way is necessary if it turns out to
> be broken, even if that amounts to a full rewrite.  Review is welcome,
> but I honestly believe it's a good idea to get this into the tree
> sooner rather than later at this point, because automated regression
> testing falls to pieces without these changes, and I believe that
> automated regression testing is a really good idea to shake out
> whatever bugs we may have in the parallel query stuff.  The code in
> this patch is all mine, but Amit Kapila deserves credit as co-author
> for doing a lot of prototyping (that ended up getting tossed) and
> testing.  This patch includes comments and an addition to
> src/backend/storage/lmgr/README which explain in more detail what this
> patch does, how it does it, and why that's OK.

I see you pushed group locking support. I do wonder if somebody has
actually reviewed this? On a quick scrollthrough it seems fairly
invasive, touching some parts where bugs are really hard to find.

I realize that this stuff has all been brewing long, and that there's
still a lot to do. So you gotta keep moving. And I'm not sure that
there's anything wrong or if there's any actually better approach. But
pushing an unreviewed, complex patch, that originated in a thread
orginally about different relatively small/mundane items, for a
contentious issue, a few days after the initial post. Hm. Not sure how
you'd react if you weren't the author.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-02-08 Thread Robert Haas
On Sun, Feb 7, 2016 at 7:28 PM, Kouhei Kaigai  wrote:
> The new callbacks of T_ExtensibleNode will replace the necessity to
> form and deform process of private values, like as:
>   https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L114

Yeah.

> It transforms a bunch of internal data of CustomScan (similar to the
> extended fields in T_ExtensibleNode) to/from the node functions
> understandable forms for copy, input and output support.
> I think it implies you proposition is workable.
>
> I'd like to follow this proposition basically.
> On the other hands, I have two points I want to pay attention.
>
> 1. At this moment, it is allowed to define a larger structure that
> embeds CustomPath and CustomScanState by extension. How do we treat
> this coding manner in this case? Especially, CustomScanState has no
> private pointer dereference because it assumes an extension define
> a larger structure. Of course, we don't need to care about node
> operations on Path and PlanState nodes, but right now.

I see no real advantage in letting a CustomPath be larger.  If
custom_private can include extension-defined node types, that seems
good enough.  On the other hand, if CustomScanState can be larger,
that seems fine.   We don't really need any special support for that,
do we?

> 2. I intended to replace LibraryName and SymbolName fields from the
> CustomScanMethods structure by integration of extensible node type.
> We had to give a pair of these identifiers because custom scan provider
> has no registration points at this moment. A little concern is extension
> has to assume a particular filename of itself.
> But, probably, it shall be a separated discussion. My preference is
> preliminary registration of custom scan provider by its name, as well
> as extensible node.

Seems like we could just leave the CustomScan stuff alone and worry
about this as a separate facility.

> Towards the last question; whether *_private shall be void * or List *,
> I want to keep fdw_private and custom_private as List * pointer, because
> a new node type definition is a bit overdone job if this FDW or CSP will
> take only a few private fields with primitive data types.
> It is a preferable features when extension defines ten or more private
> fields.

Well, I suggested Node *, not void *.  A Node can be a List, but not
every Node is a List.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-08 Thread Andres Freund
Hi Fabien,

On 2016-02-04 16:54:58 +0100, Andres Freund wrote:
> I don't want to post a full series right now, but my working state is
> available on
> http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/checkpoint-flush
> git://git.postgresql.org/git/users/andresfreund/postgres.git checkpoint-flush
> 
> The main changes are that:
> 1) the significant performance regressions I saw are addressed by
>changing the wal writer flushing logic
> 2) The flushing API moved up a couple layers, and now deals with buffer
>tags, rather than the physical files
> 3) Writes from checkpoints, bgwriter and files are flushed, configurable
>by individual GUCs. Without that I still saw the spiked in a lot of 
> circumstances.
> 
> There's also a more experimental reimplementation of bgwriter, but I'm
> not sure it's realistic to polish that up within the constraints of 9.6.

Any comments before I spend more time polishing this? I'm currently
updating docs and comments to actually describe the current state...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Re-Writes

2016-02-08 Thread Andres Freund
On 2016-02-08 10:38:55 +0530, Amit Kapila wrote:
> I think deciding it automatically without user require to configure it,
> certainly has merits, but what about some cases where user can get
> benefits by configuring themselves like the cases where we use
> PG_O_DIRECT flag for WAL (with o_direct, it will by bypass OS
> buffers and won't cause misaligned writes even for smaller chunk sizes
> like 512 bytes or so).  Some googling [1] reveals that other databases
> also provides user with option to configure wal block/chunk size (as
> BLOCKSIZE), although they seem to decide chunk size based on
> disk-sector size.

FWIW, you usually can't do that small writes with O_DIRECT. Usually it
has to be 4KB (pagesize) sized, aligned (4kb again) writes. And on
filesystems that do support doing such writes, they essentially fall
back to doing buffered IO.

> An additional thought, which is not necessarily related to this patch is,
> if user chooses and or we decide to write in 512 bytes sized chunks,
> which is usually a disk sector size, then can't we think of avoiding
> CRC for each record for such cases, because each WAL write in
> it-self will be atomic.  While reading, if we process in wal-chunk-sized
> units, then I think it should be possible to detect end-of-wal based
> on data read.

O_DIRECT doesn't give any useful guarantees to do something like the
above. It doesn't have any ordering or durability implications. You
still need to do fdatasyncs and such.

Besides, with the new CRC implications, that doesn't really seem like
such a large win anyway.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2016-02-08 Thread David Rowley
On 7/02/2016 4:14 am, "Tom Lane"  wrote:
>
> David Rowley  writes:
> [ timestamp_out_speedup_2015-11-05.patch ]
>
> Pushed with a bunch of cosmetic tweaks.

Great. Thanks for pushing this.

David


  1   2   >