Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-19 Thread Robert Haas
On Tue, Apr 18, 2017 at 6:55 AM, Ashutosh Bapat
 wrote:
> When we merge partition bounds from two relations with different
> partition key types, the merged partition bounds need to have some
> information abound the way those constants look like e.g. their
> length, structure etc. That's the reason we need to store partition
> key types of merged partitioning scheme. Consider a three way join (i4
> JOIN i8 ON i4.x = i8.x) JOIN i2 ON (i2.x = i.x). When we compare
> partition bounds of i4 and i8, we use operators for int4 and int8. The
> join i4 JOIN i8 will get partition bounds by merging those of i4 and
> i8. When we come to join with i2, we need to know which operators to
> use for comparing the partition bounds of the join with those of i2.
>
> So, if the partition key types of the joining relations differ (but
> they have matching partitioning schemes per strategy, natts and
> operator family) the partition bounds of the join are converted to the
> wider type among the partition key types of the joining tree.
> Actually, as I am explained earlier we could choose a wider outer type
> for an OUTER join and shorter type for inner join. This type is used
> as partition key type of the join. In the above case join between i4
> and i8 have its partition bounds converted to i8 (or i4) and then when
> it is joined with i2 the partition bounds of the join are converted to
> i8 (or i2).

I don't understand why you think that partition-wise join needs any
new logic here; if this were a non-partitionwise join, we'd similarly
need to use the correct operator, but the existing code handles that
just fine.  If the join is performed partition-wise, it should use the
same operators that would have been used by a non-partitionwise join
between the same tables.

I think the choice of operator depends only on the column types, and
that the "width" of those types has nothing to do with it.  For
example, if the user writes .WHERE A.x = B.x AND B.x = C.x, the
operator for an A/B join or a B/C join will be the one that appears in
the query; parse analysis will have identified which specific operator
is meant based on the types of the columns.  If the optimizer
subsequently decides to reorder the joins and perform the A/C join
first, it will go hunt down the operator with the same strategy number
in the same operator family that takes the type of A.x on one side and
the type of C.x on the other side.  No problem.  A partition-wise join
between A and C will use that same operator; again, no problem.

Your example involves joining the output of a join between i4 and i8
against i2, so it seems there is some ambiguity about what the input
type should be.  But, again, the planner already copes with this
problem.  In fact, the join is performed either using i4.x or i8.x --
I don't know what happens, or whether it depends on other details of
the query or the plan -- and the operator which can accept that value
on one side and i2.x on the other side is the one that gets used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
I wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> Hm.  Do you have a more-portable alternative?

>> I was thinking in a WaitEventSet from latch.c.

My first reaction was that that sounded like a lot more work than removing
two lines from maybe_start_bgworker and adjusting some comments.  But on
closer inspection, the slow-bgworker-start issue isn't the only problem
here.  On a machine that restarts select()'s timeout after an interrupt,
as (at least) HPUX does, the postmaster will actually never iterate
ServerLoop's loop except immediately after receiving a new connection
request.  The stream of interrupts from the autovac launcher is alone
sufficient to prevent the initial 60-second timeout from ever elapsing.
So if there are no new connection requests for awhile, none of the
housekeeping actions in ServerLoop get done.

Most of those actions are concerned with restarting failed background
tasks, which is something we could get by without --- it's unlikely
that those tasks would fail without causing a database-wide restart,
and then there really isn't going to be much need for them until at least
one new connection request has arrived.  But the last step in that loop is
concerned with touching the sockets and lock files to prevent aggressive
/tmp cleaners from removing them, and that's something that can't be let
slide, or we might as well not have the logic at all.

I've confirmed experimentally that on my HPUX box, a postmaster not
receiving new connections for an hour or more in fact fails to update
the mod times on the sockets and lock files.  So that problem isn't
hypothetical.

So it's looking to me like we do need to do something about this, and
ideally back-patch it all the way.  But WaitEventSet doesn't exist
before 9.6.  Do we have the stomach for back-patching that into
stable branches?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Masahiko Sawada
On Thu, Apr 20, 2017 at 12:30 AM, Petr Jelinek
 wrote:
> On 19/04/17 15:57, Masahiko Sawada wrote:
>> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
>>  wrote:
>>> On 19/04/17 14:42, Masahiko Sawada wrote:
 On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
  wrote:
> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>  wrote in 
> 
>> On 18/04/17 18:14, Peter Eisentraut wrote:
>>> On 4/18/17 11:59, Petr Jelinek wrote:
 Hmm if we create hashtable for this, I'd say create hashtable for the
 whole table_states then. The reason why it's list now was that it 
 seemed
 unnecessary to have hashtable when it will be empty almost always but
 there is no need to have both hashtable + list IMHO.
>
> I understant that but I also don't like the frequent palloc/pfree
> in long-lasting context and double loop like Peter.
>
>>> The difference is that we blow away the list of states when the catalog
>>> changes, but we keep the hash table with the start times around.  We
>>> need two things with different life times.
>
> On the other hand, hash seems overdone. Addition to that, the
> hash-version leaks stale entries while subscriptions are
> modified. But vacuuming them costs high.
>
>> Why can't we just update the hashtable based on the catalog? I mean once
>> the record is not needed in the list, the table has been synced so there
>> is no need for the timestamp either since we'll not try to start the
>> worker again.

 I guess the table sync worker for the same table could need to be
 started again. For example, please image a case where the table
 belonging to the publication is removed from it and the corresponding
 subscription is refreshed, and then the table is added to it again. We
 have the record of the table with timestamp in the hash table when the
 table sync in the first time, but the table sync after refreshed could
 have to wait for the interval.

>>>
>>> But why do we want to wait in such case where user has explicitly
>>> requested refresh?
>>>
>>
>> Yeah, sorry, I meant that we don't want to wait but cannot launch the
>> tablesync worker in such case.
>>
>> But after more thought, the latest patch Peter proposed has the same
>> problem. Perhaps we need to scan always whole pg_subscription_rel and
>> remove the entry if the corresponding table get synced.
>>
>
> Yes that's what I mean by "Why can't we just update the hashtable based
> on the catalog". And if we do that then I don't understand why do we
> need both hastable and linked list if we need to update both based on
> catalog reads anyway.

Thanks, I've now understood correctly. Yes, I think you're right. If
we update the hash table based on the catalog whenever table state is
invalidated, we don't need to have both hash table and list.

BTW, in current HEAD the SUBREL_STATE_SYNCWAIT is not stored in the
pg_subscription_catalog. So the following condition seems not correct.
We should use "syncworker->relstate == SUBSCRIPTION_STATE_SYNCWAIT"
instead?

/*
 * There is a worker synchronizing the relation and waiting for
 * apply to do something.
 */
if (syncworker && rstate->state == SUBREL_STATE_SYNCWAIT)
{
/*
 * There are three possible synchronization situations here.
 *
 * a) Apply is in front of the table sync: We tell the table
 *sync to CATCHUP.
 *
 * b) Apply is behind the table sync: We tell the table sync
 *to mark the table as SYNCDONE and finish.

 * c) Apply and table sync are at the same position: We tell
 *table sync to mark the table as READY and finish.
 *
 * In any case we'll need to wait for table sync to change
 * the state in catalog and only then continue ourselves.
 */

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-04-19 Thread Vitaly Burovoy
On 4/18/17, Peter Eisentraut  wrote:
> On 4/7/17 01:26, Vitaly Burovoy wrote:
>> I've implement SET GENERATED ... IF NOT EXISTS. It must be placed
>> before other SET options but fortunately it conforms with the
>> standard.
>> Since that form always changes the sequence behind the column, I
>> decided to explicitly write "[NO] CACHE" in pg_dump.
>>
>> As a plus now it is possible to rename the sequence behind the column
>> by specifying SEQUENCE NAME in SET GENERATED.
>>
>> I hope it is still possible to get rid of the "ADD GENERATED" syntax.
>
> I am still not fond of this change.  There is precedent all over the
> place for having separate commands for creating a structure, changing a
> structure, and removing a structure.  I don't understand what the
> problem with that is.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


OK. Let's go through it again.
IDENTITY is a property of a column. There are no syntax to change any
property of any DB object via the "ADD" syntax.
Yes, a structure (a sequence) is created. But in fact it cannot be
independent from the column at all (I remind you that according to the
standard it should be unnamed sequence and there are really no way to
do something with it but via the column's DDL).
It is even hard to detect which sequence (since they have names) is
owned by the column:

postgres=# CREATE TABLE xxx(i int generated always as identity, j serial);
CREATE TABLE
postgres=# \d xxx*
Table "public.xxx"
 Column |  Type   | Collation | Nullable |Default
+-+---+--+
 i  | integer |   | not null | generated always as identity
 j  | integer |   | not null | nextval('xxx_j_seq'::regclass)

 Sequence "public.xxx_i_seq"
   Column   |  Type   | Value
+-+---
 last_value | bigint  | 1
 log_cnt| bigint  | 0
 is_called  | boolean | f

 Sequence "public.xxx_j_seq"
   Column   |  Type   | Value
+-+---
 last_value | bigint  | 1
 log_cnt| bigint  | 0
 is_called  | boolean | f
Owned by: public.xxx.j


I can only guess that "public.xxx_i_seq" is owned by "public.xxx.i",
nothing proves that.
Whereas for regular sequence there are two evidences ("Default" and "Owned by").

Also the created sequence cannot be deleted (only with the column) or
left after the column is deleted.


Everywhere else the "ADD" syntax is used where you can add more than
one object to the altered one:
ALTER TABLE ... ADD COLUMN /* there can be many added columns
differing by names in the table */
ALTER TABLE ... ADD CONSTRAINT /* there can be many added constraints
differing by names in the table */
ALTER TYPE ... ADD VALUE /* many values in an enum differing by names
in the enum */
ALTER TYPE ... ADD ATTRIBUTE /* many attributes in a composite
differing by names in the enum */
etc.

But what is for "ALTER TABLE ... ALTER COLUMN ... ADD GENERATED"?
Whether a property's name is used for a distinction between them in a column?
Whether it is possible to have more than one such property to the
altering column?


The "SET GENERATED" (without "IF NOT EXISTS") syntax conforms to the
standard for those who want it.
The "SET GENERATED ... IF NOT EXISTS" syntax allows users to have the
column be in a required state (IDENTITY with set options) without
paying attention whether it is already set as IDENTITY or not.


The "[ NOT ] EXISTS" is a common Postgres' syntax extension for
creating/updating objects in many places. That's why I think it should
be used instead of introducing the new "ADD" syntax which contradicts
the users' current experience.


-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-04-19 Thread Michael Paquier
On Thu, Apr 20, 2017 at 12:40 PM, Michael Paquier
 wrote:
> On Thu, Apr 20, 2017 at 4:57 AM, Peter Eisentraut
>  wrote:
>> I think the problem with a signal-based solution is that there is no
>> feedback.  Ideally, you would wait for all walsenders to acknowledge the
>> receipt of SIGUSR2 (or similar) and only then proceed with the shutdown
>> checkpoint.
>
> Are you sure that it is necessary to go to such extent? Why wouldn't
> it be enough to prevent any replication commands generating WAL to run
> when the WAL sender knows that the postmaster is in shutdown mode?

2nd thoughts here... Ah now I see your point. True that there is no
way to ensure that an unwanted command is not running when SIGUSR2 is
received as the shutdown checkpoint may have already begun. Here is an
idea: add a new state in WalSndState, say WALSNDSTATE_STOPPING, and
the shutdown checkpoint does not run as long as all WAL senders still
running do not reach such a state.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-04-19 Thread Michael Paquier
On Thu, Apr 20, 2017 at 4:57 AM, Peter Eisentraut
 wrote:
> On 4/19/17 01:45, Michael Paquier wrote:
>> On Tue, Apr 18, 2017 at 3:27 AM, Peter Eisentraut
>>  wrote:
>>> I'd imagine the postmaster would tell the walsender that it has started
>>> shutdown, and then the walsender would reject $certain_things.  But I
>>> don't see an existing way for the walsender to know that shutdown has
>>> been initiated.  SIGINT is still free ...
>>
>> The WAL sender receives SIGUSR2 from the postmaster when shutdown is
>> initiated, so why not just rely on that and issue an ERROR when a
>> client attempts to create or drop a new slot, setting up
>> walsender_ready_to_stop unconditionally? It seems to me that the issue
>> here is the delay between the moment SIGTERM is acknowledged by the
>> WAL sender and the moment CREATE_SLOT is treated. An idea with the
>> attached...
>
> I think the problem with a signal-based solution is that there is no
> feedback.  Ideally, you would wait for all walsenders to acknowledge the
> receipt of SIGUSR2 (or similar) and only then proceed with the shutdown
> checkpoint.

Are you sure that it is necessary to go to such extent? Why wouldn't
it be enough to prevent any replication commands generating WAL to run
when the WAL sender knows that the postmaster is in shutdown mode?
-- 
Michael
VMware vCenter Server
www.vmware.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Michael Malis
Thanks for the help Andres.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-04-19 Thread Ashutosh Bapat
On Wed, Apr 19, 2017 at 10:51 PM, Douglas Doole  wrote:
> Thanks for the feedback Ashutosh.
>
> I disagree that we need to call pass_down_bound() recursively. The whole
> point of the recursive call would be to tunnel through a plan node. Since
> SubqueryScanState only has one child (unlike MergeAppendState) this can be
> more efficiently done inline rather than via a recursive call.
>
> In the case of ResultState, this is classic tail-end recursion and the C
> compiler should optimize it out. As we add more recursive calls though, it
> becomes harder for the compiler to do the right thing. (I'll admit that I'm
> not up on what state-of-the-art in recursion removal though, so maybe my
> concern is moot. That said, I prefer not to depend on compiler optimizations
> if there's a clean alternative way to express the code.)
>
> I do agree that it would make the code cleaner to handle ResultState and
> SubqueryScanState in a similar fashion. My preference, however, would be to
> remove the recursive call from ResultState handling. That would give us code
> like:
>
> if child == SubqueryScanState
> child = SubqueryScanState->child
> else if child == ResultState
> child = ResultState->child
>
> if child == SortState
> limit pushdown
> else if child == MergeAppendState
> recursive call on all children
>
> If it is possible to get more than one SubqueryScanState and/or ResultState
> between the limit and sort, then the first block of code could be placed in
> a while loop.
>
> If you'd prefer that I rework the patch as I discussed, just let me know and
> I'll resubmit it.

Thinking more about it, pass_down_bound() optimization ultimately
targets sort nodes. It doesn't for example target ForeignScan nodes
which can benefit from such optimization. But I think any generic
LIMIT optimization will need to be handled in the planner as against
the executor.

Said that, I don't think community at large would accept serializing
pass_down_bound() as you are proposing. You may to wait for others'
opinions before working on it. If you add this to the commitfest app,
more people might look at it when the next commitfest opens. Also, it
might help if you can provide a query/ies with numbers where this
optimization shows improvement.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-19 Thread Noah Misch
On Sun, Apr 16, 2017 at 06:14:49AM +, Noah Misch wrote:
> On Fri, Apr 14, 2017 at 04:47:12AM +0900, Fujii Masao wrote:
> > Though I've read only a part of the logical rep code yet, I'd like to
> > share some (relatively minor) review comments that I got so far.
> > 
> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
> > value from the argument, instead of DatumGetObjectId().
> > 
> > No one resets on_commit_launcher_wakeup flag to false.
> > 
> > ApplyLauncherWakeup() should be static function.
> > 
> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
> > 
> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
> > 
> > Normal statements that the walsender for logical rep runs are logged
> > by log_replication_commands. So if log_statement is also set,
> > those commands are logged twice.
> > 
> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
> > an error. The callback function to reset it needs to be registered
> > via on_shmem_exit().
> > 
> > Typo: "an subscriber" should be "a subscriber" in some places.
> > 
> > /* Get conninfo */
> > datum = SysCacheGetAttr(SUBSCRIPTIONOID,
> > tup,
> > Anum_pg_subscription_subconninfo,
> > );
> > Assert(!isnull);
> > 
> > This assertion makes me think that pg_subscription.subconnifo should
> > have NOT NULL constraint. Also subslotname and subpublications
> > should have NOT NULL constraint because of the same reason.
> > 
> > SpinLockAcquire(>relmutex);
> > MyLogicalRepWorker->relstate =
> >   GetSubscriptionRelState(MyLogicalRepWorker->subid,
> > MyLogicalRepWorker->relid,
> > >relstate_lsn,
> > false);
> > SpinLockRelease(>relmutex);
> > 
> > Non-"short-term" function like GetSubscriptionRelState() should not
> > be called while spinlock is being held.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: logical replication and PANIC during shutdown checkpoint in publisher

2017-04-19 Thread Noah Misch
On Sun, Apr 16, 2017 at 06:12:58AM +, Noah Misch wrote:
> On Wed, Apr 12, 2017 at 10:55:08PM +0900, Fujii Masao wrote:
> > When I shut down the publisher while I repeated creating and dropping
> > the subscription in the subscriber, the publisher emitted the following
> > PANIC error during shutdown checkpoint.
> > 
> > PANIC:  concurrent transaction log activity while database system is
> > shutting down
> > 
> > The cause of this problem is that walsender for logical replication can
> > generate WAL records even during shutdown checkpoint.
> > 
> > Firstly walsender keeps running until shutdown checkpoint finishes
> > so that all the WAL including shutdown checkpoint record can be
> > replicated to the standby. This was safe because previously walsender
> > could not generate WAL records. However this assumption became
> > invalid because of logical replication. That is, currenty walsender for
> > logical replication can generate WAL records, for example, by executing
> > CREATE_REPLICATION_SLOT command. This is an oversight in
> > logical replication patch, I think.
> > 
> > To fix this issue, we should terminate walsender for logical replication
> > before shutdown checkpoint starts. Of course walsender for physical
> > replication still needs to keep running until shutdown checkpoint ends,
> > though.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Noah Misch
On Sun, Apr 16, 2017 at 06:08:41AM +, Noah Misch wrote:
> On Thu, Apr 06, 2017 at 11:33:22AM +0900, Masahiko Sawada wrote:
> > While testing table sync worker for logical replication I noticed that
> > if the table sync worker of logical replication failed to insert the
> > data for whatever reason, the table sync worker process exits with
> > error. And then the main apply worker launches the table sync worker
> > again soon without interval. This routine is executed at very high
> > frequency without interval.
> > 
> > Should we do put a interval (wal_retrieve_interval or make a new GUC
> > parameter?) for launching the table sync worker?
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-04-19 Thread Noah Misch
On Mon, Apr 17, 2017 at 03:41:25PM -0400, Stephen Frost wrote:
> * Noah Misch (n...@leadboat.com) wrote:
> > On Thu, Apr 13, 2017 at 11:38:08AM -0400, Robert Haas wrote:
> > > On Thu, Apr 13, 2017 at 11:05 AM, Stephen Frost  
> > > wrote:
> > > > Sure, though I won't be able to today and I've got some doubts about the
> > > > other patches.  I'll have more time tomorrow though.
> > > 
> > > OK, cool.  I'll mark you down as the owner on the open items list.
> > 
> > [Action required within three days.]
> 
> I've put up a new patch for review on the thread and plan to commit
> that tomorrow, assuming there isn't anything further.  That should
> resolve the immediate issue, but I do think there's some merit to Amit's
> follow-on patches and will continue to discuss those and may commit one
> or both of those later this week.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] AGG_HASHED cost estimate

2017-04-19 Thread Jeff Janes
In cost_size.c, there is this comment block:

+* Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly
the
+* same total CPU cost, but AGG_SORTED has lower startup cost.  If
the
+* input path is already sorted appropriately, AGG_SORTED should be
+* preferred (since it has no risk of memory overflow).  This will
happen
+* as long as the computed total costs are indeed exactly equal ---
but
+* if there's roundoff error we might do the wrong thing.  So be
sure
+* that the computations below form the same intermediate values in
the
+* same order.

But, why should they have the same cost in the first place?  A sorted
aggregation just has to do an equality comparison on each adjacent pair,
which is costed as (cpu_operator_cost * numGroupCols) * input_tuples.   A
hash aggregation has to do a hashcode computation for each input, which
apparently is also costed at (cpu_operator_cost * numGroupCols) *
input_tuples.  But it also has to do the equality comparison between the
input tuple and any aggregation it is about to be aggregated into, to make
sure the hashcode is not a false collision.  That should be
another (cpu_operator_cost * numGroupCols) * (input_tuples - numGroups),
shouldn't it?  Currently, that is apparently assumed to be free.

Is there something here I am overlooking?

Cheers,

Jeff


Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

2017-04-19 Thread Michael Paquier
On Wed, Apr 19, 2017 at 7:03 AM, Michael Paquier
 wrote:
> On Wed, Apr 19, 2017 at 12:48 AM, Tom Lane  wrote:
>> Alvaro Herrera  writes:
>>> Tom Lane wrote:
 FWIW, I'm a bit suspicious of relocating the temp stats directory as
 being a reliable fix for this.
>>
>>> It's an SD card (the kind typically used in cameras and phones), not SSD.
>>> Saying it's slow is an understatement.  It's *excruciatingly* slow.
>>
>> Oh, I misread it ... but still, the modern definition of "excruciatingly
>> slow" doesn't seem all that far off what 90s-era hard drives could do.
>> It is clear from googling though that there's an enormous performance
>> range in SD cards' random write performance, eg wikipedia's entry has
>> a link to
>>
>> http://goughlui.com/2014/01/16/testing-sd-card-performance-round-up/
>>
>> Seems like it's hard to judge this without knowing exactly which
>> SD card Michael has got in that thing.
>
> Professional micro SD HC 8GB, with class 10, which is utterly slow in
> the things I have tested up to now and of course I cannot find its
> read/write properties. I had a SanDisk class 10 in it that died some
> months ago, and that was way faster. RIP.

Until something happens, I have removed --enable-tap-tests from the
script of this buildfarm machine. There is no point to let it in red
without a clear resolution plan on the table first.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Removing select(2) based latch (was Unportable implementation of background worker start)

2017-04-19 Thread Andres Freund
On 2017-04-19 20:06:05 -0400, Tom Lane wrote:
> > BTW, we IIRC had discussed removing the select() backed latch
> > implementation in this release.  I'll try to dig up that discussion.
>
> Might be sensible.  Even my pet dinosaurs have poll(2).

I can't find the discussion anymore, but I'm fairly sure we did discuss
that.  So here's a new one.

> We should check the buildfarm to see if the select() implementation is
> being tested at all.

I verified it's currently not (unless I made a mistake):
pgbfprod=>
select bs.name, snapshot
FROM buildsystems bs, lateral (
SELECT * FROM build_status_log bsl
WHERE bs.name = bsl.sysname AND log_stage = 'configure.log'
ORDER BY bsl.sysname, bsl.snapshot desc, bsl.log_stage limit 1) bsl2
WHERE
log_text NOT LIKE '%no configure step for MSCV%'
AND log_text NOT LIKE '%checking for poll... yes%' order by snapshot 
desc;
   name|  snapshot
---+-
 frogmouth | 2017-04-19 00:32:32
 jacana| 2017-04-19 00:00:58
 narwhal   | 2017-04-18 07:00:01
(3 rows)

and those three animals are windows, and thus use the windows
implementation.

This actually seems to suggest that no animal, including dead ones,
didn't have poll(2) support at the time of their last run...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-04-19 Thread Kyotaro HORIGUCHI
Ok, I got the point.

At Wed, 19 Apr 2017 17:39:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170419.173901.16598616.horiguchi.kyot...@lab.ntt.co.jp>
> > >> |
> > >> | Quorum-based synchronous replication is basically more
> > >> | efficient than priority-based one when you specify multiple
> > >> | standbys in synchronous_standby_names and want
> > >> | to synchronously replicate transactions to two or more of
> > >> | them.

"Some" means "not all".

> > >> | In the priority-based case, the replication master
> > >> | must wait for a reply from the slowest standby in the
> > >> | required number of standbys in priority order, which may
> > >> | slower than the rest.


Quorum-based synchronous replication is expected to be more
efficient than priority-based one when your master doesn't need
to be in sync with all of the nominated standbys by
synchronous_standby_names.  While quorum-based
replication master waits only for a specified number of fastest
standbys, priority-based replicatoin master must wait for
standbys at the top of the list, which may be slower than the
rest.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-19 Thread Andres Freund
On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> I think we might need some more tests for this to be committable, so
> it might not become committable tomorrow.

I'm working on some infrastructure around this.  Not sure if it needs to
be committed, but it's certainly useful for evaluation.  Basically it's
a small UDF that:
1) creates a slot via walsender protocol (to some dsn)
2) imports that snapshot into yet another connection to that dsn
3) runs some query over that new connection

That makes it reasonably easy to run e.g. pgbench and continually create
slots, and use the snapshot to run queries "verifying" that things look
good.  It's a bit shoestring-ed together, but everything else seems to
require more code. And it's just test.

Unless somebody has a better idea?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
Andres Freund  writes:
> FWIW, I'd wished before that we used something a bit more modern than
> select() if available... It's nice to be able to listen to a larger
> number of sockets without repeated O(sockets) overhead.

[ raised eyebrow... ]  Is anyone really running postmasters with enough
listen sockets for that to be meaningful?

> BTW, we IIRC had discussed removing the select() backed latch
> implementation in this release.  I'll try to dig up that discussion.

Might be sensible.  Even my pet dinosaurs have poll(2).  We should
check the buildfarm to see if the select() implementation is being
tested at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Andres Freund
On 2017-04-19 18:56:26 -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Hm.  Do you have a more-portable alternative?
> 
> > I was thinking in a WaitEventSet from latch.c.
> 
> Yeah, some googling turns up the suggestion that a self-pipe is a portable
> way to get consistent semantics from select(); latch.c has already done
> that.  I suppose that using latch.c would be convenient in that we'd have
> to write little new code, but it's a tad annoying to add the overhead of a
> self-pipe on platforms where we don't need it (which seems to be most).

FWIW, I'd wished before that we used something a bit more modern than
select() if available... It's nice to be able to listen to a larger
number of sockets without repeated O(sockets) overhead.

BTW, we IIRC had discussed removing the select() backed latch
implementation in this release.  I'll try to dig up that discussion.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Andres Freund
On 2017-04-19 16:43:07 -0700, Michael Malis wrote:
> > The profile seems to confirm that this is largely due to cache misses.
> 
> Can you elaborate? Cache misses of what? Why is the planning time so
> variable?

Postgres uses caches for a lot of metadata of tables, indexes, ... Some
actions, like vacuum, DDL, table size changes trigger cache
invalidations for other backends so the caches stay coherent.  What
you're likely seeing is that the cached cases are quick enough, and that
the slow plan cycles are when the cache has been invalidated.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Michael Malis
> The profile seems to confirm that this is largely due to cache misses.

Can you elaborate? Cache misses of what? Why is the planning time so variable?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Andres Freund
On 2017-04-19 14:14:54 -0700, Michael Malis wrote:
> > Could you also get a profile using perf record -g?  The strace isn't
> > telling us all that much.
> 
> I've attached the perf.data file from `perf record -g -p
> `. I'm not that familiar with perf so if there is more
> information you need let me know. This was on Ubuntu 16.04

The profile seems to confirm that this is largely due to cache misses.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Hm.  Do you have a more-portable alternative?

> I was thinking in a WaitEventSet from latch.c.

Yeah, some googling turns up the suggestion that a self-pipe is a portable
way to get consistent semantics from select(); latch.c has already done
that.  I suppose that using latch.c would be convenient in that we'd have
to write little new code, but it's a tad annoying to add the overhead of a
self-pipe on platforms where we don't need it (which seems to be most).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> So I'm wondering what the design rationale was for only starting one
> >> bgworker per invocation.
> 
> > The rationale was that there may be other tasks waiting for postmaster
> > attention, and if there are many bgworkers needing to be started, the
> > other work may be delayed for a long time.  This is not the first time
> > that this rationale has been challenged, but so far there hasn't been
> > any good reason to change it.  One option is to just remove it as you
> > propose, but a different one is to stop using select(2) in ServerLoop,
> > because those behavior differences seem to make it rather unusable.
> 
> Hm.  Do you have a more-portable alternative?

I was thinking in a WaitEventSet from latch.c.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> So I'm wondering what the design rationale was for only starting one
>> bgworker per invocation.

> The rationale was that there may be other tasks waiting for postmaster
> attention, and if there are many bgworkers needing to be started, the
> other work may be delayed for a long time.  This is not the first time
> that this rationale has been challenged, but so far there hasn't been
> any good reason to change it.  One option is to just remove it as you
> propose, but a different one is to stop using select(2) in ServerLoop,
> because those behavior differences seem to make it rather unusable.

Hm.  Do you have a more-portable alternative?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Alvaro Herrera
Tom Lane wrote:

> While I haven't yet tested it, it seems like a fix might be as simple
> as deleting these lines in maybe_start_bgworker:
> 
> /*
>  * Have ServerLoop call us again.  Note that there might not
>  * actually *be* another runnable worker, but we don't care all
>  * that much; we will find out the next time we run.
>  */
> StartWorkerNeeded = true;
> return;
> 
> So I'm wondering what the design rationale was for only starting one
> bgworker per invocation.

The rationale was that there may be other tasks waiting for postmaster
attention, and if there are many bgworkers needing to be started, the
other work may be delayed for a long time.  This is not the first time
that this rationale has been challenged, but so far there hasn't been
any good reason to change it.  One option is to just remove it as you
propose, but a different one is to stop using select(2) in ServerLoop,
because those behavior differences seem to make it rather unusable.

> It also appears to me that do_start_bgworker's treatment of fork
> failure is completely brain dead.  Did anyone really think about
> that case?

Hmm, I probably modelled it on autovacuum without giving that case much
additional consideration.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Andres Freund
On 2017-04-19 14:14:54 -0700, Michael Malis wrote:
> > Could you also get a profile using perf record -g?  The strace isn't
> > telling us all that much.
> 
> I've attached the perf.data file from `perf record -g -p
> `. I'm not that familiar with perf so if there is more
> information you need let me know. This was on Ubuntu 16.04

You'd have to do a perf report > outfile.txt too, without your system we
can't analyze the perf.data.

https://wiki.postgresql.org/wiki/Profiling_with_perf

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unportable implementation of background worker start

2017-04-19 Thread Tom Lane
I chanced to notice that on gaur/pademelon, the "select_parallel"
regression test sometimes takes a great deal longer than normal,
for no obvious reason.  It does eventually terminate, but sometimes
only after 10 to 20 minutes rather than the couple dozen seconds
that are typical on that slow machine.

After a fair amount of hair-pulling, I traced the problem to the
fact that maybe_start_bgworker() will only start at most one worker
per call; after that, it sets StartWorkerNeeded = true and returns,
opining in a comment that ServerLoop will make another call shortly.

Unfortunately, that's hogwash.  It happens to work reliably on Linux,
because according to signal(7)

   The following interfaces are never restarted after being interrupted by
   a signal handler, regardless of the use of SA_RESTART; they always fail
   with the error EINTR when interrupted by a signal handler:

   ... select(2) ...

However, that's a Linux-ism.  What POSIX 2008 says about it, in the
select(2) reference page, is that

If SA_RESTART has been set for the interrupting signal, it is
implementation-defined whether the function restarts or returns
with [EINTR].

HPUX apparently adopts the "restart" definition, and as we've previously
found out, not only does select(2) not return immediately but it actually
seems to reset the timeout timer to its original value.  (Some googling
suggests that restarting occurs on SVR4-derived kernels but not
BSD-derived ones.)

So what happens, if several RegisterDynamicBackgroundWorker requests
arrive in a single iteration of the postmaster's sigusr1_handler,
is that the first one is serviced thanks to the maybe_start_bgworker
call appearing in sigusr1_handler, and then we return to the select()
call and sleep.  The next start request is serviced only when the
typically-60-second select timeout expires, or more usually when some
other interrupt arrives at the postmaster.  In the regression tests,
what seems to happen is that we get a PMSIGNAL_START_AUTOVAC_WORKER
from the autovac launcher every thirty seconds, allowing one more bgworker
to get launched each time we go through sigusr1_handler.

Here's an actual trace of one select_parallel.sql query trying to
launch four parallel workers; I added a bunch of elog(LOG) calls
to help diagnose what was going on:

2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of 
slot 1
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, 
tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of 
slot 2
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, 
tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of 
slot 3
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, 
tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of 
slot 4
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, 
tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.563 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker 
"parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker 
"parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker 
"parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker 
"parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  entered maybe_start_bgworker, 
StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:25:33.563 EDT [6456] LOG:  starting background worker process 
"parallel worker for PID 8827"
2017-04-19 17:25:33.566 EDT [8871] LOG:  starting parallel worker 3
2017-04-19 17:25:33.642 EDT [8871] LOG:  exiting parallel worker 3
2017-04-19 17:25:33.642 EDT [8871] STATEMENT:  select count(*) from tenk1, 
tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.647 EDT [6456] LOG:  leaving sigusr1_handler
2017-04-19 17:25:33.647 EDT [6456] LOG:  entering reaper
2017-04-19 17:25:33.647 EDT [6456] LOG:  unregistering background worker 
"parallel worker for PID 8827"
2017-04-19 17:25:33.647 EDT [6456] LOG:  reaped bgworker pid 8871 status 0
2017-04-19 17:25:33.647 EDT [6456] LOG:  leaving reaper
2017-04-19 17:26:03.114 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:26:03.115 EDT [6456] LOG:  entered maybe_start_bgworker, 
StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:26:03.115 EDT [6456] LOG:  starting background worker process 
"parallel worker for PID 8827"
2017-04-19 17:26:03.118 EDT [8874] LOG:  starting parallel worker 2
2017-04-19 17:26:03.164 EDT [8874] LOG:  exiting parallel worker 2
2017-04-19 17:26:03.164 EDT [8874] STATEMENT:  select 

Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Jeff Janes
On Wed, Apr 19, 2017 at 2:39 PM, Michael Malis 
wrote:

> > TBH, I see no convincing explanation in that article of why 1300 partial
> > indexes are a good idea.
>
> I don't like it much either. I've been trying to eliminate most of the
> need for the partial indexes, but this is the current state of things.
>
> > *At best*, you're doing substantial work in the
> > planner to avoid the first tree descent step or two in a single
> > non-partial index.
>
> While the specific example I gave in the post could be replaced with a
> non-partial index, most of the partial indexes contain predicates that
> are not straightforward to index with non-partial indexes. About 85%
> of the partial indexes contain a regular expression in them for CSS
> selector matching. I've tried using a trigram GIN index, but it wound
> up not working too well due to the data being highly redundant (almost
> every CSS hierarchy contains 'div' in it).


pg_trgm 1.2 or higher has code to help with very common trigrams.  But to
try it, you would need to upgrade PostgreSQL to 9.6.

Cheers,

Jeff


Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Michael Malis
> TBH, I see no convincing explanation in that article of why 1300 partial
> indexes are a good idea.

I don't like it much either. I've been trying to eliminate most of the
need for the partial indexes, but this is the current state of things.

> *At best*, you're doing substantial work in the
> planner to avoid the first tree descent step or two in a single
> non-partial index.

While the specific example I gave in the post could be replaced with a
non-partial index, most of the partial indexes contain predicates that
are not straightforward to index with non-partial indexes. About 85%
of the partial indexes contain a regular expression in them for CSS
selector matching. I've tried using a trigram GIN index, but it wound
up not working too well due to the data being highly redundant (almost
every CSS hierarchy contains 'div' in it). Additionally, most of the
predicates for partial indexes are extremely specific making the
partial indexes very small. The sum total size of all of the partial
indexes is still much smaller than if we were to index every necessary
field with regular indexes.

Thanks,
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation cache invalidation on replica

2017-04-19 Thread Victor Yegorov
2017-04-19 23:08 GMT+03:00 Andres Freund :

> I was thinking of c6ff84b06a68b71719aa1aaa5f6704d8db1b51f8
>

Thanks a lot!

I found, that this got into 9.6 already via the Release Notes:
https://www.postgresql.org/docs/current/static/release-9-6.html#AEN131397
Will certainly check this out soon!

-- 
Victor Yegorov


Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Tom Lane
Michael Malis  writes:
> I've been encountering highly variable planning time on PG 9.5.4.
> Running `EXPLAIN SELECT * FROM events_1171738` will take anywhere from
> 200ms to 4s. This likely has to do with the table having 1300 partial
> indexes on it (for reasons elaborated on in
> https://blog.heapanalytics.com/running-10-million-postgresql-indexes-in-production/).

TBH, I see no convincing explanation in that article of why 1300 partial
indexes are a good idea.  *At best*, you're doing substantial work in the
planner to avoid the first tree descent step or two in a single
non-partial index.  At worst, all your savings go out the window during a
relcache flush, which is likely to happen whenever a vacuum or analyze
gets done on the underlying table.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Alvaro Herrera
Michael Malis wrote:
> Hi,
> 
> I've been encountering highly variable planning time on PG 9.5.4.
> Running `EXPLAIN SELECT * FROM events_1171738` will take anywhere from
> 200ms to 4s. This likely has to do with the table having 1300 partial
> indexes on it (for reasons elaborated on in
> https://blog.heapanalytics.com/running-10-million-postgresql-indexes-in-production/).
> I'm trying to figure out if there is something I can do to eliminate
> the slow planning from happening.

I wonder if some BRIN index could help this use case, reducing the
number of indexes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Highly Variable Planning Times

2017-04-19 Thread Andres Freund
Hi,

On 2017-04-19 13:39:40 -0700, Michael Malis wrote:
> I've been encountering highly variable planning time on PG 9.5.4.
> Running `EXPLAIN SELECT * FROM events_1171738` will take anywhere from
> 200ms to 4s. This likely has to do with the table having 1300 partial
> indexes on it (for reasons elaborated on in
> https://blog.heapanalytics.com/running-10-million-postgresql-indexes-in-production/).
> I'm trying to figure out if there is something I can do to eliminate
> the slow planning from happening.

I'd suspect that that's triggered by cache rebuilds.  If there's any
concurrent relcache invalidation (e.g. by a concurrent vacuum, create
index, alter table, relation extension, ...), a lot of metadata for all
those indexes will have to be rebuilt.

TBH, I don't think we're particularly likely to optimize hugely for
workloads with 1300 indexes on individual tables - such an effort seems
not unlikely to hurt more common cases.


> When I used `strace -c` on the backend process, I found that the
> number of `semop` system calls is much higher for the slower queries
> than the faster ones. After digging in a bit more I found that when
> the query was slow, there would be a single semaphore which would make
> up most of the `semop` calls. Here is a snippet of the output when I
> ran `strace -r` on a query with high planning time:

Could you also get a profile using perf record -g?  The strace isn't
telling us all that much.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Highly Variable Planning Times

2017-04-19 Thread Michael Malis
Hi,

I've been encountering highly variable planning time on PG 9.5.4.
Running `EXPLAIN SELECT * FROM events_1171738` will take anywhere from
200ms to 4s. This likely has to do with the table having 1300 partial
indexes on it (for reasons elaborated on in
https://blog.heapanalytics.com/running-10-million-postgresql-indexes-in-production/).
I'm trying to figure out if there is something I can do to eliminate
the slow planning from happening.

When I used `strace -c` on the backend process, I found that the
number of `semop` system calls is much higher for the slower queries
than the faster ones. After digging in a bit more I found that when
the query was slow, there would be a single semaphore which would make
up most of the `semop` calls. Here is a snippet of the output when I
ran `strace -r` on a query with high planning time:

0.000225 open("base/16384/13236647", O_RDWR) = 219
0.000110 lseek(219, 0, SEEK_END)   = 16384
0.000122 semop(1736711, [{9, -1, 0}], 1) = 0
0.004110 semop(1769480, [{10, 1, 0}], 1) = 0
0.000197 semop(1736711, [{9, -1, 0}], 1) = 0
0.010096 semop(1540097, [{1, 1, 0}], 1) = 0
0.000176 semop(1736711, [{9, -1, 0}], 1) = 0
0.004069 semop(1638404, [{15, 1, 0}], 1) = 0
0.000244 open("base/16384/13236648", O_RDWR) = 220
0.000131 lseek(220, 0, SEEK_END)   = 16384
0.000212 semop(1736711, [{9, -1, 0}], 1) = 0
0.007851 semop(1736711, [{8, 1, 0}], 1) = 0
0.000130 semop(1736711, [{9, -1, 0}], 1) = 0
0.000101 semop(1769480, [{5, 1, 0}], 1) = 0
0.000450 open("base/16384/13236653", O_RDWR) = 221
0.000151 lseek(221, 0, SEEK_END)   = 180224
0.000141 semop(1736711, [{9, -1, 0}], 1) = 0
0.000239 semop(1736711, [{9, -1, 0}], 1) = 0
0.000419 semop(1736711, [{10, 1, 0}], 1) = 0
0.000127 semop(1736711, [{9, -1, 0}], 1) = 0
0.001493 semop(1703942, [{3, 1, 0}], 1) = 0

You can see that most of the semop calls are waits on semaphore #9 in
semaphore set 1736711. I looked up the relation for the files being
opened and they all seem to be indexes on the table. Based on some
flame graphs I generated, I believe most of the time is spent inside
of the function get_relation_info. I think the open/lseek calls are
coming from the estimate_rel_size call in get_relation_info, but I
haven't been able to figure out where the semop calls are coming from.

It seems that the semop calls are at least a symptom of the high
planning time. I'm looking for help in finding the root cause of the
high planning time and for figuring out a way for eliminating the high
planning time.

Thanks,
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ongoing issues with representation of empty arrays

2017-04-19 Thread Merlin Moncure
On Mon, Apr 10, 2017 at 11:17 PM, Andrew Gierth
 wrote:
>> "Tom" == Tom Lane  writes:
>
>  >> First is contrib/intarray, _AGAIN_ (see past bugs such as #7730):
>  >> ...
>  >> I plan to fix this one properly, unless anyone has any objections.
>
>  Tom> Just to clarify, what do you think is "properly"?
>
> I would say, that any time an intarray function returns an empty result
> it should be the standard 0-dimensional representation that every other
> array operation uses.  The intarray functions all seem already able to
> take such values as inputs.  Also there should be regression tests for
> this (none of intarray's existing tests have any empty arrays at all).

Are there any impacts outside of intarray?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ongoing issues with representation of empty arrays

2017-04-19 Thread David G. Johnston
On Mon, Apr 10, 2017 at 4:57 PM, Andrew Gierth 
wrote:

> The distinction between the standard representation of '{}' as an array
> with zero dimensions and nonstandard representations as a 1-dimensional
> array with zero elements has come up in a couple of contexts on the IRC
> channel recently.
>

​Just to add to the listing of annoyances.

The following makes me want some way, in SQL, to create an empty
1-dimensional array ...

SELECT array_agg(e_arr)
FROM ( VALUES (ARRAY['1']::text[]), (ARRAY[]::text[]) ) v (e_arr);
--ERROR:  cannot accumulate arrays of different dimensionality

We moved the goals posts nicely with bac27394a1c but not being able to mix
empty and non-empty arrays is problematic.  Ideally an empty array could
become an array of any dimension on-the-fly so that if even explicitly
dimensioned input to array_agg is 1-dimensioned then all empty arrays would
be promoted to 1-dimension and the resultant output would become two
dimensional.  unnest'ing such a structure would pose its own challenges
though...

David J.


Re: [HACKERS] Relation cache invalidation on replica

2017-04-19 Thread Andres Freund
On 2017-04-19 17:07:49 +0300, Victor Yegorov wrote:
> 2017-03-13 9:22 GMT+02:00 Andres Freund :
> 
> > >I think we're hitting this particular issue quite frequently when
> > >rebuilding indexes on master after big-volume data manipulations.
> > >
> > >We have `pgbouncer` in transaction mode for both, master and slave,
> > >therefore it's quite typical to have sessions on slave, that
> > >were using indexes before those we're re-created. Sad, but right now
> > >maintenance is a big performance hit for our applications,
> > >we have to re-connect them to start using indexes again.
> > >
> > >Are there any chances to get fix for this issue released in 10.0 and,
> > >perhaps, backpatched also?
> >
> > I'm not at my computer right now, but I recall committing something like
> > my approach.
> 
> 
> Andres, can you point me on the commit you're mentioning, please?

I was thinking of c6ff84b06a68b71719aa1aaa5f6704d8db1b51f8

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

2017-04-19 Thread Peter Eisentraut
On 4/19/17 01:45, Michael Paquier wrote:
> On Tue, Apr 18, 2017 at 3:27 AM, Peter Eisentraut
>  wrote:
>> I'd imagine the postmaster would tell the walsender that it has started
>> shutdown, and then the walsender would reject $certain_things.  But I
>> don't see an existing way for the walsender to know that shutdown has
>> been initiated.  SIGINT is still free ...
> 
> The WAL sender receives SIGUSR2 from the postmaster when shutdown is
> initiated, so why not just rely on that and issue an ERROR when a
> client attempts to create or drop a new slot, setting up
> walsender_ready_to_stop unconditionally? It seems to me that the issue
> here is the delay between the moment SIGTERM is acknowledged by the
> WAL sender and the moment CREATE_SLOT is treater. An idea with the
> attached...

I think the problem with a signal-based solution is that there is no
feedback.  Ideally, you would wait for all walsenders to acknowledge the
receipt of SIGUSR2 (or similar) and only then proceed with the shutdown
checkpoint.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-04-19 Thread Euler Taveira
2017-04-19 1:32 GMT-03:00 Michael Paquier :

> I vote for "location" -> "lsn". I would expect complains about the
> current inconsistency at some point, and the function names have been
> already changed for this release..
>

+1.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Fwd: [HACKERS] GSoC 2017 Proposal

2017-04-19 Thread Mark Rofail
Dear Mr Alexander,

I was checking the archives today and to my shock, I did not find my
reply to your previous question which was almost two weeks ago.
I apologise for the inconvenience, I have however replied within an
hour but apparently, it did not go through.

Best Regards,
Mark Moheb

On Thu, Apr 6, 2017 at 4:18 PM, Mark Rofail  wrote:
> Hello Mr Alexander,
>
> From my understanding, the main issue occurs whenever any UPDATE or
> DELETE statement is executed on the PK table,
> this triggers a referential integrity check on the FK table. In the
> previous patches, this was done by performing a sequential scan.
>
> To improve performance I propose that we index the FK column, and in
> my point of view the most suitable index would be the GIN index since
> it is targeted for composite items.
> However, to move forward with this approach, we have to be sure that
> the comparison semantics offered by GIN indexes satisfy our needs for
> the referential integrity check.
>
> This approach was proposed by Tom Lane:
> https://www.postgresql.org/message-id/28389.1351094795%40sss.pgh.pa.us
>
> I believe this can be accomplished by better understanding the GIN
> index implementation in postgreSQL, including its operators.
>
> This is the best to the knowledge I gained during the application
> period. I would like to investigate it further and would be delighted
> to hear your input regarding the matter,
>
> Best Regards,
> Mark Moheb


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SASL minor docs typo

2017-04-19 Thread Heikki Linnakangas

On 04/18/2017 08:50 PM, Jaime Casanova wrote:

Hi,

reading SASL docs found this typo:

in protocol.sgml:1356
"""
To begin a SASL authentication exchange, the server an AuthenticationSASL
  message.
"""

I guess it should say "the server sends an AuthenticationSASL message"


Yep, fixed. Thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19

2017-04-19 Thread Andrew Dunstan


On 04/19/2017 01:38 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> I have released version 4.19 of the PostgreSQL Buildfarm client. It can
>> be downloaded from
>> 
> Nice work!
>


Thank you.


>>   * Improvements to TAP tests logging and coverage. Each test set (i.e
>> each /t directory) is now logged separately. Coverage is extended to
>> remaining test sets in src/test except SSL tests.
> I was going to plead with buildfarm owners who use --enable-tap-tests to
> update sooner rather than later, because this separate-logging behavior
> is going to make it a lot less painful to decipher failures during the
> TAP tests.  However, in view of my results in
> https://www.postgresql.org/message-id/21358.1492622881%40sss.pgh.pa.us
> I can't really recommend that owners of slower machines update just yet.
> Hopefully we can do something to trim down the added runtime.


As I mentioned in reply to that thread, you can disable the extra TAP
tests with this addition to the command line:

--skip-steps=misc-check


>
>> These changes mean that the client is more useful for testing
>> development code, and also that testing config settings is much simpler.
>> An initial test run on a fresh installation is now as simple as:
>> ...
>> To test development code, the from-source option is now much more
>> flexible and friendly. For example, one might do something like:
> These things are pretty cool, but maybe they need to be documented at
> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto
> ?
>
>   



Indeed. I am planning to.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost of src/test/recovery and .../subscription tests

2017-04-19 Thread Andrew Dunstan


On 04/19/2017 01:28 PM, Tom Lane wrote:
> So I updated longfin to the new release of the buildfarm client,
> and was quite dismayed by the fact that its cycle time went
> from 16 minutes to 24.  Some of that might be random effects like
> the state of the kernel disk caches, but a large chunk of it
> --- over 5 minutes --- evidently is from src/test/recovery/,
> which the buildfarm script didn't run before and now does.
>
> I am going to say flat out that that's unacceptably long for
> a test script that will be run dozens of times a day by the
> buildfarm.  There isn't any other test script that takes more
> than circa 90 seconds on that machine, and I don't think this
> one should either.
>
> I'm a bit dubious about whether src/test/subscription is being
> reasonably thrifty of test time either.  Do we really need twice
> the runtime of the entire core regression suite to test that?
>
> There was already a good argument not to enable the TAP tests
> on slower buildfarm animals, and if nothing is done about these
> additions, I'm afraid that such testing will be put permanently
> out of reach.  I don't even want to think about how long it
> might take a CLOBBER_CACHE_ALWAYS animal to run these tests.


You can disable the extra tests by adding this to the buildfarm command
line:

--skip-steps=misc-check


cheers

andrew




>
>   regards, tom lane
>
>

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [buildfarm-members] BuildFarm client release 4.19

2017-04-19 Thread Tom Lane
Andrew Dunstan  writes:
> I have released version 4.19 of the PostgreSQL Buildfarm client. It can
> be downloaded from
> 

Nice work!

>   * Improvements to TAP tests logging and coverage. Each test set (i.e
> each /t directory) is now logged separately. Coverage is extended to
> remaining test sets in src/test except SSL tests.

I was going to plead with buildfarm owners who use --enable-tap-tests to
update sooner rather than later, because this separate-logging behavior
is going to make it a lot less painful to decipher failures during the
TAP tests.  However, in view of my results in
https://www.postgresql.org/message-id/21358.1492622881%40sss.pgh.pa.us
I can't really recommend that owners of slower machines update just yet.
Hopefully we can do something to trim down the added runtime.

> These changes mean that the client is more useful for testing
> development code, and also that testing config settings is much simpler.
> An initial test run on a fresh installation is now as simple as:
> ...
> To test development code, the from-source option is now much more
> flexible and friendly. For example, one might do something like:

These things are pretty cool, but maybe they need to be documented at
https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto
?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cost of src/test/recovery and .../subscription tests

2017-04-19 Thread Tom Lane
So I updated longfin to the new release of the buildfarm client,
and was quite dismayed by the fact that its cycle time went
from 16 minutes to 24.  Some of that might be random effects like
the state of the kernel disk caches, but a large chunk of it
--- over 5 minutes --- evidently is from src/test/recovery/,
which the buildfarm script didn't run before and now does.

I am going to say flat out that that's unacceptably long for
a test script that will be run dozens of times a day by the
buildfarm.  There isn't any other test script that takes more
than circa 90 seconds on that machine, and I don't think this
one should either.

I'm a bit dubious about whether src/test/subscription is being
reasonably thrifty of test time either.  Do we really need twice
the runtime of the entire core regression suite to test that?

There was already a good argument not to enable the TAP tests
on slower buildfarm animals, and if nothing is done about these
additions, I'm afraid that such testing will be put permanently
out of reach.  I don't even want to think about how long it
might take a CLOBBER_CACHE_ALWAYS animal to run these tests.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-04-19 Thread Douglas Doole
Thanks for the feedback Ashutosh.

I disagree that we need to call pass_down_bound() recursively. The whole
point of the recursive call would be to tunnel through a plan node. Since
SubqueryScanState only has one child (unlike MergeAppendState) this can be
more efficiently done inline rather than via a recursive call.

In the case of ResultState, this is classic tail-end recursion and the C
compiler should optimize it out. As we add more recursive calls though, it
becomes harder for the compiler to do the right thing. (I'll admit that I'm
not up on what state-of-the-art in recursion removal though, so maybe my
concern is moot. That said, I prefer not to depend on compiler
optimizations if there's a clean alternative way to express the code.)

I do agree that it would make the code cleaner to handle ResultState and
SubqueryScanState in a similar fashion. My preference, however, would be to
remove the recursive call from ResultState handling. That would give us
code like:

if child == SubqueryScanState
child = SubqueryScanState->child
else if child == ResultState
child = ResultState->child

if child == SortState
limit pushdown
else if child == MergeAppendState
recursive call on all children

If it is possible to get more than one SubqueryScanState and/or ResultState
between the limit and sort, then the first block of code could be placed in
a while loop.

If you'd prefer that I rework the patch as I discussed, just let me know
and I'll resubmit it.

- Doug
Salesforce

On Wed, Apr 19, 2017 at 1:57 AM Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> The function pass_down_bound() is a recursive function. For
> SubqueryScanState we have to do something similar to ResultState i.e.
> call pass_down_bound() recursively on subqueryScanState->subplan.
>
> Please add this to the next commitfest.
>
> On Wed, Apr 19, 2017 at 6:09 AM, Douglas Doole 
> wrote:
> > We've hit a case where pass_down_bound() isn't pushing the row count
> limit
> > from limit into sort. The issue is that we're getting a subquery scan
> node
> > between the limit and the sort. The subquery is only doing column
> projection
> > and has no quals or SRFs so it should be safe to push the limit into the
> > sort.
> >
> > The query that hit the problem can be simplified to:
> >
> >SELECT * FROM (SELECT A,B FROM T ORDER BY C) LIMIT 5
> >
> > (Yeah, the query's a little screwy in that the ORDER BY should really be
> > outside the subselect, but it came from a query generator, so that's a
> > different conversation.)
> >
> > Proposed patch is attached.
> >
> > - Doug
> > Salesforce
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >
>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>


[HACKERS] BRIN autosummarize tests

2017-04-19 Thread Nikolay Shaplov

Hi!

Alvaro, you've recently commited patch with BRIN autosummarize

Tell me, this feature can't be really tested via regression tests?

Because now I am rebasing my reloptions patch (again!), and as it was partly 
rebased, I run make check. 
At that point I did not moved implementation of this option completely, and 
thus autosummarize were forever set to false.

And in this obviously broken situation all tests were successful when I run 
them. May be there can be a way to make sure it is really works, and one 
should add it to the tests?



-- 
Nikolay Shaplov, independent Perl & C/C++ developer. Available for hire.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-04-19 Thread Maksim Milyutin

On 19.04.2017 17:13, Remi Colinet wrote:

Maksim,


2017-04-18 20:31 GMT+02:00 Maksim Milyutin >:

On 18.04.2017 17:39, Remi Colinet wrote:


Regarding the queryDesc state of SQL query upon receiving a
request to
report its execution progress, it does not bring any issue. The
request
is noted when the signal is received by the monitored backend.
Then, the
backend continues its execution code path. When interrupts are
checked
in the executor code, the request will be dealt.


Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.

When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and
report
either that the SQL query does not have a valid status, or
otherwise the
current execution state of the SQL query.

SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoring

Other tests can be added if needed to exclude some SQL query
state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible
in this
context. Even when the queryDescc is NULL, we can just report a
 output. This is currently the case with the patch
suggested.


It's interesting question - how much the active query is in a usable
state on the stage of execution. Tom Lane noticed that
CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full
consistency [1].


I wonder what you mean about usable state.



A usable query state is suitable for analysis, IOW we have consistent 
QueryDesc object. This term was introduced by Tom Lane in [1]. I suppose 
he meant the case when a query fails with error and before transaction 
aborts we bump into *CHECK_FOR_INTERRUPTS* in the place where QueryDesc 
may be inconsistent and further reading from it will give us invalid result.




Currently, the code suggested tests the queryDesc pointer and all the
sub nodes pointers in order to detect NULL pointers. When the progress
report is collected by the backend, this backend does the collect and
consequently does not run the query. So the query tree is not being
modified. At this moment, whatever is the query state, we can manage to
deal with its static state. It is only a tree which could also be just a
NULL pointer in the most extreme case. Such case is dealt in the current
code.



Perhaps the deep checking of QueryDesc would allow us to consider it as 
consistent.



1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OK, so culicidae is *still* broken

2017-04-19 Thread Andres Freund
On 2017-04-19 10:15:31 -0400, Tom Lane wrote:
> Amit Kapila  writes:
> > On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane  wrote:
> >> Obviously, any such fix would be a lot more likely to be reliable in
> >> 64-bit machines.  There's probably not enough daylight to be sure of
> >> making it work in 32-bit Windows, so I suspect we'd need some retry
> >> logic anyway for that case.
> 
> > Yeah, that kind of thing can work assuming we don't get conflicts too
> > often, but it could be possible that conflicts are not reported from
> > ASLR enabled environments because of commit 7f3e17b4.
> 
> Right, but Andres' point is that we should make an effort to undo that
> hack and instead allow ASLR to happen.  Not just because it's allegedly
> more secure, but because we may have no choice in future Windows versions.

FWIW, I think it *also* might make us more secure, because addresses in
the postgres binary won't be predictable anymore.  Since most of the
windows binaries are built by one source, that's some advantage on its
own.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Petr Jelinek
On 19/04/17 15:57, Masahiko Sawada wrote:
> On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
>  wrote:
>> On 19/04/17 14:42, Masahiko Sawada wrote:
>>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>>>  wrote:
 At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
  wrote in 
 
> On 18/04/17 18:14, Peter Eisentraut wrote:
>> On 4/18/17 11:59, Petr Jelinek wrote:
>>> Hmm if we create hashtable for this, I'd say create hashtable for the
>>> whole table_states then. The reason why it's list now was that it seemed
>>> unnecessary to have hashtable when it will be empty almost always but
>>> there is no need to have both hashtable + list IMHO.

 I understant that but I also don't like the frequent palloc/pfree
 in long-lasting context and double loop like Peter.

>> The difference is that we blow away the list of states when the catalog
>> changes, but we keep the hash table with the start times around.  We
>> need two things with different life times.

 On the other hand, hash seems overdone. Addition to that, the
 hash-version leaks stale entries while subscriptions are
 modified. But vacuuming them costs high.

> Why can't we just update the hashtable based on the catalog? I mean once
> the record is not needed in the list, the table has been synced so there
> is no need for the timestamp either since we'll not try to start the
> worker again.
>>>
>>> I guess the table sync worker for the same table could need to be
>>> started again. For example, please image a case where the table
>>> belonging to the publication is removed from it and the corresponding
>>> subscription is refreshed, and then the table is added to it again. We
>>> have the record of the table with timestamp in the hash table when the
>>> table sync in the first time, but the table sync after refreshed could
>>> have to wait for the interval.
>>>
>>
>> But why do we want to wait in such case where user has explicitly
>> requested refresh?
>>
> 
> Yeah, sorry, I meant that we don't want to wait but cannot launch the
> tablesync worker in such case.
> 
> But after more thought, the latest patch Peter proposed has the same
> problem. Perhaps we need to scan always whole pg_subscription_rel and
> remove the entry if the corresponding table get synced.
> 

Yes that's what I mean by "Why can't we just update the hashtable based
on the catalog". And if we do that then I don't understand why do we
need both hastable and linked list if we need to update both based on
catalog reads anyway.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-04-19 Thread Remi Colinet
Following on previous email

I have added below some use cases which I find very relevant when we need
to know the progress of a SQL query.
The command can be used by any SQL query (select, update, delete, insert).


The tables used have been created with :

CREATE TABLE T_1M (id integer, md5 text);
INSERT INTO T_1M SELECT generate_series(1,100) AS id,
md5(random()::text) AS md5;

CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,1000) AS id,
md5(random()::text) AS md5;


All the different leaf node types are implemented.



1/ Parallel select with sort (no index)
===

=> Terminal running the long SQL query:
test=# select * from t_10M order by md5;

=> Terminal monitoring SQL query progression:
test=# select pid,query from pg_stat_activity ;
  pid  |  query
---+--
  8062 |
  8065 |
 19605 | select pid,query from pg_stat_activity ;
 20830 | select * from t_10M order by md5;
 20832 | select * from t_10M order by md5;
 20833 | select * from t_10M order by md5;
  8060 |
  8059 |
  8061 |
(9 rows)

test=# PROGRESS 20830
test-# ;
 PLAN PROGRESS

 Gather Merge
   ->  Sort=> dumping tuples to tapes / merging tapes
 rows r/w merge 2167923/2167908 rows r/w effective 0/3514320 0%
 Sort Key: md5
 ->  Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=#
test=#
test=# PROGRESS 20830;
   PLAN PROGRESS

 Gather Merge
   ->  Sort=> final merge sort on tapes
 rows r/w merge 4707198/4691167 rows r/w effective 16016/3514320 0%
 Sort Key: md5
 ->  Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
PLAN
PROGRESS
-
 Gather Merge
   ->  Sort=> final merge sort on tapes
 rows r/w merge 4809857/4691167 rows r/w effective 118675/3514320 3%
 Sort Key: md5
 ->  Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
PLAN
PROGRESS
-
 Gather Merge
   ->  Sort=> final merge sort on tapes
 rows r/w merge 4883715/4691167 rows r/w effective 192533/3514320 5%
 Sort Key: md5
 ->  Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
PLAN
PROGRESS
-
 Gather Merge
   ->  Sort=> final merge sort on tapes
 rows r/w merge 4948381/4691167 rows r/w effective 257199/3514320 7%
 Sort Key: md5
 ->  Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
PLAN
PROGRESS
-
 Gather Merge
   ->  Sort=> final merge sort on tapes
 rows r/w merge 5022137/4691167 rows r/w effective 330955/3514320 9%
 Sort Key: md5
 ->  Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
PLAN
PROGRESS
--
 Gather Merge
   ->  Sort=> final merge sort on tapes
 rows r/w merge 5079083/4691167 rows r/w effective 387901/3514320
11%
 Sort Key: md5
 ->  Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
PLAN
PROGRESS
--
 Gather Merge
   ->  Sort=> final merge sort on tapes
 rows r/w merge 5144499/4691167 rows r/w effective 453317/3514320
12%
 Sort Key: md5
 ->  Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
PLAN PROGRESS
--
 
(1 row)

test=#

SQL query was interrupted before completion


2/ Insert into table
=

=> Terminal running the long SQL query:
test=# INSERT INTO T_10M SELECT generate_series(1001, 1200)  AS id,
md5(random()::text) AS md5;

=> Terminal monitoring SQL query progression:
test=# PROGRESS 20830;
PLAN PROGRESS
--
 
(1 row)

test=#
test=# PROGRESS 20830;
 PLAN PROGRESS
---
 Insert => rows 718161
   ->  ProjectSet
 ->  Result
(3 rows)

test=# PROGRESS 20830;
 PLAN PROGRESS

 Insert => rows 1370255
   ->  ProjectSet
 ->  Result
(3 rows)

test=# PROGRESS 20830;
 PLAN PROGRESS

 Insert => rows 

Re: [HACKERS] OK, so culicidae is *still* broken

2017-04-19 Thread Tom Lane
Amit Kapila  writes:
> On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane  wrote:
>> Obviously, any such fix would be a lot more likely to be reliable in
>> 64-bit machines.  There's probably not enough daylight to be sure of
>> making it work in 32-bit Windows, so I suspect we'd need some retry
>> logic anyway for that case.

> Yeah, that kind of thing can work assuming we don't get conflicts too
> often, but it could be possible that conflicts are not reported from
> ASLR enabled environments because of commit 7f3e17b4.

Right, but Andres' point is that we should make an effort to undo that
hack and instead allow ASLR to happen.  Not just because it's allegedly
more secure, but because we may have no choice in future Windows versions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-04-19 Thread Remi Colinet
Maksim,


2017-04-18 20:31 GMT+02:00 Maksim Milyutin :

> On 18.04.2017 17:39, Remi Colinet wrote:
>
>> Hello Maksim,
>>
>> The core implementation I suggested for the new PROGRESS command uses
>> different functions from the one used by EXPLAIN for its core
>> implementation.
>> Some source code is shared with EXPLAIN command. But this shared code is
>> only related to quals, properties, children, subPlans and few other nodes.
>>
>> All other code for PROGRESS is new code.
>>
>> I don't believe explain.c code can be fully shared with the one of the
>> new PROGRESS command. These 2 commands have different purposes.
>> The core code used for the new PROGRESS command is very different from
>> the core code used for EXPLAIN.
>>
>>
> Perhaps you will be forced to duplicate significant snippets of
> functionality from explain.c into your progress.c.
>

Currently, few code is duplicated between EXPLAIN and PROGRESS commands.
The duplicated code could be moved to file src/backend/commands/report.c
which is used to gather shared code between the 2 commands. I will try to
complete this code sharing as much as possible.

The main point is that PROGRESS uses the same design pattern as EXPLAIN by
parsing the query tree. The work horse of the PROGRESS command is
ProgressNode() which calls recursively sub nodes until we reach leaf nodes
such as SeqScan, IndexScan, TupleStore, Sort, Material, ... . EXPLAIN
command uses a similar work horse with function ExplainNode() which
eventually calls different leaf nodes.

Some of the leaf nodes which are common to the 2 commands have been put in
the file src/backend/commands/report.c. May be some further code sharing is
also possible for the work horse by using a template function which would
call EXPLAIN specific leaf node functions or PROGRESS specific leaf node
functions.


>
>
>> Regarding the queryDesc state of SQL query upon receiving a request to
>> report its execution progress, it does not bring any issue. The request
>> is noted when the signal is received by the monitored backend. Then, the
>> backend continues its execution code path. When interrupts are checked
>> in the executor code, the request will be dealt.
>>
>>
> Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.
>
> When the request is being dealt, the monitored backend will stop its
>> execution and report the progress of the SQL query. Whatever is the
>> status of the SQL query, progress.c code checks the status and report
>> either that the SQL query does not have a valid status, or otherwise the
>> current execution state of the SQL query.
>>
>> SQL query status checking is about:
>> - idle transaction
>> - out of transaction status
>> - null planned statement
>> - utility command
>> - self monitoring
>>
>> Other tests can be added if needed to exclude some SQL query state. Such
>> checking is done in void HandleProgressRequest(void).
>> I do not see why a SQL query progression would not be possible in this
>> context. Even when the queryDescc is NULL, we can just report a > transaction> output. This is currently the case with the patch suggested.
>>
>>
> It's interesting question - how much the active query is in a usable state
> on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS
> doesn't give us 100% guarantee about full consistency [1].
>

I wonder what you mean about usable state.

Currently, the code suggested tests the queryDesc pointer and all the sub
nodes pointers in order to detect NULL pointers. When the progress report
is collected by the backend, this backend does the collect and consequently
does not run the query. So the query tree is not being modified. At this
moment, whatever is the query state, we can manage to deal with its static
state. It is only a tree which could also be just a NULL pointer in the
most extreme case. Such case is dealt in the current code.


>
> So far, I've found this new command very handy. It allows to evaluate
>> the time needed to complete a SQL query.
>>
>>
> Could you explain how you get the percent of execution for nodes of plan
> tree and overall for query?
>

The progress of execution of the query is computed as follows at 2
different places for each leaf node type (Scan, IndexScan, Sort, Material,
TupleStore, ...):

- one place in the executor code, or in access methods code, or in sort
utilities code, used during the execution of the SQL query in which
following values are counted for instance: rows R/W, blocks, R/W, tapes R/W
used for sort, tuple store R/W, ... . Some of these values are already
computed in the current Postgresql official source code. Some other values
had to be added and collected.

- one place in the leaf function of each node type (ProgressScan(),
ProgressSort(), ...) in which percents are computed and are then dumped
together with raw values collected during execution, in the report. The
dump details can be selected with the VERBOSE option of the PROGRESS
command (For 

Re: [HACKERS] Relation cache invalidation on replica

2017-04-19 Thread Victor Yegorov
2017-03-13 9:22 GMT+02:00 Andres Freund :

> >I think we're hitting this particular issue quite frequently when
> >rebuilding indexes on master after big-volume data manipulations.
> >
> >We have `pgbouncer` in transaction mode for both, master and slave,
> >therefore it's quite typical to have sessions on slave, that
> >were using indexes before those we're re-created. Sad, but right now
> >maintenance is a big performance hit for our applications,
> >we have to re-connect them to start using indexes again.
> >
> >Are there any chances to get fix for this issue released in 10.0 and,
> >perhaps, backpatched also?
>
> I'm not at my computer right now, but I recall committing something like
> my approach.


Andres, can you point me on the commit you're mentioning, please?


-- 
Victor Yegorov


[HACKERS] BuildFarm client release 4.19

2017-04-19 Thread Andrew Dunstan

I have released version 4.19 of the PostgreSQL Buildfarm client. It can
be downloaded from



Apart from some minor bug fixes, the following changes are made:

  * Include the script's path in @INC. That means you can usually run
the script from anywhere rather than just its own directory.
  * Set TZ after "make check" is run. this makes for slightly faster
initdb runs in later stages.
  * make TAP tests run with --timer
  * change default log_line_prefix in config file to use %p instead of
%c. That makes it easier to tie log messages to other messages that
mention a pid
  * Add a module to log running commands to a file as it runs and
replace critical uses of `` with the new procedure. That means we
have better traces if the buildfarm process crashes.
  * Improvements to TAP tests logging and coverage. Each test set (i.e
each /t directory) is now logged separately. Coverage is extended to
remaining test sets in src/test except SSL tests.
  * Add a module for testing ICU
  * change target URL in config file sample to use canonical name
matching SSL certificate
  * default build_root to a directory in the same place that the script
is found, and create it if it doesn't exist.
  * add a command-line flag to allow setting config settings
  * from-source path can now be relative
  * from-source doesn't need to specify a branch, if not clear it
defaults to HEAD
  * from-source can now do vpath builds


These changes mean that the client is more useful for testing
development code, and also that testing config settings is much simpler.
An initial test run on a fresh installation is now as simple as:

cp buildfarm.conf.sample build-farm.conf

./run_build.pl --test --verbose

To test development code, the from-source option is now much more
flexible and friendly. For example, one might do something like:

./run_build.pl --from-source=/path/to/postgresql \

   --config-set use_vpath=1 \

   --config-set config_opts+=--enable-tap-tests \

   --config-set locales+=en_US.utf8


If you run something like this against your development code, and it
succeeds, you can be reasonably confident that if committed it won't
sent the buildfarm all red.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Masahiko Sawada
On Wed, Apr 19, 2017 at 10:07 PM, Petr Jelinek
 wrote:
> On 19/04/17 14:42, Masahiko Sawada wrote:
>> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>>>  wrote in 
>>> 
 On 18/04/17 18:14, Peter Eisentraut wrote:
> On 4/18/17 11:59, Petr Jelinek wrote:
>> Hmm if we create hashtable for this, I'd say create hashtable for the
>> whole table_states then. The reason why it's list now was that it seemed
>> unnecessary to have hashtable when it will be empty almost always but
>> there is no need to have both hashtable + list IMHO.
>>>
>>> I understant that but I also don't like the frequent palloc/pfree
>>> in long-lasting context and double loop like Peter.
>>>
> The difference is that we blow away the list of states when the catalog
> changes, but we keep the hash table with the start times around.  We
> need two things with different life times.
>>>
>>> On the other hand, hash seems overdone. Addition to that, the
>>> hash-version leaks stale entries while subscriptions are
>>> modified. But vacuuming them costs high.
>>>
 Why can't we just update the hashtable based on the catalog? I mean once
 the record is not needed in the list, the table has been synced so there
 is no need for the timestamp either since we'll not try to start the
 worker again.
>>
>> I guess the table sync worker for the same table could need to be
>> started again. For example, please image a case where the table
>> belonging to the publication is removed from it and the corresponding
>> subscription is refreshed, and then the table is added to it again. We
>> have the record of the table with timestamp in the hash table when the
>> table sync in the first time, but the table sync after refreshed could
>> have to wait for the interval.
>>
>
> But why do we want to wait in such case where user has explicitly
> requested refresh?
>

Yeah, sorry, I meant that we don't want to wait but cannot launch the
tablesync worker in such case.

But after more thought, the latest patch Peter proposed has the same
problem. Perhaps we need to scan always whole pg_subscription_rel and
remove the entry if the corresponding table get synced.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Alvaro Herrera
Stas Kelvich wrote:

> With patch MemoryContextStats() shows following hierarchy during slot 
> operations in 
> apply worker:
> 
> TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used
>   ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used
> ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 
> used
>   ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
> ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used

Please update src/backend/utils/mmgr/README to list the new contexts.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Petr Jelinek
On 19/04/17 14:42, Masahiko Sawada wrote:
> On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>>  wrote in 
>> 
>>> On 18/04/17 18:14, Peter Eisentraut wrote:
 On 4/18/17 11:59, Petr Jelinek wrote:
> Hmm if we create hashtable for this, I'd say create hashtable for the
> whole table_states then. The reason why it's list now was that it seemed
> unnecessary to have hashtable when it will be empty almost always but
> there is no need to have both hashtable + list IMHO.
>>
>> I understant that but I also don't like the frequent palloc/pfree
>> in long-lasting context and double loop like Peter.
>>
 The difference is that we blow away the list of states when the catalog
 changes, but we keep the hash table with the start times around.  We
 need two things with different life times.
>>
>> On the other hand, hash seems overdone. Addition to that, the
>> hash-version leaks stale entries while subscriptions are
>> modified. But vacuuming them costs high.
>>
>>> Why can't we just update the hashtable based on the catalog? I mean once
>>> the record is not needed in the list, the table has been synced so there
>>> is no need for the timestamp either since we'll not try to start the
>>> worker again.
> 
> I guess the table sync worker for the same table could need to be
> started again. For example, please image a case where the table
> belonging to the publication is removed from it and the corresponding
> subscription is refreshed, and then the table is added to it again. We
> have the record of the table with timestamp in the hash table when the
> table sync in the first time, but the table sync after refreshed could
> have to wait for the interval.
> 

But why do we want to wait in such case where user has explicitly
requested refresh?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Stas Kelvich

> On 19 Apr 2017, at 14:30, Petr Jelinek  wrote:
> 
> On 19/04/17 12:46, Stas Kelvich wrote:
>> 
>> Right now ApplyContext cleaned after each transaction and by this patch I 
>> basically 
>> suggest to clean it after each command counter increment. 
>> 
>> So in your definitions that is ApplyMessageContext, most short-lived one. We 
>> can
>> rename now ApplyContext -> ApplyMessageContext to make that clear and avoid
>> possible name conflict when somebody will decide to keep something during 
>> whole
>> transaction or worker lifespan.
> 
> Yeah we can rename, and then rename the ApplyCacheContext to
> ApplyContext. That would leave the ApplyTransactionContext from Simon's
> proposal which is not really need it anywhere right now and I am not
> sure it will be needed given there is still TopTransactionContext
> present. If we really want ApplyTransactionContext we can probably just
> assign it to TopTransactionContext in ensure_transaction.
> 
>> 
>> P.S. It also seems to me now that resetting context in ensure_transaction() 
>> isn’t that
>> good idea, probably better just explicitly call reset at the end of each 
>> function involved.
>> 
> 
> Well it would definitely improve clarity.
> 

Okay, done.

With patch MemoryContextStats() shows following hierarchy during slot 
operations in 
apply worker:

TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used
  ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used
ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 
used
  ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used




apply_contexts.patch
Description: Binary data

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Masahiko Sawada
On Wed, Apr 19, 2017 at 5:12 PM, Kyotaro HORIGUCHI
 wrote:
> At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek 
>  wrote in 
> 
>> On 18/04/17 18:14, Peter Eisentraut wrote:
>> > On 4/18/17 11:59, Petr Jelinek wrote:
>> >> Hmm if we create hashtable for this, I'd say create hashtable for the
>> >> whole table_states then. The reason why it's list now was that it seemed
>> >> unnecessary to have hashtable when it will be empty almost always but
>> >> there is no need to have both hashtable + list IMHO.
>
> I understant that but I also don't like the frequent palloc/pfree
> in long-lasting context and double loop like Peter.
>
>> > The difference is that we blow away the list of states when the catalog
>> > changes, but we keep the hash table with the start times around.  We
>> > need two things with different life times.
>
> On the other hand, hash seems overdone. Addition to that, the
> hash-version leaks stale entries while subscriptions are
> modified. But vacuuming them costs high.
>
>> Why can't we just update the hashtable based on the catalog? I mean once
>> the record is not needed in the list, the table has been synced so there
>> is no need for the timestamp either since we'll not try to start the
>> worker again.

I guess the table sync worker for the same table could need to be
started again. For example, please image a case where the table
belonging to the publication is removed from it and the corresponding
subscription is refreshed, and then the table is added to it again. We
have the record of the table with timestamp in the hash table when the
table sync in the first time, but the table sync after refreshed could
have to wait for the interval.

>
> Considering the anticipated complexity when many subscriptions
> exist in the list, and complexity to remove stale entries in the
> hash, I'm inclining toward poroposing just to add 'worker_start'
> in pg_subscription_rel. We already have the similars in
> pg_stat_activity and pg_stat_replication.
>

I was thinking the same. But I'm concerned last start time of table
sync worker is not kind of useful information for the user and we
already have similar value in pg_stat_activity
(pg_stat_replication.backend_start is actually taken from
pg_stat_activity.backend_start). I'm not sure whether it's good idea
to show the slightly shifed timestamps having same meaning in two
system view.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixup some misusage of appendStringInfo and friends

2017-04-19 Thread Ashutosh Bapat
I reviewed the patch. It compiles clean, make check-world passes. I do
not see any issue with it.

On Wed, Apr 19, 2017 at 9:13 AM, David Rowley
 wrote:
> The attached cleans up a few small misusages of appendStringInfo and
> related functions.
>
> Some similar work was done in,
>
> f92d6a540ac443f85f0929b284edff67da14687a
> d02f16470f117db3038dbfd87662d5f0eb5a2a9b
> cacbdd78106526d7c4f11f90b538f96ba8696fb0
>
> so the majority of these should all be new misuseages.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Petr Jelinek
On 19/04/17 12:46, Stas Kelvich wrote:
> 
>> On 19 Apr 2017, at 13:34, Simon Riggs  wrote:
>>
>> On 19 April 2017 at 11:24, Petr Jelinek  wrote:
>>
>>> I'd still like you to add comment to the ApplyContext saying that it's
>>> cleaned after handling each message so nothing that needs to survive for
>>> example whole transaction can be allocated in it as that's not very well
>>> visible IMHO (and I know I will forget about it when writing next patch
>>> in that area).
>>
>> It would be better to invent the contexts we want now, so we get the
>> architecture right for future work. Otherwise we have problems with
>> "can't backpatch this fix because that infrastructure doesn't exist in
>> PG10" in a couple of years.
>>
>> So I suggest we have
>>
>> ApplyMessageContext - cleaned after every message
>> ApplyTransactionContext - cleaned at EOXact
>> ApplyContext - persistent
> 
> Right now ApplyContext cleaned after each transaction and by this patch I 
> basically 
> suggest to clean it after each command counter increment. 
> 
> So in your definitions that is ApplyMessageContext, most short-lived one. We 
> can
> rename now ApplyContext -> ApplyMessageContext to make that clear and avoid
> possible name conflict when somebody will decide to keep something during 
> whole
> transaction or worker lifespan.

Yeah we can rename, and then rename the ApplyCacheContext to
ApplyContext. That would leave the ApplyTransactionContext from Simon's
proposal which is not really need it anywhere right now and I am not
sure it will be needed given there is still TopTransactionContext
present. If we really want ApplyTransactionContext we can probably just
assign it to TopTransactionContext in ensure_transaction.

> 
> P.S. It also seems to me now that resetting context in ensure_transaction() 
> isn’t that
> good idea, probably better just explicitly call reset at the end of each 
> function involved.
> 

Well it would definitely improve clarity.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Stas Kelvich

> On 19 Apr 2017, at 13:34, Simon Riggs  wrote:
> 
> On 19 April 2017 at 11:24, Petr Jelinek  wrote:
> 
>> I'd still like you to add comment to the ApplyContext saying that it's
>> cleaned after handling each message so nothing that needs to survive for
>> example whole transaction can be allocated in it as that's not very well
>> visible IMHO (and I know I will forget about it when writing next patch
>> in that area).
> 
> It would be better to invent the contexts we want now, so we get the
> architecture right for future work. Otherwise we have problems with
> "can't backpatch this fix because that infrastructure doesn't exist in
> PG10" in a couple of years.
> 
> So I suggest we have
> 
> ApplyMessageContext - cleaned after every message
> ApplyTransactionContext - cleaned at EOXact
> ApplyContext - persistent

Right now ApplyContext cleaned after each transaction and by this patch I 
basically 
suggest to clean it after each command counter increment. 

So in your definitions that is ApplyMessageContext, most short-lived one. We can
rename now ApplyContext -> ApplyMessageContext to make that clear and avoid
possible name conflict when somebody will decide to keep something during whole
transaction or worker lifespan.

P.S. It also seems to me now that resetting context in ensure_transaction() 
isn’t that
good idea, probably better just explicitly call reset at the end of each 
function involved.

> 
> -- 
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Simon Riggs
On 19 April 2017 at 11:24, Petr Jelinek  wrote:

> I'd still like you to add comment to the ApplyContext saying that it's
> cleaned after handling each message so nothing that needs to survive for
> example whole transaction can be allocated in it as that's not very well
> visible IMHO (and I know I will forget about it when writing next patch
> in that area).

It would be better to invent the contexts we want now, so we get the
architecture right for future work. Otherwise we have problems with
"can't backpatch this fix because that infrastructure doesn't exist in
PG10" in a couple of years.

So I suggest we have

ApplyMessageContext - cleaned after every message
ApplyTransactionContext - cleaned at EOXact
ApplyContext - persistent

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Petr Jelinek
On 19/04/17 11:55, Stas Kelvich wrote:
> 
>> On 19 Apr 2017, at 12:37, Petr Jelinek  wrote:
>>
>> On 18/04/17 13:45, Stas Kelvich wrote:
>>> Hi,
>>>
>>> currently logical replication worker uses ApplyContext to decode received 
>>> data
>>> and that context is never freed during transaction processing. Hence if 
>>> publication
>>> side is performing something like 10M row inserts in single transaction, 
>>> then
>>> subscription worker will occupy more than 10G of ram (and can be killed by 
>>> OOM).
>>>
>>> Attached patch resets ApplyContext after each insert/update/delete/commit.
>>> I’ve tried to reset context only on each 100/1000/1 value of 
>>> CommandCounter,
>>> but didn’t spotted any measurable difference in speed.
>>>
>>> Problem spotted by Mikhail Shurutov.
>>>
>>
>> Hmm we also do replication protocol handling inside the ApplyContext
>> (LogicalRepApplyLoop), are you sure this change is safe in that regard?
> 
> Memory is bloated by logicalrep_read_* when palloc happens inside.
> I’ve inserted context reset in ensure_transaction() which is only called in 
> beginning
> of hande_insert/update/delete/commit when data from previous call of that
> function isn’t needed. So probably it is safe to clear memory there. At least
> i don’t see any state that can be accessed later in this functions. Do you
> have any specific concerns?
> 

I wanted to make sure you checked things that are happening outside of
the apply_handle_* stuff. I checked myself now, the allocations that
happen outside don't use postgres memory contexts (libpqrcv_receive) so
that should not be problem. Other allocations that I see in neighboring
code switch to permanent contexts anyway. So looks safe on first look
indeed. Eventually we'll want to cache some of the executor stuff so
this will be problem but not in v10.

I'd still like you to add comment to the ApplyContext saying that it's
cleaned after handling each message so nothing that needs to survive for
example whole transaction can be allocated in it as that's not very well
visible IMHO (and I know I will forget about it when writing next patch
in that area).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Stas Kelvich

> On 19 Apr 2017, at 12:37, Petr Jelinek  wrote:
> 
> On 18/04/17 13:45, Stas Kelvich wrote:
>> Hi,
>> 
>> currently logical replication worker uses ApplyContext to decode received 
>> data
>> and that context is never freed during transaction processing. Hence if 
>> publication
>> side is performing something like 10M row inserts in single transaction, then
>> subscription worker will occupy more than 10G of ram (and can be killed by 
>> OOM).
>> 
>> Attached patch resets ApplyContext after each insert/update/delete/commit.
>> I’ve tried to reset context only on each 100/1000/1 value of 
>> CommandCounter,
>> but didn’t spotted any measurable difference in speed.
>> 
>> Problem spotted by Mikhail Shurutov.
>> 
> 
> Hmm we also do replication protocol handling inside the ApplyContext
> (LogicalRepApplyLoop), are you sure this change is safe in that regard?

Memory is bloated by logicalrep_read_* when palloc happens inside.
I’ve inserted context reset in ensure_transaction() which is only called in 
beginning
of hande_insert/update/delete/commit when data from previous call of that
function isn’t needed. So probably it is safe to clear memory there. At least
i don’t see any state that can be accessed later in this functions. Do you
have any specific concerns?


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Petr Jelinek
On 18/04/17 13:45, Stas Kelvich wrote:
> Hi,
> 
> currently logical replication worker uses ApplyContext to decode received data
> and that context is never freed during transaction processing. Hence if 
> publication
> side is performing something like 10M row inserts in single transaction, then
> subscription worker will occupy more than 10G of ram (and can be killed by 
> OOM).
> 
> Attached patch resets ApplyContext after each insert/update/delete/commit.
> I’ve tried to reset context only on each 100/1000/1 value of 
> CommandCounter,
> but didn’t spotted any measurable difference in speed.
> 
> Problem spotted by Mikhail Shurutov.
> 

Hmm we also do replication protocol handling inside the ApplyContext
(LogicalRepApplyLoop), are you sure this change is safe in that regard?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-19 Thread Maksim Milyutin

On 19.04.2017 11:42, Ashutosh Bapat wrote:

On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin
 wrote:


Local partitioned indexes can be recognized through the check on the relkind
of table to which the index refers. Something like this:

heap = relation_open(IndexGetRelation(indexid, false), heapLockmode);
if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
/* indexid is local index on partitioned table */


An index on partitioned table can be global index (yet to be
implemented) or a local index. We can not differentiate between those
just by looking at the relation on which they are built.



We could to refine the criteria for the local partitioned index later 
encapsulating it in a macro, e.g., adding a new flag from pg_index that 
differentiate the type of index on partitioned table.




Thеsе cases must be caught. But as much as partitioned tables doesn't
participate in query plans their indexes are unaccessible by executor.
Reindex operation is overloaded with my patch.



A global index would have storage for a partitioned table whereas a
local index wouldn't have any storage for a partitioned table.

I agree with Amit that we need new relkinds for local as well as global indexes.



Ok, thanks for the feedback. Then I'll use a new relkind for local 
partitioned index in further development.




--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-19 Thread Petr Jelinek
On 19/04/17 10:45, Kyotaro HORIGUCHI wrote:
> At Wed, 19 Apr 2017 17:43:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20170419.174317.114509231.horiguchi.kyot...@lab.ntt.co.jp>
>> At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek 
>>  wrote in 
>> 
 Commit has been moved from after to before of the lock section.
 This causes potential race condition. (As the same as the
 potential dead-lock, I'm not sure it can cause realistic problem,
 though..) Isn't it better to be after the lock section?

>>>
>>> We just read catalogs so there should not be locking issues.
>>
>> Some other process may modify it then go to there. With a kind of
>> priority inversion, the process may modify the data on the memory
>> *before* we modify it. Of course this is rather unrealistic,
>> though.
> 
> A bit short.
> 
> Some other process may modify it *after* we read it then go to
> there. With a kind of priority inversion, the process may modify
> the data on the memory *before* we modify it. Of course this is
> rather unrealistic, though.
> 

Yeah but I think that's risk anyway in MVCC read committed database, the
only way to protect against that would be to hold lock over the catalogs
access which was what we wanted to get rid of :)

BTW the whole sync coordination is explicitly written in a way that
unless you mess with catalogs manually only the tablesync process should
do UPDATEs to that catalog (with the exception of s->r state switch
which can happen in apply and has no effect on sync because both states
mean that synchronization is done, only makes apply stop tracking the
table individually).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-04-19 Thread Ashutosh Bapat
The function pass_down_bound() is a recursive function. For
SubqueryScanState we have to do something similar to ResultState i.e.
call pass_down_bound() recursively on subqueryScanState->subplan.

Please add this to the next commitfest.

On Wed, Apr 19, 2017 at 6:09 AM, Douglas Doole  wrote:
> We've hit a case where pass_down_bound() isn't pushing the row count limit
> from limit into sort. The issue is that we're getting a subquery scan node
> between the limit and the sort. The subquery is only doing column projection
> and has no quals or SRFs so it should be safe to push the limit into the
> sort.
>
> The query that hit the problem can be simplified to:
>
>SELECT * FROM (SELECT A,B FROM T ORDER BY C) LIMIT 5
>
> (Yeah, the query's a little screwy in that the ORDER BY should really be
> outside the subselect, but it came from a query generator, so that's a
> different conversation.)
>
> Proposed patch is attached.
>
> - Doug
> Salesforce
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-19 Thread Kyotaro HORIGUCHI
At Wed, 19 Apr 2017 17:43:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170419.174317.114509231.horiguchi.kyot...@lab.ntt.co.jp>
> At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek 
>  wrote in 
> 
> > > Commit has been moved from after to before of the lock section.
> > > This causes potential race condition. (As the same as the
> > > potential dead-lock, I'm not sure it can cause realistic problem,
> > > though..) Isn't it better to be after the lock section?
> > > 
> > 
> > We just read catalogs so there should not be locking issues.
> 
> Some other process may modify it then go to there. With a kind of
> priority inversion, the process may modify the data on the memory
> *before* we modify it. Of course this is rather unrealistic,
> though.

A bit short.

Some other process may modify it *after* we read it then go to
there. With a kind of priority inversion, the process may modify
the data on the memory *before* we modify it. Of course this is
rather unrealistic, though.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-19 Thread Kyotaro HORIGUCHI
At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek  
wrote in 
> > Commit has been moved from after to before of the lock section.
> > This causes potential race condition. (As the same as the
> > potential dead-lock, I'm not sure it can cause realistic problem,
> > though..) Isn't it better to be after the lock section?
> > 
> 
> We just read catalogs so there should not be locking issues.

Some other process may modify it then go to there. With a kind of
priority inversion, the process may modify the data on the memory
*before* we modify it. Of course this is rather unrealistic,
though.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-19 Thread Ashutosh Bapat
On Tue, Apr 18, 2017 at 4:43 PM, Maksim Milyutin
 wrote:
> On 18.04.2017 13:08, Amit Langote wrote:
>>
>> Hi,
>>
>
> Hi, Amit!
>
>> On 2017/04/17 23:00, Maksim Milyutin wrote:
>>>
>>>
>>> Ok, thanks for the note.
>>>
>>> But I want to discuss the relevancy of introduction of a new relkind for
>>> partitioned index. I could to change the control flow in partitioned
>>> index
>>> creation (specify conditional statement in the 'index_create' routine in
>>> attached patch) and not enter to the 'heap_create' routine. This case
>>> releases us from integrating new relkind into different places of
>>> Postgres
>>> code. But we have to copy-paste some specific code from 'heap_create'
>>> function, e.g., definition of relfilenode and tablespaceid for the new
>>> index and perhaps something more when 'heap_create' routine will be
>>> extended.
>>
>>
>> I may be missing something, but isn't it that a new relkind will be needed
>> anyway?  How does the rest of the code distinguish such index objects once
>> they are created?
>
>
> Local partitioned indexes can be recognized through the check on the relkind
> of table to which the index refers. Something like this:
>
> heap = relation_open(IndexGetRelation(indexid, false), heapLockmode);
> if (heap->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> /* indexid is local index on partitioned table */

An index on partitioned table can be global index (yet to be
implemented) or a local index. We can not differentiate between those
just by looking at the relation on which they are built.

>
>> Is it possible that some other code may try to access
>> the storage for an index whose indrelid is a partitioned table?
>>
>
> Thеsе cases must be caught. But as much as partitioned tables doesn't
> participate in query plans their indexes are unaccessible by executor.
> Reindex operation is overloaded with my patch.
>

A global index would have storage for a partitioned table whereas a
local index wouldn't have any storage for a partitioned table.

I agree with Amit that we need new relkinds for local as well as global indexes.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-04-19 Thread Kyotaro HORIGUCHI
At Wed, 19 Apr 2017 03:03:38 +0900, Fujii Masao  wrote 
in 

Re: [HACKERS] some review comments on logical rep code

2017-04-19 Thread Petr Jelinek
On 19/04/17 10:25, Kyotaro HORIGUCHI wrote:
> At Wed, 19 Apr 2017 04:18:18 +0200, Petr Jelinek 
>  wrote in 
> 
>> On 18/04/17 19:27, Fujii Masao wrote:
>>> On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
>>>  wrote:
 Thank you for working on this!

 On 18/04/17 10:16, Masahiko Sawada wrote:
> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>  wrote:

>> 10.
>>>
>>> SpinLockAcquire(>relmutex);
>>> MyLogicalRepWorker->relstate =
>>>   GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>> MyLogicalRepWorker->relid,
>>> >relstate_lsn,
>>> false);
>>> SpinLockRelease(>relmutex);
>>>
>>> Non-"short-term" function like GetSubscriptionRelState() should not
>>> be called while spinlock is being held.
>>>
>>
>> One option is adding new LWLock for the relation state change but 
>> this
>> lock is used at many locations where the operation actually doesn't
>> take time. I think that the discussion would be needed to get
>> consensus, so patch for it is not attached.
>
> From the point of view of time, I agree that it doesn't seem to
> harm. Bt I thing it as more significant problem that
> potentially-throwable function is called within the mutex
> region. It potentially causes a kind of dead lock.  (That said,
> theoretically dosn't occur and I'm not sure what happens by the
> dead lock..)
>

 Hmm, I think doing what I attached should be fine here.
>>>
>>> Thanks for the patch!
>>>
 We don't need to
 hold spinlock for table read, just for changing the
 MyLogicalRepWorker->relstate.
>>>
>>> Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
>>> be protected with the spinlock.
>>>
>>
>> Yes, sorry tired/blind, fixed.
> 
> Commit has been moved from after to before of the lock section.
> This causes potential race condition. (As the same as the
> potential dead-lock, I'm not sure it can cause realistic problem,
> though..) Isn't it better to be after the lock section?
> 

We just read catalogs so there should not be locking issues.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-04-19 Thread Kyotaro HORIGUCHI
At Wed, 19 Apr 2017 13:32:48 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] some review comments on logical rep code

2017-04-19 Thread Kyotaro HORIGUCHI
At Wed, 19 Apr 2017 04:18:18 +0200, Petr Jelinek  
wrote in 
> On 18/04/17 19:27, Fujii Masao wrote:
> > On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
> >  wrote:
> >> Thank you for working on this!
> >>
> >> On 18/04/17 10:16, Masahiko Sawada wrote:
> >>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
> >>>  wrote:
> >>
>  10.
> >
> > SpinLockAcquire(>relmutex);
> > MyLogicalRepWorker->relstate =
> >   GetSubscriptionRelState(MyLogicalRepWorker->subid,
> > MyLogicalRepWorker->relid,
> > >relstate_lsn,
> > false);
> > SpinLockRelease(>relmutex);
> >
> > Non-"short-term" function like GetSubscriptionRelState() should not
> > be called while spinlock is being held.
> >
> 
>  One option is adding new LWLock for the relation state change but 
>  this
>  lock is used at many locations where the operation actually doesn't
>  take time. I think that the discussion would be needed to get
>  consensus, so patch for it is not attached.
> >>>
> >>> From the point of view of time, I agree that it doesn't seem to
> >>> harm. Bt I thing it as more significant problem that
> >>> potentially-throwable function is called within the mutex
> >>> region. It potentially causes a kind of dead lock.  (That said,
> >>> theoretically dosn't occur and I'm not sure what happens by the
> >>> dead lock..)
> >>>
> >>
> >> Hmm, I think doing what I attached should be fine here.
> > 
> > Thanks for the patch!
> > 
> >> We don't need to
> >> hold spinlock for table read, just for changing the
> >> MyLogicalRepWorker->relstate.
> > 
> > Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
> > be protected with the spinlock.
> > 
> 
> Yes, sorry tired/blind, fixed.

Commit has been moved from after to before of the lock section.
This causes potential race condition. (As the same as the
potential dead-lock, I'm not sure it can cause realistic problem,
though..) Isn't it better to be after the lock section?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-19 Thread Kyotaro HORIGUCHI
At Tue, 18 Apr 2017 18:40:56 +0200, Petr Jelinek  
wrote in 
> On 18/04/17 18:14, Peter Eisentraut wrote:
> > On 4/18/17 11:59, Petr Jelinek wrote:
> >> Hmm if we create hashtable for this, I'd say create hashtable for the
> >> whole table_states then. The reason why it's list now was that it seemed
> >> unnecessary to have hashtable when it will be empty almost always but
> >> there is no need to have both hashtable + list IMHO.

I understant that but I also don't like the frequent palloc/pfree
in long-lasting context and double loop like Peter.

> > The difference is that we blow away the list of states when the catalog
> > changes, but we keep the hash table with the start times around.  We
> > need two things with different life times.

On the other hand, hash seems overdone. Addition to that, the
hash-version leaks stale entries while subscriptions are
modified. But vacuuming them costs high.

> Why can't we just update the hashtable based on the catalog? I mean once
> the record is not needed in the list, the table has been synced so there
> is no need for the timestamp either since we'll not try to start the
> worker again.

Considering the anticipated complexity when many subscriptions
exist in the list, and complexity to remove stale entries in the
hash, I'm inclining toward poroposing just to add 'worker_start'
in pg_subscription_rel. We already have the similars in
pg_stat_activity and pg_stat_replication.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-04-19 Thread Jan Michálek
2017-04-19 9:18 GMT+02:00 Fabien COELHO :

>
> I still do not understand "why" this variant vs CommonMark or whatever
>>> other version.
>>>
>>
>> Because of simply implementation and readability (looks similar to aligned
>> format) and it is comfortable to edit generated table (changing values,
>> aligning columns etc.).
>>
>
> Hmmm. Why not.
>
> Sorry, maybe I`m not understanding, there is problems with characters like
>> pipe in cells, pipe should be escaped. What other special markdown
>> characters? Escaping html code in cells?
>>
>
> Markdown include characters/sequences which are interpreted as markers:
> _Italic_, **Bold**, *** => horizontal rules, > block quote... `inline
> code`... If they are interpreted within a table cell then probably they
> should be escaped somehow.
>

I`m able to sanitize characters, but complex sequences will be problem. I
will look on this, but I don`t know, if I`m able to do this.

My main interest on this was in rst. I`m using markdown only in github
issues and my knowldge about md is poor.


>
> Main of the functionality is used from aligned format. I tested returned
 tables in retext and it works. If i have another character than standart
 pipe, it shouldn`t work.

 Sure. ISTM that you are currently using U+2502 instead of pipe, hence my
>>> point.
>>>
>>
>> Could you send me example where?
>>
>
> I already did in the first mail with the example output copy pasted from
> psql. Some characters are pipe and others are BOX DRAWINGS LIGHT VERTICAL
> characters.
>
> Maybe this is because I have in my ~/.psqlrc:
>
>  \pset linestyle unicode
>  \pset border 2
>
> in which case your reuse of the the aligned stuff should take care of the
> border setting to avoid using special UTF8 characters..
>

Yes, it looks it is done by linestyle.


jelen=# SELECT 1;

| ?column? |
|--|
|1 |


(1 row)

jelen=# \pset linestyle unicode
Line style is unicode.
jelen=# SELECT 1;

│ ?column? │
|--|
│1 │


(1 row)

jelen=#

I have prepared linestyle for rst and md, but I can`t switch linestyle
outside, because if i did it
\pset linestyle
wrote "markdown" or "rst".
I see, problem is only in cells borders, I will correct this.

Jan


>
> --
> Fabien.
>



-- 
Jelen
Starší čeledín datovýho chlíva


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-04-19 Thread Fabien COELHO



I still do not understand "why" this variant vs CommonMark or whatever
other version.


Because of simply implementation and readability (looks similar to aligned
format) and it is comfortable to edit generated table (changing values,
aligning columns etc.).


Hmmm. Why not.


Sorry, maybe I`m not understanding, there is problems with characters like
pipe in cells, pipe should be escaped. What other special markdown
characters? Escaping html code in cells?


Markdown include characters/sequences which are interpreted as markers: 
_Italic_, **Bold**, *** => horizontal rules, > block quote... `inline 
code`... If they are interpreted within a table cell then probably they 
should be escaped somehow.



Main of the functionality is used from aligned format. I tested returned
tables in retext and it works. If i have another character than standart
pipe, it shouldn`t work.


Sure. ISTM that you are currently using U+2502 instead of pipe, hence my
point.


Could you send me example where?


I already did in the first mail with the example output copy pasted from 
psql. Some characters are pipe and others are BOX DRAWINGS LIGHT VERTICAL 
characters.


Maybe this is because I have in my ~/.psqlrc:

 \pset linestyle unicode
 \pset border 2

in which case your reuse of the the aligned stuff should take care of the 
border setting to avoid using special UTF8 characters..


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OK, so culicidae is *still* broken

2017-04-19 Thread Amit Kapila
On Sun, Apr 16, 2017 at 3:04 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2017-04-15 17:24:54 -0400, Tom Lane wrote:
>>> I wonder whether we could work around that by just destroying the created
>>> process and trying again if we get a collision.  It'd be a tad
>>> inefficient, but hopefully collisions wouldn't happen often enough to be a
>>> big problem.
>
>> That might work, although it's obviously not pretty.  We could also just
>> default to some out-of-the-way address for MapViewOfFileEx, that might
>> also work.
>
> Could be.  Does Microsoft publish any documentation about the range of
> addresses their ASLR uses?
>

I have look around to find some information to see if there is any
such address range which could be used for our purpose.  I am not able
to see any such predictable address range.  You might want to read the
article [1] especially the text around "What is the memory address
space range in virtual memory map where system DLLs and user DLLs
could load?"   It seems to indicate that there is no such address
unless I have misunderstood it.  I don't deny the possibility of
having such an address range, but I could not find any info on the
same.

> Obviously, any such fix would be a lot more likely to be reliable in
> 64-bit machines.  There's probably not enough daylight to be sure of
> making it work in 32-bit Windows, so I suspect we'd need some retry
> logic anyway for that case.
>

Yeah, that kind of thing can work assuming we don't get conflicts too
often, but it could be possible that conflicts are not reported from
ASLR enabled environments because of commit 7f3e17b4.

[1] - 
https://blogs.msdn.microsoft.com/winsdk/2009/11/30/how-to-disable-address-space-layout-randomization-aslr/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-04-19 Thread Jan Michálek
2017-04-18 23:06 GMT+02:00 Fabien COELHO :

>
> Hello,
>
> There are different flavour of markdown, maybe you should document which
>>> one is targetted. Should it be CommonMark? Another variant? Why?
>>>
>>
>> This should be pandoc pipe table. It's because it is similar to aligned
>> format. I need add this to documentation (i have it in recent TODO)
>>
>
> I still do not understand "why" this variant vs CommonMark or whatever
> other version.
>

Because of simply implementation and readability (looks similar to aligned
format) and it is comfortable to edit generated table (changing values,
aligning columns etc.).


>
> ISTM that the md format lacks escaping for special md characters [...] I'd
>>> say that you need to do escaping more or less similar to html?
>>>
>>
>> There is problem with markown and newlines. Replacing newline by br was
>> only solution that I was able to find.
>>
>
> Fine. That does not answer the question about escaping other special md
> characters.
>

Sorry, maybe I`m not understanding, there is problems with characters like
pipe in cells, pipe should be escaped. What other special markdown
characters? Escaping html code in cells?


>
> Also, it seems that you use distinct vertical bar characters in the
>>> format? Or is this a trick of my terminal?? It seems that your patch
>>> introduces U+2502 (BOX DRAWINGS LIGHT VERTICAL) instead of the usual pipe
>>> in some places. Maybe you copy-pasted things from the unicode linestyle.
>>>
>>
>> Main of the functionality is used from aligned format. I tested returned
>> tables in retext and it works. If i have another character than standart
>> pipe, it shouldn`t work.
>>
>
> Sure. ISTM that you are currently using U+2502 instead of pipe, hence my
> point.
>

Could you send me example where?


>
> Why are *_newline variants added for length and formatting? Would it be
>>> possible to do without, say by relying on the line count computed by the
>>> standard function for instance?
>>>
>>
>> It`s because newlines in markdown, If I need to do it without copy this
>> function, i had to add parameter for markdown to this functions.
>>
>
> Then maybe it is an option to consider to avoid duplicating code.
>
> --
> Fabien.
>



-- 
Jelen
Starší čeledín datovýho chlíva