Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-23 Thread Pavel Stehule
2015-12-24 3:23 GMT+01:00 Michael Paquier :

> On Thu, Dec 17, 2015 at 6:45 PM, Shulgin, Oleksandr
>  wrote:
> > On Wed, Dec 16, 2015 at 8:39 PM, Tomas Vondra <
> tomas.von...@2ndquadrant.com>
> > wrote:
> >> On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:
> >>>
> >>>
> >>> I have the plans to make something from this on top of
> >>> pg_stat_statements and auto_explain, as I've mentioned last time.  The
> >>> next iteration will be based on the two latest patches above, so it
> >>> still makes sense to review them.
> >>>
> >>> As for EXPLAIN ANALYZE support, it will require changes to core, but
> >>> this can be done separately.
> >>
> >> I'm re-reading the thread, and I have to admit I'm utterly confused what
> >> is the current plan, what features it's supposed to provide and whether
> it
> >> will solve the use case I'm most interested in. Oleksandr, could you
> please
> >> post a summary explaining that?
> >>
> >> My use case for this functionality is debugging of long-running queries,
> >> particularly getting EXPLAIN ANALYZE for them. In such cases I either
> can't
> >> run the EXPLAIN ANALYZE manually because it will either run forever
> (just
> >> like the query) and may not be the same (e.g. due to recent ANALYZE).
> So we
> >> need to extract the data from the process executing the query.
> >>
> >> I'm not essentially opposed to doing this in an extension (better an
> >> extension than nothing), but I don't quite see how you could to do that
> from
> >> pg_stat_statements or auto_explain. AFAIK both extensions merely use
> hooks
> >> before/after the executor, and therefore can't do anything in between
> (while
> >> the query is actually running).
> >>
> >> Perhaps you don't intend to solve this particular use case? Which use
> >> cases are you aiming to solve, then? Could you explain?
> >
> > Thanks for your interest in this patch!
> >
> > My motivation is the same as your use case: having a long-running query,
> be
> > able to look inside this exact query run by this exact backend.
> >
> > I admit the evolution of ideas in this patch can be very confusing: we
> were
> > trying a number of different approaches, w/o thinking deeply on the
> > implications, just to have a proof of concept.
>
> It's great to see ideas of concepts and things to help address those
> issues, at least we are agreeing that there is a hole in the
> instrumentation and that it would be nice to fill it with something.
> Still, it is not completely clear which approach would be taken. Is it
> fair to mark the current patch as returned with feedback then?
>

+1

Pavel


> --
> Michael
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-23 Thread Michael Paquier
On Thu, Dec 17, 2015 at 6:45 PM, Shulgin, Oleksandr
 wrote:
> On Wed, Dec 16, 2015 at 8:39 PM, Tomas Vondra 
> wrote:
>> On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:
>>>
>>>
>>> I have the plans to make something from this on top of
>>> pg_stat_statements and auto_explain, as I've mentioned last time.  The
>>> next iteration will be based on the two latest patches above, so it
>>> still makes sense to review them.
>>>
>>> As for EXPLAIN ANALYZE support, it will require changes to core, but
>>> this can be done separately.
>>
>> I'm re-reading the thread, and I have to admit I'm utterly confused what
>> is the current plan, what features it's supposed to provide and whether it
>> will solve the use case I'm most interested in. Oleksandr, could you please
>> post a summary explaining that?
>>
>> My use case for this functionality is debugging of long-running queries,
>> particularly getting EXPLAIN ANALYZE for them. In such cases I either can't
>> run the EXPLAIN ANALYZE manually because it will either run forever (just
>> like the query) and may not be the same (e.g. due to recent ANALYZE). So we
>> need to extract the data from the process executing the query.
>>
>> I'm not essentially opposed to doing this in an extension (better an
>> extension than nothing), but I don't quite see how you could to do that from
>> pg_stat_statements or auto_explain. AFAIK both extensions merely use hooks
>> before/after the executor, and therefore can't do anything in between (while
>> the query is actually running).
>>
>> Perhaps you don't intend to solve this particular use case? Which use
>> cases are you aiming to solve, then? Could you explain?
>
> Thanks for your interest in this patch!
>
> My motivation is the same as your use case: having a long-running query, be
> able to look inside this exact query run by this exact backend.
>
> I admit the evolution of ideas in this patch can be very confusing: we were
> trying a number of different approaches, w/o thinking deeply on the
> implications, just to have a proof of concept.

It's great to see ideas of concepts and things to help address those
issues, at least we are agreeing that there is a hole in the
instrumentation and that it would be nice to fill it with something.
Still, it is not completely clear which approach would be taken. Is it
fair to mark the current patch as returned with feedback then?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-23 Thread Michael Paquier
On Thu, Dec 24, 2015 at 1:57 PM, Pavel Stehule  wrote:
>
>
> 2015-12-24 3:23 GMT+01:00 Michael Paquier :
>>
>> On Thu, Dec 17, 2015 at 6:45 PM, Shulgin, Oleksandr
>>  wrote:
>> > On Wed, Dec 16, 2015 at 8:39 PM, Tomas Vondra
>> > 
>> > wrote:
>> >> On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:
>> >>>
>> >>>
>> >>> I have the plans to make something from this on top of
>> >>> pg_stat_statements and auto_explain, as I've mentioned last time.  The
>> >>> next iteration will be based on the two latest patches above, so it
>> >>> still makes sense to review them.
>> >>>
>> >>> As for EXPLAIN ANALYZE support, it will require changes to core, but
>> >>> this can be done separately.
>> >>
>> >> I'm re-reading the thread, and I have to admit I'm utterly confused
>> >> what
>> >> is the current plan, what features it's supposed to provide and whether
>> >> it
>> >> will solve the use case I'm most interested in. Oleksandr, could you
>> >> please
>> >> post a summary explaining that?
>> >>
>> >> My use case for this functionality is debugging of long-running
>> >> queries,
>> >> particularly getting EXPLAIN ANALYZE for them. In such cases I either
>> >> can't
>> >> run the EXPLAIN ANALYZE manually because it will either run forever
>> >> (just
>> >> like the query) and may not be the same (e.g. due to recent ANALYZE).
>> >> So we
>> >> need to extract the data from the process executing the query.
>> >>
>> >> I'm not essentially opposed to doing this in an extension (better an
>> >> extension than nothing), but I don't quite see how you could to do that
>> >> from
>> >> pg_stat_statements or auto_explain. AFAIK both extensions merely use
>> >> hooks
>> >> before/after the executor, and therefore can't do anything in between
>> >> (while
>> >> the query is actually running).
>> >>
>> >> Perhaps you don't intend to solve this particular use case? Which use
>> >> cases are you aiming to solve, then? Could you explain?
>> >
>> > Thanks for your interest in this patch!
>> >
>> > My motivation is the same as your use case: having a long-running query,
>> > be
>> > able to look inside this exact query run by this exact backend.
>> >
>> > I admit the evolution of ideas in this patch can be very confusing: we
>> > were
>> > trying a number of different approaches, w/o thinking deeply on the
>> > implications, just to have a proof of concept.
>>
>> It's great to see ideas of concepts and things to help address those
>> issues, at least we are agreeing that there is a hole in the
>> instrumentation and that it would be nice to fill it with something.
>> Still, it is not completely clear which approach would be taken. Is it
>> fair to mark the current patch as returned with feedback then?
>
>
> +1

So, done this way. We could always move it to next CF if someone
wishes to move on. But at this point that would be a different
approach than what was proposed first..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-17 Thread Shulgin, Oleksandr
On Wed, Dec 16, 2015 at 8:39 PM, Tomas Vondra 
wrote:

> Hi,
>
> On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:
>
>>
>> I have the plans to make something from this on top of
>> pg_stat_statements and auto_explain, as I've mentioned last time.  The
>> next iteration will be based on the two latest patches above, so it
>> still makes sense to review them.
>>
>> As for EXPLAIN ANALYZE support, it will require changes to core, but
>> this can be done separately.
>>
>
> I'm re-reading the thread, and I have to admit I'm utterly confused what
> is the current plan, what features it's supposed to provide and whether it
> will solve the use case I'm most interested in. Oleksandr, could you please
> post a summary explaining that?
>
> My use case for this functionality is debugging of long-running queries,
> particularly getting EXPLAIN ANALYZE for them. In such cases I either can't
> run the EXPLAIN ANALYZE manually because it will either run forever (just
> like the query) and may not be the same (e.g. due to recent ANALYZE). So we
> need to extract the data from the process executing the query.
>
> I'm not essentially opposed to doing this in an extension (better an
> extension than nothing), but I don't quite see how you could to do that
> from pg_stat_statements or auto_explain. AFAIK both extensions merely use
> hooks before/after the executor, and therefore can't do anything in between
> (while the query is actually running).
>
> Perhaps you don't intend to solve this particular use case? Which use
> cases are you aiming to solve, then? Could you explain?
>

Hi Tomas.

Thanks for your interest in this patch!

My motivation is the same as your use case: having a long-running query, be
able to look inside this exact query run by this exact backend.

I admit the evolution of ideas in this patch can be very confusing: we were
trying a number of different approaches, w/o thinking deeply on the
implications, just to have a proof of concept.

Maybe all we need to do is add another hook somewhere in the executor, and
> push the explain analyze into pg_stat_statements once in a while, entirely
> eliminating the need for inter-process communication (signals, queues, ...).
>
> I'm pretty sure we don't need this for "short" queries, because in those
> cases we have other means to get the explain analyze (e.g. running the
> query again or auto_explain). So I can imagine having a rather high
> threshold (say, 60 seconds or even more), and we'd only push the explain
> analyze after crossing it. And then we'd only update it once in a while -
> say, again every 60 seconds.
>
> Of course, this might be configurable by two GUCs:
>
>pg_stat_statements.explain_analyze_threshold = 60  # -1 is "off"
>pg_stat_statements.explain_analyze_refresh = 60
>
> FWIW I'd still prefer having "EXPLAIN ANALYZE" in core, but better this
> than nothing.
>

Yes, this is how pg_stat_statements idea comes into play: even if we
implement support for online EXPLAIN ANALYZE, enabling it for every query
is an overkill, but if you don't enable it from the query start, there is
no way to enable it later on as the query has already progressed.  So in
order to know for which queries it makes sense to enable this feature, we
need to know what is the query's expected run time, thus pg_stat_statements
seems like a natural place to obtain this information and/or trigger the
behavior.

I'm also all for simplification of the underlying communication mechanism:
shared memory queues are neat, but not necessarily the best way to handle
it.  As for the use of signals: I believe this was a red herring, it will
make the code much less fragile if the progressing backend itself will
publish intermediary EXPLAIN ANALYZE reports for other backends to read.

The EXPLAIN (w/o ANALYZE) we can do completely as an extension: no core
support required.  To enable ANALYZE it will require a little hacking
around Instrumentation methods: otherwise the Explain functions just crash
when run in the middle of the query.

Hope that makes it clear.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-16 Thread Tomas Vondra

Hi,

On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:


I have the plans to make something from this on top of
pg_stat_statements and auto_explain, as I've mentioned last time.  The
next iteration will be based on the two latest patches above, so it
still makes sense to review them.

As for EXPLAIN ANALYZE support, it will require changes to core, but
this can be done separately.


I'm re-reading the thread, and I have to admit I'm utterly confused what 
is the current plan, what features it's supposed to provide and whether 
it will solve the use case I'm most interested in. Oleksandr, could you 
please post a summary explaining that?


My use case for this functionality is debugging of long-running queries, 
particularly getting EXPLAIN ANALYZE for them. In such cases I either 
can't run the EXPLAIN ANALYZE manually because it will either run 
forever (just like the query) and may not be the same (e.g. due to 
recent ANALYZE). So we need to extract the data from the process 
executing the query.


I'm not essentially opposed to doing this in an extension (better an 
extension than nothing), but I don't quite see how you could to do that 
from pg_stat_statements or auto_explain. AFAIK both extensions merely 
use hooks before/after the executor, and therefore can't do anything in 
between (while the query is actually running).


Perhaps you don't intend to solve this particular use case? Which use 
cases are you aiming to solve, then? Could you explain?



Maybe all we need to do is add another hook somewhere in the executor, 
and push the explain analyze into pg_stat_statements once in a while, 
entirely eliminating the need for inter-process communication (signals, 
queues, ...).


I'm pretty sure we don't need this for "short" queries, because in those 
cases we have other means to get the explain analyze (e.g. running the 
query again or auto_explain). So I can imagine having a rather high 
threshold (say, 60 seconds or even more), and we'd only push the explain 
analyze after crossing it. And then we'd only update it once in a while 
- say, again every 60 seconds.


Of course, this might be configurable by two GUCs:

   pg_stat_statements.explain_analyze_threshold = 60  # -1 is "off"
   pg_stat_statements.explain_analyze_refresh = 60

FWIW I'd still prefer having "EXPLAIN ANALYZE" in core, but better this 
than nothing.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-01 Thread Shulgin, Oleksandr
On Tue, Dec 1, 2015 at 12:04 AM, Simon Riggs  wrote:

> On 30 November 2015 at 22:27, Julien Rouhaud 
> wrote:
>
>
>> I registered as reviewer on this, but after reading the whole thread for
>> the second time, it's still not clear to me if the last two submitted
>> patches (0001-Add-auto_explain.publish_plans.patch and
>> 0002-Add-SHM-table-of-contents-to-the-explain-DSM.patch) are still
>> needed reviews, since multiple refactoring ideas and objections have
>> been raised since.
>>
>
> I looked into it more deeply and decided partial EXPLAINs aren't very
> useful yet.
>
> I've got some thoughts around that which I'll publish when I have more
> time. I suggest this is a good idea, just needs some serious
> committer-love, so we should bounce this for now.
>

I have the plans to make something from this on top of pg_stat_statements
and auto_explain, as I've mentioned last time.  The next iteration will be
based on the two latest patches above, so it still makes sense to review
them.

As for EXPLAIN ANALYZE support, it will require changes to core, but this
can be done separately.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-11-30 Thread Julien Rouhaud
Hello,

On 15/10/2015 16:04, Pavel Stehule wrote:
> 
> 
> 2015-10-15 15:42 GMT+02:00 Tom Lane  >:
> 
> "Shulgin, Oleksandr"  > writes:
> > I was thinking about this and what seems to be the biggest problem is 
> when
> > to actually turn the feature on.  It seems unlikely that someone will 
> want
> > to enable it unconditionally.  Enabling per-backend also doesn't seem 
> to be
> > a good approach because you don't know if the next query you'd like to 
> look
> > at is going to run in this exact backend.
> 
> Check.
> 
> > What might be actually usable is poking pg_stat_statements for queryid 
> to
> > decide if we need to do explain (and possibly analyze).
> 
> Hm, interesting thought.
> 
> > Does this make sense to you?  Does this make a good argument for merging
> > pg_stat_statements and auto_explain into core?
> 
> I'd say more that it's a good argument for moving this feature out to
> one of those extensions, or perhaps building a third extension that
> depends on both of those.  TBH, none of this stuff smells to me like
> something that ought to be in core.
> 
> 
> There are few features, that I would to see in core:
> 
> 1. possibility to get full SQL string
> 2. possibility to get state string
> 
> We can speak how to do it well.
> 

I registered as reviewer on this, but after reading the whole thread for
the second time, it's still not clear to me if the last two submitted
patches (0001-Add-auto_explain.publish_plans.patch and
0002-Add-SHM-table-of-contents-to-the-explain-DSM.patch) are still
needed reviews, since multiple refactoring ideas and objections have
been raised since.


Regards.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-11-30 Thread Simon Riggs
On 30 November 2015 at 22:27, Julien Rouhaud 
wrote:


> I registered as reviewer on this, but after reading the whole thread for
> the second time, it's still not clear to me if the last two submitted
> patches (0001-Add-auto_explain.publish_plans.patch and
> 0002-Add-SHM-table-of-contents-to-the-explain-DSM.patch) are still
> needed reviews, since multiple refactoring ideas and objections have
> been raised since.
>

I looked into it more deeply and decided partial EXPLAINs aren't very
useful yet.

I've got some thoughts around that which I'll publish when I have more
time. I suggest this is a good idea, just needs some serious
committer-love, so we should bounce this for now.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-10-15 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 7:52 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> This is not a change of the direction, but rather of the approach.
> Hitting a process with a signal and hoping it will produce a meaningful
> response in all circumstances without disrupting its current task was way
> too naive.  I'd really want to make this work with ANALYZE, just not as the
> first step.  I believe this approach can be extended to include
> instrumentation support (obviously we will not be able to contain this in
> the auto_explain module).
>

I was thinking about this and what seems to be the biggest problem is when
to actually turn the feature on.  It seems unlikely that someone will want
to enable it unconditionally.  Enabling per-backend also doesn't seem to be
a good approach because you don't know if the next query you'd like to look
at is going to run in this exact backend.

What might be actually usable is poking pg_stat_statements for queryid to
decide if we need to do explain (and possibly analyze).  We could make it
so that the explain is produced for any query that is known to run longer
than certain configurable threshold on average (or we can provide for
enabling this explicitly per entry in pg_stat_statements).  Then the
interested client can go and do pg_explain_backend() on the pid.

If we would also track the plan total costs, we could do some predictions
so that if the same query comes along and is planned with total cost
exceeding the recorded average the configurable amount of percent, then we
enable explain.

Does this make sense to you?  Does this make a good argument for merging
pg_stat_statements and auto_explain into core?

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-10-15 Thread Tom Lane
"Shulgin, Oleksandr"  writes:
> I was thinking about this and what seems to be the biggest problem is when
> to actually turn the feature on.  It seems unlikely that someone will want
> to enable it unconditionally.  Enabling per-backend also doesn't seem to be
> a good approach because you don't know if the next query you'd like to look
> at is going to run in this exact backend.

Check.

> What might be actually usable is poking pg_stat_statements for queryid to
> decide if we need to do explain (and possibly analyze).

Hm, interesting thought.

> Does this make sense to you?  Does this make a good argument for merging
> pg_stat_statements and auto_explain into core?

I'd say more that it's a good argument for moving this feature out to
one of those extensions, or perhaps building a third extension that
depends on both of those.  TBH, none of this stuff smells to me like
something that ought to be in core.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-10-15 Thread Pavel Stehule
2015-10-15 15:42 GMT+02:00 Tom Lane :

> "Shulgin, Oleksandr"  writes:
> > I was thinking about this and what seems to be the biggest problem is
> when
> > to actually turn the feature on.  It seems unlikely that someone will
> want
> > to enable it unconditionally.  Enabling per-backend also doesn't seem to
> be
> > a good approach because you don't know if the next query you'd like to
> look
> > at is going to run in this exact backend.
>
> Check.
>
> > What might be actually usable is poking pg_stat_statements for queryid to
> > decide if we need to do explain (and possibly analyze).
>
> Hm, interesting thought.
>
> > Does this make sense to you?  Does this make a good argument for merging
> > pg_stat_statements and auto_explain into core?
>
> I'd say more that it's a good argument for moving this feature out to
> one of those extensions, or perhaps building a third extension that
> depends on both of those.  TBH, none of this stuff smells to me like
> something that ought to be in core.
>

There are few features, that I would to see in core:

1. possibility to get full SQL string
2. possibility to get state string

We can speak how to do it well.

Regards

Pavel



>
> regards, tom lane
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-10-02 Thread Robert Haas
On Tue, Sep 29, 2015 at 1:52 PM, Shulgin, Oleksandr
 wrote:
> This is not a change of the direction, but rather of the approach.  Hitting
> a process with a signal and hoping it will produce a meaningful response in
> all circumstances without disrupting its current task was way too naive.

I think that's exactly right.  It's not safe to do much anything from
a signal handler, and while ProcessInterrupts() is a very
substantially safer, the set of things that can be safely done there
is still awfully restricted.  You have to cope with the fact that the
function you just interrupted may be doing anything at all, and if you
change anything before returning, you may knock over the apple cart.
Just as bad, the state you inherit may not be very sane: we call
ProcessInterrupts() from LOTS of places and there is absolutely no
guarantee that every one of those places has the data structures that
you need to run EXPLAIN ANALYZE in a sane state.

I haven't looked at the new patch specifically so I don't have an
opinion on that at this time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 12:28 AM, Jim Nasby 
wrote:

> On 9/18/15 5:05 AM, Shulgin, Oleksandr wrote:
>
>>
>> So this has to be the responsibility of the reply sending backend in the
>> end: to create and release the DSM *at some point*.
>>
>
> What's wrong with just releasing it at the end of the statement? When the
> statement is done there's no point to reading it asynchronously anymore.


That was only one of the problems in signals-based design, and it has other
more significant problems.  The current approach does exactly that:
allocate the segment before running the plan, release at the end of the
statement.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 12:57 PM, Andres Freund  wrote:

> On 2015-09-25 19:13:13 +0200, Shulgin, Oleksandr wrote:
> > the auto_explain contrib module.  I now propose the most simple thing
> > possible in my opinion: a new GUC option for auto_explain.  It will make
> > every backend, in which the module is loaded via *_preload_libraries
> > mechanism or using CREATE EXTENSION/LOAD commands, to actively publish
> the
> > plans of queries in dynamic shared memory as long as
> > auto_explain.publish_plans is turned on.
>
> Wait. You're proposing that every query does this unconditionally? For
> every single query? That sounds prohibitively expensive to me.
>

Only if the extension is loaded AND the option is turned on.  Likely you
don't want to do this for an OLTP database, but for other types of load it
might make sense.  Also, I think it should be possible to toggle this on a
per-process basis.

Finally, we can put in a cost threshold, so the plans only get published if
they have total_cost >= publish_plan_cost_threshod.

Also, do you mean expensive in terms of CPU or the dynamic shared memory?

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Andres Freund
On 2015-09-25 19:13:13 +0200, Shulgin, Oleksandr wrote:
> the auto_explain contrib module.  I now propose the most simple thing
> possible in my opinion: a new GUC option for auto_explain.  It will make
> every backend, in which the module is loaded via *_preload_libraries
> mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the
> plans of queries in dynamic shared memory as long as
> auto_explain.publish_plans is turned on.

Wait. You're proposing that every query does this unconditionally? For
every single query? That sounds prohibitively expensive to me.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Simon Riggs
On 25 September 2015 at 12:13, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:


> I now believe that use of ProcessInterrupts() in the recently proposed
> design as well as manipulation of proc latch due to use of shared memory
> queue are major blockers.
>
> In order to simplify things up, I've taken a step back and looked again at
> the auto_explain contrib module.  I now propose the most simple thing
> possible in my opinion: a new GUC option for auto_explain.  It will make
> every backend, in which the module is loaded via *_preload_libraries
> mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the
> plans of queries in dynamic shared memory as long as
> auto_explain.publish_plans is turned on.
>
> The greatest part for me, is that this approach doesn't require handling
> of signals and is isolated in an external module, so it can be readily used
> with the current server versions, no need to wait for >= 9.6.
>

This is a major change of direction, so the thread title no longer applies
at all.

The requirement is to be able to see the output of EXPLAIN ANALYZE for a
running process. Ideally, we would dump the diagnostic output for any
process that runs longer than a specific timeout, so we can decide whether
to kill it, or just save that for later diagnostics.

I'm interested in helping the original goal happen. Dumping an EXPLAIN,
without ANALYZE info, is not at all interesting since it has no value for
immediate action or post-facto diagnosis, sorry to say - making it happen
for every backend just makes it even less useful.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Simon Riggs
On 29 September 2015 at 20:52, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Sep 29, 2015 at 8:34 PM, Simon Riggs 
> wrote:
>
>> On 29 September 2015 at 12:52, Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de> wrote:
>>
>>
>>> Hitting a process with a signal and hoping it will produce a meaningful
>>> response in all circumstances without disrupting its current task was way
>>> too naive.
>>>
>>
>> Hmm, I would have to disagree, sorry. For me the problem was dynamically
>> allocating everything at the time the signal is received and getting into
>> problems when that caused errors.
>>
>
> What I mean is that we need to move the actual EXPLAIN run out of
> ProcessInterrupts().  It can be still fine to trigger the communication
> with a signal.
>

Good


> * INIT - Allocate N areas of memory for use by queries, which can be
>> expanded/contracted as needed. Keep a freelist of structures.
>> * OBSERVER - When requested, gain exclusive access to a diagnostic area,
>> then allocate the designated process to that area, then send a signal
>> * QUERY - When signal received dump an EXPLAIN ANALYZE to the allocated
>> diagnostic area, (set flag to show complete, set latch on observer)
>> * OBSERVER - process data in diagnostic area and then release area for
>> use by next observation
>>
>> If the EXPLAIN ANALYZE doesn't fit into the diagnostic chunk, LOG it as a
>> problem and copy data only up to the size defined. Any other ERRORs that
>> are caused by this process cause it to fail normally.
>>
>
> Do you envision problems if we do this with a newly allocated DSM every
> time instead of pre-allocated area?  This will have to revert the workflow,
> because only the QUERY knows the required segment size:
>

That's too fiddly; we need to keep it simple by using just fixed sizes.


> OBSERVER - sends a signal and waits for its proc latch to be set
> QUERY - when signal is received allocates a DSM just big enough to fit the
> EXPLAIN plan, then locates the OBSERVER(s) and sets its latch (or their
> latches)
>
> The EXPLAIN plan should already be produced somewhere in the executor, to
> avoid calling into explain.c from ProcessInterrupts().
>
> That allows the observer to be another backend, or it allows the query
>> process to perform self-observation based upon a timeout (e.g. >1 hour) or
>> a row limit (e.g. when an optimizer estimate is seen to be badly wrong).
>>
>
> Do you think there is one single best place in the executor code where
> such a check could be added?  I have very little idea about that.
>

 Fairly simple.

Main problem is knowing how to handle nested calls to the executor.

I'll look at the patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 7:01 PM, Simon Riggs  wrote:

> On 25 September 2015 at 12:13, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>
>> I now believe that use of ProcessInterrupts() in the recently proposed
>> design as well as manipulation of proc latch due to use of shared memory
>> queue are major blockers.
>>
>> In order to simplify things up, I've taken a step back and looked again
>> at the auto_explain contrib module.  I now propose the most simple thing
>> possible in my opinion: a new GUC option for auto_explain.  It will make
>> every backend, in which the module is loaded via *_preload_libraries
>> mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the
>> plans of queries in dynamic shared memory as long as
>> auto_explain.publish_plans is turned on.
>>
>> The greatest part for me, is that this approach doesn't require handling
>> of signals and is isolated in an external module, so it can be readily used
>> with the current server versions, no need to wait for >= 9.6.
>>
>
> This is a major change of direction, so the thread title no longer applies
> at all.
>
> The requirement is to be able to see the output of EXPLAIN ANALYZE for a
> running process. Ideally, we would dump the diagnostic output for any
> process that runs longer than a specific timeout, so we can decide whether
> to kill it, or just save that for later diagnostics.
>
> I'm interested in helping the original goal happen. Dumping an EXPLAIN,
> without ANALYZE info, is not at all interesting since it has no value for
> immediate action or post-facto diagnosis, sorry to say - making it happen
> for every backend just makes it even less useful.
>

This is not a change of the direction, but rather of the approach.  Hitting
a process with a signal and hoping it will produce a meaningful response in
all circumstances without disrupting its current task was way too naive.
I'd really want to make this work with ANALYZE, just not as the first
step.  I believe this approach can be extended to include instrumentation
support (obviously we will not be able to contain this in the auto_explain
module).

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Simon Riggs
On 29 September 2015 at 12:52, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:


> Hitting a process with a signal and hoping it will produce a meaningful
> response in all circumstances without disrupting its current task was way
> too naive.
>

Hmm, I would have to disagree, sorry. For me the problem was dynamically
allocating everything at the time the signal is received and getting into
problems when that caused errors.

* INIT - Allocate N areas of memory for use by queries, which can be
expanded/contracted as needed. Keep a freelist of structures.
* OBSERVER - When requested, gain exclusive access to a diagnostic area,
then allocate the designated process to that area, then send a signal
* QUERY - When signal received dump an EXPLAIN ANALYZE to the allocated
diagnostic area, (set flag to show complete, set latch on observer)
* OBSERVER - process data in diagnostic area and then release area for use
by next observation

If the EXPLAIN ANALYZE doesn't fit into the diagnostic chunk, LOG it as a
problem and copy data only up to the size defined. Any other ERRORs that
are caused by this process cause it to fail normally.

That allows the observer to be another backend, or it allows the query
process to perform self-observation based upon a timeout (e.g. >1 hour) or
a row limit (e.g. when an optimizer estimate is seen to be badly wrong).

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 8:34 PM, Simon Riggs  wrote:

> On 29 September 2015 at 12:52, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>
>> Hitting a process with a signal and hoping it will produce a meaningful
>> response in all circumstances without disrupting its current task was way
>> too naive.
>>
>
> Hmm, I would have to disagree, sorry. For me the problem was dynamically
> allocating everything at the time the signal is received and getting into
> problems when that caused errors.
>

What I mean is that we need to move the actual EXPLAIN run out of
ProcessInterrupts().  It can be still fine to trigger the communication
with a signal.

* INIT - Allocate N areas of memory for use by queries, which can be
> expanded/contracted as needed. Keep a freelist of structures.
> * OBSERVER - When requested, gain exclusive access to a diagnostic area,
> then allocate the designated process to that area, then send a signal
> * QUERY - When signal received dump an EXPLAIN ANALYZE to the allocated
> diagnostic area, (set flag to show complete, set latch on observer)
> * OBSERVER - process data in diagnostic area and then release area for use
> by next observation
>
> If the EXPLAIN ANALYZE doesn't fit into the diagnostic chunk, LOG it as a
> problem and copy data only up to the size defined. Any other ERRORs that
> are caused by this process cause it to fail normally.
>

Do you envision problems if we do this with a newly allocated DSM every
time instead of pre-allocated area?  This will have to revert the workflow,
because only the QUERY knows the required segment size:

OBSERVER - sends a signal and waits for its proc latch to be set
QUERY - when signal is received allocates a DSM just big enough to fit the
EXPLAIN plan, then locates the OBSERVER(s) and sets its latch (or their
latches)

The EXPLAIN plan should already be produced somewhere in the executor, to
avoid calling into explain.c from ProcessInterrupts().

That allows the observer to be another backend, or it allows the query
> process to perform self-observation based upon a timeout (e.g. >1 hour) or
> a row limit (e.g. when an optimizer estimate is seen to be badly wrong).
>

Do you think there is one single best place in the executor code where such
a check could be added?  I have very little idea about that.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Shulgin, Oleksandr
On Mon, Sep 28, 2015 at 7:09 PM, Pavel Stehule 
wrote:

>
> 2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>>
>>> I didn't propose too different solution. There is only one difference -
>>> sharing DSM for smaller data. It is similar to using usual shared memory.
>>>
>>
>> Does this mean implementing some sort of allocator on top of the shared
>> memory segment?  If so, how are you going to prevent fragmentation?
>>
>
> yes, simple memory allocator is necessary in this case. But it should be
> really simple - you can allocate only fixed size blocks - 10KB, 100KB and
> 1MB from separate buffers. So the fragmentation is not possible.
>

Maybe we're talking about completely different ideas, but I don't see how
fixing the block helps to fight fragmentation, in particular.  Can you
sketch a simple example?  E.g. 400 backends share the common segment and
all of them want to publish a plan of ~1kb, for example.  Now what?

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Pavel Stehule
2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr 
:

> On Mon, Sep 28, 2015 at 12:05 PM, Pavel Stehule 
> wrote:
>
>>
>> 2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>> On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule 
>>> wrote:
>>>
>>> the preparing of content before execution is interesting idea, that can
 be used more. The almost queries and plans are not too big, so when the
 size of content is not too big - less than 1MB, then can be used one DSM
 for all backends.

>>>
>>>
 When size of content is bigger than limit, then DSM will be allocated
 specially for this content. The pointer to DSM and offset can be stored in
 requested process slot. The reading and writing to requested slot should be
 protected by spinlock, but it should block only two related processes for
 short time (copy memory).

>>>
>>> Sorry, I don't think this will fly.
>>>
>>> The whole idea is that a backend publishes the plan of a query just
>>> before running it and it doesn't care which other backend(s) might be
>>> reading it, how many times and in which order.  The only required locking
>>> (implicit) is contained in the code for dsm_attach/detach().
>>>
>>
>> I didn't propose too different solution. There is only one difference -
>> sharing DSM for smaller data. It is similar to using usual shared memory.
>>
>
> Does this mean implementing some sort of allocator on top of the shared
> memory segment?  If so, how are you going to prevent fragmentation?
>

yes, simple memory allocator is necessary in this case. But it should be
really simple - you can allocate only fixed size blocks - 10KB, 100KB and
1MB from separate buffers. So the fragmentation is not possible.

Regards

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Pavel Stehule
2015-09-28 19:13 GMT+02:00 Shulgin, Oleksandr 
:

> On Mon, Sep 28, 2015 at 7:09 PM, Pavel Stehule 
> wrote:
>
>>
>> 2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>>
 I didn't propose too different solution. There is only one difference -
 sharing DSM for smaller data. It is similar to using usual shared memory.

>>>
>>> Does this mean implementing some sort of allocator on top of the shared
>>> memory segment?  If so, how are you going to prevent fragmentation?
>>>
>>
>> yes, simple memory allocator is necessary in this case. But it should be
>> really simple - you can allocate only fixed size blocks - 10KB, 100KB and
>> 1MB from separate buffers. So the fragmentation is not possible.
>>
>
> Maybe we're talking about completely different ideas, but I don't see how
> fixing the block helps to fight fragmentation, in particular.  Can you
> sketch a simple example?  E.g. 400 backends share the common segment and
> all of them want to publish a plan of ~1kb, for example.  Now what?
>


Probably only few backends will be data requesters and few backends will be
requested for data. The size of shared data will be typically less than
10KB, with few exceptions 100KB, and few exceptions 10MB. So for 400
backends you need 400*10KB+100*100KB+20*1MB = 34MB. You can have three
independent buffers for this sizes. If some process require 5KB then you
returns 10KB (it is good enough) from first buffer. This simple mechanism
can coverage 90% of usage. for other 10% you can create new DSM. Because
you are working with fixed size blocks and you don't divide the size of
block, then the fragmentation isn't possible. This model allow very fast
allocation, very fast deallocation.

It is not terribly effective, but 34MB is better than 400 DSM or than 400MB
(max size for any backend).

Our super long queries are terribly long > 1MB, but we use a few clients
(less than 2x CPU cores)

Regards

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Shulgin, Oleksandr
On Fri, Sep 25, 2015 at 7:13 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> Some implementation details:
>
> For every backend that might be running (MaxBackends) we reserve a
> dsm_handle slot in the addins shared memory.  When the new option is turned
> on, the ExecutorRun hook will produce a plan in whatever format is
> specified by the auto_explain.log_format, allocate a DSM segment, copy the
> plan into the segment and finally store the DSM handle into its own slot.
> No locking is required around this because every backend writes to its slot
> exclusively, no other backend can be writing into it concurrently.  In the
> ExecutorFinish hook we invalidate the backend's slot by setting the handle
> to 0 and deallocate the DSM segment.
>

And this patch adds back the use of table of contents on the DSM.
Benefits: magic number validation; communication of the actual plan size.

--
Alex
From 9294995e200fca2abdccf0825ef852f26df1f4a2 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin 
Date: Mon, 28 Sep 2015 16:20:42 +0200
Subject: [PATCH 2/2] Add SHM table of contents to the explain DSM

By doing so we achieve two things:

1) verify the actual contents of the segment we've attached to, by
enforcing the magic number check;

2) transmit the actual size of the payload, so the receiving side can
allocate and parse in process-local memory, detaching from the DSM
early.
---
 contrib/auto_explain/auto_explain.c | 59 -
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 0fa123f..ac6b750 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -23,6 +23,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/shmem.h"
+#include "storage/shm_toc.h"
 #include "utils/guc.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
@@ -48,6 +49,7 @@ static const struct config_enum_entry format_options[] = {
 	{NULL, 0, false}
 };
 
+#define PG_EXPLAIN_BACKEND_MAGIC	0x79fb2449
 
 /* Shared memory structs */
 static dsm_handle *published_plan_handles = NULL;
@@ -455,18 +457,40 @@ explain_query(QueryDesc *queryDesc)
 static void
 explain_publish_plan(QueryDesc *queryDesc)
 {
-	StringInfo		str = explain_query(queryDesc);
+	StringInfo		str;
+	Size			plan_size;
+	shm_toc_estimator	toc_estimator;
+	Sizeseg_size;
 	dsm_segment	   *dsm;
+	shm_toc		   *toc = NULL;
 
 	attach_published_plan_handles();
 
-	dsm = dsm_create(str->len + 1, DSM_CREATE_NULL_IF_MAXSEGMENTS);
+	str = explain_query(queryDesc);
+	plan_size = str->len + 1;
+
+	shm_toc_initialize_estimator(_estimator);
+	shm_toc_estimate_keys(_estimator, 2);	/* size, plan */
+	shm_toc_estimate_chunk(_estimator, sizeof(Size));
+	shm_toc_estimate_chunk(_estimator, plan_size);
+	seg_size = shm_toc_estimate(_estimator);
+
+	dsm = dsm_create(seg_size, DSM_CREATE_NULL_IF_MAXSEGMENTS);
 	if (dsm)
 	{
-		char	   *buffer = (char *) dsm_segment_address(dsm);
+		Size	   *size_ptr;
+		char	   *buffer;
 		dsm_handle	handle;
 
-		memcpy(buffer, str->data, str->len + 1);
+		toc = shm_toc_create(PG_EXPLAIN_BACKEND_MAGIC, dsm_segment_address(dsm), seg_size);
+
+		size_ptr = (Size *) shm_toc_allocate(toc, sizeof(Size));
+		*size_ptr = plan_size;
+		shm_toc_insert(toc, 0, size_ptr);
+
+		buffer = (char *) shm_toc_allocate(toc, plan_size);
+		memcpy(buffer, str->data, plan_size);
+		shm_toc_insert(toc, 1, buffer);
 
 		MyPublishedPlanSegment = dsm;
 
@@ -570,9 +594,30 @@ pg_explain_backend(PG_FUNCTION_ARGS)
 	{
 		PG_TRY();
 		{
-			const char	   *p = (const char *) dsm_segment_address(seg);
+			shm_toc		   *toc;
+			Size			plan_size;
+			char		   *buffer;
+			const char	   *p;
+
+			toc = shm_toc_attach(PG_EXPLAIN_BACKEND_MAGIC, dsm_segment_address(seg));
+			if (toc == NULL)
+ereport(ERROR,
+		(errmsg("bad magic in dynamic memory segment")));
+
+			plan_size = *((Size *) shm_toc_lookup(toc, 0));
+
+			buffer = palloc(plan_size);
+			memcpy(buffer, shm_toc_lookup(toc, 1), plan_size);
+
+			/* we can detach DSM already, but only if it's not our own one */
+			if (handle)
+			{
+handle = 0;		/* avoid possible extra detach in PG_CATCH */
+dsm_detach(seg);
+			}
 
 			/* break into tuples on newline */
+			p = buffer;
 			while (*p)
 			{
 const char *q = strchr(p, '\n');
@@ -592,9 +637,7 @@ pg_explain_backend(PG_FUNCTION_ARGS)
 p += len + 1;
 			}
 
-			/* detach DSM, but only if it's not our own one */
-			if (handle)
-dsm_detach(seg);
+			pfree(buffer);
 		}
 		PG_CATCH();
 		{
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Shulgin, Oleksandr
On Mon, Sep 28, 2015 at 12:05 PM, Pavel Stehule 
wrote:

>
> 2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>> On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule 
>> wrote:
>>
>> the preparing of content before execution is interesting idea, that can
>>> be used more. The almost queries and plans are not too big, so when the
>>> size of content is not too big - less than 1MB, then can be used one DSM
>>> for all backends.
>>>
>>
>>
>>> When size of content is bigger than limit, then DSM will be allocated
>>> specially for this content. The pointer to DSM and offset can be stored in
>>> requested process slot. The reading and writing to requested slot should be
>>> protected by spinlock, but it should block only two related processes for
>>> short time (copy memory).
>>>
>>
>> Sorry, I don't think this will fly.
>>
>> The whole idea is that a backend publishes the plan of a query just
>> before running it and it doesn't care which other backend(s) might be
>> reading it, how many times and in which order.  The only required locking
>> (implicit) is contained in the code for dsm_attach/detach().
>>
>
> I didn't propose too different solution. There is only one difference -
> sharing DSM for smaller data. It is similar to using usual shared memory.
>

Does this mean implementing some sort of allocator on top of the shared
memory segment?  If so, how are you going to prevent fragmentation?

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Shulgin, Oleksandr
On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule 
wrote:

the preparing of content before execution is interesting idea, that can be
> used more. The almost queries and plans are not too big, so when the size
> of content is not too big - less than 1MB, then can be used one DSM for all
> backends.
>


> When size of content is bigger than limit, then DSM will be allocated
> specially for this content. The pointer to DSM and offset can be stored in
> requested process slot. The reading and writing to requested slot should be
> protected by spinlock, but it should block only two related processes for
> short time (copy memory).
>

Sorry, I don't think this will fly.

The whole idea is that a backend publishes the plan of a query just before
running it and it doesn't care which other backend(s) might be reading it,
how many times and in which order.  The only required locking (implicit) is
contained in the code for dsm_attach/detach().


> Other possibility is showing the size of content in requested process
> slot. Then the requester can preallocate enough size of shared memory. But
> this doesn't solve a issues related to waiting requester for content. So
> first variant is pretty simple, and should be preferred. The disadvantage
> is clear - it can enforce maybe significant slowdown of fast queries.
>

Both of these approaches have just too many synchronization problems, IMO.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Pavel Stehule
2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr 
:

> On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule 
> wrote:
>
> the preparing of content before execution is interesting idea, that can be
>> used more. The almost queries and plans are not too big, so when the size
>> of content is not too big - less than 1MB, then can be used one DSM for all
>> backends.
>>
>
>
>> When size of content is bigger than limit, then DSM will be allocated
>> specially for this content. The pointer to DSM and offset can be stored in
>> requested process slot. The reading and writing to requested slot should be
>> protected by spinlock, but it should block only two related processes for
>> short time (copy memory).
>>
>
> Sorry, I don't think this will fly.
>
> The whole idea is that a backend publishes the plan of a query just before
> running it and it doesn't care which other backend(s) might be reading it,
> how many times and in which order.  The only required locking (implicit) is
> contained in the code for dsm_attach/detach().
>

I didn't propose too different solution. There is only one difference -
sharing DSM for smaller data. It is similar to using usual shared memory.

Regards

Pavel

>
>
>> Other possibility is showing the size of content in requested process
>> slot. Then the requester can preallocate enough size of shared memory. But
>> this doesn't solve a issues related to waiting requester for content. So
>> first variant is pretty simple, and should be preferred. The disadvantage
>> is clear - it can enforce maybe significant slowdown of fast queries.
>>
>
> Both of these approaches have just too many synchronization problems, IMO.
>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Jim Nasby

On 9/18/15 5:05 AM, Shulgin, Oleksandr wrote:


It should not be true - the data sender create DSM and fills it.
Then set caller slot and send signal to caller. Caller can free DSM
any time, because data sender send newer touch it.


But the requester can timeout on waiting for reply and exit before it
sees the reply DSM.  Actually, I now don't even think a backend can free
the DSM it has not created.  First it will need to attach it,
effectively increasing the refcount, and upon detach it will only
decrease the refcount, but not actually release the segment...

So this has to be the responsibility of the reply sending backend in the
end: to create and release the DSM *at some point*.


What's wrong with just releasing it at the end of the statement? When 
the statement is done there's no point to reading it asynchronously anymore.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-27 Thread Pavel Stehule
2015-09-25 19:13 GMT+02:00 Shulgin, Oleksandr 
:
>
>
> Some problems:
>
> There is a potential problem with the limited total number of DSM
> segments: it is reserved in a way to only allow 2 per backend (on average)
> and 64 additional per server, so if you run with the new option enabled at
> all times, you're down to only 1 additional DSM per backend (again, on
> average).  Not sure how critical this can be, but no one is forced to run
> with this option enabled for all backends.
>
> If you don't want to run it enabled at all times, then enabling the GUC
> per-backend can be problematic.  It's still possible to update the conf
> file and send SIGHUP to a single backend, but it's harder to accomplish
> over psql, for example.  I think here we might still have some luck with
> the signals: use another array of per-backend slots with flags, set the
> target backend's flag and send it SIGUSR1.  The backend wakes on the signal
> and examines its slot, then toggles the GUC if needed.  Sounds pretty safe,
> eh?
>
> No documentation changes yet, waiting for your comments. :-)
>

the preparing of content before execution is interesting idea, that can be
used more. The almost queries and plans are not too big, so when the size
of content is not too big - less than 1MB, then can be used one DSM for all
backends. When size of content is bigger than limit, then DSM will be
allocated specially for this content. The pointer to DSM and offset can be
stored in requested process slot. The reading and writing to requested slot
should be protected by spinlock, but it should block only two related
processes for short time (copy memory). Other possibility is showing the
size of content in requested process slot. Then the requester can
preallocate enough size of shared memory. But this doesn't solve a issues
related to waiting requester for content. So first variant is pretty
simple, and should be preferred. The disadvantage is clear - it can enforce
maybe significant slowdown of fast queries.

Regards

Pavel





> Happy hacking!
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-25 Thread Shulgin, Oleksandr
On Fri, Sep 18, 2015 at 7:04 PM, Robert Haas  wrote:

>
> Frankly, I think you guys are making this out to be way more
> complicated than it really is.  Basically, I think the process being
> queried should publish a DSM via a slot it owns.  The recipient is
> responsible for reading it and then notifying the sender.  If a second
> process requests data before the first process reads its data, the
> second process can either (1) become an additional reader of the
> already-published data or (2) wait for the first process to finish,
> and then send its own inquiry.
>
> There are some problems to solve here, but they hardly seem impossible.


Thank you Robert, for your invaluable input on this patch!

I now believe that use of ProcessInterrupts() in the recently proposed
design as well as manipulation of proc latch due to use of shared memory
queue are major blockers.

In order to simplify things up, I've taken a step back and looked again at
the auto_explain contrib module.  I now propose the most simple thing
possible in my opinion: a new GUC option for auto_explain.  It will make
every backend, in which the module is loaded via *_preload_libraries
mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the
plans of queries in dynamic shared memory as long as
auto_explain.publish_plans is turned on.

The greatest part for me, is that this approach doesn't require handling of
signals and is isolated in an external module, so it can be readily used
with the current server versions, no need to wait for >= 9.6.

Some implementation details:

For every backend that might be running (MaxBackends) we reserve a
dsm_handle slot in the addins shared memory.  When the new option is turned
on, the ExecutorRun hook will produce a plan in whatever format is
specified by the auto_explain.log_format, allocate a DSM segment, copy the
plan into the segment and finally store the DSM handle into its own slot.
No locking is required around this because every backend writes to its slot
exclusively, no other backend can be writing into it concurrently.  In the
ExecutorFinish hook we invalidate the backend's slot by setting the handle
to 0 and deallocate the DSM segment.

Reading of the plan is performed by a newly added function
pg_explain_backend(PID).  Since it can determine the target process'
backendId, it simply reads the DSM handle from that backend's slot and
tries to attach it (there's not much point in checking the handle for being
non-0, because the other backend could release the segment the moment after
we've checked it, so we rely on dsm_attach returning non-NULL).  If
attached successfully, we parse the contents and detach.  At this point the
backend to detach the last is actually releasing the segment, due to
reference counting.

Handling of the nested statements plans is an open question.  It can be
really useful when the top-level plan is simply displaying a "SELECT
my_stored_procedure()" and all the interesting stuff is happening behind
the scenes, but I didn't start to think about how this could be implemented
yet.

Pavel was really interested in retrieving the complete query text/plans
which could be over a megabyte in his case (and pg_stat_activity.query is
capped by 10240 bytes I believe).  This is now possible with the patch, but
some others might still want to put a threshold on the allocation,
especially given this is shared memory.  I can envision another GUC, but in
our experience the extremely long queries (and plans) are most of the time
due to use of VALUES() or IN() clauses with a huge list of literals.

I think we could fold the VALUES/IN into "?" if the query/plan text exceeds
the specified threshold, or unconditionally (yes, pg_stat_statements, I'm
looking at you).  This should help in the cases when the most interesting
part is in the plan nodes near the end, but there's such a huge list of
literals before it.

Future plans:

I believe this approach can be extended to enable instrumentation once
again.  The backend executing the query could update the published plan
every once in a while (for example, every N ms or 1% of rows processed in a
node), and any other process interested in this data, can simply read it
without the need for signals and complex and fragile communication.  This
obviously requires a core patch.

Some problems:

There is a potential problem with the limited total number of DSM segments:
it is reserved in a way to only allow 2 per backend (on average) and 64
additional per server, so if you run with the new option enabled at all
times, you're down to only 1 additional DSM per backend (again, on
average).  Not sure how critical this can be, but no one is forced to run
with this option enabled for all backends.

If you don't want to run it enabled at all times, then enabling the GUC
per-backend can be problematic.  It's still possible to update the conf
file and send SIGHUP to a single backend, but it's harder to accomplish
over psql, 

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-18 Thread Pavel Stehule
2015-09-18 10:59 GMT+02:00 Shulgin, Oleksandr 
:

> On Thu, Sep 17, 2015 at 10:13 PM, Robert Haas 
> wrote:
>
>> On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule 
>> wrote:
>>
>> >> Second, using a shm_mq manipulates the state of the process latch.  I
>> >> don't think you can make the assumption that it's safe to reset the
>> >> process latch at any and every place where we check for interrupts.
>> >> For example, suppose the process is already using a shm_mq and the
>> >> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> >> somebody has activated this mechanism and you now go try to send and
>> >> receive from a new shm_mq.  But even if that and every other
>> >> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> >> today, it's a new coding rule that could easily trip people up in the
>> >> future.
>> >
>> > It is valid, and probably most important. But if we introduce own
>> mechanism,
>> > we will play with process latch too (although we can use LWlocks)
>>
>> With the design I proposed, there is zero need to touch the process
>> latch, which is good, because I'm pretty sure that is going to be a
>> problem.  I don't think there is any need to use LWLocks here either.
>> When you get a request for data, you can just publish a DSM segment
>> with the data and that's it.  Why do you need anything more?  You
>> could set the requestor's latch if it's convenient; that wouldn't be a
>> problem.  But the process supplying the data can't end up in a
>> different state than it was before supplying that data, or stuff WILL
>> break.
>>
>
> There is still the whole problem of where exactly the backend being
> queried for the status should publish that DSM segment and when to free it?
>
> If it's a location shared between all backends, there should be locking
> around it.  Probably this is not a big problem, if you don't expect all the
> backends start querying each other rapidly.  That is how it was implemented
> in the first versions of this patch actually.
>
> If we take the per-backend slot approach the locking seems unnecessary and
> there are principally two options:
>
> 1) The backend puts the DSM handle in its own slot and notifies the
> requester to read it.
> 2) The backend puts the DSM handle in the slot of the requester (and
> notifies it).
>
> If we go with the first option, the backend that has created the DSM will
> not know when it's OK to free it, so this has to be responsibility of the
> requester.  If the latter exits before reading and freeing the DSM, we have
> a leak.  Even bigger is the problem that the sender backend can no longer
> send responses to a number of concurrent requestors: if its slot is
> occupied by a DSM handle, it can not send a reply to another backend until
> the slot is freed.
>
> With the second option we have all the same problems with not knowing when
> to free the DSM and potentially leaking it, but we can handle concurrent
> requests.
>

It should not be true - the data sender create DSM and fills it. Then set
caller slot and send signal to caller. Caller can free DSM any time,
because data sender send newer touch it.


>
> The current approach where the requester creates and frees the DSM doesn't
> suffer from these problems, so if we pre-allocate the segment just big
> enough we can avoid the use of shm_mq.  That will take another GUC for the
> segment size.  Certainly no one expects a query plan to weigh a bloody
> megabyte, but this is what happens to Pavel apparently.
>

It is plan C - last variant from my view. Any other GUC :(

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-18 Thread Shulgin, Oleksandr
On Thu, Sep 17, 2015 at 10:13 PM, Robert Haas  wrote:

> On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule 
> wrote:
>
> >> Second, using a shm_mq manipulates the state of the process latch.  I
> >> don't think you can make the assumption that it's safe to reset the
> >> process latch at any and every place where we check for interrupts.
> >> For example, suppose the process is already using a shm_mq and the
> >> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
> >> somebody has activated this mechanism and you now go try to send and
> >> receive from a new shm_mq.  But even if that and every other
> >> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
> >> today, it's a new coding rule that could easily trip people up in the
> >> future.
> >
> > It is valid, and probably most important. But if we introduce own
> mechanism,
> > we will play with process latch too (although we can use LWlocks)
>
> With the design I proposed, there is zero need to touch the process
> latch, which is good, because I'm pretty sure that is going to be a
> problem.  I don't think there is any need to use LWLocks here either.
> When you get a request for data, you can just publish a DSM segment
> with the data and that's it.  Why do you need anything more?  You
> could set the requestor's latch if it's convenient; that wouldn't be a
> problem.  But the process supplying the data can't end up in a
> different state than it was before supplying that data, or stuff WILL
> break.
>

There is still the whole problem of where exactly the backend being queried
for the status should publish that DSM segment and when to free it?

If it's a location shared between all backends, there should be locking
around it.  Probably this is not a big problem, if you don't expect all the
backends start querying each other rapidly.  That is how it was implemented
in the first versions of this patch actually.

If we take the per-backend slot approach the locking seems unnecessary and
there are principally two options:

1) The backend puts the DSM handle in its own slot and notifies the
requester to read it.
2) The backend puts the DSM handle in the slot of the requester (and
notifies it).

If we go with the first option, the backend that has created the DSM will
not know when it's OK to free it, so this has to be responsibility of the
requester.  If the latter exits before reading and freeing the DSM, we have
a leak.  Even bigger is the problem that the sender backend can no longer
send responses to a number of concurrent requestors: if its slot is
occupied by a DSM handle, it can not send a reply to another backend until
the slot is freed.

With the second option we have all the same problems with not knowing when
to free the DSM and potentially leaking it, but we can handle concurrent
requests.

The current approach where the requester creates and frees the DSM doesn't
suffer from these problems, so if we pre-allocate the segment just big
enough we can avoid the use of shm_mq.  That will take another GUC for the
segment size.  Certainly no one expects a query plan to weigh a bloody
megabyte, but this is what happens to Pavel apparently.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-18 Thread Shulgin, Oleksandr
On Fri, Sep 18, 2015 at 11:25 AM, Pavel Stehule 
wrote:

> 2015-09-18 10:59 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>>
>> If we take the per-backend slot approach the locking seems unnecessary
>> and there are principally two options:
>>
>> 1) The backend puts the DSM handle in its own slot and notifies the
>> requester to read it.
>> 2) The backend puts the DSM handle in the slot of the requester (and
>> notifies it).
>>
>> If we go with the first option, the backend that has created the DSM will
>> not know when it's OK to free it, so this has to be responsibility of the
>> requester.  If the latter exits before reading and freeing the DSM, we have
>> a leak.  Even bigger is the problem that the sender backend can no longer
>> send responses to a number of concurrent requestors: if its slot is
>> occupied by a DSM handle, it can not send a reply to another backend until
>> the slot is freed.
>>
>> With the second option we have all the same problems with not knowing
>> when to free the DSM and potentially leaking it, but we can handle
>> concurrent requests.
>>
>
> It should not be true - the data sender create DSM and fills it. Then set
> caller slot and send signal to caller. Caller can free DSM any time,
> because data sender send newer touch it.
>

But the requester can timeout on waiting for reply and exit before it sees
the reply DSM.  Actually, I now don't even think a backend can free the DSM
it has not created.  First it will need to attach it, effectively
increasing the refcount, and upon detach it will only decrease the
refcount, but not actually release the segment...

So this has to be the responsibility of the reply sending backend in the
end: to create and release the DSM *at some point*.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-18 Thread Shulgin, Oleksandr
On Fri, Sep 18, 2015 at 12:59 PM, Pavel Stehule 
wrote:

> 2015-09-18 12:05 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>> On Fri, Sep 18, 2015 at 11:25 AM, Pavel Stehule 
>> wrote:
>>
>>>
>>> It should not be true - the data sender create DSM and fills it. Then
>>> set caller slot and send signal to caller. Caller can free DSM any time,
>>> because data sender send newer touch it.
>>>
>>
>> But the requester can timeout on waiting for reply and exit before it
>> sees the reply DSM.  Actually, I now don't even think a backend can free
>> the DSM it has not created.  First it will need to attach it, effectively
>> increasing the refcount, and upon detach it will only decrease the
>> refcount, but not actually release the segment...
>>
>
> I am afraid so it has not simple and nice solution - when data sender will
> wait for to moment when data are received, then we have same complexity
> like we use  shm_mq.
>
> Isn't better to introduce new background worker with responsibility to
> clean orphaned DSM?
>

I'm not thrilled by this idea.

We don't even need a worker to do that, as leaked segments are reported by
the backend itself upon transaction commit (see
ResourceOwnerReleaseInternal), e.g:

WARNING:  dynamic shared memory leak: segment 808539725 still referenced

Still I believe relying on some magic cleanup mechanism and not caring
about managing the shared memory properly is not the way to go.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-18 Thread Pavel Stehule
2015-09-18 12:05 GMT+02:00 Shulgin, Oleksandr 
:

> On Fri, Sep 18, 2015 at 11:25 AM, Pavel Stehule 
> wrote:
>
>> 2015-09-18 10:59 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>>
>>> If we take the per-backend slot approach the locking seems unnecessary
>>> and there are principally two options:
>>>
>>> 1) The backend puts the DSM handle in its own slot and notifies the
>>> requester to read it.
>>> 2) The backend puts the DSM handle in the slot of the requester (and
>>> notifies it).
>>>
>>> If we go with the first option, the backend that has created the DSM
>>> will not know when it's OK to free it, so this has to be responsibility of
>>> the requester.  If the latter exits before reading and freeing the DSM, we
>>> have a leak.  Even bigger is the problem that the sender backend can no
>>> longer send responses to a number of concurrent requestors: if its slot is
>>> occupied by a DSM handle, it can not send a reply to another backend until
>>> the slot is freed.
>>>
>>> With the second option we have all the same problems with not knowing
>>> when to free the DSM and potentially leaking it, but we can handle
>>> concurrent requests.
>>>
>>
>> It should not be true - the data sender create DSM and fills it. Then set
>> caller slot and send signal to caller. Caller can free DSM any time,
>> because data sender send newer touch it.
>>
>
> But the requester can timeout on waiting for reply and exit before it sees
> the reply DSM.  Actually, I now don't even think a backend can free the DSM
> it has not created.  First it will need to attach it, effectively
> increasing the refcount, and upon detach it will only decrease the
> refcount, but not actually release the segment...
>


I am afraid so it has not simple and nice solution - when data sender will
wait for to moment when data are received, then we have same complexity
like we use  shm_mq.

Isn't better to introduce new background worker with responsibility to
clean orphaned DSM?

Regards

Pavel


>
> So this has to be the responsibility of the reply sending backend in the
> end: to create and release the DSM *at some point*.
>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-18 Thread Pavel Stehule
2015-09-18 13:22 GMT+02:00 Shulgin, Oleksandr 
:

> On Fri, Sep 18, 2015 at 12:59 PM, Pavel Stehule 
> wrote:
>
>> 2015-09-18 12:05 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>> On Fri, Sep 18, 2015 at 11:25 AM, Pavel Stehule >> > wrote:
>>>

 It should not be true - the data sender create DSM and fills it. Then
 set caller slot and send signal to caller. Caller can free DSM any time,
 because data sender send newer touch it.

>>>
>>> But the requester can timeout on waiting for reply and exit before it
>>> sees the reply DSM.  Actually, I now don't even think a backend can free
>>> the DSM it has not created.  First it will need to attach it, effectively
>>> increasing the refcount, and upon detach it will only decrease the
>>> refcount, but not actually release the segment...
>>>
>>
>> I am afraid so it has not simple and nice solution - when data sender
>> will wait for to moment when data are received, then we have same
>> complexity like we use  shm_mq.
>>
>> Isn't better to introduce new background worker with responsibility to
>> clean orphaned DSM?
>>
>
> I'm not thrilled by this idea.
>
> We don't even need a worker to do that, as leaked segments are reported by
> the backend itself upon transaction commit (see
> ResourceOwnerReleaseInternal), e.g:
>
> WARNING:  dynamic shared memory leak: segment 808539725 still referenced
>
> Still I believe relying on some magic cleanup mechanism and not caring
> about managing the shared memory properly is not the way to go.
>

It was one  my idea too,  to check shared memory on exit. The disadvantage
is clean - some times you can wait too long - although the very practical
limit for session is about 2h.




>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-18 Thread Robert Haas
On Thu, Sep 17, 2015 at 12:47 PM, Shulgin, Oleksandr
 wrote:
> But if we make the sender backend create the DSM with the response payload,
> it suddenly becomes really unclear at which point and who should make the
> final detach of that DSM.  We're getting back to the problem of
> synchronization between these processes all over again.

Yes, some thought is needed here.  There's not really a problem if
somebody asks for information just once: you could just have the
publishing process keep it attached until process exit, and nothing
bad would happen.  The real problem comes when multiple people ask for
information in quick succession: if you detach the old segment too
quickly after publishing it, the guy who requested that data might not
have attached it yet, and then when he gets around to looking at it,
it won't be there.

This seems like a pretty solvable problem, though.  For example, when
somebody asks for information, they store in the main shared segment a
pointer to their PGPROC and then they signal the target process, which
responds by publishing a dsm_segment_id.  When the requesting process
has accessed it, or when it errors out or exits, it clears the PGPROC
and the dsm_segment_id.  The publisher avoids unmapping it until that
happens.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-18 Thread Robert Haas
On Fri, Sep 18, 2015 at 6:59 AM, Pavel Stehule  wrote:
> I am afraid so it has not simple and nice solution - when data sender will
> wait for to moment when data are received, then we have same complexity like
> we use  shm_mq.
>
> Isn't better to introduce new background worker with responsibility to clean
> orphaned DSM?

That won't work, or at least not easily.  On Windows, the DSM is
cleaned up by the operating system as soon as nobody has it mapped.

Frankly, I think you guys are making this out to be way more
complicated than it really is.  Basically, I think the process being
queried should publish a DSM via a slot it owns.  The recipient is
responsible for reading it and then notifying the sender.  If a second
process requests data before the first process reads its data, the
second process can either (1) become an additional reader of the
already-published data or (2) wait for the first process to finish,
and then send its own inquiry.

There are some problems to solve here, but they hardly seem impossible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Shulgin, Oleksandr
On Wed, Sep 16, 2015 at 8:07 PM, Pavel Stehule 
wrote:

>
> 2015-09-16 16:31 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>>
>> I've added the timeout parameter to the pg_cmdstatus call, and more
>> importantly to the sending side of the queue, so that one can limit the
>> potential effect of handling the interrupt in case something goes really
>> wrong.
>>
>
> I don't think so introduction new user visible timer is good idea. This
> timer should be invisible
>
> My idea - send a signal and wait 1sec, then check if target process is
> living still. Stop if not. Wait next 5 sec, then check target process. Stop
> if this process doesn't live or raise warning "requested process doesn't
> communicate, waiting.." The final cancel should be done by
> statement_timeout.
>
> what do you think about it?
>

That won't work really well with something like I use to do when testing
this patch, namely:

postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
pg_stat_activity where pid<>pg_backend_pid() \watch 1

while also running pgbench with -C option (to create new connection for
every transaction).  When a targeted backend exits before it can handle the
signal, the receiving process keeps waiting forever.

The statement_timeout in this case will stop the whole select, not just
individual function call.  Unless you wrap it to set statement_timeout and
catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
select.  The ability to set a (short) timeout for the function itself
proves to be a really useful feature to me.

We can still communicate some warnings to the client when no timeout is
specified (and make 0 the default value actually).

What I'm now thinking about is probably we can extend this backend
>> communication mechanism in order to query GUC values effective in a
>> different backend or even setting the values.  The obvious candidate to
>> check when you see some query misbehaving would be work_mem, for example.
>> Or we could provide a report of all settings that were overridden in the
>> backend's session, to the effect of running something like this:
>>
>> select * from pg_settings where context = 'user' and setting != reset_val;
>>
>
> this is a good idea. More times I interested what is current setting of
> query. We cannot to use simple query - because it is not possible to push
> PID to function simply, but you can to write function  pg_settings_pid() so
> usage can look like
>
> select * from pg_settings_pid(, possible other params) where ...
>

I would rather have a more general way to run *readonly* queries in the
other backend than invent a new function for every occurrence.

The obvious candidates to be set externally are the
>> cmdstatus_analyze/instrument_*: when you decided you want to turn them on,
>> you'd rather do that carefully for a single backend than globally
>> per-cluster.  One can still modify the postgresql.conf and then send SIGHUP
>> to just a single backend, but I think a more direct way to alter the
>> settings would be better.
>>
>
> I am 100% for possibility to read the setting. But reconfiguration from
> other process is too hot coffee  - it can be available via extension, but
> not via usually used tools.
>

It can be reserved to superuser, and nobody is forcing one to use it
anyway. :-)

In this light should we rename the API to something like "backend control"
>> instead of "command status"?  The SHOW/SET syntax could be extended to
>> support the remote setting retrieval/update.
>>
>
> prepare API, and this functionality can be part of referential
> implementation in contrib.
>
> This patch should not to have too range be finished in this release cycle.
>

These are just the thoughts on what could be achieved using this
cross-backend communication mechanism and ideas for generalization of the
API.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 11:55 GMT+02:00 Shulgin, Oleksandr 
:

> On Wed, Sep 16, 2015 at 8:07 PM, Pavel Stehule 
> wrote:
>
>>
>> 2015-09-16 16:31 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>>
>>> I've added the timeout parameter to the pg_cmdstatus call, and more
>>> importantly to the sending side of the queue, so that one can limit the
>>> potential effect of handling the interrupt in case something goes really
>>> wrong.
>>>
>>
>> I don't think so introduction new user visible timer is good idea. This
>> timer should be invisible
>>
>> My idea - send a signal and wait 1sec, then check if target process is
>> living still. Stop if not. Wait next 5 sec, then check target process. Stop
>> if this process doesn't live or raise warning "requested process doesn't
>> communicate, waiting.." The final cancel should be done by
>> statement_timeout.
>>
>> what do you think about it?
>>
>
> That won't work really well with something like I use to do when testing
> this patch, namely:
>
> postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
> pg_stat_activity where pid<>pg_backend_pid() \watch 1
>
> while also running pgbench with -C option (to create new connection for
> every transaction).  When a targeted backend exits before it can handle the
> signal, the receiving process keeps waiting forever.
>

no - every timeout you have to check, if targeted backend is living still,
if not you will do cancel. It is 100% safe.


>
> The statement_timeout in this case will stop the whole select, not just
> individual function call.  Unless you wrap it to set statement_timeout and
> catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
> select.  The ability to set a (short) timeout for the function itself
> proves to be a really useful feature to me.
>

you cannot to handle QUERY_CANCELED in plpgsql. There is need some internal
timeout - but this timeout should not be visible - any new GUC increase
PostgreSQL complexity - and there is not a reason why do it.


>
> We can still communicate some warnings to the client when no timeout is
> specified (and make 0 the default value actually).
>
> What I'm now thinking about is probably we can extend this backend
>>> communication mechanism in order to query GUC values effective in a
>>> different backend or even setting the values.  The obvious candidate to
>>> check when you see some query misbehaving would be work_mem, for example.
>>> Or we could provide a report of all settings that were overridden in the
>>> backend's session, to the effect of running something like this:
>>>
>>> select * from pg_settings where context = 'user' and setting !=
>>> reset_val;
>>>
>>
>> this is a good idea. More times I interested what is current setting of
>> query. We cannot to use simple query - because it is not possible to push
>> PID to function simply, but you can to write function  pg_settings_pid() so
>> usage can look like
>>
>> select * from pg_settings_pid(, possible other params) where ...
>>
>
> I would rather have a more general way to run *readonly* queries in the
> other backend than invent a new function for every occurrence.
>
> The obvious candidates to be set externally are the
>>> cmdstatus_analyze/instrument_*: when you decided you want to turn them on,
>>> you'd rather do that carefully for a single backend than globally
>>> per-cluster.  One can still modify the postgresql.conf and then send SIGHUP
>>> to just a single backend, but I think a more direct way to alter the
>>> settings would be better.
>>>
>>
>> I am 100% for possibility to read the setting. But reconfiguration from
>> other process is too hot coffee  - it can be available via extension, but
>> not via usually used tools.
>>
>
> It can be reserved to superuser, and nobody is forcing one to use it
> anyway. :-)
>
> In this light should we rename the API to something like "backend control"
>>> instead of "command status"?  The SHOW/SET syntax could be extended to
>>> support the remote setting retrieval/update.
>>>
>>
>> prepare API, and this functionality can be part of referential
>> implementation in contrib.
>>
>> This patch should not to have too range be finished in this release cycle.
>>
>
> These are just the thoughts on what could be achieved using this
> cross-backend communication mechanism and ideas for generalization of the
> API.
>

ok


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Robert Haas
On Tue, Sep 15, 2015 at 5:00 AM, Shulgin, Oleksandr
 wrote:
> Please see attached for implementation of this approach.  The most
> surprising thing is that it actually works :)

It's cool to see these interesting ideas for using some of the code
I've been working on for the last couple of years.  However, it seems
to me that a better design would avoid the use of shm_mq.  Instead of
having the guy who asks the question create a DSM, have the guy who
sends back the answer create it.  Then, he can just make the DSM big
enough to store the entire string.  I think that's going to be a lot
more robust than what you have here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Robert Haas
On Wed, Sep 16, 2015 at 10:31 AM, Shulgin, Oleksandr
 wrote:
> I've also decided we really ought to suppress any possible ERROR level
> messages generated during the course of processing the status request in
> order not to prevent the originally running transaction to complete.  The
> errors so caught are just logged using LOG level and ignored in this new
> version of the patch.

This plan has almost zero chance of surviving committer-level
scrutiny, and if it somehow does, some other committer will scream
bloody murder as soon as they notice it.

It's not safe to just catch an error and continue on executing.  You
have to abort the (sub)transaction once an error happens.

Of course, this gets at a pretty crucial design question for this
patch, which is whether it's OK for one process to interrogate another
process's state, and what the consequences of that might be.  What
permissions are needed?  Can you DOS the guy running a query by
pinging him for status at a very high rate?

The other question here is whether it's really safe to do this.
ProcessInterrupts() gets called at lots of places in the code, and we
may freely add more in the future.  How are you going to prove that
ProcessCmdStatusInfoRequest() is safe to call in all of those places?
How do you know that some data structure you find by chasing down from
ActivePortal or current_query_stack doesn't have a NULL pointer
someplace that you don't expect, because the code that initializes
that pointer hasn't run yet?

I'd like to see this made to work and be safe, but I think proving
that it's truly robust in all cases is going to be hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Robert Haas
On Thu, Sep 17, 2015 at 10:28 AM, Pavel Stehule  wrote:
> Please, can you explain what is wrong on using of shm_mq? It works pretty
> well now.
>
> There can be a contra argument, why don't use shm_mq, because it does data
> exchange between processes and this patch introduce new pattern for same
> thing.

Sure, I can explain that.

First, when you communication through a shm_mq, the writer has to wait
when the queue is full, and the reader has to wait when the queue is
empty.  If the message is short enough to fit in the queue, then you
can send it and be done without blocking.  But if it is long, then you
will have each process repeatedly waiting for the other one until the
whole message has been sent.  That is going to make sending the
message potentially a lot slower than writing it all in one go,
especially if the system is under heavy load.

Also, if there are any bugs in the way the shm_mq is being used,
they're likely to be quite rare and hard to find, because the vast
majority of messages will probably be short enough to be sent in a
single chunk, so whatever bugs may exist when the processes play
ping-ping are unlikely to occur in practice except in unusual cases
where the message being returned is very long.

Second, using a shm_mq manipulates the state of the process latch.  I
don't think you can make the assumption that it's safe to reset the
process latch at any and every place where we check for interrupts.
For example, suppose the process is already using a shm_mq and the
CHECK_FOR_INTERRUPTS() call inside that code then discovers that
somebody has activated this mechanism and you now go try to send and
receive from a new shm_mq.  But even if that and every other
CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
today, it's a new coding rule that could easily trip people up in the
future.

Using a shm_mq is appropriate when the amount of data that needs to be
transmitted might be very large - too large to just allocate a buffer
for the whole thing - or when the amount of data can't be predicted
before memory is allocated.  But there is obviously no rule that a
shm_mq should be used any time we have "data exchange between
processes"; we have lots of shared-memory based IPC already and have
for many years, and shm_mq is newer than the vast majority of that
code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Shulgin, Oleksandr
On Thu, Sep 17, 2015 at 2:14 PM, Robert Haas  wrote:

> On Wed, Sep 16, 2015 at 10:31 AM, Shulgin, Oleksandr
>  wrote:
> > I've also decided we really ought to suppress any possible ERROR level
> > messages generated during the course of processing the status request in
> > order not to prevent the originally running transaction to complete.  The
> > errors so caught are just logged using LOG level and ignored in this new
> > version of the patch.
>
> This plan has almost zero chance of surviving committer-level
> scrutiny, and if it somehow does, some other committer will scream
> bloody murder as soon as they notice it.
>
> It's not safe to just catch an error and continue on executing.  You
> have to abort the (sub)transaction once an error happens.
>

Hm... is this really different from WHEN EXCEPTION clause in plpgsql?

Of course, this gets at a pretty crucial design question for this
> patch, which is whether it's OK for one process to interrogate another
> process's state, and what the consequences of that might be.  What
> permissions are needed?


I think superuser or same user is pretty reasonable requirement.  Can be
limited to superuser only if there are some specific concerns.

Can you DOS the guy running a query by
> pinging him for status at a very high rate?
>

Probably, but if you've got enough permissions to do that
pg_cancel/terminate_backend() would be easier.

The other question here is whether it's really safe to do this.
> ProcessInterrupts() gets called at lots of places in the code, and we
> may freely add more in the future.  How are you going to prove that
> ProcessCmdStatusInfoRequest() is safe to call in all of those places?
> How do you know that some data structure you find by chasing down from
> ActivePortal or current_query_stack doesn't have a NULL pointer
> someplace that you don't expect, because the code that initializes
> that pointer hasn't run yet?
>
> I'd like to see this made to work and be safe, but I think proving
> that it's truly robust in all cases is going to be hard.
>

Yes, this is a perfectly good point.  For the purpose of obtaining the
explain plans, I believe anything we capture in ExecutorRun ought to be
safe to run Explain on: the plannedstmt is there and isn't going to change.

In a more general case, if we seek to extend this approach, then yes one
should be very careful in what is allowed to run during ProcessInterrupts().

What we could do instead to be extra-safe, is make the running backend
prepare the information for other backends to obtain from the shared
memory.  Explain plans can be easily captured at the ExecutorRun hook.  The
obvious downside is that you can no longer choose the explain format, or
obtain partially collected instrumentation data.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 16:46 GMT+02:00 Robert Haas :

> On Thu, Sep 17, 2015 at 10:28 AM, Pavel Stehule 
> wrote:
> > Please, can you explain what is wrong on using of shm_mq? It works pretty
> > well now.
> >
> > There can be a contra argument, why don't use shm_mq, because it does
> data
> > exchange between processes and this patch introduce new pattern for same
> > thing.
>
> Sure, I can explain that.
>
> First, when you communication through a shm_mq, the writer has to wait
> when the queue is full, and the reader has to wait when the queue is
> empty.  If the message is short enough to fit in the queue, then you
> can send it and be done without blocking.  But if it is long, then you
> will have each process repeatedly waiting for the other one until the
> whole message has been sent.  That is going to make sending the
> message potentially a lot slower than writing it all in one go,
> especially if the system is under heavy load.
>


Is there some risk if we take too much large DSM segment? And what is max
size of DSM segment? When we use shm_mq, we don't need to solve limits.


>
> Also, if there are any bugs in the way the shm_mq is being used,
> they're likely to be quite rare and hard to find, because the vast
> majority of messages will probably be short enough to be sent in a
> single chunk, so whatever bugs may exist when the processes play
> ping-ping are unlikely to occur in practice except in unusual cases
> where the message being returned is very long.
>

This is true for any functionality based on shm_mq - parallel seq scan,


>
> Second, using a shm_mq manipulates the state of the process latch.  I
> don't think you can make the assumption that it's safe to reset the
> process latch at any and every place where we check for interrupts.
> For example, suppose the process is already using a shm_mq and the
> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
> somebody has activated this mechanism and you now go try to send and
> receive from a new shm_mq.  But even if that and every other
> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
> today, it's a new coding rule that could easily trip people up in the
> future.
>
>
It is valid, and probably most important. But if we introduce own
mechanism, we will play with process latch too (although we can use LWlocks)


> Using a shm_mq is appropriate when the amount of data that needs to be
> transmitted might be very large - too large to just allocate a buffer
> for the whole thing - or when the amount of data can't be predicted
> before memory is allocated.  But there is obviously no rule that a
> shm_mq should be used any time we have "data exchange between
> processes"; we have lots of shared-memory based IPC already and have
> for many years, and shm_mq is newer than the vast majority of that
> code.
>

I am little bit disappointed - I hoped so shm_mq can be used as generic
interprocess mechanism - that will ensure all corner cases for work with
shared memory. I understand to shm_mq is new, and nobody used it, but this
risk is better than invent wheels again and again.

Regards

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 16:06 GMT+02:00 Shulgin, Oleksandr 
:

> On Thu, Sep 17, 2015 at 12:06 PM, Pavel Stehule 
> wrote:
>
>>
>>> That won't work really well with something like I use to do when testing
>>> this patch, namely:
>>>
>>> postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
>>> pg_stat_activity where pid<>pg_backend_pid() \watch 1
>>>
>>> while also running pgbench with -C option (to create new connection for
>>> every transaction).  When a targeted backend exits before it can handle the
>>> signal, the receiving process keeps waiting forever.
>>>
>>
>> no - every timeout you have to check, if targeted backend is living
>> still, if not you will do cancel. It is 100% safe.
>>
>
> But then you need to make this internal timeout rather short, not 1s as
> originally suggested.
>

can be - 1 sec is max, maybe 100ms is optimum.

>
> The statement_timeout in this case will stop the whole select, not just
>>> individual function call.  Unless you wrap it to set statement_timeout and
>>> catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
>>> select.  The ability to set a (short) timeout for the function itself
>>> proves to be a really useful feature to me.
>>>
>>
>> you cannot to handle QUERY_CANCELED in plpgsql.
>>
>
> Well, you can but its not that useful of course:
>

hmm, some is wrong - I remember from some older plpgsql, so CANCEL message
is not catchable. Maybe I have bad memory. I have to recheck it.


>
> =# create or replace function test_query_cancel() returns void language
> plpgsql as $$ begin
>  perform pg_sleep(1);
>  exception when query_canceled then raise notice 'cancel';
> end; $$;
> CREATE FUNCTION
> =# set statement_timeout to '100ms';
> SET
> =# select test_query_cancel();
> NOTICE:  cancel
>  test_query_cancel
> ---
>
> (1 row)
> =# select test_query_cancel() from generate_series(1, 100) x;
> NOTICE:  cancel
> ^CCancel request sent
> NOTICE:  cancel
> ^CCancel request sent
>
> Now you cannot cancel this query unless you do pg_terminate_backend() or
> equivalent.
>
> There is need some internal timeout - but this timeout should not be
>> visible - any new GUC increase PostgreSQL complexity - and there is not a
>> reason why do it.
>>
>
> But the GUC was added for the timeout on the sending side, not the
> receiving one.  There is no "one value fits all" for this, but you would
> still want to make the effects of this as limited as possible.
>

I still believe so any new GUC is not necessary. If you don't like
statement_timeout, we can copy the behave of CREATE DATABASE - there are
few 5sec cycles (when template1 is occupated) and break.

Regards

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 14:06 GMT+02:00 Robert Haas :

> On Tue, Sep 15, 2015 at 5:00 AM, Shulgin, Oleksandr
>  wrote:
> > Please see attached for implementation of this approach.  The most
> > surprising thing is that it actually works :)
>
> It's cool to see these interesting ideas for using some of the code
> I've been working on for the last couple of years.  However, it seems
> to me that a better design would avoid the use of shm_mq.  Instead of
> having the guy who asks the question create a DSM, have the guy who
> sends back the answer create it.  Then, he can just make the DSM big
> enough to store the entire string.  I think that's going to be a lot
> more robust than what you have here.
>

Please, can you explain what is wrong on using of shm_mq? It works pretty
well now.

There can be a contra argument, why don't use shm_mq, because it does data
exchange between processes and this patch introduce new pattern for same
thing.

Regards

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Shulgin, Oleksandr
On Thu, Sep 17, 2015 at 12:06 PM, Pavel Stehule 
wrote:

>
>> That won't work really well with something like I use to do when testing
>> this patch, namely:
>>
>> postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
>> pg_stat_activity where pid<>pg_backend_pid() \watch 1
>>
>> while also running pgbench with -C option (to create new connection for
>> every transaction).  When a targeted backend exits before it can handle the
>> signal, the receiving process keeps waiting forever.
>>
>
> no - every timeout you have to check, if targeted backend is living still,
> if not you will do cancel. It is 100% safe.
>

But then you need to make this internal timeout rather short, not 1s as
originally suggested.

The statement_timeout in this case will stop the whole select, not just
>> individual function call.  Unless you wrap it to set statement_timeout and
>> catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
>> select.  The ability to set a (short) timeout for the function itself
>> proves to be a really useful feature to me.
>>
>
> you cannot to handle QUERY_CANCELED in plpgsql.
>

Well, you can but its not that useful of course:

=# create or replace function test_query_cancel() returns void language
plpgsql as $$ begin
 perform pg_sleep(1);
 exception when query_canceled then raise notice 'cancel';
end; $$;
CREATE FUNCTION
=# set statement_timeout to '100ms';
SET
=# select test_query_cancel();
NOTICE:  cancel
 test_query_cancel
---

(1 row)
=# select test_query_cancel() from generate_series(1, 100) x;
NOTICE:  cancel
^CCancel request sent
NOTICE:  cancel
^CCancel request sent

Now you cannot cancel this query unless you do pg_terminate_backend() or
equivalent.

There is need some internal timeout - but this timeout should not be
> visible - any new GUC increase PostgreSQL complexity - and there is not a
> reason why do it.
>

But the GUC was added for the timeout on the sending side, not the
receiving one.  There is no "one value fits all" for this, but you would
still want to make the effects of this as limited as possible.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Shulgin, Oleksandr
On Thu, Sep 17, 2015 at 5:16 PM, Pavel Stehule 
wrote:

> 2015-09-17 16:46 GMT+02:00 Robert Haas :
>
>>
>> Second, using a shm_mq manipulates the state of the process latch.  I
>> don't think you can make the assumption that it's safe to reset the
>> process latch at any and every place where we check for interrupts.
>> For example, suppose the process is already using a shm_mq and the
>> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> somebody has activated this mechanism and you now go try to send and
>> receive from a new shm_mq.  But even if that and every other
>> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> today, it's a new coding rule that could easily trip people up in the
>> future.
>>
>
> It is valid, and probably most important. But if we introduce own
> mechanism, we will play with process latch too (although we can use LWlocks)
>

Careful manipulation of the process latch is a valid point.  Probably we
were way too optimistic about the limits we can hit with this approach. :-(

But if we make the sender backend create the DSM with the response payload,
it suddenly becomes really unclear at which point and who should make the
final detach of that DSM.  We're getting back to the problem of
synchronization between these processes all over again.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Robert Haas
On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule  wrote:
> Is there some risk if we take too much large DSM segment? And what is max
> size of DSM segment? When we use shm_mq, we don't need to solve limits.

I can't really think you are going to have a problem.  How much data
do you really intend to send back?  Surely it's going to be <100kB.
If you think it's not a problem to have a running query stop and send
a gigabyte of data someplace anytime somebody asks, well, I don't
think I agree.

>> Also, if there are any bugs in the way the shm_mq is being used,
>> they're likely to be quite rare and hard to find, because the vast
>> majority of messages will probably be short enough to be sent in a
>> single chunk, so whatever bugs may exist when the processes play
>> ping-ping are unlikely to occur in practice except in unusual cases
>> where the message being returned is very long.
>
> This is true for any functionality based on shm_mq - parallel seq scan,

Parallel sequential scan is likely to put a lot more data through a
shm_mq than you would for this.

>> Second, using a shm_mq manipulates the state of the process latch.  I
>> don't think you can make the assumption that it's safe to reset the
>> process latch at any and every place where we check for interrupts.
>> For example, suppose the process is already using a shm_mq and the
>> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> somebody has activated this mechanism and you now go try to send and
>> receive from a new shm_mq.  But even if that and every other
>> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> today, it's a new coding rule that could easily trip people up in the
>> future.
>
> It is valid, and probably most important. But if we introduce own mechanism,
> we will play with process latch too (although we can use LWlocks)

With the design I proposed, there is zero need to touch the process
latch, which is good, because I'm pretty sure that is going to be a
problem.  I don't think there is any need to use LWLocks here either.
When you get a request for data, you can just publish a DSM segment
with the data and that's it.  Why do you need anything more?  You
could set the requestor's latch if it's convenient; that wouldn't be a
problem.  But the process supplying the data can't end up in a
different state than it was before supplying that data, or stuff WILL
break.

>> Using a shm_mq is appropriate when the amount of data that needs to be
>> transmitted might be very large - too large to just allocate a buffer
>> for the whole thing - or when the amount of data can't be predicted
>> before memory is allocated.  But there is obviously no rule that a
>> shm_mq should be used any time we have "data exchange between
>> processes"; we have lots of shared-memory based IPC already and have
>> for many years, and shm_mq is newer than the vast majority of that
>> code.
>
> I am little bit disappointed - I hoped so shm_mq can be used as generic
> interprocess mechanism - that will ensure all corner cases for work with
> shared memory. I understand to shm_mq is new, and nobody used it, but this
> risk is better than invent wheels again and again.

shm_mq is useful, but if you insist on using a complicated tool when a
simple one is plenty sufficient, you may not get the results you're
hoping for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 16:37 GMT+02:00 Pavel Stehule :

>
>
> 2015-09-17 16:06 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>> On Thu, Sep 17, 2015 at 12:06 PM, Pavel Stehule 
>> wrote:
>>
>>>
 That won't work really well with something like I use to do when
 testing this patch, namely:

 postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
 pg_stat_activity where pid<>pg_backend_pid() \watch 1

 while also running pgbench with -C option (to create new connection for
 every transaction).  When a targeted backend exits before it can handle the
 signal, the receiving process keeps waiting forever.

>>>
>>> no - every timeout you have to check, if targeted backend is living
>>> still, if not you will do cancel. It is 100% safe.
>>>
>>
>> But then you need to make this internal timeout rather short, not 1s as
>> originally suggested.
>>
>
> can be - 1 sec is max, maybe 100ms is optimum.
>
>>
>> The statement_timeout in this case will stop the whole select, not just
 individual function call.  Unless you wrap it to set statement_timeout and
 catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
 select.  The ability to set a (short) timeout for the function itself
 proves to be a really useful feature to me.

>>>
>>> you cannot to handle QUERY_CANCELED in plpgsql.
>>>
>>
>> Well, you can but its not that useful of course:
>>
>
> hmm, some is wrong - I remember from some older plpgsql, so CANCEL message
> is not catchable. Maybe I have bad memory. I have to recheck it.
>

it is ok. I didn't remeber well this behave. You cannot to catch CANCEL
(and today ASSERT) in OTHER handler. It can be handled if it is explicitly
mentioned.

Regards

Pavel


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 22:13 GMT+02:00 Robert Haas :

> On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule 
> wrote:
> > Is there some risk if we take too much large DSM segment? And what is max
> > size of DSM segment? When we use shm_mq, we don't need to solve limits.
>
> I can't really think you are going to have a problem.  How much data
> do you really intend to send back?  Surely it's going to be <100kB.
> If you think it's not a problem to have a running query stop and send
> a gigabyte of data someplace anytime somebody asks, well, I don't
> think I agree.
>
>
I afraid so <100kB is too optimistic. I know so GoodData environment is a
exception - it uses query generator, but I found few plans >1MB . It is
unusual probably due long identifiers too - used 63bytes long hash - and
some queries and models are strange. It does translation between analytic
multi dimensional business oriented query language and SQL. Back to topic.
With high probability we can calculate <10MB.


> >> Also, if there are any bugs in the way the shm_mq is being used,
> >> they're likely to be quite rare and hard to find, because the vast
> >> majority of messages will probably be short enough to be sent in a
> >> single chunk, so whatever bugs may exist when the processes play
> >> ping-ping are unlikely to occur in practice except in unusual cases
> >> where the message being returned is very long.
> >
> > This is true for any functionality based on shm_mq - parallel seq scan,
>
> Parallel sequential scan is likely to put a lot more data through a
> shm_mq than you would for this.
>
> >> Second, using a shm_mq manipulates the state of the process latch.  I
> >> don't think you can make the assumption that it's safe to reset the
> >> process latch at any and every place where we check for interrupts.
> >> For example, suppose the process is already using a shm_mq and the
> >> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
> >> somebody has activated this mechanism and you now go try to send and
> >> receive from a new shm_mq.  But even if that and every other
> >> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
> >> today, it's a new coding rule that could easily trip people up in the
> >> future.
> >
> > It is valid, and probably most important. But if we introduce own
> mechanism,
> > we will play with process latch too (although we can use LWlocks)
>
> With the design I proposed, there is zero need to touch the process
> latch, which is good, because I'm pretty sure that is going to be a
> problem.  I don't think there is any need to use LWLocks here either.
> When you get a request for data, you can just publish a DSM segment
> with the data and that's it.  Why do you need anything more?  You
> could set the requestor's latch if it's convenient; that wouldn't be a
> problem.  But the process supplying the data can't end up in a
> different state than it was before supplying that data, or stuff WILL
> break.
>

sure - but the same behave have to have shm_mq - if not, then only one can
be used for communication between process - that is little bit limiting.

>
> >> Using a shm_mq is appropriate when the amount of data that needs to be
> >> transmitted might be very large - too large to just allocate a buffer
> >> for the whole thing - or when the amount of data can't be predicted
> >> before memory is allocated.  But there is obviously no rule that a
> >> shm_mq should be used any time we have "data exchange between
> >> processes"; we have lots of shared-memory based IPC already and have
> >> for many years, and shm_mq is newer than the vast majority of that
> >> code.
> >
> > I am little bit disappointed - I hoped so shm_mq can be used as generic
> > interprocess mechanism - that will ensure all corner cases for work with
> > shared memory. I understand to shm_mq is new, and nobody used it, but
> this
> > risk is better than invent wheels again and again.
>
> shm_mq is useful, but if you insist on using a complicated tool when a
> simple one is plenty sufficient, you may not get the results you're
> hoping for.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-16 Thread Shulgin, Oleksandr
On Tue, Sep 15, 2015 at 11:00 AM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Mon, Sep 14, 2015 at 7:27 PM, Pavel Stehule 
> wrote:
>
>>
>> 2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>>
>>> ... This way the receiver only writes to the slot and the sender only
>>> reads from it.
>>>
>>> By the way, is it safe to assume atomic read/writes of dsm_handle
>>> (uint32)?  I would be surprised if not.
>>>
>>
>> I don't see any reason why it should not to work - only few processes
>> will wait for data - so lost attach/detach shm operations will not be too
>> much.
>>
>
> Please see attached for implementation of this approach.  The most
> surprising thing is that it actually works :)
>
> One problem still remains with the process requesting information when the
> target process exits before it can have a chance to handle the signal.  The
> requesting process then waits forever, because nobody attaches/detaches the
> queue.  We've discussed this before and looks like we need to introduce a
> timeout parameter to pg_cmdstatus()...
>

I've added the timeout parameter to the pg_cmdstatus call, and more
importantly to the sending side of the queue, so that one can limit the
potential effect of handling the interrupt in case something goes really
wrong.

I've tested a number of possible scenarios with artificial delays in reply
and sending cancel request or SIGTERM to the receiving side and this is all
seems to work nicely due to proper message queue detach event
notification.  Still I think it helps to know that there is a timeout in
case the receiving side is really slow to read the message.

I've also decided we really ought to suppress any possible ERROR level
messages generated during the course of processing the status request in
order not to prevent the originally running transaction to complete.  The
errors so caught are just logged using LOG level and ignored in this new
version of the patch.

I'm also submitting the instrumentation support as a separate patch on top
of this.  I'm not really fond of adding bool parameter to InstrEndLoop, but
for now I didn't find any better way.

What I'm now thinking about is probably we can extend this backend
communication mechanism in order to query GUC values effective in a
different backend or even setting the values.  The obvious candidate to
check when you see some query misbehaving would be work_mem, for example.
Or we could provide a report of all settings that were overridden in the
backend's session, to the effect of running something like this:

select * from pg_settings where context = 'user' and setting != reset_val;

The obvious candidates to be set externally are the
cmdstatus_analyze/instrument_*: when you decided you want to turn them on,
you'd rather do that carefully for a single backend than globally
per-cluster.  One can still modify the postgresql.conf and then send SIGHUP
to just a single backend, but I think a more direct way to alter the
settings would be better.

In this light should we rename the API to something like "backend control"
instead of "command status"?  The SHOW/SET syntax could be extended to
support the remote setting retrieval/update.

--
Alex
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 2a184ed..d2f46bb 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -288,7 +288,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
 		 * levels of hook all do this.)
 		 */
-		InstrEndLoop(queryDesc->totaltime);
+		InstrEndLoop(queryDesc->totaltime, false);
 
 		/* Log plan if duration is exceeded. */
 		msec = queryDesc->totaltime->total * 1000.0;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 59b8a2e..d90fc6d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -926,7 +926,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
 		 * levels of hook all do this.)
 		 */
-		InstrEndLoop(queryDesc->totaltime);
+		InstrEndLoop(queryDesc->totaltime, false);
 
 		pgss_store(queryDesc->sourceText,
    queryId,
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d06fa4..e5a28a9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -654,7 +654,7 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es)
 		char	   *conname = NULL;
 
 		/* Must clean up instrumentation state */
-		InstrEndLoop(instr);
+		InstrEndLoop(instr, false);
 
 		/*
 		 * We ignore triggers that were never invoked; they likely aren't
@@ -1233,62 +1233,75 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		}
 	}
 
-	/*
-	 * We have to forcibly clean up the 

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-16 Thread Pavel Stehule
2015-09-16 16:31 GMT+02:00 Shulgin, Oleksandr 
:

> On Tue, Sep 15, 2015 at 11:00 AM, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Mon, Sep 14, 2015 at 7:27 PM, Pavel Stehule 
>> wrote:
>>
>>>
>>> 2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <
>>> oleksandr.shul...@zalando.de>:
>>>

 ... This way the receiver only writes to the slot and the sender only
 reads from it.

 By the way, is it safe to assume atomic read/writes of dsm_handle
 (uint32)?  I would be surprised if not.

>>>
>>> I don't see any reason why it should not to work - only few processes
>>> will wait for data - so lost attach/detach shm operations will not be too
>>> much.
>>>
>>
>> Please see attached for implementation of this approach.  The most
>> surprising thing is that it actually works :)
>>
>> One problem still remains with the process requesting information when
>> the target process exits before it can have a chance to handle the signal.
>> The requesting process then waits forever, because nobody attaches/detaches
>> the queue.  We've discussed this before and looks like we need to introduce
>> a timeout parameter to pg_cmdstatus()...
>>
>
> I've added the timeout parameter to the pg_cmdstatus call, and more
> importantly to the sending side of the queue, so that one can limit the
> potential effect of handling the interrupt in case something goes really
> wrong.
>

I don't think so introduction new user visible timer is good idea. This
timer should be invisible

My idea - send a signal and wait 1sec, then check if target process is
living still. Stop if not. Wait next 5 sec, then check target process. Stop
if this process doesn't live or raise warning "requested process doesn't
communicate, waiting.." The final cancel should be done by
statement_timeout.

what do you think about it?


>
> I've tested a number of possible scenarios with artificial delays in reply
> and sending cancel request or SIGTERM to the receiving side and this is all
> seems to work nicely due to proper message queue detach event
> notification.  Still I think it helps to know that there is a timeout in
> case the receiving side is really slow to read the message.
>
> I've also decided we really ought to suppress any possible ERROR level
> messages generated during the course of processing the status request in
> order not to prevent the originally running transaction to complete.  The
> errors so caught are just logged using LOG level and ignored in this new
> version of the patch.
>
> I'm also submitting the instrumentation support as a separate patch on top
> of this.  I'm not really fond of adding bool parameter to InstrEndLoop, but
> for now I didn't find any better way.
>
> What I'm now thinking about is probably we can extend this backend
> communication mechanism in order to query GUC values effective in a
> different backend or even setting the values.  The obvious candidate to
> check when you see some query misbehaving would be work_mem, for example.
> Or we could provide a report of all settings that were overridden in the
> backend's session, to the effect of running something like this:
>
> select * from pg_settings where context = 'user' and setting != reset_val;
>

this is a good idea. More times I interested what is current setting of
query. We cannot to use simple query - because it is not possible to push
PID to function simply, but you can to write function  pg_settings_pid() so
usage can look like

select * from pg_settings_pid(, possible other params) where ...



>
> The obvious candidates to be set externally are the
> cmdstatus_analyze/instrument_*: when you decided you want to turn them on,
> you'd rather do that carefully for a single backend than globally
> per-cluster.  One can still modify the postgresql.conf and then send SIGHUP
> to just a single backend, but I think a more direct way to alter the
> settings would be better.
>

I am 100% for possibility to read the setting. But reconfiguration from
other process is too hot coffee  - it can be available via extension, but
not via usually used tools.


>
> In this light should we rename the API to something like "backend control"
> instead of "command status"?  The SHOW/SET syntax could be extended to
> support the remote setting retrieval/update.
>

prepare API, and this functionality can be part of referential
implementation in contrib.

This patch should not to have too range be finished in this release cycle.

Regards

Pavel





>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-15 Thread Shulgin, Oleksandr
On Mon, Sep 14, 2015 at 7:27 PM, Pavel Stehule 
wrote:

>
> 2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>>
>> I have a radical proposal to remove the need for locking: make the
>> CmdStatusSlot struct consist of a mere dsm_handle and move all the required
>> metadata like sender_pid, request_type, etc. into the shared memory segment
>> itself.
>>
>> If we allow the only the requesting process to update the slot (that is
>> the handle value itself) this removes the need for locking between sender
>> and receiver.
>>
>> The sender will walk through the slots looking for a non-zero dsm handle
>> (according to dsm_create() implementation 0 is considered an invalid
>> handle), and if it finds a valid one, it will attach and look inside, to
>> check if it's destined for this process ID.  At first that might sound
>> strange, but I would expect 99% of the time that the only valid slot would
>> be for the process that has been just signaled.
>>
>> The sender process will then calculate the response message, update the
>> result_code in the shared memory segment and finally send the message
>> through the queue.  If the receiver has since detached we get a detached
>> result code and bail out.
>>
>> Clearing the slot after receiving the message should be the requesting
>> process' responsibility.  This way the receiver only writes to the slot and
>> the sender only reads from it.
>>
>> By the way, is it safe to assume atomic read/writes of dsm_handle
>> (uint32)?  I would be surprised if not.
>>
>
> I don't see any reason why it should not to work - only few processes will
> wait for data - so lost attach/detach shm operations will not be too much.
>

Please see attached for implementation of this approach.  The most
surprising thing is that it actually works :)

One problem still remains with the process requesting information when the
target process exits before it can have a chance to handle the signal.  The
requesting process then waits forever, because nobody attaches/detaches the
queue.  We've discussed this before and looks like we need to introduce a
timeout parameter to pg_cmdstatus()...

--
Alex
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 32ac58f..2e3beaf 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -43,6 +43,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
+#include "utils/cmdstatus.h"
 
 
 shmem_startup_hook_type shmem_startup_hook = NULL;
@@ -139,6 +140,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, CmdStatusShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -243,6 +245,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	ReplicationOriginShmemInit();
 	WalSndShmemInit();
 	WalRcvShmemInit();
+	CmdStatusShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 0abde43..e637be1 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -26,6 +26,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/cmdstatus.h"
 
 
 /*
@@ -296,6 +297,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_CMD_STATUS_INFO))
+		HandleCmdStatusInfoInterrupt();
+
 	if (set_latch_on_sigusr1)
 		SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d917af3..1a5b03c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -67,6 +67,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/cmdstatus.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -2991,6 +2992,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (CmdStatusInfoPending)
+		ProcessCmdStatusInfoRequest();
 }
 
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 3ed0b44..2c8687c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -18,7 +18,7 @@ endif
 # keep this list arranged alphabetically or it gets to be a mess
 OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \
 	array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \
-	bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \
+	bool.o cash.o char.o cmdstatus.o date.o datetime.o datum.o dbsize.o domains.o \
 	encode.o enum.o expandeddatum.o \
 	float.o 

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Pavel Stehule
2015-09-14 14:11 GMT+02:00 Tomas Vondra :

>
>
> On 09/14/2015 01:15 PM, Shulgin, Oleksandr wrote:
>
>> On Mon, Sep 14, 2015 at 11:53 AM, Tomas Vondra
>> >
>> wrote:
>>
>>
>> On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:
>>
>> On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra
>> > 
>> >
>> >> wrote:
>>
>> ...
>>
>>  - Attempts to get plan for simple insert queries like this
>>
>> INSERT INTO x SELECT * FROM x;
>>
>> end with a segfault, because ActivePortal->queryDesc is
>> 0x0 for this
>> query. Needs investigation.
>>
>>
>> Yes, I've hit the same problem after submitting the latest
>> version of
>> the patch.  For now I've just added a check for queryDesc being
>> not
>> NULL, but I guess the top of the current_query_stack might
>> contains
>> something useful.  Something I need to try.
>>
>>
>> Well, the thing is we're able to do EXPLAIN on those queries, and
>> IIRC auto_explain can log them too. So perhaps look into the hooks
>> where they take the queryDesc in those cases - it has to be
>> available somewhere.
>>
>>
>> Yes, this seems to work.  I was able to successfully query "create table
>> as" and even "explain analyze" for the plans.  In both cases
>> ActivePortal doesn't have a valid queryDesc, but the executor hook
>> captures one.
>>
>>  - The lockless approach seems fine to me, although I think
>> the fear
>> of performance issues is a bit moot (I don't think we
>> expect large
>> number of processes calling pg_cmdstatus at the same
>> time). But
>> it's not significantly more complex, so why not.
>>
>>
>> I believe the main benefit of the less-locking approach is that if
>> something goes wrong when two backends tried to communicate it
>> doesn't
>> prevent the rest of them from operating, because there is no
>> shared (and
>> thus locked) communication channel.
>>
>>
>> Sure, but I think it really deserves a bit more detailed explanation
>> of the protocol, and discussion of the expected behavior for some
>> basic failure types.
>>
>> For example - what happens when a backend requests info, but dies
>> before receiving it, and the backed is reused for another
>> connection? Doesn't this introduce a race condition? Perhaps not,
>> I'm just asking.
>>
>>
>> Now explained better in code comments.
>>
>> The worst thing that could happen in this case I believe is that the
>> requesting backend will not receive any response from the second use of
>> its slot due to the slot being marked as processed by the backend that
>> was asked first:
>>
>> A:
>> slot->is_processed = false;
>> slot->is_valid = true;
>> /* signals backend B */;
>> shm_mq_read(); /* blocks */
>>
>> B: /* finds the slot and prepares the response */
>>
>> A: /* decides to bail out */
>> InvalidateCmdStatusSlot();
>> shm_mq_detach();
>> /* exits pg_cmdstatus() */
>>
>> /* pg_cmdstatus() is called again */
>> /* initializes the slot and signals some backend */
>> shm_mq_read(); /* blocks */
>>
>> B: /* completes preparing the response */
>> slot->is_processed = true;
>> shm_mq_send() => SHM_MQ_DETACHED
>> slot->is_valid = false;
>> /* gives up with this slot */
>>
>
hmm .. yes - there have to be check if slot is related still to queue
before to change is_valid attribute.

Regards

Pavel


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Shulgin, Oleksandr
On Mon, Sep 14, 2015 at 11:53 AM, Tomas Vondra  wrote:

>
> On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:
>
>> On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra
>> >
>> wrote:
>>
> ...
>
>> - Attempts to get plan for simple insert queries like this
>>
>>INSERT INTO x SELECT * FROM x;
>>
>>end with a segfault, because ActivePortal->queryDesc is 0x0 for
>> this
>>query. Needs investigation.
>>
>>
>> Yes, I've hit the same problem after submitting the latest version of
>> the patch.  For now I've just added a check for queryDesc being not
>> NULL, but I guess the top of the current_query_stack might contains
>> something useful.  Something I need to try.
>>
>
> Well, the thing is we're able to do EXPLAIN on those queries, and IIRC
> auto_explain can log them too. So perhaps look into the hooks where they
> take the queryDesc in those cases - it has to be available somewhere.


Yes, this seems to work.  I was able to successfully query "create table
as" and even "explain analyze" for the plans.  In both cases ActivePortal
doesn't have a valid queryDesc, but the executor hook captures one.

- The lockless approach seems fine to me, although I think the fear
>>of performance issues is a bit moot (I don't think we expect large
>>number of processes calling pg_cmdstatus at the same time). But
>>it's not significantly more complex, so why not.
>>
>>
>> I believe the main benefit of the less-locking approach is that if
>> something goes wrong when two backends tried to communicate it doesn't
>> prevent the rest of them from operating, because there is no shared (and
>> thus locked) communication channel.
>>
>
> Sure, but I think it really deserves a bit more detailed explanation of
> the protocol, and discussion of the expected behavior for some basic
> failure types.
>
> For example - what happens when a backend requests info, but dies before
> receiving it, and the backed is reused for another connection? Doesn't this
> introduce a race condition? Perhaps not, I'm just asking.


Now explained better in code comments.

The worst thing that could happen in this case I believe is that the
requesting backend will not receive any response from the second use of its
slot due to the slot being marked as processed by the backend that was
asked first:

A:
slot->is_processed = false;
slot->is_valid = true;
/* signals backend B */;
shm_mq_read(); /* blocks */

B: /* finds the slot and prepares the response */

A: /* decides to bail out */
InvalidateCmdStatusSlot();
shm_mq_detach();
/* exits pg_cmdstatus() */

/* pg_cmdstatus() is called again */
/* initializes the slot and signals some backend */
shm_mq_read(); /* blocks */

B: /* completes preparing the response */
slot->is_processed = true;
shm_mq_send() => SHM_MQ_DETACHED
slot->is_valid = false;
/* gives up with this slot */

Now the backend that has been signaled on the second call to pg_cmdstatus
(it can be either some other backend, or the backend B again) will not find
an unprocessed slot, thus it will not try to attach/detach the queue and
the backend A will block forever.

This requires a really bad timing and the user should be able to interrupt
the querying backend A still.

In any case, the backends that are being asked to send the info will be
able to notice the problem (receiver detached early) and handle it
gracefully.

- The patch contains pretty much no documentation, both comments
>>at the code level and user docs. The lack of user docs is not that
>>a big deal at this point (although the patch seems to be mature
>>enough, although the user-level API will likely change).
>>
>>The lack of code comments is more serious, as it makes the review
>>somewhat more difficult. For example it'd be very nice to document
>>the contract for the lock-less interface.
>>
>>
>> I will add the code comments.  The user docs could wait before we decide
>> on the interface, I think.
>>
>
> Agreed, although I think having rudimentary user documentation would be
> useful for the reviewers - a summary of the goals that are a bit scattered
> over the e-mail thread.


OK

- I agree that pg_cmdstatus() is not the best API. Having something
>>like EXPLAIN PID would be nice, but it does not really work for
>>all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
>>there's not a single API for all cases, i.e. we should use EXPLAIN
>>PID for one case and invent something different for the other?
>>
>>
>> I can think of something like:
>>
>> EXPLAIN [ ( option [, ...] ) ] PROCESS ;
>>
>> where option is extended with:
>>
>>QUERY
>>PROGRESS
>>BACKTRACE
>>
>> in addition to the usual ANALYZE, VERBOSE, FORMAT, etc.
>>
>
> Seems OK. I'm not quite sure what PROGRESS is - I assume it's the same
> thing as ANALYZE? Why not to use the 

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Shulgin, Oleksandr
On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra 
wrote:

>
>> Now the backend that has been signaled on the second call to
>> pg_cmdstatus (it can be either some other backend, or the backend B
>> again) will not find an unprocessed slot, thus it will not try to
>> attach/detach the queue and the backend A will block forever.
>>
>> This requires a really bad timing and the user should be able to
>> interrupt the querying backend A still.
>>
>
> I think we can't rely on the low probability that this won't happen, and
> we should not rely on people interrupting the backend. Being able to detect
> the situation and fail gracefully should be possible.
>
> It may be possible to introduce some lock-less protocol preventing such
> situations, but it's not there at the moment. If you believe it's possible,
> you need to explain and "prove" that it's actually safe.
>
> Otherwise we may need to introduce some basic locking - for example we may
> introduce a LWLock for each slot, and lock it with dontWait=true (and skip
> it if we couldn't lock it). This should prevent most scenarios where one
> corrupted slot blocks many processes.


OK, I will revisit this part then.

In any case, the backends that are being asked to send the info will be
>> able to notice the problem (receiver detached early) and handle it
>> gracefully.
>>
>
> Ummm, how? Maybe I missed something?


Well, I didn't attach the updated patch (doing that now).  The basic idea
is that when the backend that has requested information bails out
prematurely it still detaches from the shared memory queue.  This makes it
possible for the backend being asked to detect the situation either before
attaching to the queue or when trying to send the data, so it won't be
blocked forever if the other backend failed to wait.

I don't think we should mix this with monitoring of auxiliary
>> processes. This interface is designed at monitoring SQL queries
>> running in other backends, effectively "remote" EXPLAIN. But those
>> auxiliary processes are not processing SQL queries at all, they're
>> not even using regular executor ...
>>
>> OTOH the ability to request this info (e.g. auxiliary process
>> looking at plans running in backends) seems useful, so I'm ok with
>> tuple slots for auxiliary processes.
>>
>>
>> Now that I think about it, reserving the slots for aux process doesn't
>> let us query their status, it's the other way round.  If we don't
>> reserve them, then aux process would not be able to query any other
>> process for the status.  Likely this is not a problem at all, so we can
>> remove these extra slots.
>>
>
> I don't know. I can imagine using this from background workers, but I
> think those are counted as regular backends (not sure though).


MaxBackends includes the background workers, yes.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Tomas Vondra



On 09/14/2015 01:15 PM, Shulgin, Oleksandr wrote:

On Mon, Sep 14, 2015 at 11:53 AM, Tomas Vondra
> wrote:


On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:

On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra

>> wrote:

...

 - Attempts to get plan for simple insert queries like this

INSERT INTO x SELECT * FROM x;

end with a segfault, because ActivePortal->queryDesc is
0x0 for this
query. Needs investigation.


Yes, I've hit the same problem after submitting the latest
version of
the patch.  For now I've just added a check for queryDesc being not
NULL, but I guess the top of the current_query_stack might contains
something useful.  Something I need to try.


Well, the thing is we're able to do EXPLAIN on those queries, and
IIRC auto_explain can log them too. So perhaps look into the hooks
where they take the queryDesc in those cases - it has to be
available somewhere.


Yes, this seems to work.  I was able to successfully query "create table
as" and even "explain analyze" for the plans.  In both cases
ActivePortal doesn't have a valid queryDesc, but the executor hook
captures one.

 - The lockless approach seems fine to me, although I think
the fear
of performance issues is a bit moot (I don't think we
expect large
number of processes calling pg_cmdstatus at the same
time). But
it's not significantly more complex, so why not.


I believe the main benefit of the less-locking approach is that if
something goes wrong when two backends tried to communicate it
doesn't
prevent the rest of them from operating, because there is no
shared (and
thus locked) communication channel.


Sure, but I think it really deserves a bit more detailed explanation
of the protocol, and discussion of the expected behavior for some
basic failure types.

For example - what happens when a backend requests info, but dies
before receiving it, and the backed is reused for another
connection? Doesn't this introduce a race condition? Perhaps not,
I'm just asking.


Now explained better in code comments.

The worst thing that could happen in this case I believe is that the
requesting backend will not receive any response from the second use of
its slot due to the slot being marked as processed by the backend that
was asked first:

A:
slot->is_processed = false;
slot->is_valid = true;
/* signals backend B */;
shm_mq_read(); /* blocks */

B: /* finds the slot and prepares the response */

A: /* decides to bail out */
InvalidateCmdStatusSlot();
shm_mq_detach();
/* exits pg_cmdstatus() */

/* pg_cmdstatus() is called again */
/* initializes the slot and signals some backend */
shm_mq_read(); /* blocks */

B: /* completes preparing the response */
slot->is_processed = true;
shm_mq_send() => SHM_MQ_DETACHED
slot->is_valid = false;
/* gives up with this slot */

Now the backend that has been signaled on the second call to
pg_cmdstatus (it can be either some other backend, or the backend B
again) will not find an unprocessed slot, thus it will not try to
attach/detach the queue and the backend A will block forever.

This requires a really bad timing and the user should be able to
interrupt the querying backend A still.


I think we can't rely on the low probability that this won't happen, and 
we should not rely on people interrupting the backend. Being able to 
detect the situation and fail gracefully should be possible.


It may be possible to introduce some lock-less protocol preventing such 
situations, but it's not there at the moment. If you believe it's 
possible, you need to explain and "prove" that it's actually safe.


Otherwise we may need to introduce some basic locking - for example we 
may introduce a LWLock for each slot, and lock it with dontWait=true 
(and skip it if we couldn't lock it). This should prevent most scenarios 
where one corrupted slot blocks many processes.




In any case, the backends that are being asked to send the info will be
able to notice the problem (receiver detached early) and handle it
gracefully.


Ummm, how? Maybe I missed something?



I don't think we should mix this with monitoring of auxiliary
processes. This interface is designed at monitoring SQL queries
running in other backends, effectively "remote" EXPLAIN. But those
auxiliary processes are not processing SQL queries at all, they're
not even using regular executor ...

OTOH the ability to request this info (e.g. auxiliary process
looking at plans running in backends) seems useful, 

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Shulgin, Oleksandr
>
> Well, I didn't attach the updated patch (doing that now).
>

This time for real.  Sorry, it's Monday :-p
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 32ac58f..2e3beaf 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -43,6 +43,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
+#include "utils/cmdstatus.h"
 
 
 shmem_startup_hook_type shmem_startup_hook = NULL;
@@ -139,6 +140,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, CmdStatusShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -243,6 +245,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	ReplicationOriginShmemInit();
 	WalSndShmemInit();
 	WalRcvShmemInit();
+	CmdStatusShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 0abde43..e637be1 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -26,6 +26,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/cmdstatus.h"
 
 
 /*
@@ -296,6 +297,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_CMD_STATUS_INFO))
+		HandleCmdStatusInfoInterrupt();
+
 	if (set_latch_on_sigusr1)
 		SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d917af3..1a5b03c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -67,6 +67,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/cmdstatus.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -2991,6 +2992,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (CmdStatusInfoPending)
+		ProcessCmdStatusInfoRequest();
 }
 
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 3ed0b44..2c8687c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -18,7 +18,7 @@ endif
 # keep this list arranged alphabetically or it gets to be a mess
 OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \
 	array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \
-	bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \
+	bool.o cash.o char.o cmdstatus.o date.o datetime.o datum.o dbsize.o domains.o \
 	encode.o enum.o expandeddatum.o \
 	float.o format_type.o formatting.o genfile.o \
 	geo_ops.o geo_selfuncs.o inet_cidr_ntop.o inet_net_pton.o int.o \
diff --git a/src/backend/utils/adt/cmdstatus.c b/src/backend/utils/adt/cmdstatus.c
new file mode 100644
index 000..5cc6d51
--- /dev/null
+++ b/src/backend/utils/adt/cmdstatus.c
@@ -0,0 +1,691 @@
+/*-
+ *
+ * cmdstatus.c
+ *	  Definitions for pg_cmdstatus function.
+ *
+ * Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/cmdstatus.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/htup_details.h"
+#include "commands/explain.h"
+#include "lib/stringinfo.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/procsignal.h"
+#include "storage/shm_mq.h"
+#include "storage/shm_toc.h"
+#include "tcop/dest.h"
+#include "tcop/pquery.h"
+#include "utils/builtins.h"
+#include "utils/cmdstatus.h"
+
+
+typedef enum {
+	CMD_STATUS_REQUEST_EXPLAIN = 1,
+	CMD_STATUS_REQUEST_QUERY_TEXT = 2,
+	CMD_STATUS_REQUEST_PROGRESS_TAG = 3,
+	CMD_STATUS_REQUEST_EXPLAIN_BACKTRACE = 4
+} CmdStatusInfoRequestType;
+
+#define CMD_STATUS_MAX_REQUEST CMD_STATUS_REQUEST_EXPLAIN_BACKTRACE
+
+typedef enum {
+	CMD_STATUS_RESULT_FAILURE = -1,
+	CMD_STATUS_RESULT_SUCCESS = 0,
+	CMD_STATUS_RESULT_BACKEND_IDLE,
+	CMD_STATUS_RESULT_NO_DATA
+} CmdStatusInfoResultCode;
+
+/*
+ * Each process that wants to be able to query other processes for their
+ * status registers itself in a slot in the shared memory by setting the
+ * slot's process ID.  The slots array is indexed by backend ID, so any slot
+ * can be assigned at max to one backend at any given time.
+ *
+ * In order to actually query some backend for its status, the interested
+ * process will initialize its *own* slot, create a DSM segment and initialize
+ * a shared 

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Pavel Stehule
2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr 
:

> On Mon, Sep 14, 2015 at 3:09 PM, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <
>> tomas.von...@2ndquadrant.com> wrote:
>>
>>>
 Now the backend that has been signaled on the second call to
 pg_cmdstatus (it can be either some other backend, or the backend B
 again) will not find an unprocessed slot, thus it will not try to
 attach/detach the queue and the backend A will block forever.

 This requires a really bad timing and the user should be able to
 interrupt the querying backend A still.

>>>
>>> I think we can't rely on the low probability that this won't happen, and
>>> we should not rely on people interrupting the backend. Being able to detect
>>> the situation and fail gracefully should be possible.
>>>
>>> It may be possible to introduce some lock-less protocol preventing such
>>> situations, but it's not there at the moment. If you believe it's possible,
>>> you need to explain and "prove" that it's actually safe.
>>>
>>> Otherwise we may need to introduce some basic locking - for example we
>>> may introduce a LWLock for each slot, and lock it with dontWait=true (and
>>> skip it if we couldn't lock it). This should prevent most scenarios where
>>> one corrupted slot blocks many processes.
>>
>>
>> OK, I will revisit this part then.
>>
>
> I have a radical proposal to remove the need for locking: make the
> CmdStatusSlot struct consist of a mere dsm_handle and move all the required
> metadata like sender_pid, request_type, etc. into the shared memory segment
> itself.
>
> If we allow the only the requesting process to update the slot (that is
> the handle value itself) this removes the need for locking between sender
> and receiver.
>
> The sender will walk through the slots looking for a non-zero dsm handle
> (according to dsm_create() implementation 0 is considered an invalid
> handle), and if it finds a valid one, it will attach and look inside, to
> check if it's destined for this process ID.  At first that might sound
> strange, but I would expect 99% of the time that the only valid slot would
> be for the process that has been just signaled.
>
> The sender process will then calculate the response message, update the
> result_code in the shared memory segment and finally send the message
> through the queue.  If the receiver has since detached we get a detached
> result code and bail out.
>
> Clearing the slot after receiving the message should be the requesting
> process' responsibility.  This way the receiver only writes to the slot and
> the sender only reads from it.
>
> By the way, is it safe to assume atomic read/writes of dsm_handle
> (uint32)?  I would be surprised if not.
>

I don't see any reason why it should not to work - only few processes will
wait for data - so lost attach/detach shm operations will not be too much.

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Shulgin, Oleksandr
On Mon, Sep 14, 2015 at 3:09 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>>
>>> Now the backend that has been signaled on the second call to
>>> pg_cmdstatus (it can be either some other backend, or the backend B
>>> again) will not find an unprocessed slot, thus it will not try to
>>> attach/detach the queue and the backend A will block forever.
>>>
>>> This requires a really bad timing and the user should be able to
>>> interrupt the querying backend A still.
>>>
>>
>> I think we can't rely on the low probability that this won't happen, and
>> we should not rely on people interrupting the backend. Being able to detect
>> the situation and fail gracefully should be possible.
>>
>> It may be possible to introduce some lock-less protocol preventing such
>> situations, but it's not there at the moment. If you believe it's possible,
>> you need to explain and "prove" that it's actually safe.
>>
>> Otherwise we may need to introduce some basic locking - for example we
>> may introduce a LWLock for each slot, and lock it with dontWait=true (and
>> skip it if we couldn't lock it). This should prevent most scenarios where
>> one corrupted slot blocks many processes.
>
>
> OK, I will revisit this part then.
>

I have a radical proposal to remove the need for locking: make the
CmdStatusSlot struct consist of a mere dsm_handle and move all the required
metadata like sender_pid, request_type, etc. into the shared memory segment
itself.

If we allow the only the requesting process to update the slot (that is the
handle value itself) this removes the need for locking between sender and
receiver.

The sender will walk through the slots looking for a non-zero dsm handle
(according to dsm_create() implementation 0 is considered an invalid
handle), and if it finds a valid one, it will attach and look inside, to
check if it's destined for this process ID.  At first that might sound
strange, but I would expect 99% of the time that the only valid slot would
be for the process that has been just signaled.

The sender process will then calculate the response message, update the
result_code in the shared memory segment and finally send the message
through the queue.  If the receiver has since detached we get a detached
result code and bail out.

Clearing the slot after receiving the message should be the requesting
process' responsibility.  This way the receiver only writes to the slot and
the sender only reads from it.

By the way, is it safe to assume atomic read/writes of dsm_handle
(uint32)?  I would be surprised if not.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Shulgin, Oleksandr
On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra  wrote:

> Hi,
>
> I did a quick initial review of this patch today, so here are my comments
> so far:
>

Hi Tomas,

First of all, thanks for the review!

- ipcs.c should include utils/cmdstatus.h (the compiler complains
>   about implicit declaration of two functions)
>

Correct, though the file name is ipci.c.

- Attempts to get plan for simple insert queries like this
>
>   INSERT INTO x SELECT * FROM x;
>
>   end with a segfault, because ActivePortal->queryDesc is 0x0 for this
>   query. Needs investigation.
>

Yes, I've hit the same problem after submitting the latest version of the
patch.  For now I've just added a check for queryDesc being not NULL, but I
guess the top of the current_query_stack might contains something useful.
Something I need to try.

- The lockless approach seems fine to me, although I think the fear
>   of performance issues is a bit moot (I don't think we expect large
>   number of processes calling pg_cmdstatus at the same time). But
>   it's not significantly more complex, so why not.
>

I believe the main benefit of the less-locking approach is that if
something goes wrong when two backends tried to communicate it doesn't
prevent the rest of them from operating, because there is no shared (and
thus locked) communication channel.

- The patch contains pretty much no documentation, both comments
>   at the code level and user docs. The lack of user docs is not that
>   a big deal at this point (although the patch seems to be mature
>   enough, although the user-level API will likely change).
>
>   The lack of code comments is more serious, as it makes the review
>   somewhat more difficult. For example it'd be very nice to document
>   the contract for the lock-less interface.
>

I will add the code comments.  The user docs could wait before we decide on
the interface, I think.

- I agree that pg_cmdstatus() is not the best API. Having something
>   like EXPLAIN PID would be nice, but it does not really work for
>   all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
>   there's not a single API for all cases, i.e. we should use EXPLAIN
>   PID for one case and invent something different for the other?
>

I can think of something like:

EXPLAIN [ ( option [, ...] ) ] PROCESS ;

where option is extended with:

  QUERY
  PROGRESS
  BACKTRACE

in addition to the usual ANALYZE, VERBOSE, FORMAT, etc.

- Is there a particular reason why we allocate slots for auxiliary
>   processes and not just for backends (NumBackends)? Do we expect those
>   auxiliary processes to ever use this API?
>

If we extend the interface to a more general one, there still might be some
space for querying status of checkpointer of bgwriter.

- CleanupCmdStatusSlot seems needlessly complicated. I don't quite see
>   the need for the second argument, or the internal slot variable. Why
>   not to simply use the MyCmdStatusSlot directly?
>

Good point.

- I also don't quite understand why we need to track css_pid for the
>   slot? In what scenario will this actually matter?
>

I think it's being only used for error reporting and could help in
debugging, but for now that's it.

- While being able to get EXPLAIN from the running process is nice,
>   I'm especially interested in getting EXPLAIN ANALYZE to get insight
>   into the progress of the execution. The are other ways to get the
>   EXPLAIN, e.g. by opening a different connection and actually running
>   it (sure, the plan might have changed since then), but currently
>   there's no way to get insight into the progress.
>
>   From the thread I get the impression that Oleksandr also finds this
>   useful - correct? What are the plans in this direction?
>
>   ISTM we need at least two things for that to work:
>
>   (a) Ability to enable instrumentation on all queries (effectively
>   what auto_explain allows), otherwise we can't get EXPLAIN ANALYZE
>   on the queries later. But auto_explain is an extension, so that
>   does not seem as a good match if this is supposed to be in core.
>   In that case a separate GUC seems appropriate.
>
>   (b) Being able to run the InstrEnd* methods repeatedly - the initial
>   message in this thread mentions issues with InstrEndLoop for
>   example. So perhaps this is non-trivial.
>

I was able to make this work with a simple change to InstrEndLoop and the
callers.  Basically, adding a bool parameter in_explain and passing an
appropriate value.  I guess that's not the best approach, but it appears to
work.

Adding a GUC to enable instrumentation sounds reasonable.

Do you believe it makes sense to add instrumentation support in this same
patch or better focus on making the simplest thing work first?

- And finally, I think we should really support all existing EXPLAIN
>   formats, not just text. We need to support the other formats (yaml,
>   json, xml) if we want to use the EXPLAIN PID approach, and it also

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Pavel Stehule
2015-09-14 10:23 GMT+02:00 Shulgin, Oleksandr 
:

>
>
>
> I can think of something like:
>
> EXPLAIN [ ( option [, ...] ) ] PROCESS ;
>
> where option is extended with:
>
>   QUERY
>   PROGRESS
>   BACKTRACE
>
> in addition to the usual ANALYZE, VERBOSE, FORMAT, etc.
>
>
It can work

Regards

Pavel


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-14 Thread Tomas Vondra



On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:

On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra
> wrote:

...

- Attempts to get plan for simple insert queries like this

   INSERT INTO x SELECT * FROM x;

   end with a segfault, because ActivePortal->queryDesc is 0x0 for this
   query. Needs investigation.


Yes, I've hit the same problem after submitting the latest version of
the patch.  For now I've just added a check for queryDesc being not
NULL, but I guess the top of the current_query_stack might contains
something useful.  Something I need to try.


Well, the thing is we're able to do EXPLAIN on those queries, and IIRC 
auto_explain can log them too. So perhaps look into the hooks where they 
take the queryDesc in those cases - it has to be available somewhere.



- The lockless approach seems fine to me, although I think the fear
   of performance issues is a bit moot (I don't think we expect large
   number of processes calling pg_cmdstatus at the same time). But
   it's not significantly more complex, so why not.


I believe the main benefit of the less-locking approach is that if
something goes wrong when two backends tried to communicate it doesn't
prevent the rest of them from operating, because there is no shared (and
thus locked) communication channel.


Sure, but I think it really deserves a bit more detailed explanation of 
the protocol, and discussion of the expected behavior for some basic 
failure types.


For example - what happens when a backend requests info, but dies before 
receiving it, and the backed is reused for another connection? Doesn't 
this introduce a race condition? Perhaps not, I'm just asking.




- The patch contains pretty much no documentation, both comments
   at the code level and user docs. The lack of user docs is not that
   a big deal at this point (although the patch seems to be mature
   enough, although the user-level API will likely change).

   The lack of code comments is more serious, as it makes the review
   somewhat more difficult. For example it'd be very nice to document
   the contract for the lock-less interface.


I will add the code comments.  The user docs could wait before we decide
on the interface, I think.


Agreed, although I think having rudimentary user documentation would be 
useful for the reviewers - a summary of the goals that are a bit 
scattered over the e-mail thread.



- I agree that pg_cmdstatus() is not the best API. Having something
   like EXPLAIN PID would be nice, but it does not really work for
   all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
   there's not a single API for all cases, i.e. we should use EXPLAIN
   PID for one case and invent something different for the other?


I can think of something like:

EXPLAIN [ ( option [, ...] ) ] PROCESS ;

where option is extended with:

   QUERY
   PROGRESS
   BACKTRACE

in addition to the usual ANALYZE, VERBOSE, FORMAT, etc.


Seems OK. I'm not quite sure what PROGRESS is - I assume it's the same 
thing as ANALYZE? Why not to use the keyword, then?




- Is there a particular reason why we allocate slots for auxiliary
   processes and not just for backends (NumBackends)? Do we expect those
   auxiliary processes to ever use this API?


If we extend the interface to a more general one, there still might be
some space for querying status of checkpointer of bgwriter.


I don't think we should mix this with monitoring of auxiliary processes. 
This interface is designed at monitoring SQL queries running in other 
backends, effectively "remote" EXPLAIN. But those auxiliary processes 
are not processing SQL queries at all, they're not even using regular 
executor ...


OTOH the ability to request this info (e.g. auxiliary process looking at 
plans running in backends) seems useful, so I'm ok with tuple slots for 
auxiliary processes.




- CleanupCmdStatusSlot seems needlessly complicated. I don't quite see
   the need for the second argument, or the internal slot variable. Why
   not to simply use the MyCmdStatusSlot directly?


Good point.

- I also don't quite understand why we need to track css_pid for the
   slot? In what scenario will this actually matter?


I think it's being only used for error reporting and could help in
debugging, but for now that's it.


I recommend getting rid of it, unless we have a clear idea of how to use 
it. Otherwise I envision we won't really keep it updated properly 
(because it's "just for debugging"), and then one day someone actually 
starts using it and get bitten by it.




- While being able to get EXPLAIN from the running process is nice,
   I'm especially interested in getting EXPLAIN ANALYZE to get insight
   into the progress of the execution. The are other ways to get the
   EXPLAIN, e.g. by opening a different 

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-12 Thread Pavel Stehule
Hi

Thank you for precious check.

2015-09-12 11:50 GMT+02:00 Tomas Vondra :

> Hi,
>
> I did a quick initial review of this patch today, so here are my comments
> so far:
>
> - ipcs.c should include utils/cmdstatus.h (the compiler complains
>   about implicit declaration of two functions)
>
> - Attempts to get plan for simple insert queries like this
>
>   INSERT INTO x SELECT * FROM x;
>
>   end with a segfault, because ActivePortal->queryDesc is 0x0 for this
>   query. Needs investigation.
>
> - The lockless approach seems fine to me, although I think the fear
>   of performance issues is a bit moot (I don't think we expect large
>   number of processes calling pg_cmdstatus at the same time). But
>   it's not significantly more complex, so why not.
>
> - The patch contains pretty much no documentation, both comments
>   at the code level and user docs. The lack of user docs is not that
>   a big deal at this point (although the patch seems to be mature
>   enough, although the user-level API will likely change).
>
>   The lack of code comments is more serious, as it makes the review
>   somewhat more difficult. For example it'd be very nice to document
>   the contract for the lock-less interface.
>
> - I agree that pg_cmdstatus() is not the best API. Having something
>   like EXPLAIN PID would be nice, but it does not really work for
>   all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
>   there's not a single API for all cases, i.e. we should use EXPLAIN
>   PID for one case and invent something different for the other?
>

I used this simple interface because it is faster way how to get a
prototype. In this patch the most complexity is in interprocess
communication and in executor hooking. The UI can be designed later very
quickly.

There is consensus about EXPLAIN PID, for other (status, query) I don't
want to introduce new keyword, so there will be accessed via functions. But
we can introduce psql commands

\qpid - query pid and \spid - status pid


> - Is there a particular reason why we allocate slots for auxiliary
>   processes and not just for backends (NumBackends)? Do we expect those
>   auxiliary processes to ever use this API?
>

The introduced interprocess communication (we talked about this possibility
week ago) can be used by any diagnostic extension - so aux processes can be
interesting too.


>
> - CleanupCmdStatusSlot seems needlessly complicated. I don't quite see
>   the need for the second argument, or the internal slot variable. Why
>   not to simply use the MyCmdStatusSlot directly?
>
> - I also don't quite understand why we need to track css_pid for the
>   slot? In what scenario will this actually matter?
>
>
I have to recheck it - but this is safeguard against to change process with
same backendId.


> - While being able to get EXPLAIN from the running process is nice,
>   I'm especially interested in getting EXPLAIN ANALYZE to get insight
>   into the progress of the execution. The are other ways to get the
>   EXPLAIN, e.g. by opening a different connection and actually running
>   it (sure, the plan might have changed since then), but currently
>   there's no way to get insight into the progress.
>

It can be pretty nice feature. I though about it - as next step of this
feature. But I have not idea how it is complex task - maybe not too much.


>
>   From the thread I get the impression that Oleksandr also finds this
>   useful - correct? What are the plans in this direction?
>
>   ISTM we need at least two things for that to work:
>
>   (a) Ability to enable instrumentation on all queries (effectively
>   what auto_explain allows), otherwise we can't get EXPLAIN ANALYZE
>   on the queries later. But auto_explain is an extension, so that
>   does not seem as a good match if this is supposed to be in core.
>   In that case a separate GUC seems appropriate.
>

If described functionality will be implemented, then auto_explain extension
will be very thin envelop - I don't see any reason, why it should not be
integrated to core.


>
>   (b) Being able to run the InstrEnd* methods repeatedly - the initial
>   message in this thread mentions issues with InstrEndLoop for
>   example. So perhaps this is non-trivial.
>
> - And finally, I think we should really support all existing EXPLAIN
>   formats, not just text. We need to support the other formats (yaml,
>   json, xml) if we want to use the EXPLAIN PID approach, and it also
>   makes the plans easier to process by additional tools.
>

surely - there is not any reason why not.

Regards

Pavel


>
>
> regards
>
> --
> Tomas Vondra   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-12 Thread Tomas Vondra

Hi,

I did a quick initial review of this patch today, so here are my 
comments so far:


- ipcs.c should include utils/cmdstatus.h (the compiler complains
  about implicit declaration of two functions)

- Attempts to get plan for simple insert queries like this

  INSERT INTO x SELECT * FROM x;

  end with a segfault, because ActivePortal->queryDesc is 0x0 for this
  query. Needs investigation.

- The lockless approach seems fine to me, although I think the fear
  of performance issues is a bit moot (I don't think we expect large
  number of processes calling pg_cmdstatus at the same time). But
  it's not significantly more complex, so why not.

- The patch contains pretty much no documentation, both comments
  at the code level and user docs. The lack of user docs is not that
  a big deal at this point (although the patch seems to be mature
  enough, although the user-level API will likely change).

  The lack of code comments is more serious, as it makes the review
  somewhat more difficult. For example it'd be very nice to document
  the contract for the lock-less interface.

- I agree that pg_cmdstatus() is not the best API. Having something
  like EXPLAIN PID would be nice, but it does not really work for
  all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
  there's not a single API for all cases, i.e. we should use EXPLAIN
  PID for one case and invent something different for the other?

- Is there a particular reason why we allocate slots for auxiliary
  processes and not just for backends (NumBackends)? Do we expect those
  auxiliary processes to ever use this API?

- CleanupCmdStatusSlot seems needlessly complicated. I don't quite see
  the need for the second argument, or the internal slot variable. Why
  not to simply use the MyCmdStatusSlot directly?

- I also don't quite understand why we need to track css_pid for the
  slot? In what scenario will this actually matter?

- While being able to get EXPLAIN from the running process is nice,
  I'm especially interested in getting EXPLAIN ANALYZE to get insight
  into the progress of the execution. The are other ways to get the
  EXPLAIN, e.g. by opening a different connection and actually running
  it (sure, the plan might have changed since then), but currently
  there's no way to get insight into the progress.

  From the thread I get the impression that Oleksandr also finds this
  useful - correct? What are the plans in this direction?

  ISTM we need at least two things for that to work:

  (a) Ability to enable instrumentation on all queries (effectively
  what auto_explain allows), otherwise we can't get EXPLAIN ANALYZE
  on the queries later. But auto_explain is an extension, so that
  does not seem as a good match if this is supposed to be in core.
  In that case a separate GUC seems appropriate.

  (b) Being able to run the InstrEnd* methods repeatedly - the initial
  message in this thread mentions issues with InstrEndLoop for
  example. So perhaps this is non-trivial.

- And finally, I think we should really support all existing EXPLAIN
  formats, not just text. We need to support the other formats (yaml,
  json, xml) if we want to use the EXPLAIN PID approach, and it also
  makes the plans easier to process by additional tools.


regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-09 Thread Pavel Stehule
2015-09-08 18:53 GMT+02:00 Shulgin, Oleksandr 
:

> On Tue, Sep 8, 2015 at 11:49 AM, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>>
>> >> The real problem could be if the process that was signaled to connect
>> to the message queue never handles the interrupt, and we keep waiting
>> forever in shm_mq_receive().  We could add a timeout parameter or just let
>> the user cancel the call: send a cancellation request, use
>> pg_cancel_backend() or set statement_timeout before running this.
>> >
>> > This is valid question - for begin we can use a statement_timeout and
>> we don't need to design some special (if you don't hold some important
>> lock).
>> > My example (the code has prototype quality) is little bit longer, but
>> it work without global lock - the requester doesn't block any other
>>
>> I'll update the commitfest patch to use this technique.
>>
>
> Please find attached v4.
>

It is better

Two notices:

1. The communication mechanism can be used more wide, than only for this
purpose. We can introduce a SendInfoHook - and it can be used for any
customer probes - memory, cpu, ...

2. With your support for explain of nested queries we have all what we need
for integration auto_explain to core.

Regards

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-09 Thread Shulgin, Oleksandr
On Wed, Sep 9, 2015 at 8:36 AM, Pavel Stehule 
wrote:

>
>> Please find attached v4.
>>
>
> It is better
>

One important thing to notice, and probably deserves a code comment, that
any modification of the slot fields done by the info sender backend must be
done before actually sending the message via the shared memory queue (or
detaching one, if there's nothing to be sent).  Otherwise it is possible
that is_processed flag will be set on the slot that has been already reused
by the receiving backend.

I could hit that problem rather easily while doing

select pg_cmdstatus(pid,$1) from pg_stat_activity where
pid<>pg_backend_pid() \watch 1

and running pgbench with >= 8 connections in the background.  The observed
effect is that after a few successful runs, the pg_cmdstatus() never
returns because the next signaled backend loops through the slots and
doesn't find an unprocessed one (because it was marked as processed by the
backend that was responding just before it).

Two notices:
>
> 1. The communication mechanism can be used more wide, than only for this
> purpose. We can introduce a SendInfoHook - and it can be used for any
> customer probes - memory, cpu, ...
>

Not sure if for CPU you can get any more insight than an external tool like
top(1) will provide.  For the memory, it might be indeed useful in some way
(memory context inspection?)

So there's certainly a space for generalization.  Rename it as
pg_backend_info()?

2. With your support for explain of nested queries we have all what we need
> for integration auto_explain to core.
>

Well, I don't quite see how that follows.  What I'm really after is
bringing instrumentation support to this, so that not only plan could be
extracted, but also the rows/loops/times accumulated thus far during the
query run.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-09 Thread Pavel Stehule
Two notices:
>
> 1. The communication mechanism can be used more wide, than only for this
> purpose. We can introduce a SendInfoHook - and it can be used for any
> customer probes - memory, cpu, ...
>

Not sure if for CPU you can get any more insight than an external tool like
top(1) will provide.  For the memory, it might be indeed useful in some way
(memory context inspection?)

CPU is probably nonsense - sorry - but there are more other possibilities,
how to use it - simple checking some internal states of extensions without
log processing or without gdb, etc


>
> So there's certainly a space for generalization.  Rename it as
> pg_backend_info()?
>

It is good name, so why not?

>
> 2. With your support for explain of nested queries we have all what we
>> need for integration auto_explain to core.
>>
>
> Well, I don't quite see how that follows.  What I'm really after is
> bringing instrumentation support to this, so that not only plan could be
> extracted, but also the rows/loops/times accumulated thus far during the
> query run.
>

It is possible, but not necessary step - but it is premature question in
this moment

Regards

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-09 Thread Shulgin, Oleksandr
On Wed, Sep 9, 2015 at 10:22 AM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Wed, Sep 9, 2015 at 8:36 AM, Pavel Stehule 
> wrote:
>
>>
>>> Please find attached v4.
>>>
>>
>> It is better
>>
>
> One important thing to notice, and probably deserves a code comment, that
> any modification of the slot fields done by the info sender backend must be
> done before actually sending the message via the shared memory queue (or
> detaching one, if there's nothing to be sent).  Otherwise it is possible
> that is_processed flag will be set on the slot that has been already reused
> by the receiving backend.
>

Sorry, the latest version did contain some debugging messages I didn't
intend to send.  I've removed them and also added a comment about the above
synchronization problem.

--
Alex
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 32ac58f..a2b757c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -139,6 +139,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, CmdStatusShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -243,6 +244,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	ReplicationOriginShmemInit();
 	WalSndShmemInit();
 	WalRcvShmemInit();
+	CmdStatusShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 0abde43..e637be1 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -26,6 +26,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/cmdstatus.h"
 
 
 /*
@@ -296,6 +297,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_CMD_STATUS_INFO))
+		HandleCmdStatusInfoInterrupt();
+
 	if (set_latch_on_sigusr1)
 		SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d917af3..1a5b03c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -67,6 +67,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/cmdstatus.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -2991,6 +2992,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (CmdStatusInfoPending)
+		ProcessCmdStatusInfoRequest();
 }
 
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 3ed0b44..2c8687c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -18,7 +18,7 @@ endif
 # keep this list arranged alphabetically or it gets to be a mess
 OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \
 	array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \
-	bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \
+	bool.o cash.o char.o cmdstatus.o date.o datetime.o datum.o dbsize.o domains.o \
 	encode.o enum.o expandeddatum.o \
 	float.o format_type.o formatting.o genfile.o \
 	geo_ops.o geo_selfuncs.o inet_cidr_ntop.o inet_net_pton.o int.o \
diff --git a/src/backend/utils/adt/cmdstatus.c b/src/backend/utils/adt/cmdstatus.c
new file mode 100644
index 000..40df6e1
--- /dev/null
+++ b/src/backend/utils/adt/cmdstatus.c
@@ -0,0 +1,640 @@
+/*-
+ *
+ * cmdstatus.c
+ *	  Definitions for pg_cmdstatus function.
+ *
+ * Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/cmdstatus.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/htup_details.h"
+#include "commands/explain.h"
+#include "lib/stringinfo.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/shm_mq.h"
+#include "storage/shm_toc.h"
+#include "tcop/dest.h"
+#include "tcop/pquery.h"
+#include "utils/builtins.h"
+#include "utils/cmdstatus.h"
+
+
+typedef enum {
+	CMD_STATUS_REQUEST_EXPLAIN = 1,
+	CMD_STATUS_REQUEST_QUERY_TEXT = 2,
+	CMD_STATUS_REQUEST_PROGRESS_TAG = 3,
+	CMD_STATUS_REQUEST_EXPLAIN_BACKTRACE = 4
+} CmdStatusInfoRequestType;
+
+#define CMD_STATUS_MAX_REQUEST CMD_STATUS_REQUEST_EXPLAIN_BACKTRACE
+
+typedef enum {
+	CMD_STATUS_RESULT_FAILURE = -1,
+	CMD_STATUS_RESULT_SUCCESS = 0,
+	CMD_STATUS_RESULT_BACKEND_IDLE,
+	

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-08 Thread Shulgin, Oleksandr
On Tue, Sep 8, 2015 at 6:58 AM, Pavel Stehule 
wrote:
>>
>> But you will still lock on the slots list to find an unused one.  How is
that substantially different from what I'm doing?
>
> It is not necessary - you can use similar technique to what it does
PGPROC. I am sending "lock free" demo.
>
> I don't afraid about locks - short locks, when the range and time are
limited. But there are lot of bugs, and fixes with the name "do
interruptible some", and it is reason, why I prefer typical design for work
with shared memory.

Thanks, this is really helpful!  The key difference is that every backend
has a dedicated slot, so there's no need to search for a free one, which
would again incur locking.

>> Well, we are talking about hundreds to thousands bytes per plan in
total.  And if my reading of shm_mq implementation is correct, if the
message fits into the shared memory buffer, the receiver gets the direct
pointer to the shared memory, no extra allocation/copy to process-local
memory.  So this can be actually a win.
>
> you have to calculate with signals and interprocess communication. the
cost of memory allocation is not all.

Sure.  Anyway, we're talking about only kilobytes being sent in this case,
so the whole performance discussion is rather moot.

>> The real problem could be if the process that was signaled to connect to
the message queue never handles the interrupt, and we keep waiting forever
in shm_mq_receive().  We could add a timeout parameter or just let the user
cancel the call: send a cancellation request, use pg_cancel_backend() or
set statement_timeout before running this.
>
> This is valid question - for begin we can use a statement_timeout and we
don't need to design some special (if you don't hold some important lock).
> My example (the code has prototype quality) is little bit longer, but it
work without global lock - the requester doesn't block any other

I'll update the commitfest patch to use this technique.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-08 Thread Shulgin, Oleksandr
On Tue, Sep 8, 2015 at 11:49 AM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> >> The real problem could be if the process that was signaled to connect
> to the message queue never handles the interrupt, and we keep waiting
> forever in shm_mq_receive().  We could add a timeout parameter or just let
> the user cancel the call: send a cancellation request, use
> pg_cancel_backend() or set statement_timeout before running this.
> >
> > This is valid question - for begin we can use a statement_timeout and we
> don't need to design some special (if you don't hold some important lock).
> > My example (the code has prototype quality) is little bit longer, but it
> work without global lock - the requester doesn't block any other
>
> I'll update the commitfest patch to use this technique.
>

Please find attached v4.

--
Alex
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 32ac58f..a2b757c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -139,6 +139,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, CmdStatusShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -243,6 +244,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	ReplicationOriginShmemInit();
 	WalSndShmemInit();
 	WalRcvShmemInit();
+	CmdStatusShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 0abde43..e637be1 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -26,6 +26,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/cmdstatus.h"
 
 
 /*
@@ -296,6 +297,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_CMD_STATUS_INFO))
+		HandleCmdStatusInfoInterrupt();
+
 	if (set_latch_on_sigusr1)
 		SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d917af3..1a5b03c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -67,6 +67,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/cmdstatus.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -2991,6 +2992,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (CmdStatusInfoPending)
+		ProcessCmdStatusInfoRequest();
 }
 
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 3ed0b44..2c8687c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -18,7 +18,7 @@ endif
 # keep this list arranged alphabetically or it gets to be a mess
 OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \
 	array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \
-	bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \
+	bool.o cash.o char.o cmdstatus.o date.o datetime.o datum.o dbsize.o domains.o \
 	encode.o enum.o expandeddatum.o \
 	float.o format_type.o formatting.o genfile.o \
 	geo_ops.o geo_selfuncs.o inet_cidr_ntop.o inet_net_pton.o int.o \
diff --git a/src/backend/utils/adt/cmdstatus.c b/src/backend/utils/adt/cmdstatus.c
new file mode 100644
index 000..c17de9d
--- /dev/null
+++ b/src/backend/utils/adt/cmdstatus.c
@@ -0,0 +1,647 @@
+/*-
+ *
+ * cmdstatus.c
+ *	  Definitions for pg_cmdstatus function.
+ *
+ * Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/cmdstatus.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/htup_details.h"
+#include "commands/explain.h"
+#include "lib/stringinfo.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/shm_mq.h"
+#include "storage/shm_toc.h"
+#include "tcop/dest.h"
+#include "tcop/pquery.h"
+#include "utils/builtins.h"
+#include "utils/cmdstatus.h"
+
+
+typedef enum {
+	CMD_STATUS_REQUEST_EXPLAIN = 1,
+	CMD_STATUS_REQUEST_QUERY_TEXT = 2,
+	CMD_STATUS_REQUEST_PROGRESS_TAG = 3,
+	CMD_STATUS_REQUEST_EXPLAIN_BACKTRACE = 4
+} CmdStatusInfoRequestType;
+
+#define CMD_STATUS_MAX_REQUEST CMD_STATUS_REQUEST_EXPLAIN_BACKTRACE
+
+typedef enum {
+	CMD_STATUS_RESULT_FAILURE = -1,
+	CMD_STATUS_RESULT_SUCCESS = 0,
+	CMD_STATUS_RESULT_BACKEND_IDLE,
+	

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-07 Thread Shulgin, Oleksandr
On Fri, Sep 4, 2015 at 6:11 AM, Pavel Stehule 
wrote:

> Sorry, but I still don't see how the slots help this issue - could you
>> please elaborate?
>>
> with slot (or some similiar) there is not global locked resource. If I'll
> have a time at weekend I'll try to write some prototype.
>

But you will still lock on the slots list to find an unused one.  How is
that substantially different from what I'm doing?

> >> Other smaller issues:
>> >>
>> >> * probably sending line by line is useless - shm_mq_send can pass
>> bigger data when nowait = false
>>
>> I'm not sending it like that because of the message size - I just find it
>> more convenient. If you think it can be problematic, its easy to do this as
>> before, by splitting lines on the receiving side.
>>
> Yes, shm queue sending data immediately - so slicing on sender generates
> more interprocess communication
>

Well, we are talking about hundreds to thousands bytes per plan in total.
And if my reading of shm_mq implementation is correct, if the message fits
into the shared memory buffer, the receiver gets the direct pointer to the
shared memory, no extra allocation/copy to process-local memory.  So this
can be actually a win.

> >> * pg_usleep(1000L); - it is related to single point resource
>>
>> But not a highly concurrent one.
>>
> I believe so it is not becessary - waiting (sleeping) can be deeper in
> reading from queue - the code will be cleaner
>

The only way I expect this line to be reached is when a concurrent
pg_cmdstatus() call is in progress: the receiving backend has set the
target_pid and has created the queue, released the lock and now waits to
read something from shm_mq.  So the backend that's trying to also use this
communication channel can obtain the lwlock, checks if the channel is not
used at the time, fails and then it needs to check again, but that's going
to put a load on the CPU, so there's a small sleep.

The real problem could be if the process that was signaled to connect to
the message queue never handles the interrupt, and we keep waiting forever
in shm_mq_receive().  We could add a timeout parameter or just let the user
cancel the call: send a cancellation request, use pg_cancel_backend() or
set statement_timeout before running this.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-07 Thread Pavel Stehule
2015-09-07 11:55 GMT+02:00 Shulgin, Oleksandr 
:

> On Fri, Sep 4, 2015 at 6:11 AM, Pavel Stehule 
> wrote:
>
>> Sorry, but I still don't see how the slots help this issue - could you
>>> please elaborate?
>>>
>> with slot (or some similiar) there is not global locked resource. If I'll
>> have a time at weekend I'll try to write some prototype.
>>
>
> But you will still lock on the slots list to find an unused one.  How is
> that substantially different from what I'm doing?
>

It is not necessary - you can use similar technique to what it does PGPROC.
I am sending "lock free" demo.

I don't afraid about locks - short locks, when the range and time are
limited. But there are lot of bugs, and fixes with the name "do
interruptible some", and it is reason, why I prefer typical design for work
with shared memory.


> >> Other smaller issues:
>>> >>
>>> >> * probably sending line by line is useless - shm_mq_send can pass
>>> bigger data when nowait = false
>>>
>>> I'm not sending it like that because of the message size - I just find
>>> it more convenient. If you think it can be problematic, its easy to do this
>>> as before, by splitting lines on the receiving side.
>>>
>> Yes, shm queue sending data immediately - so slicing on sender generates
>> more interprocess communication
>>
>
> Well, we are talking about hundreds to thousands bytes per plan in total.
> And if my reading of shm_mq implementation is correct, if the message fits
> into the shared memory buffer, the receiver gets the direct pointer to the
> shared memory, no extra allocation/copy to process-local memory.  So this
> can be actually a win.
>

you have to calculate with signals and interprocess communication. the cost
of memory allocation is not all.

> >> * pg_usleep(1000L); - it is related to single point resource
>>>
>>> But not a highly concurrent one.
>>>
>> I believe so it is not becessary - waiting (sleeping) can be deeper in
>> reading from queue - the code will be cleaner
>>
>
> The only way I expect this line to be reached is when a concurrent
> pg_cmdstatus() call is in progress: the receiving backend has set the
> target_pid and has created the queue, released the lock and now waits to
> read something from shm_mq.  So the backend that's trying to also use this
> communication channel can obtain the lwlock, checks if the channel is not
> used at the time, fails and then it needs to check again, but that's going
> to put a load on the CPU, so there's a small sleep.
>
> The real problem could be if the process that was signaled to connect to
> the message queue never handles the interrupt, and we keep waiting forever
> in shm_mq_receive().  We could add a timeout parameter or just let the user
> cancel the call: send a cancellation request, use pg_cancel_backend() or
> set statement_timeout before running this.
>

This is valid question - for begin we can use a statement_timeout and we
don't need to design some special (if you don't hold some important lock).

My example (the code has prototype quality) is little bit longer, but it
work without global lock - the requester doesn't block any other

Pavel


>
> --
> Alex
>
>
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
new file mode 100644
index 32ac58f..f5db55a
*** a/src/backend/storage/ipc/ipci.c
--- b/src/backend/storage/ipc/ipci.c
***
*** 43,48 
--- 43,49 
  #include "storage/procsignal.h"
  #include "storage/sinvaladt.h"
  #include "storage/spin.h"
+ #include "utils/signal_demo.h"
  
  
  shmem_startup_hook_type shmem_startup_hook = NULL;
*** CreateSharedMemoryAndSemaphores(bool mak
*** 139,144 
--- 140,146 
  		size = add_size(size, BTreeShmemSize());
  		size = add_size(size, SyncScanShmemSize());
  		size = add_size(size, AsyncShmemSize());
+ 		size = add_size(size, ProcPipeShmemSize());
  #ifdef EXEC_BACKEND
  		size = add_size(size, ShmemBackendArraySize());
  #endif
*** CreateSharedMemoryAndSemaphores(bool mak
*** 243,248 
--- 245,251 
  	ReplicationOriginShmemInit();
  	WalSndShmemInit();
  	WalRcvShmemInit();
+ 	ProcPipeShmemInit();
  
  	/*
  	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
new file mode 100644
index 0abde43..c28b44f
*** a/src/backend/storage/ipc/procsignal.c
--- b/src/backend/storage/ipc/procsignal.c
***
*** 27,32 
--- 27,33 
  #include "storage/sinval.h"
  #include "tcop/tcopprot.h"
  
+ #include "utils/signal_demo.h"
  
  /*
   * The SIGUSR1 signal is multiplexed to support signalling multiple event
*** bool		set_latch_on_sigusr1;
*** 69,75 
  
  static ProcSignalSlot *ProcSignalSlots = NULL;
  static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
! 
  static bool CheckProcSignal(ProcSignalReason reason);
  static void CleanupProcSignalState(int status, Datum arg);
  

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-03 Thread Shulgin, Oleksandr
On Wed, Sep 2, 2015 at 3:07 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule 
> wrote:
>>
>>
>>> Well, maybe I'm missing something, but sh_mq_create() will just
>>> overwrite the contents of the struct, so it doesn't care about
>>> sender/receiver: only sh_mq_set_sender/receiver() do.
>>>
>>
>> if you create sh_mq from scratch, then you can reuse structure.
>>
>
Please find attached a v3.

It uses a shared memory queue and also has the ability to capture plans
nested deeply in the call stack.  Not sure about using the executor hook,
since this is not an extension...

The LWLock is used around initializing/cleaning the shared struct and the
message queue, the IO synchronization is handled by the message queue
itself.  After some testing with concurrent pgbench and intentionally deep
recursive plpgsql functions (up to 700 plpgsql stack frames) I think this
approach can work.  Unless there's some theoretical problem I'm just not
aware of. :-)

Comments welcome!
--
Alex
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 0abde43..40db40d 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -26,6 +26,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/cmdstatus.h"
 
 
 /*
@@ -296,6 +297,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_CMDSTATUS_INFO))
+		HandleCmdStatusInfoInterrupt();
+
 	if (set_latch_on_sigusr1)
 		SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ce4bdaf..5d5df58 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -67,6 +67,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/cmdstatus.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -2991,6 +2992,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (CmdStatusInfoRequested)
+		ProcessCmdStatusInfoRequest();
 }
 
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 3ed0b44..2c8687c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -18,7 +18,7 @@ endif
 # keep this list arranged alphabetically or it gets to be a mess
 OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \
 	array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \
-	bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \
+	bool.o cash.o char.o cmdstatus.o date.o datetime.o datum.o dbsize.o domains.o \
 	encode.o enum.o expandeddatum.o \
 	float.o format_type.o formatting.o genfile.o \
 	geo_ops.o geo_selfuncs.o inet_cidr_ntop.o inet_net_pton.o int.o \
diff --git a/src/backend/utils/adt/cmdstatus.c b/src/backend/utils/adt/cmdstatus.c
new file mode 100644
index 000..5f31a2d
--- /dev/null
+++ b/src/backend/utils/adt/cmdstatus.c
@@ -0,0 +1,567 @@
+/*-
+ *
+ * cmdstatus.c
+ *	  Definitions for pg_cmdstatus function.
+ *
+ * Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/cmdstatus.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/htup_details.h"
+#include "commands/explain.h"
+#include "lib/stringinfo.h"
+#include "storage/latch.h"
+#include "storage/ipc.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/shm_mq.h"
+#include "tcop/dest.h"
+#include "tcop/pquery.h"
+#include "utils/builtins.h"
+#include "utils/cmdstatus.h"
+
+
+typedef enum {
+	CMD_STATUS_REQUEST_EXPLAIN = 1,
+	CMD_STATUS_REQUEST_QUERY_TEXT = 2,
+	CMD_STATUS_REQUEST_PROGRESS_TAG = 3,
+	CMD_STATUS_REQUEST_EXPLAIN_BACKTRACE = 4
+} CmdStatusInfoRequestType;
+
+#define CMD_STATUS_MAX_REQUEST CMD_STATUS_REQUEST_EXPLAIN_BACKTRACE
+
+typedef enum {
+	CMD_STATUS_RESULT_FAILURE = -1,
+	CMD_STATUS_RESULT_SUCCESS = 0,
+	CMD_STATUS_RESULT_BACKEND_IDLE,
+	CMD_STATUS_RESULT_NO_COMMAND_TAG
+} CmdStatusInfoResultCode;
+
+typedef struct {
+	LWLock	   *lock;
+	pid_t		target_pid;
+	pid_t		sender_pid;
+	CmdStatusInfoRequestType	request_type;
+	CmdStatusInfoResultCode		result_code;
+	char		buffer[FLEXIBLE_ARRAY_MEMBER];
+} CmdStatusInfo;
+
+#define BUFFER_SIZE			8192
+
+/*
+ * These structs are allocated on the program stack as local variables in the
+ * ExecutorRun hook.  The top of stack is current_query_stack, see below.
+ */
+typedef struct CmdInfoStack {
+	QueryDesc			   *query_desc;
+	struct CmdInfoStack	   *parent;
+} CmdInfoStack;
+
+
+bool CmdStatusInfoEnabled 

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-03 Thread Pavel Stehule
Hi




2015-09-03 18:30 GMT+02:00 Shulgin, Oleksandr 
:

> On Wed, Sep 2, 2015 at 3:07 PM, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule 
>> wrote:
>>>
>>>
 Well, maybe I'm missing something, but sh_mq_create() will just
 overwrite the contents of the struct, so it doesn't care about
 sender/receiver: only sh_mq_set_sender/receiver() do.

>>>
>>> if you create sh_mq from scratch, then you can reuse structure.
>>>
>>
> Please find attached a v3.
>
> It uses a shared memory queue and also has the ability to capture plans
> nested deeply in the call stack.  Not sure about using the executor hook,
> since this is not an extension...
>
> The LWLock is used around initializing/cleaning the shared struct and the
> message queue, the IO synchronization is handled by the message queue
> itself.  After some testing with concurrent pgbench and intentionally deep
> recursive plpgsql functions (up to 700 plpgsql stack frames) I think this
> approach can work.  Unless there's some theoretical problem I'm just not
> aware of. :-)
>
>
Comments welcome!
>

I am not pretty happy from this design. Only one EXPLAIN PID/GET STATUS in
one time can be executed per server - I remember lot of queries that
doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be
unfriendly. Cannot to say if it is good enough for first iteration. This is
functionality that can be used for diagnostic when you have overloaded
server and this risk looks too high (for me). The idea of receive slot can
to solve this risk well. The difference from this code should not be too
big - although it is not trivial - needs work with PGPROC.



Other smaller issues:

* probably sending line by line is useless - shm_mq_send can pass bigger
data when nowait = false
* pg_usleep(1000L); - it is related to single point resource

Some ideas:

* this code share some important parts with auto_explain (query stack) -
and because it should be in core (due handling signal if I remember well),
it can be first step of integration auto_explain to core.



> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-03 Thread Pavel Stehule
2015-09-03 22:06 GMT+02:00 Pavel Stehule :

> Hi
>
>
>
>
> 2015-09-03 18:30 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>> On Wed, Sep 2, 2015 at 3:07 PM, Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de> wrote:
>>
>>> On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule 
>>> wrote:


> Well, maybe I'm missing something, but sh_mq_create() will just
> overwrite the contents of the struct, so it doesn't care about
> sender/receiver: only sh_mq_set_sender/receiver() do.
>

 if you create sh_mq from scratch, then you can reuse structure.

>>>
>> Please find attached a v3.
>>
>> It uses a shared memory queue and also has the ability to capture plans
>> nested deeply in the call stack.  Not sure about using the executor hook,
>> since this is not an extension...
>>
>> The LWLock is used around initializing/cleaning the shared struct and the
>> message queue, the IO synchronization is handled by the message queue
>> itself.  After some testing with concurrent pgbench and intentionally deep
>> recursive plpgsql functions (up to 700 plpgsql stack frames) I think this
>> approach can work.  Unless there's some theoretical problem I'm just not
>> aware of. :-)
>>
>>
> Comments welcome!
>>
>
> I am not pretty happy from this design. Only one EXPLAIN PID/GET STATUS in
> one time can be executed per server - I remember lot of queries that
> doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be
> unfriendly. Cannot to say if it is good enough for first iteration. This is
> functionality that can be used for diagnostic when you have overloaded
> server and this risk looks too high (for me). The idea of receive slot can
> to solve this risk well (and can be used elsewhere). The difference from
> this code should not be too big - although it is not trivial - needs work
> with PGPROC. The opinion of our multiprocess experts can be interesting.
> Maybe I am too careful.
>



>
>
>
> Other smaller issues:
>
> * probably sending line by line is useless - shm_mq_send can pass bigger
> data when nowait = false
> * pg_usleep(1000L); - it is related to single point resource
>
> Some ideas:
>
> * this code share some important parts with auto_explain (query stack) -
> and because it should be in core (due handling signal if I remember well),
> it can be first step of integration auto_explain to core.
>
>
>
>> --
>> Alex
>>
>>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-03 Thread Shulgin, Oleksandr
On Sep 3, 2015 10:14 PM, "Pavel Stehule"  wrote:
>>>
>>> Please find attached a v3.
>>>
>>> It uses a shared memory queue and also has the ability to capture plans
nested deeply in the call stack.  Not sure about using the executor hook,
since this is not an extension...
>>>
>>> The LWLock is used around initializing/cleaning the shared struct and
the message queue, the IO synchronization is handled by the message queue
itself.
>>
>> I am not pretty happy from this design. Only one EXPLAIN PID/GET STATUS
in one time can be executed per server - I remember lot of queries that
doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be
unfriendly. Cannot to say if it is good enough for first iteration. This is
functionality that can be used for diagnostic when you have overloaded
server and this risk looks too high (for me). The idea of receive slot can
to solve this risk well (and can be used elsewhere). The difference from
this code should not be too big - although it is not trivial - needs work
with PGPROC. The opinion of our multiprocess experts can be interesting.
Maybe I am too careful.

Sorry, but I still don't see how the slots help this issue - could you
please elaborate?

>> Other smaller issues:
>>
>> * probably sending line by line is useless - shm_mq_send can pass bigger
data when nowait = false

I'm not sending it like that because of the message size - I just find it
more convenient. If you think it can be problematic, its easy to do this as
before, by splitting lines on the receiving side.

>> * pg_usleep(1000L); - it is related to single point resource

But not a highly concurrent one.

-
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-03 Thread Pavel Stehule
2015-09-04 5:50 GMT+02:00 Shulgin, Oleksandr :

> On Sep 3, 2015 10:14 PM, "Pavel Stehule"  wrote:
> >>>
> >>> Please find attached a v3.
> >>>
> >>> It uses a shared memory queue and also has the ability to capture
> plans nested deeply in the call stack.  Not sure about using the executor
> hook, since this is not an extension...
> >>>
> >>> The LWLock is used around initializing/cleaning the shared struct and
> the message queue, the IO synchronization is handled by the message queue
> itself.
> >>
> >> I am not pretty happy from this design. Only one EXPLAIN PID/GET STATUS
> in one time can be executed per server - I remember lot of queries that
> doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be
> unfriendly. Cannot to say if it is good enough for first iteration. This is
> functionality that can be used for diagnostic when you have overloaded
> server and this risk looks too high (for me). The idea of receive slot can
> to solve this risk well (and can be used elsewhere). The difference from
> this code should not be too big - although it is not trivial - needs work
> with PGPROC. The opinion of our multiprocess experts can be interesting.
> Maybe I am too careful.
>
> Sorry, but I still don't see how the slots help this issue - could you
> please elaborate?
>
with slot (or some similiar) there is not global locked resource. If I'll
have a time at weekend I'll try to write some prototype.


> >> Other smaller issues:
> >>
> >> * probably sending line by line is useless - shm_mq_send can pass
> bigger data when nowait = false
>
> I'm not sending it like that because of the message size - I just find it
> more convenient. If you think it can be problematic, its easy to do this as
> before, by splitting lines on the receiving side.
>
Yes, shm queue sending data immediately - so slicing on sender generates
more interprocess communication


> >> * pg_usleep(1000L); - it is related to single point resource
>
> But not a highly concurrent one.
>
I believe so it is not becessary - waiting (sleeping) can be deeper in
reading from queue - the code will be cleaner

Regard

Pavel


> -
> Alex
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-02 Thread Shulgin, Oleksandr
On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule 
wrote:

>
>> But do we really need the slots mechanism?  Would it not be OK to just
>> let the LWLock do the sequencing of concurrent requests?  Given that we
>> only going to use one message queue per cluster, there's not much
>> concurrency you can gain by introducing slots I believe.
>>
>
> I afraid of problems on production. When you have a queue related to any
> process, then all problems should be off after end of processes. One
> message queue per cluster needs restart cluster when some pathological
> problems are - and you cannot restart cluster in production week, sometimes
> weeks. The slots are more robust.
>

Yes, but in your implementation the slots themselves don't have a
queue/buffer.  Did you intend to have a message queue per slot?

What sort of pathological problems are you concerned of?  The communicating
backends should just detach from the message queue properly and have some
timeout configured to prevent deadlocks.  Other than that, I don't see how
having N slots really help the problem: in case of pathological problems
you will just deplete them all sooner or later.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-02 Thread Pavel Stehule
2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr 
:

> On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule 
> wrote:
>
>>
>>> But do we really need the slots mechanism?  Would it not be OK to just
>>> let the LWLock do the sequencing of concurrent requests?  Given that we
>>> only going to use one message queue per cluster, there's not much
>>> concurrency you can gain by introducing slots I believe.
>>>
>>
>> I afraid of problems on production. When you have a queue related to any
>> process, then all problems should be off after end of processes. One
>> message queue per cluster needs restart cluster when some pathological
>> problems are - and you cannot restart cluster in production week, sometimes
>> weeks. The slots are more robust.
>>
>
> Yes, but in your implementation the slots themselves don't have a
> queue/buffer.  Did you intend to have a message queue per slot?
>

The message queue cannot be reused, so I expect one slot per caller to be
used passing parameters, - message queue will be created/released by demand
by caller.


>
> What sort of pathological problems are you concerned of?  The
> communicating backends should just detach from the message queue properly
> and have some timeout configured to prevent deadlocks.  Other than that, I
> don't see how having N slots really help the problem: in case of
> pathological problems you will just deplete them all sooner or later.
>

I afraid of unexpected problems :) - any part of signal handling or
multiprocess communication is fragile. Slots are simple and simply attached
to any process without necessity to alloc/free some memory.

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-02 Thread Shulgin, Oleksandr
On Wed, Sep 2, 2015 at 11:16 AM, Pavel Stehule 
wrote:

>
>
> 2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>> On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule 
>> wrote:
>>
>>>
 But do we really need the slots mechanism?  Would it not be OK to just
 let the LWLock do the sequencing of concurrent requests?  Given that we
 only going to use one message queue per cluster, there's not much
 concurrency you can gain by introducing slots I believe.

>>>
>>> I afraid of problems on production. When you have a queue related to any
>>> process, then all problems should be off after end of processes. One
>>> message queue per cluster needs restart cluster when some pathological
>>> problems are - and you cannot restart cluster in production week, sometimes
>>> weeks. The slots are more robust.
>>>
>>
>> Yes, but in your implementation the slots themselves don't have a
>> queue/buffer.  Did you intend to have a message queue per slot?
>>
>
> The message queue cannot be reused, so I expect one slot per caller to be
> used passing parameters, - message queue will be created/released by demand
> by caller.
>

I don't believe a message queue cannot really be reused.  What would stop
us from calling shm_mq_create() on the queue struct again?

To give you an idea, in my current prototype I have only the following
struct:

typedef struct {
LWLock   *lock;
/*CmdStatusInfoSlot slots[CMDINFO_SLOTS];*/
pid_t target_pid;
pid_t sender_pid;
int request_type;
int result_code;
shm_mq buffer;
} CmdStatusInfo;

An instance of this is allocated on shared memory once, using BUFFER_SIZE
of 8k.

In pg_cmdstatus() I lock on the LWLock to check if target_pid is 0, then it
means nobody else is using this communication channel at the moment.  If
that's the case, I set the pids and request_type and initialize the mq
buffer.  Otherwise I just sleep and retry acquiring the lock (a timeout
should be added here probably).

What sort of pathological problems are you concerned of?  The communicating
>> backends should just detach from the message queue properly and have some
>> timeout configured to prevent deadlocks.  Other than that, I don't see how
>> having N slots really help the problem: in case of pathological problems
>> you will just deplete them all sooner or later.
>>
>
> I afraid of unexpected problems :) - any part of signal handling or
> multiprocess communication is fragile. Slots are simple and simply attached
> to any process without necessity to alloc/free some memory.
>

Yes, but do slots solve the actual problem?  If there is only one message
queue, you still have the same problem regardless of the number of slots
you decide to have.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-02 Thread Shulgin, Oleksandr
On Wed, Sep 2, 2015 at 2:56 PM, Pavel Stehule 
wrote:

>
>
> 2015-09-02 12:36 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>> On Wed, Sep 2, 2015 at 11:16 AM, Pavel Stehule 
>> wrote:
>>
>>>
>>>
>>> 2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <
>>> oleksandr.shul...@zalando.de>:
>>>
 On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule 
 wrote:

>
>> But do we really need the slots mechanism?  Would it not be OK to
>> just let the LWLock do the sequencing of concurrent requests?  Given that
>> we only going to use one message queue per cluster, there's not much
>> concurrency you can gain by introducing slots I believe.
>>
>
> I afraid of problems on production. When you have a queue related to
> any process, then all problems should be off after end of processes. One
> message queue per cluster needs restart cluster when some pathological
> problems are - and you cannot restart cluster in production week, 
> sometimes
> weeks. The slots are more robust.
>

 Yes, but in your implementation the slots themselves don't have a
 queue/buffer.  Did you intend to have a message queue per slot?

>>>
>>> The message queue cannot be reused, so I expect one slot per caller to
>>> be used passing parameters, - message queue will be created/released by
>>> demand by caller.
>>>
>>
>> I don't believe a message queue cannot really be reused.  What would stop
>> us from calling shm_mq_create() on the queue struct again?
>>
>
> you cannot to change recipient later
>

Well, maybe I'm missing something, but sh_mq_create() will just overwrite
the contents of the struct, so it doesn't care about sender/receiver: only
sh_mq_set_sender/receiver() do.


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-02 Thread Pavel Stehule
2015-09-02 15:00 GMT+02:00 Shulgin, Oleksandr 
:

> On Wed, Sep 2, 2015 at 2:56 PM, Pavel Stehule 
> wrote:
>
>>
>>
>> 2015-09-02 12:36 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>> On Wed, Sep 2, 2015 at 11:16 AM, Pavel Stehule 
>>> wrote:
>>>


 2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <
 oleksandr.shul...@zalando.de>:

> On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule  > wrote:
>
>>
>>> But do we really need the slots mechanism?  Would it not be OK to
>>> just let the LWLock do the sequencing of concurrent requests?  Given 
>>> that
>>> we only going to use one message queue per cluster, there's not much
>>> concurrency you can gain by introducing slots I believe.
>>>
>>
>> I afraid of problems on production. When you have a queue related to
>> any process, then all problems should be off after end of processes. One
>> message queue per cluster needs restart cluster when some pathological
>> problems are - and you cannot restart cluster in production week, 
>> sometimes
>> weeks. The slots are more robust.
>>
>
> Yes, but in your implementation the slots themselves don't have a
> queue/buffer.  Did you intend to have a message queue per slot?
>

 The message queue cannot be reused, so I expect one slot per caller to
 be used passing parameters, - message queue will be created/released by
 demand by caller.

>>>
>>> I don't believe a message queue cannot really be reused.  What would
>>> stop us from calling shm_mq_create() on the queue struct again?
>>>
>>
>> you cannot to change recipient later
>>
>
> Well, maybe I'm missing something, but sh_mq_create() will just overwrite
> the contents of the struct, so it doesn't care about sender/receiver: only
> sh_mq_set_sender/receiver() do.
>

if you create sh_mq from scratch, then you can reuse structure.

Pavel


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-02 Thread Shulgin, Oleksandr
On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule 
wrote:
>
>
>> Well, maybe I'm missing something, but sh_mq_create() will just overwrite
>> the contents of the struct, so it doesn't care about sender/receiver: only
>> sh_mq_set_sender/receiver() do.
>>
>
> if you create sh_mq from scratch, then you can reuse structure.
>

That was my plan.  I'm now close to actually compiling this stuff, we'll
see if it works in the simplest case at least. :-)


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-02 Thread Pavel Stehule
2015-09-02 12:36 GMT+02:00 Shulgin, Oleksandr 
:

> On Wed, Sep 2, 2015 at 11:16 AM, Pavel Stehule 
> wrote:
>
>>
>>
>> 2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>> On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule 
>>> wrote:
>>>

> But do we really need the slots mechanism?  Would it not be OK to just
> let the LWLock do the sequencing of concurrent requests?  Given that we
> only going to use one message queue per cluster, there's not much
> concurrency you can gain by introducing slots I believe.
>

 I afraid of problems on production. When you have a queue related to
 any process, then all problems should be off after end of processes. One
 message queue per cluster needs restart cluster when some pathological
 problems are - and you cannot restart cluster in production week, sometimes
 weeks. The slots are more robust.

>>>
>>> Yes, but in your implementation the slots themselves don't have a
>>> queue/buffer.  Did you intend to have a message queue per slot?
>>>
>>
>> The message queue cannot be reused, so I expect one slot per caller to be
>> used passing parameters, - message queue will be created/released by demand
>> by caller.
>>
>
> I don't believe a message queue cannot really be reused.  What would stop
> us from calling shm_mq_create() on the queue struct again?
>

you cannot to change recipient later


>
> To give you an idea, in my current prototype I have only the following
> struct:
>
> typedef struct {
> LWLock   *lock;
> /*CmdStatusInfoSlot slots[CMDINFO_SLOTS];*/
> pid_t target_pid;
> pid_t sender_pid;
> int request_type;
> int result_code;
> shm_mq buffer;
> } CmdStatusInfo;
>
> An instance of this is allocated on shared memory once, using BUFFER_SIZE
> of 8k.
>
> In pg_cmdstatus() I lock on the LWLock to check if target_pid is 0, then
> it means nobody else is using this communication channel at the moment.  If
> that's the case, I set the pids and request_type and initialize the mq
> buffer.  Otherwise I just sleep and retry acquiring the lock (a timeout
> should be added here probably).
>
> What sort of pathological problems are you concerned of?  The
>>> communicating backends should just detach from the message queue properly
>>> and have some timeout configured to prevent deadlocks.  Other than that, I
>>> don't see how having N slots really help the problem: in case of
>>> pathological problems you will just deplete them all sooner or later.
>>>
>>
>> I afraid of unexpected problems :) - any part of signal handling or
>> multiprocess communication is fragile. Slots are simple and simply attached
>> to any process without necessity to alloc/free some memory.
>>
>
> Yes, but do slots solve the actual problem?  If there is only one message
> queue, you still have the same problem regardless of the number of slots
> you decide to have.
>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-01 Thread Pavel Stehule
2015-09-01 17:20 GMT+02:00 Shulgin, Oleksandr 
:

> I'm not familiar with the shared memory handling, but could we not
>>> allocate just enough shared memory to fit the data we're going to write
>>> instead of the fixed 8k?  It's not that we cannot know the length of the
>>> resulting plan text in advance.
>>>
>>
>> the shared memory cannot be reused - (released) :(, so allocating enough
>> memory is not effective. More - in this moment it has not sense. Shared
>> memory queue can do almost all work.
>>
>
> A-ha, I've discovered the shared memory message queue facility and I see
> how we can use it.
>
> But do we really need the slots mechanism?  Would it not be OK to just let
> the LWLock do the sequencing of concurrent requests?  Given that we only
> going to use one message queue per cluster, there's not much concurrency
> you can gain by introducing slots I believe.
>

I afraid of problems on production. When you have a queue related to any
process, then all problems should be off after end of processes. One
message queue per cluster needs restart cluster when some pathological
problems are - and you cannot restart cluster in production week, sometimes
weeks. The slots are more robust.

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-01 Thread Shulgin, Oleksandr
>
> I'd say we should hide the so-designed pg_cmdstatus() interface behind
>> more friendly calls like pg_explain_backend() and pg_backend_progress() to
>> give some naming examples, to remove the need for magic numbers in the
>> second arg.
>>
>
> I had similar idea - this is good enough for start, but target interface
> iis based on integration with EXPLAIN statement
>
> some like EXPLAIN PROCESS or EXPLAIN PID or EXPLAIN VERBOSE PID ..
>

Yes, that's another way to do it.

the important functionality is drawing complete text of query - it was my
> original motivation, because I had not way how to get complete query before
> its finishing
>
> Probably the communication between processes should be more complex :( -
> the SHM queue should be used there, because some plans can be terrible long.
>
> The using shared write buffer (one for all) is too simply solution
> probably - good for prototype, but not good for core.
>
> I have a idea about communication:
>
> 1. caller prepare buffer, shm queue and signalize target process -
> parameter is pid od caller
> 2. target process fills a write buffer and close queue
> 3. caller show data and close buffer, close queue
>
> Now almost all code for communication is in upstream - the missing part is
> injection one end of queue to any process dynamicaly.
>

I'm not familiar with the shared memory handling, but could we not allocate
just enough shared memory to fit the data we're going to write instead of
the fixed 8k?  It's not that we cannot know the length of the resulting
plan text in advance.

I think we can remove buffer_is_free/buffer_holds_data and just use the
buffer != NULL to check if there's some data to be read (and buffer == NULL
to check if we can write).

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-01 Thread Pavel Stehule
2015-09-01 15:00 GMT+02:00 Shulgin, Oleksandr 
:

> I'd say we should hide the so-designed pg_cmdstatus() interface behind
>>> more friendly calls like pg_explain_backend() and pg_backend_progress() to
>>> give some naming examples, to remove the need for magic numbers in the
>>> second arg.
>>>
>>
>> I had similar idea - this is good enough for start, but target interface
>> iis based on integration with EXPLAIN statement
>>
>> some like EXPLAIN PROCESS or EXPLAIN PID or EXPLAIN VERBOSE PID ..
>>
>
> Yes, that's another way to do it.
>
> the important functionality is drawing complete text of query - it was my
>> original motivation, because I had not way how to get complete query before
>> its finishing
>>
>> Probably the communication between processes should be more complex :( -
>> the SHM queue should be used there, because some plans can be terrible long.
>>
>> The using shared write buffer (one for all) is too simply solution
>> probably - good for prototype, but not good for core.
>>
>> I have a idea about communication:
>>
>> 1. caller prepare buffer, shm queue and signalize target process -
>> parameter is pid od caller
>> 2. target process fills a write buffer and close queue
>> 3. caller show data and close buffer, close queue
>>
>> Now almost all code for communication is in upstream - the missing part
>> is injection one end of queue to any process dynamicaly.
>>
>
> I'm not familiar with the shared memory handling, but could we not
> allocate just enough shared memory to fit the data we're going to write
> instead of the fixed 8k?  It's not that we cannot know the length of the
> resulting plan text in advance.
>

the shared memory cannot be reused - (released) :(, so allocating enough
memory is not effective. More - in this moment it has not sense. Shared
memory queue can do almost all work.


>
> I think we can remove buffer_is_free/buffer_holds_data and just use the
> buffer != NULL to check if there's some data to be read (and buffer == NULL
> to check if we can write).
>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-01 Thread Shulgin, Oleksandr
>
> I'm not familiar with the shared memory handling, but could we not
>> allocate just enough shared memory to fit the data we're going to write
>> instead of the fixed 8k?  It's not that we cannot know the length of the
>> resulting plan text in advance.
>>
>
> the shared memory cannot be reused - (released) :(, so allocating enough
> memory is not effective. More - in this moment it has not sense. Shared
> memory queue can do almost all work.
>

A-ha, I've discovered the shared memory message queue facility and I see
how we can use it.

But do we really need the slots mechanism?  Would it not be OK to just let
the LWLock do the sequencing of concurrent requests?  Given that we only
going to use one message queue per cluster, there's not much concurrency
you can gain by introducing slots I believe.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-31 Thread Shulgin, Oleksandr
On Mon, Aug 31, 2015 at 12:35 PM, Pavel Stehule 
wrote:

>
>>>
>>> http://www.postgresql.org/message-id/cafj8praxcs9b8abgim-zauvggqdhpzoarz5ysp1_nhv9hp8...@mail.gmail.com
>>>
>>
>> Ah, thanks!  Somehow I've missed this mail.  You didn't add the patch to
>> a commitfest back then I think?
>>
>
> I had no time to finish this patch - there is few issues in signal
> handling and returning back result - but still I want it :) - and what I
> know - almost all other SQL db has similar functionality.
>

I've updated the patch for the current master and also added some
unexpected parameters handling, so attached is a v2.

I'd say we should hide the so-designed pg_cmdstatus() interface behind more
friendly calls like pg_explain_backend() and pg_backend_progress() to give
some naming examples, to remove the need for magic numbers in the second
arg.

What I've found missing in this approach is the insight into nested
executor runs, so that if you're running a "SELECT my_func()", you only see
this outer query in the pg_cmdstatus() output.  With the auto_explain
approach, by hooking into executor I was able to capture the nested queries
and their plans as well.

It's conceptually trivial to add some code to use the Executor hooks here,
but I don't see any precedent for this except for contrib modules
(auto_explain and pg_stat_statements), I'm just not sure if that would be
OK-ish.

And when we solve that, there is another problem of having a sane interface
to query the nested plans.  For a psql user, probably the most interesting
would be the topmost (level=1) and the innermost (e.g. level=-1) plans.  We
might also want to provide a full nesting of plans in a structured format
like JSON or... *cough* XML, for programs to consume and display nicely
with folding and stuff.

And the most interesting would be making instrumentation work with all of
the above.

I'm adding this to the next CF.

--
Alex
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 0abde43..40db40d 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -26,6 +26,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/cmdstatus.h"
 
 
 /*
@@ -296,6 +297,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_CMDSTATUS_INFO))
+		HandleCmdStatusInfoInterrupt();
+
 	if (set_latch_on_sigusr1)
 		SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ce4bdaf..5d5df58 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -67,6 +67,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/cmdstatus.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -2991,6 +2992,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (CmdStatusInfoRequested)
+		ProcessCmdStatusInfoRequest();
 }
 
 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 3ed0b44..2c8687c 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -18,7 +18,7 @@ endif
 # keep this list arranged alphabetically or it gets to be a mess
 OBJS = acl.o arrayfuncs.o array_expanded.o array_selfuncs.o \
 	array_typanalyze.o array_userfuncs.o arrayutils.o ascii.o \
-	bool.o cash.o char.o date.o datetime.o datum.o dbsize.o domains.o \
+	bool.o cash.o char.o cmdstatus.o date.o datetime.o datum.o dbsize.o domains.o \
 	encode.o enum.o expandeddatum.o \
 	float.o format_type.o formatting.o genfile.o \
 	geo_ops.o geo_selfuncs.o inet_cidr_ntop.o inet_net_pton.o int.o \
diff --git a/src/backend/utils/adt/cmdstatus.c b/src/backend/utils/adt/cmdstatus.c
new file mode 100644
index 000..38c1947
--- /dev/null
+++ b/src/backend/utils/adt/cmdstatus.c
@@ -0,0 +1,508 @@
+/*-
+ *
+ * cmdstatus.c
+ *	  Definitions for pg_cmdstatus function.
+ *
+ * Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/adt/cmdstatus.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "miscadmin.h"
+
+#include "access/htup_details.h"
+#include "commands/explain.h"
+#include "lib/stringinfo.h"
+#include "storage/latch.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
+#include "storage/shmem.h"
+#include "tcop/dest.h"
+#include "tcop/pquery.h"
+#include "utils/builtins.h"
+#include "utils/cmdstatus.h"
+
+
+#define CMDINFO_SLOTS		100
+#define BUFFER_SIZE			(8 * 1024)
+
+bool CmdStatusInfoRequested = false;
+
+typedef struct {
+	bool	is_valid;
+	bool	is_done;
+	int		

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-31 Thread Pavel Stehule
2015-08-31 11:30 GMT+02:00 Shulgin, Oleksandr 
:

> Do you still have the code somewhere around? Did it see production use?

>>> I sent it to mailing list year ago
>>
>>
>> http://www.postgresql.org/message-id/cafj8praxcs9b8abgim-zauvggqdhpzoarz5ysp1_nhv9hp8...@mail.gmail.com
>>
>
> Ah, thanks!  Somehow I've missed this mail.  You didn't add the patch to a
> commitfest back then I think?
>

I had no time to finish this patch - there is few issues in signal handling
and returning back result - but still I want it :) - and what I know -
almost all other SQL db has similar functionality.

>
> --
> Alex
>
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-31 Thread Shulgin, Oleksandr
>
> Do you still have the code somewhere around? Did it see production use?
>>>
>> I sent it to mailing list year ago
>
>
> http://www.postgresql.org/message-id/cafj8praxcs9b8abgim-zauvggqdhpzoarz5ysp1_nhv9hp8...@mail.gmail.com
>

Ah, thanks!  Somehow I've missed this mail.  You didn't add the patch to a
commitfest back then I think?

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-31 Thread Pavel Stehule
Hi

2015-08-31 19:09 GMT+02:00 Shulgin, Oleksandr 
:

> On Mon, Aug 31, 2015 at 12:35 PM, Pavel Stehule 
> wrote:
>
>>

 http://www.postgresql.org/message-id/cafj8praxcs9b8abgim-zauvggqdhpzoarz5ysp1_nhv9hp8...@mail.gmail.com

>>>
>>> Ah, thanks!  Somehow I've missed this mail.  You didn't add the patch to
>>> a commitfest back then I think?
>>>
>>
>> I had no time to finish this patch - there is few issues in signal
>> handling and returning back result - but still I want it :) - and what I
>> know - almost all other SQL db has similar functionality.
>>
>
> I've updated the patch for the current master and also added some
> unexpected parameters handling, so attached is a v2.
>

Thank you very much


>
> I'd say we should hide the so-designed pg_cmdstatus() interface behind
> more friendly calls like pg_explain_backend() and pg_backend_progress() to
> give some naming examples, to remove the need for magic numbers in the
> second arg.
>

I had similar idea - this is good enough for start, but target interface
iis based on integration with EXPLAIN statement

some like EXPLAIN PROCESS or EXPLAIN PID or EXPLAIN VERBOSE PID ..


>
> What I've found missing in this approach is the insight into nested
> executor runs, so that if you're running a "SELECT my_func()", you only see
> this outer query in the pg_cmdstatus() output.  With the auto_explain
> approach, by hooking into executor I was able to capture the nested queries
> and their plans as well.
>

I understand - originally I didn't think about nested queries, but it is
good idea and probably not a problem:

Not for XML and JSON where we can describe nesting simply

It is little bit harder for plain text - but we can use similar format that
is used for subplans or some like

top query:
  SELECT fx()

nested (1. level) query:
   SELECT 


>
> It's conceptually trivial to add some code to use the Executor hooks here,
> but I don't see any precedent for this except for contrib modules
> (auto_explain and pg_stat_statements), I'm just not sure if that would be
> OK-ish.
>
> And when we solve that, there is another problem of having a sane
> interface to query the nested plans.  For a psql user, probably the most
> interesting would be the topmost (level=1) and the innermost (e.g.
> level=-1) plans.  We might also want to provide a full nesting of plans in
> a structured format like JSON or... *cough* XML, for programs to consume
> and display nicely with folding and stuff.
>
> And the most interesting would be making instrumentation work with all of
> the above.
>

the important functionality is drawing complete text of query - it was my
original motivation, because I had not way how to get complete query before
its finishing

Probably the communication between processes should be more complex :( -
the SHM queue should be used there, because some plans can be terrible long.

The using shared write buffer (one for all) is too simply solution probably
- good for prototype, but not good for core.

I have a idea about communication:

1. caller prepare buffer, shm queue and signalize target process -
parameter is pid od caller
2. target process fills a write buffer and close queue
3. caller show data and close buffer, close queue

Now almost all code for communication is in upstream - the missing part is
injection one end of queue to any process dynamicaly.

Regards

Pavel

>
> I'm adding this to the next CF.
>
> --
> Alex
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-30 Thread Shulgin, Oleksandr
On Aug 29, 2015 7:31 PM, Pavel Stehule pavel.steh...@gmail.com wrote:



 2015-08-29 18:36 GMT+02:00 Andres Freund and...@anarazel.de:

 On 2015-08-29 18:27:59 +0200, Pavel Stehule wrote:
  2015-08-29 18:25 GMT+02:00 Shulgin, Oleksandr 
oleksandr.shul...@zalando.de
   Good point.  There's still hope to set a flag and process it later
on.
   Will have to check if it's possible to stay in the scope of a loaded
module
   though.

  I had a workable prototype - and It was implemented very similar as
  handling CANCEL

 Where did you put the handling of that kind of interrupt? Directly into
 ProcessInterrupts()?


 Probably. I don't remember it well, but it need hack code - it cannot be
used from extension.

Do you still have the code somewhere around? Did it see production use?

Thanks!
--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-30 Thread Pavel Stehule
2015-08-30 10:30 GMT+02:00 Shulgin, Oleksandr oleksandr.shul...@zalando.de
:

 On Aug 29, 2015 7:31 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 
 
 
  2015-08-29 18:36 GMT+02:00 Andres Freund and...@anarazel.de:
 
  On 2015-08-29 18:27:59 +0200, Pavel Stehule wrote:
   2015-08-29 18:25 GMT+02:00 Shulgin, Oleksandr 
 oleksandr.shul...@zalando.de
Good point.  There's still hope to set a flag and process it later
 on.
Will have to check if it's possible to stay in the scope of a
 loaded module
though.
 
   I had a workable prototype - and It was implemented very similar as
   handling CANCEL
 
  Where did you put the handling of that kind of interrupt? Directly into
  ProcessInterrupts()?
 
 
  Probably. I don't remember it well, but it need hack code - it cannot be
 used from extension.

 Do you still have the code somewhere around? Did it see production use?

I am not sure I am able to find it - I'll try. We didn't use it on
production.


 Thanks!
 --
 Alex



Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-30 Thread Pavel Stehule
2015-08-30 10:34 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:



 2015-08-30 10:30 GMT+02:00 Shulgin, Oleksandr 
 oleksandr.shul...@zalando.de:

 On Aug 29, 2015 7:31 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 
 
 
  2015-08-29 18:36 GMT+02:00 Andres Freund and...@anarazel.de:
 
  On 2015-08-29 18:27:59 +0200, Pavel Stehule wrote:
   2015-08-29 18:25 GMT+02:00 Shulgin, Oleksandr 
 oleksandr.shul...@zalando.de
Good point.  There's still hope to set a flag and process it later
 on.
Will have to check if it's possible to stay in the scope of a
 loaded module
though.
 
   I had a workable prototype - and It was implemented very similar as
   handling CANCEL
 
  Where did you put the handling of that kind of interrupt? Directly into
  ProcessInterrupts()?
 
 
  Probably. I don't remember it well, but it need hack code - it cannot
 be used from extension.

 Do you still have the code somewhere around? Did it see production use?

 I sent it to mailing list year ago

http://www.postgresql.org/message-id/cafj8praxcs9b8abgim-zauvggqdhpzoarz5ysp1_nhv9hp8...@mail.gmail.com

Regards

Pavel



 I am not sure I am able to find it - I'll try. We didn't use it on
 production.


 Thanks!
 --
 Alex





Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-29 Thread Shulgin, Oleksandr
On Sat, Aug 29, 2015 at 5:44 PM, Andres Freund and...@anarazel.de wrote:

 On 2015-08-29 17:33:22 +0200, Shulgin, Oleksandr wrote:
  Probably using SIGUSR2 would be more appropriate, but I'm not sure if
 there
  are other extensions out there that might be already using it for some
  other reason (well, I do not know that for SIGUSR1 either).  Looking at
 the
  current state of affairs in procsignal_sigusr1_handler() makes me believe
  it should be pretty safe to catch the signal like I do.  Or is that not
 the
  case?

 You can catch signals, but you're not allowed to do a lot from
 them. Anything allocating memory, acquiring locks, etc. is out - these
 functions aren't reentrant.  If you can guarantee that you're not
 interrupting any relevant code you can bend those rules, but that's
 obviously not the case here.

 Check out the list of async-signal-safe functions at
 http://man7.org/linux/man-pages/man7/signal.7.html


Good point.  There's still hope to set a flag and process it later on.
Will have to check if it's possible to stay in the scope of a loaded module
though.


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-08-29 Thread Andres Freund
On 2015-08-29 18:27:59 +0200, Pavel Stehule wrote:
 2015-08-29 18:25 GMT+02:00 Shulgin, Oleksandr oleksandr.shul...@zalando.de
  Good point.  There's still hope to set a flag and process it later on.
  Will have to check if it's possible to stay in the scope of a loaded module
  though.

 I had a workable prototype - and It was implemented very similar as
 handling CANCEL

Where did you put the handling of that kind of interrupt? Directly into
ProcessInterrupts()?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >