Re: Adding column "mem_usage" to view pg_prepared_statements

2019-09-03 Thread Alvaro Herrera
Hello Daniel,

This patch no longer applies.  Please submit an updated version.  Also,
there's voluminous discussion that doesn't seem to have resulted in any
revision of the code.  Please fix that too.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Adding column "mem_usage" to view pg_prepared_statements

2019-08-06 Thread Konstantin Knizhnik




On 05.08.2019 22:35, Daniel Migowski wrote:

.
I think that including in pg_prepared_statements information about 
memory used this statement is very useful.
CachedPlanMemoryUsage function may be useful not only for this view, 
but for example it is also need in my autoprepare patch.
I would love to use your work if it's done, and would also love to 
work together here. I am quite novice in C thought, I might take my 
time to get things right.


Right now I resused your implementation of CachedPlanMemoryUsage function:)
Before I took in account only memory used by plan->context, but not of 
plan->query_context and plan->gplan->context (although query_context for 
raw parse tree seems to be much smaller).



I wonder if you consider go further and not only report but control 
memory used by prepared statements?
For example implement some LRU replacement discipline on top of 
prepared statements cache which can

evict rarely used prepared statements to avoid memory overflow.


THIS! Having some kind of safety net here would finally make sure that 
my precious processes will not grow endlessly until all mem is eaten 
up, even with prep statement count limits.


While working on stuff I noticed there are three things stored in a 
CachedPlanSource. The raw query tree (a relatively small thing), the 
query list (analyzed-and-rewritten query tree) which takes up the most 
memory (at least here, maybe different with your usecases), and (often 
after the 6th call) the CachedPlan, which is more optimized that the 
query list and often needs less memory (half of the query list here).


The query list seems to take the most time to create here, because I 
hit the GEQO engine here, but it could be recreated easily (up to 
500ms for some queries). Creating the CachedPlan afterwards takes 60ms 
in some usecase. IF we could just invalidate them from time to time, 
that would be grate.


Also, invalidating just the queries or the CachedPlan would not 
invalidate the whole prepared statement, which would break clients 
expectations, but just make them a slower, adding much to the 
stability of the system. I would pay that price, because I just don't 
use manually named prepared statements anyway and just autogenerate 
them as performance sugar without thinking about what really needs to 
be prepared anyway. There is an option in the JDBC driver to use 
prepared statements automatically after you have used them a few time.


I have noticed that cached plans for implicitly prepared statements in 
stored procedures are not shown in pg_prepared_statements view.
It may be not a problem in your case (if you are accessing Postgres 
through  JDBC and not using prepared statements),
but can cause memory overflow in applications actively using stored 
procedures, because unlike explicitly created prepared statements, it is 
very difficult

to estimate and control statements implicitly prepared by plpgsql.

I am not sure what will be the best solution in this case. Adding yet 
another view for implicitly prepared statements? Or include them in 
pg_prepared_statements view?


We have such patch for PgPro-EE but it limits only number of prepared 
statement, not taken in account amount of memory used by them.
I think that memory based limit will be more accurate (although it 
adds more overhead).


Limiting them by number is already done automatically here and would 
really not be of much value, but having a mem limit would be great. We 
could have a combined memory limit for your autoprepared statements as 
well as the manually prepared ones, so clients can know for sure the 
server processes won't eat up more that e.g. 800MB for prepared 
statements. And also I would like to have this value spread across all 
client processes, e.g. specifying max_prepared_statement_total_mem=5G 
for the server, and maybe max_prepared_statement_mem=1G for client 
processes. Of course we would have to implement cross client process 
invalidation here, and I don't know if communicating client processes 
are even intended.


Anyway, a memory limit won't really add that much more overhead. At 
least not more than having no prepared statements at all because of 
the fear of server OOMs, or have just a small count of those 
statements. I was even think about a prepared statement reaper that 
checks the pg_prepared_statements every few minutes to clean things up 
manually, but having this in the server would be of great value to me.


Right now memory context has no field containing amount of currently 
used memory.
This is why context->methods->stats function implementation has to 
traverse all blocks to calculate size of memory used by context.
It may be not so fast for large contexts. But I do not expect that 
contexts of prepared statements will be very large, although
I have deal with customers which issued queries with query text length 
larger than few megabytes. I afraid to estimate size of plan for such 
queries.
This is the reason of my concern 

Re: Adding column "mem_usage" to view pg_prepared_statements

2019-08-05 Thread Andres Freund
Hi,

On 2019-08-05 22:46:47 +0200, Daniel Migowski wrote:
> > Arguably the proposed owning_object field would be a bit redundant with
> > the already existing ident/MemoryContextSetIdentifier field, which
> > e.g. already associates the query string with the contexts used for a
> > prepared statement. But I'm not convinced that's going to be enough
> > context in a lot of cases, because e.g. for prepared statements it could
> > be interesting to have access to both the prepared statement name, and
> > the statement.
> The identifier seems to be more like a category at the moment, because it
> does not seem to hold any relevant information about the object in question.
> So a more specific name would be nice.

I think you might be thinking of the context's name, not ident? E.g. for
prepared statements the context name is:

source_context = AllocSetContextCreate(CurrentMemoryContext,

   "CachedPlanSource",

   ALLOCSET_START_SMALL_SIZES);

which is obviously the same for every statement. But then there's

MemoryContextSetIdentifier(source_context, plansource->query_string);

which obviously differs.


> > The reason I like something like this is that we wouldn't add new
> > columns to a number of views, and lack views to associate such
> > information to for some objects. And it'd be disproportional to add all
> > the information to numerous places anyway.
> I understand your argumentation, but things like Cursors and Portals are
> rather short living while prepared statements seem to be the place where
> memory really builds up.

That's not necessarily true, especially given WITH HOLD cursors.  Nor
does one only run out of memory in the context of long-lived objects.


> > > While being interesting I still believe monitoring the mem usage of
> > > prepared statements is a bit more important than that of other objects
> > > because of how they change memory consumption of the server without
> > > using any DDL or configuration options and I am not aware of other
> > > objects with the same properties, or are there some? And for the other
> > > volatile objects like tables and indexes and their contents PostgreSQL
> > > already has it's information functions.

> > Plenty other objects have that property. E.g. cursors. And for the
> > catalog/relation/... caches it's even more pernicious - the client might
> > have closed all its "handles", but we still use memory (and it's
> > absolutely crucial for performance).
> 
> Maybe we can do both? Add a single column to pg_prepared_statements, and add
> another table for the output of MemoryContextStatsDetail? This has the
> advantage that the single real memory indicator useful for end users (to the
> question: How much mem takes my sh*t up?) is in pg_prepared_statements and
> some more intrinsic information in a detail view.

I don't see why we'd want to do both. Just makes pg_prepared_statements
a considerably more expensive. And that's used by some applications /
clients in an automated manner.


> Thinking about the latter I am against such a table, at least in the form
> where it gives information like context_total_freechunks, because it would
> just be useful for us developers.

Developers are also an audience for us. I mean we certainly can use this
information during development. But even for bugreports such information
would be useufl.


> Why should any end user care for how many
> chunks are still open in a MemoryContext, except when he is working on
> C-style extensions. Could just be a source of confusion for them.

Meh. As long as the crucial stuff is first, that's imo enough.


> Let's think about the goal this should have: The end user should be able to
> monitor the memory consumption of things he's in control of or could affect
> the system performance. Should such a table automatically aggregate some
> information? I think so. I would not add more than two memory columns to the
> view, just mem_used and mem_reserved. And even mem_used is questionable,
> because in his eyes only the memory he cannot use for other stuff because of
> object x is important for him (that was the reason I just added one column).
> He would even ask: WHY is there 50% more memory reserved than used, and how
> I can optimize it? (Would lead to more curious PostgreSQL developers maybe,
> so that's maybe a plus).

It's important because it influences how memory usage will grow.


> On the other hand: The Generic Plan had been created for the first
> invocation of the prepared statement, why not store it immediatly. It is a
> named statement for a reason that it is intended to be reused, even when it
> is just twice, and since memory seems not to be seen as a scarce resource in
> this context why not store that immediately. Would drop the need for a
> hierarchy here also.

Well, we'll maybe never use 

Re: Adding column "mem_usage" to view pg_prepared_statements

2019-08-05 Thread Daniel Migowski

Am 05.08.2019 um 19:16 schrieb Andres Freund:

On 2019-07-28 06:20:40 +, Daniel Migowski wrote:

how do you want to generalize it? Are you thinking about a view solely
for the display of the memory usage of different objects?

I'm not quite sure. I'm just not sure that adding separate
infrastructure for various objects is a sutainable approach. We'd likely
want to have this for prepared statements, for cursors, for the current
statement, for various caches, ...

I think an approach would be to add an 'owning_object' field to memory
contexts, which has to point to a Node type if set. A table returning reporting
function could recursively walk through the memory contexts, starting at
TopMemoryContext. Whenever it encounters a context with owning_object
set, it prints a string version of nodeTag(owning_object). For some node
types it knows about (e.g. PreparedStatement, Portal, perhaps some of
the caches), it prints additional metadata specific to the type (so for
prepared statement it'd be something like 'prepared statement', '[name
of prepared statement]'), and prints information about the associated
context and all its children.
I understand. So it would be something like the output of 
MemoryContextStatsInternal, but in table form with some extra columns. I 
would have loved this extra information already in 
MemoryContextStatsInternal btw., so it might be a good idea to upgrade 
it first to find the information and wrap a table function over it 
afterwards.

The general context information probably should be something like:
context_name, context_ident,
context_total_bytes, context_total_blocks, context_total_freespace, 
context_total_freechunks, context_total_used, context_total_children
context_self_bytes, context_self_blocks, context_self_freespace, 
context_self_freechunks, context_self_used, context_self_children,

It might make sense to have said function return a row for the contexts
it encounters that do not have an owner set too (that way we'd e.g. get
CacheMemoryContext handled), but then still recurse.
A nice way to learn about the internals of the server and to analyze the 
effects of memory reducing enhancements.

Arguably the proposed owning_object field would be a bit redundant with
the already existing ident/MemoryContextSetIdentifier field, which
e.g. already associates the query string with the contexts used for a
prepared statement. But I'm not convinced that's going to be enough
context in a lot of cases, because e.g. for prepared statements it could
be interesting to have access to both the prepared statement name, and
the statement.
The identifier seems to be more like a category at the moment, because 
it does not seem to hold any relevant information about the object in 
question. So a more specific name would be nice.

The reason I like something like this is that we wouldn't add new
columns to a number of views, and lack views to associate such
information to for some objects. And it'd be disproportional to add all
the information to numerous places anyway.
I understand your argumentation, but things like Cursors and Portals are 
rather short living while prepared statements seem to be the place where 
memory really builds up.

One counter-argument is that it'd be more expensive to get information
specific to prepared statements (or other object types) that way. I'm
not sure I buy that that's a problem - this isn't something that's
likely going to be used at a high frequency. But if it becomes a
problem, we can add a function that starts that process at a distinct
memory context (e.g. a function that does this just for a single
prepared statement, identified by name) - but I'd not start there.
I also see no problem here, and with Konstantin Knizhnik's autoprepare I 
wouldn't use this very often anyway, more just for monitoring purposes, 
where I don't care if my query is a bit more complex.

While being interesting I still believe monitoring the mem usage of
prepared statements is a bit more important than that of other objects
because of how they change memory consumption of the server without
using any DDL or configuration options and I am not aware of other
objects with the same properties, or are there some? And for the other
volatile objects like tables and indexes and their contents PostgreSQL
already has it's information functions.

Plenty other objects have that property. E.g. cursors. And for the
catalog/relation/... caches it's even more pernicious - the client might
have closed all its "handles", but we still use memory (and it's
absolutely crucial for performance).


Maybe we can do both? Add a single column to pg_prepared_statements, and 
add another table for the output of MemoryContextStatsDetail? This has 
the advantage that the single real memory indicator useful for end users 
(to the question: How much mem takes my sh*t up?) is in 
pg_prepared_statements and some more intrinsic information in a detail 
view.


Thinking about the latter I am against such a 

Re: Adding column "mem_usage" to view pg_prepared_statements

2019-08-05 Thread Daniel Migowski

Am 05.08.2019 um 18:30 schrieb Konstantin Knizhnik:

On 31.07.2019 1:38, Daniel Migowski wrote:

Am 31.07.2019 um 00:29 schrieb Tom Lane:

Daniel Migowski  writes:
Ok, just have read about the commitfest thing. Is the patch OK for 
that? Or is there generally no love for a mem_usage column here? If 
it was, I really would add some memory monitoring in our app 
regarding this.

You should certainly put it into the next commitfest.  We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it.  (The CF app is really a to-do
list for patches ...)

OK, added it there. Thanks for your patience :).
The patch is not applied to the most recent source because extra 
parameter was added to CreateTemplateTupleDesc method.

Please rebase it - the fix is trivial.

OK, will do.
I think that including in pg_prepared_statements information about 
memory used this statement is very useful.
CachedPlanMemoryUsage function may be useful not only for this view, 
but for example it is also need in my autoprepare patch.
I would love to use your work if it's done, and would also love to work 
together here. I am quite novice in C thought, I might take my time to 
get things right.
I wonder if you consider go further and not only report but control 
memory used by prepared statements?
For example implement some LRU replacement discipline on top of 
prepared statements cache which can

evict rarely used prepared statements to avoid memory overflow.


THIS! Having some kind of safety net here would finally make sure that 
my precious processes will not grow endlessly until all mem is eaten up, 
even with prep statement count limits.


While working on stuff I noticed there are three things stored in a 
CachedPlanSource. The raw query tree (a relatively small thing), the 
query list (analyzed-and-rewritten query tree) which takes up the most 
memory (at least here, maybe different with your usecases), and (often 
after the 6th call) the CachedPlan, which is more optimized that the 
query list and often needs less memory (half of the query list here).


The query list seems to take the most time to create here, because I hit 
the GEQO engine here, but it could be recreated easily (up to 500ms for 
some queries). Creating the CachedPlan afterwards takes 60ms in some 
usecase. IF we could just invalidate them from time to time, that would 
be grate.


Also, invalidating just the queries or the CachedPlan would not 
invalidate the whole prepared statement, which would break clients 
expectations, but just make them a slower, adding much to the stability 
of the system. I would pay that price, because I just don't use manually 
named prepared statements anyway and just autogenerate them as 
performance sugar without thinking about what really needs to be 
prepared anyway. There is an option in the JDBC driver to use prepared 
statements automatically after you have used them a few time.


We have such patch for PgPro-EE but it limits only number of prepared 
statement, not taken in account amount of memory used by them.
I think that memory based limit will be more accurate (although it 
adds more overhead).


Limiting them by number is already done automatically here and would 
really not be of much value, but having a mem limit would be great. We 
could have a combined memory limit for your autoprepared statements as 
well as the manually prepared ones, so clients can know for sure the 
server processes won't eat up more that e.g. 800MB for prepared 
statements. And also I would like to have this value spread across all 
client processes, e.g. specifying max_prepared_statement_total_mem=5G 
for the server, and maybe max_prepared_statement_mem=1G for client 
processes. Of course we would have to implement cross client process 
invalidation here, and I don't know if communicating client processes 
are even intended.


Anyway, a memory limit won't really add that much more overhead. At 
least not more than having no prepared statements at all because of the 
fear of server OOMs, or have just a small count of those statements. I 
was even think about a prepared statement reaper that checks the 
pg_prepared_statements every few minutes to clean things up manually, 
but having this in the server would be of great value to me.



If you want, I can be reviewer of your patch.


I'd love to have you as my reviewer.

Regards,
Daniel Migowski





Re: Adding column "mem_usage" to view pg_prepared_statements

2019-08-05 Thread Andres Freund
Hi,

On 2019-07-28 06:20:40 +, Daniel Migowski wrote:
> how do you want to generalize it? Are you thinking about a view solely
> for the display of the memory usage of different objects?

I'm not quite sure. I'm just not sure that adding separate
infrastructure for various objects is a sutainable approach. We'd likely
want to have this for prepared statements, for cursors, for the current
statement, for various caches, ...

I think an approach would be to add an 'owning_object' field to memory
contexts, which has to point to a Node type if set. A table returning reporting
function could recursively walk through the memory contexts, starting at
TopMemoryContext. Whenever it encounters a context with owning_object
set, it prints a string version of nodeTag(owning_object). For some node
types it knows about (e.g. PreparedStatement, Portal, perhaps some of
the caches), it prints additional metadata specific to the type (so for
prepared statement it'd be something like 'prepared statement', '[name
of prepared statement]'), and prints information about the associated
context and all its children.

The general context information probably should be something like:
context_name, context_ident,
context_total_bytes, context_total_blocks, context_total_freespace, 
context_total_freechunks, context_total_used, context_total_children
context_self_bytes, context_self_blocks, context_self_freespace, 
context_self_freechunks, context_self_used, context_self_children,

It might make sense to have said function return a row for the contexts
it encounters that do not have an owner set too (that way we'd e.g. get
CacheMemoryContext handled), but then still recurse.


Arguably the proposed owning_object field would be a bit redundant with
the already existing ident/MemoryContextSetIdentifier field, which
e.g. already associates the query string with the contexts used for a
prepared statement. But I'm not convinced that's going to be enough
context in a lot of cases, because e.g. for prepared statements it could
be interesting to have access to both the prepared statement name, and
the statement.

The reason I like something like this is that we wouldn't add new
columns to a number of views, and lack views to associate such
information to for some objects. And it'd be disproportional to add all
the information to numerous places anyway.

One counter-argument is that it'd be more expensive to get information
specific to prepared statements (or other object types) that way. I'm
not sure I buy that that's a problem - this isn't something that's
likely going to be used at a high frequency. But if it becomes a
problem, we can add a function that starts that process at a distinct
memory context (e.g. a function that does this just for a single
prepared statement, identified by name) - but I'd not start there.


> While being interesting I still believe monitoring the mem usage of
> prepared statements is a bit more important than that of other objects
> because of how they change memory consumption of the server without
> using any DDL or configuration options and I am not aware of other
> objects with the same properties, or are there some? And for the other
> volatile objects like tables and indexes and their contents PostgreSQL
> already has it's information functions.

Plenty other objects have that property. E.g. cursors. And for the
catalog/relation/... caches it's even more pernicious - the client might
have closed all its "handles", but we still use memory (and it's
absolutely crucial for performance).




Greetings,

Andres Freund




Re: Adding column "mem_usage" to view pg_prepared_statements

2019-08-05 Thread Konstantin Knizhnik




On 31.07.2019 1:38, Daniel Migowski wrote:


Am 31.07.2019 um 00:29 schrieb Tom Lane:

Daniel Migowski  writes:
Ok, just have read about the commitfest thing. Is the patch OK for 
that? Or is there generally no love for a mem_usage column here? If 
it was, I really would add some memory monitoring in our app 
regarding this.

You should certainly put it into the next commitfest.  We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it.  (The CF app is really a to-do
list for patches ...)


OK, added it there. Thanks for your patience :).

Regards,
Daniel Migowski



The patch is not applied to the most recent source because extra 
parameter was added to CreateTemplateTupleDesc method.

Please rebase it - the fix is trivial.

I think that including in pg_prepared_statements information about 
memory used this statement is very useful.
CachedPlanMemoryUsage function may be useful not only for this view, but 
for example it is also need in my autoprepare patch.


I wonder if you consider go further and not only report but control 
memory used by prepared statements?
For example implement some LRU replacement discipline on top of prepared 
statements cache which can

evict rarely used prepared statements to avoid memory overflow.
We have such patch for PgPro-EE but it limits only number of prepared 
statement, not taken in account amount of memory used by them.
I think that memory based limit will be more accurate (although it adds 
more overhead).


If you want, I can be reviewer of your patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread David Fetter
On Tue, Jul 30, 2019 at 10:01:09PM +, Daniel Migowski wrote:
> Hello,
> 
> Will my patch be considered for 12.0? The calculation of the
> mem_usage value might be improved later on but because the system
> catalog is changed I would love to add it before 12.0 becomes
> stable. 

Feature freeze was April 8, so no.

https://www.mail-archive.com/pgsql-hackers@lists.postgresql.org/msg37059.html

You're in plenty of time for 13, though!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Daniel Migowski



Am 31.07.2019 um 00:29 schrieb Tom Lane:

Daniel Migowski  writes:

Ok, just have read about the commitfest thing. Is the patch OK for that? Or is 
there generally no love for a mem_usage column here? If it was, I really would 
add some memory monitoring in our app regarding this.

You should certainly put it into the next commitfest.  We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it.  (The CF app is really a to-do
list for patches ...)


OK, added it there. Thanks for your patience :).

Regards,
Daniel Migowski





Re: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Daniel Migowski

Am 31.07.2019 um 00:17 schrieb Tomas Vondra:

FWIW not sure what mail client you're using, but it seems to be breaking
the threads for some reason, splitting it into two - see [1].

Also, please stop top-posting. It makes it way harder to follow the
discussion.


Was using Outlook because it's my companies mail client but switching to 
Thunderbird now for communication with the list, trying to be a good 
citizen now.


Regards,
Daniel Migowski





Re: AW: AW: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Tom Lane
Daniel Migowski  writes:
> Ok, just have read about the commitfest thing. Is the patch OK for that? Or 
> is there generally no love for a mem_usage column here? If it was, I really 
> would add some memory monitoring in our app regarding this.

You should certainly put it into the next commitfest.  We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it.  (The CF app is really a to-do
list for patches ...)

regards, tom lane




Re: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Tomas Vondra

On Tue, Jul 30, 2019 at 10:01:09PM +, Daniel Migowski wrote:

Hello,

Will my patch be considered for 12.0? The calculation of the mem_usage
value might be improved later on but because the system catalog is
changed I would love to add it before 12.0 becomes stable.



Nope. Code freeze for PG12 data was April 7, 2019. You're a bit late for
that, unfortunately. We're in PG13 land now.

FWIW not sure what mail client you're using, but it seems to be breaking
the threads for some reason, splitting it into two - see [1].

Also, please stop top-posting. It makes it way harder to follow the
discussion.

[1] 
https://www.postgresql.org/message-id/flat/41ED3F5450C90F4D8381BC4D8DF6BBDCF02E10B4%40EXCHANGESERVER.ikoffice.de


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





AW: AW: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Daniel Migowski
Ok, just have read about the commitfest thing. Is the patch OK for that? Or is 
there generally no love for a mem_usage column here? If it was, I really would 
add some memory monitoring in our app regarding this.

-Ursprüngliche Nachricht-
Von: Tom Lane  
Gesendet: Mittwoch, 31. Juli 2019 00:09
An: Daniel Migowski 
Cc: Andres Freund ; pgsql-hackers@lists.postgresql.org
Betreff: Re: AW: Adding column "mem_usage" to view pg_prepared_statements

Daniel Migowski  writes:
> Will my patch be considered for 12.0? The calculation of the mem_usage value 
> might be improved later on but because the system catalog is changed I would 
> love to add it before 12.0 becomes stable. 

v12 has been feature-frozen for months, and it's pretty hard to paint this as a 
bug fix, so I'd say the answer is "no".

regards, tom lane




Re: AW: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Tom Lane
Daniel Migowski  writes:
> Will my patch be considered for 12.0? The calculation of the mem_usage value 
> might be improved later on but because the system catalog is changed I would 
> love to add it before 12.0 becomes stable. 

v12 has been feature-frozen for months, and it's pretty hard to paint
this as a bug fix, so I'd say the answer is "no".

regards, tom lane




AW: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Daniel Migowski
Hello,

Will my patch be considered for 12.0? The calculation of the mem_usage value 
might be improved later on but because the system catalog is changed I would 
love to add it before 12.0 becomes stable. 

Regards,
Daniel Migowski

-Ursprüngliche Nachricht-
Von: Daniel Migowski  
Gesendet: Sonntag, 28. Juli 2019 08:21
An: Andres Freund 
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hello Andres,

how do you want to generalize it? Are you thinking about a view solely for the 
display of the memory usage of different objects? Like functions or views (that 
also have a plan associated with it, when I think about it)? While being 
interesting I still believe monitoring the mem usage of prepared statements is 
a bit more important than that of other objects because of how they change 
memory consumption of the server without using any DDL or configuration options 
and I am not aware of other objects with the same properties, or are there 
some? And for the other volatile objects like tables and indexes and their 
contents PostgreSQL already has it's information functions. 

Regardless of that here is the patch for now. I didn't want to fiddle to much 
with MemoryContexts yet, so it still doesn't recurse in child contexts, but I 
will change that also when I try to build a more compact MemoryContext 
implementation and see how that works out.

Thanks for pointing out the relevant information in the statement column of the 
view.

Regards,
Daniel Migowski

-Ursprüngliche Nachricht-
Von: Andres Freund 
Gesendet: Samstag, 27. Juli 2019 21:12
An: Daniel Migowski 
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-07-27 18:29:23 +, Daniel Migowski wrote:
> I just implemented a small change that adds another column "mem_usage"
> to the system view "pg_prepared_statements". It returns the memory 
> usage total of CachedPlanSource.context, 
> CachedPlanSource.query_content and if available 
> CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the patch, even 
if it's just POC, as that gives a better handle on how much additional 
complexity it introduces.

I think this could be a useful feature. I'm not so sure we want it tied to just 
cached statements however - perhaps we ought to generalize it a bit more.


Regarding the prepared statements specific considerations: I don't think we 
ought to explicitly reference CachedPlanSource.query_content, and 
CachedPlanSource.gplan.context.

In the case of actual prepared statements (rather than oneshot plans) 
CachedPlanSource.query_context IIRC should live under CachedPlanSource.context. 
 I think there's no relevant cases where gplan.context isn't a child of 
CachedPlanSource.context either, but not quite sure.

Then we ought to just include child contexts in the memory computation (cf. 
logic in MemoryContextStatsInternal(), although you obviously wouldn't need all 
that). That way, if the cached statements has child contexts, we're going to 
stay accurate.


> Also I wonder why the "prepare test as" is part of the statement 
> column. I isn't even part of the real statement that is prepared as 
> far as I would assume. Would prefer to just have the "select *..." in 
> that column.

It's the statement that was executed. Note that you'll not see that in the case 
of protocol level prepared statements.  It will sometimes include relevant 
information, e.g. about the types specified as part of the prepare (as in 
PREPARE foo(int, float, ...) AS ...).

Greetings,

Andres Freund




Re: Adding column "mem_usage" to view pg_prepared_statements

2019-07-28 Thread Daniel Migowski
Hello Andres,

how do you want to generalize it? Are you thinking about a view solely for the 
display of the memory usage of different objects? Like functions or views (that 
also have a plan associated with it, when I think about it)? While being 
interesting I still believe monitoring the mem usage of prepared statements is 
a bit more important than that of other objects because of how they change 
memory consumption of the server without using any DDL or configuration options 
and I am not aware of other objects with the same properties, or are there 
some? And for the other volatile objects like tables and indexes and their 
contents PostgreSQL already has it's information functions. 

Regardless of that here is the patch for now. I didn't want to fiddle to much 
with MemoryContexts yet, so it still doesn't recurse in child contexts, but I 
will change that also when I try to build a more compact MemoryContext 
implementation and see how that works out.

Thanks for pointing out the relevant information in the statement column of the 
view.

Regards,
Daniel Migowski

-Ursprüngliche Nachricht-
Von: Andres Freund  
Gesendet: Samstag, 27. Juli 2019 21:12
An: Daniel Migowski 
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-07-27 18:29:23 +, Daniel Migowski wrote:
> I just implemented a small change that adds another column "mem_usage"
> to the system view "pg_prepared_statements". It returns the memory 
> usage total of CachedPlanSource.context, 
> CachedPlanSource.query_content and if available 
> CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the patch, even 
if it's just POC, as that gives a better handle on how much additional 
complexity it introduces.

I think this could be a useful feature. I'm not so sure we want it tied to just 
cached statements however - perhaps we ought to generalize it a bit more.


Regarding the prepared statements specific considerations: I don't think we 
ought to explicitly reference CachedPlanSource.query_content, and 
CachedPlanSource.gplan.context.

In the case of actual prepared statements (rather than oneshot plans) 
CachedPlanSource.query_context IIRC should live under CachedPlanSource.context. 
 I think there's no relevant cases where gplan.context isn't a child of 
CachedPlanSource.context either, but not quite sure.

Then we ought to just include child contexts in the memory computation (cf. 
logic in MemoryContextStatsInternal(), although you obviously wouldn't need all 
that). That way, if the cached statements has child contexts, we're going to 
stay accurate.


> Also I wonder why the "prepare test as" is part of the statement 
> column. I isn't even part of the real statement that is prepared as 
> far as I would assume. Would prefer to just have the "select *..." in 
> that column.

It's the statement that was executed. Note that you'll not see that in the case 
of protocol level prepared statements.  It will sometimes include relevant 
information, e.g. about the types specified as part of the prepare (as in 
PREPARE foo(int, float, ...) AS ...).

Greetings,

Andres Freund


prepared_statements_mem_usage.diff
Description: prepared_statements_mem_usage.diff


Re: Adding column "mem_usage" to view pg_prepared_statements

2019-07-27 Thread Andres Freund
Hi,

On 2019-07-27 18:29:23 +, Daniel Migowski wrote:
> I just implemented a small change that adds another column "mem_usage"
> to the system view "pg_prepared_statements". It returns the memory
> usage total of CachedPlanSource.context,
> CachedPlanSource.query_content and if available
> CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the
patch, even if it's just POC, as that gives a better handle on how much
additional complexity it introduces.

I think this could be a useful feature. I'm not so sure we want it tied
to just cached statements however - perhaps we ought to generalize it a
bit more.


Regarding the prepared statements specific considerations: I don't think
we ought to explicitly reference CachedPlanSource.query_content, and
CachedPlanSource.gplan.context.

In the case of actual prepared statements (rather than oneshot plans)
CachedPlanSource.query_context IIRC should live under
CachedPlanSource.context.  I think there's no relevant cases where
gplan.context isn't a child of CachedPlanSource.context either, but not
quite sure.

Then we ought to just include child contexts in the memory computation
(cf. logic in MemoryContextStatsInternal(), although you obviously
wouldn't need all that). That way, if the cached statements has child
contexts, we're going to stay accurate.


> Also I wonder why the "prepare test as" is part of the statement
> column. I isn't even part of the real statement that is prepared as
> far as I would assume. Would prefer to just have the "select *..." in
> that column.

It's the statement that was executed. Note that you'll not see that in
the case of protocol level prepared statements.  It will sometimes
include relevant information, e.g. about the types specified as part of
the prepare (as in PREPARE foo(int, float, ...) AS ...).

Greetings,

Andres Freund




Adding column "mem_usage" to view pg_prepared_statements

2019-07-27 Thread Daniel Migowski
Hello,

I just implemented a small change that adds another column "mem_usage" to the 
system view "pg_prepared_statements". It returns the memory usage total of 
CachedPlanSource.context, CachedPlanSource.query_content and if available 
CachedPlanSource.gplan.context.

Looks like this:

IKOffice_Daume=# prepare test as select * from vw_report_salesinvoice where 
salesinvoice_id = $1;
PREPARE
IKOffice_Daume=# select * from pg_prepared_statements;
name |statement 
| prepare_time | parameter_types | from_sql | mem_usage
--+--+--+-+--+---
test | prepare test as select * from vw_report_salesinvoice where 
salesinvoice_id = $1; | 2019-07-27 20:21:12.63093+02 | {integer}   | t  
  |  33580232
(1 row)

I did this in preparation of reducing the memory usage of prepared statements 
and believe that this gives client application an option to investigate which 
prepared statements should be dropped. Also this makes it possible to directly 
examine the results of further changes and their effectiveness on reducing the 
memory load of prepared_statements.

Is a patch welcome or is this feature not of interest?

Also I wonder why the "prepare test as" is part of the statement column. I 
isn't even part of the real statement that is prepared as far as I would 
assume. Would prefer to just have the "select *..." in that column.

Kind regards,
Daniel Migowski