Hi,
Coverity complained about the fact that pgpa_planner_feedback_warning
does not do anything to free the memory used by flagbuf or detailbuf,
and Tom asked whether that failure could amount to a significant leak.
The best way to use a lot of memory here is to have a very long advice
string that doesn't do anything useful, so I created this test case:
load 'pg_plan_advice';
select set_config('pg_plan_advice.advice', v.v, false) from (select
string_agg('SEQ_SCAN(hogehogehoge' || g || ')', ' ') v from
generate_series(1,10000) g) v;
set pg_plan_advice.feedback_warnings = true;
select 1;
While this is not the worst case imaginable, it's pretty bad. There's
probably no use case for having 10000 advice items in your advice
string even if they were all valid, and here none of them are valid
for the final query (select 1). So this produces "WARNING: supplied
plan advice was not enforced" with a 10000 line detail message where
all the lines look like this, but with different numbers:
advice SEQ_SCAN(hogehogehoge7348) feedback is "not matched"
Failure to free the buffers costs us just over 1MB of memory, which is
not wildly disproportionate given that the message itself is over half
a MB long, so I'm not sure that freeing it is all that useful here. I
don't think the context we use for planning normally sticks around for
all that long. If I'm wrong about that and it does, then we should
probably wrap our own shorter-lived context around the stuff this
module is doing rather than trying to free allocations individually.
But this does raise two points which are perhaps worthy of some
further consideration:
1. Maybe we should limit the length of the detail message. In some
other cases, like reportDependentObjects, we limit the detail message
to the first 100 problems found, and then say at the end of that
detail message how many more problems there were afterwards. We could
do that here, too. I'm not 100% sure it's worth the code, but then
again it's not much code.
2. The way the current code works is to transform the advice feedback
into a Node-tree representation which is stashed in the PlannedStmt's
extension_state, and then also passes that Node-tree representation to
pgpa_planner_feedback_warning, which generates the actual warning.
That Node-tree representation is currently used by EXPLAIN to display
the advice feedback, which I think for most people will be a nicer
interface than enabling pg_plan_advice.feedback_warnings, and it can
also be used by other extensions. For instance, we could extend
pg_stash_advice so that it looks at the advice feedback and updates
the shared store, so that users can monitor whether their stashed
advice is doing what they hoped it would.
However, in a case like this, that Node tree is actually quite large:
about 16MB. I guess that's because pgpa_planner_append_feedback() has
to do multiple allocations for every item of feedback: a C string,
DefElem, an Integer, plus whatever lappend() charges us to add to a
List. We could save that memory by adding code here to optimize for
the case where we need to generate warnings but we don't need the Node
tree for anything else. I'm inclined not to do that, because (A) I
don't think temporarily using 16MB when the user specifies 10,000
items of bogus advice is really that bad and (B) complicating the code
has its own costs, such as maybe introducing more bugs. However, maybe
someone else sees it differently.
Another idea is to try to find a more economic Node representation.
For instance, we could jam the flags into the DefElem's location
field, instead of building a separate Integer node, or we could build
the advice feedback as a giant binary blob and wrap it in a varlena
and a Const node and leave consumers to make sense of that as best
they're able, or we could invent a new Node type that's just to the
perfect thing to hold a C string and an integer. I'm inclined to think
that the first two are too hacky and the third is too special-purpose,
but again someone else might see it otherwise.
Thoughts?
--
Robert Haas
EDB: http://www.enterprisedb.com