Proposal: Better generation of values in GENERATED columns.

2019-08-26 Thread Daniel Migowski

Hello,

one of the most frustating things when I started with PostgreSQL was 
that IDENTITY columns are based on sequences that are completly 
disconnected from the table contents and manually imported data will 
lead to errors like 'duplicate key value violates unique constraint 
"xyz_pkey"'.


I had to fight this initially with an insert trigger that always updates 
the sequences on each insert, or with client side code that updates the 
sequence when such an error occurs and then retries the insert.


Even Microsoft Access did a better job at autogenerated primaries keys, 
and while I love the elegant design of PostgreSQL in many ways I believe 
we can do better here. I would like to implement a fallback solution 
that detects such errors and automatically updates the nextvalue of the 
sequence when the nextvalue is already used on insert.


I believe this can be implemented without affecting performance 
negatively when one just does extra stuff in the error case, so I 
wouldn't do table scans when creating the insert initially.


Any reasons why this isn't a good idea to try?

Regards,
Daniel Migowski





Re: Patch: New GUC prepared_statement_limit to limit memory used by prepared statements

2019-08-19 Thread Daniel Migowski

Am 19.08.2019 um 05:57 schrieb Kyotaro Horiguchi:

At Sun, 18 Aug 2019 09:43:09 +0200, Daniel Migowski  wrote in 
<6e25ca12-9484-8994-a1ee-40fdbe6af...@ikoffice.de>

Am 17.08.2019 um 19:10 schrieb Ibrar Ahmed:

On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski mailto:dmigow...@ikoffice.de>> wrote:


 attached you find a patch that adds a new GUC:


Quick questions before looking at the patch.


 prepared_statement_limit:

  - Do we have a consensus about the name of GUC? I don't think it is
the right name for that.

The almost same was proposed [1] as a part of syscache-pruning
patch [2], but removed to concentrate on defining how to do that
on the first instance - syscache.  We have some mechanisms that
have the same characteristics - can be bloat and no means to keep
it in a certain size. It is better that they are treated the same
way, or at least on the same principle.
Correct. However, I don't know the backend well enough to see how to 
unify this. Also time based eviction isn't that important for me, and 
I'd rather work with the memory used. I agree that memory leaks are all 
bad, and a time based eviction for some small cache entries might 
suffice, but CachedPlanSources take up up to 45MB EACH here, and looking 
at the memory directly seems desirable in that case.

[1] 
https://www.postgresql.org/message-id/20180315.141246.130742928.horiguchi.kyotaro%40lab.ntt.co.jp
[2] https://commitfest.postgresql.org/23/931/

Pruning plancaches in any means is valuable, but we haven't
reached a concsensus on how to do that. My old patch does that
based on the number of entries because precise memory accounting
of memory contexts is too expensive. I didn't look this patch
closer but it seems to use MemoryContext->methods->stats to count
memory usage, which would be too expensive for the purpose. We
currently use it only for debug output on critical errors like
OOM.


Yes, this looks like a place to improve. I hadn't looked at the stats 
function before, and wasn't aware it iterates over all chunks of the 
context. We really don't need all the mem stats, just the total usage 
and this calculates solely from the size of the blocks. Maybe we should 
add this counter (totalmemusage) to the MemoryContexts themselves to 
start to be able to make decisions based on the memory usage fast. 
Shouldn't be too slow because blocks seem to be aquired much less often 
than chunks and when they are aquired an additional add to variable in 
memory wouldn't hurt. One the other hand they should be as fast as 
possible given how often they are used in the database, so that might be 
bad.


Also one could precompute the memory usage of a CachedPlanSource when it 
is saved, when the query_list gets calculated (for plans about to be 
saved) and when the generic plan is stored in it. In combination with a 
fast stats function which only calculates the total usage this looks 
good for me. Also one could store the sum in a session global variable 
to make checking for the need of a prune run faster.


While time based eviction also makes sense in a context where the 
database is not alone on a server, contraining the memory used directly 
attacks the problem one tries to solve with time based eviction.






Re: Patch: New GUC prepared_statement_limit to limit memory used by prepared statements

2019-08-18 Thread Daniel Migowski

Am 17.08.2019 um 19:10 schrieb Ibrar Ahmed:
On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski <mailto:dmigow...@ikoffice.de>> wrote:



attached you find a patch that adds a new GUC:


Quick questions before looking at the patch.


prepared_statement_limit:

 - Do we have a consensus about the name of GUC? I don't think it is
the right name for that.
No, it is a proposal. It could also be named plancache_mem or 
cachedplansource_maxmem or anything else. It was intended to make 
prepared statements not use up all my mem, but development has shown 
that it could also be used for other CachedPlans, as long as it is a 
saved plan.
 - Is this a WIP patch or the final patch? Because I can see TODO and 
non-standard

comments in the patch.


Definitely work in progress! The current implementation seems to work 
for me, but might be improved, but I wanted some input from the mailing 
list before I optimize things.


The most important question is, if such a feature would find some love 
here. Personally it is essential for me because a single prepared 
statement uses up to 45MB in my application and there were cases where 
ORM-generated prepared statememts would crash my server after some time.


Then I would like to know if the current implementation would at least 
not crash (even it might by slow a bit) or if I have to take more care 
for locking in some places. I believe a backend is a single thread of 
execution but there were notes of invalidation messages that seem to be 
run asynchronously to the main thread. Could someone care to explain the 
treading model of a single backend to me? Does it react so signals and 
what would those signals change? Can I assume that only the 
ResetPlanCache, PlanCacheRelCallback and PlanCacheObjectCallback will be 
called async?



     Specifies the maximum amount of memory used in each
session to
cache
     parsed-and-rewritten queries and execution plans. This
affects
the maximum memory
 a backend threads will reserve when many prepared statements
are used.
 The default value of 0 disables this setting, but it is
recommended to set this
 value to a bit lower than the maximum memory a backend
worker
thread should reserve
 permanently.

If the GUC is configured after each save of a CachedPlanSource, or
after
creating a CachedPlan from it, the function
EnforcePreparedStatementLimit is called now. It checks the mem
usage of
the existing saved CachedPlanSources and invalidates the
query_list and
the gplan if available until the memory limit is met again.

CachedPlanSource are removed-and-tailadded in the saved_plan_list
everytime GetCachedPlan is called on them so it can be used as a
LRU list.


Could be a single move-to-tail function I would add.


I also reworked ResetPlanCache, PlanCacheRelCallback and
PlanCacheObjectCallback a bit so when a CachedPlanSource is
invalidated
the query_list is not only marked as invalid but it is also fully
released to free memory here.

Because this seems to work I was able to reuse the new ReleaseQueryList 
in my implementation.



Regards,
    Daniel Migowski

PS@Konstantin: This patch also includes the CachedPlanMemoryUsage
function you like, maybe you like the review the patch for me?

--
Ibrar Ahmed

Daniel Migowski


Patch: New GUC prepared_statement_limit to limit memory used by prepared statements

2019-08-17 Thread Daniel Migowski

Hello,

attached you find a patch that adds a new GUC:

prepared_statement_limit:

        Specifies the maximum amount of memory used in each session to 
cache
        parsed-and-rewritten queries and execution plans. This affects 
the maximum memory
    a backend threads will reserve when many prepared statements 
are used.
    The default value of 0 disables this setting, but it is 
recommended to set this
    value to a bit lower than the maximum memory a backend worker 
thread should reserve

    permanently.

If the GUC is configured after each save of a CachedPlanSource, or after 
creating a CachedPlan from it, the function 
EnforcePreparedStatementLimit is called now. It checks the mem usage of 
the existing saved CachedPlanSources and invalidates the query_list and 
the gplan if available until the memory limit is met again.


CachedPlanSource are removed-and-tailadded in the saved_plan_list 
everytime GetCachedPlan is called on them so it can be used as a LRU list.


I also reworked ResetPlanCache, PlanCacheRelCallback and 
PlanCacheObjectCallback a bit so when a CachedPlanSource is invalidated 
the query_list is not only marked as invalid but it is also fully 
released to free memory here.


Regards,
Daniel Migowski

PS@Konstantin: This patch also includes the CachedPlanMemoryUsage 
function you like, maybe you like the review the patch for me?




diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index cdc30fa..2da4ba8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1702,6 +1702,24 @@ include_dir 'conf.d'
   
  
 
+ 
+  prepared_statement_limit (integer)
+  
+   prepared_statement_limit configuration 
parameter
+  
+  
+  
+   
+Specifies the maximum amount of memory used in each session to cache 
+parsed-and-rewritten queries and execution plans. This affects the 
maximum memory
+a backend threads will reserve when many prepared statements are used.
+The default value of 0 disables this setting, but it is recommended to 
set this
+value to a bit lower than the maximum memory a backend worker thread 
should reserve
+permanently. 
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
diff --git a/src/backend/utils/cache/plancache.c 
b/src/backend/utils/cache/plancache.c
index abc3062..dbeb5a2 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -88,6 +88,9 @@
  * those that are in long-lived storage and are examined for sinval events).
  * We use a dlist instead of separate List cells so that we can guarantee
  * to save a CachedPlanSource without error.
+ *
+ * This list is used as a LRU list for prepared statements when a
+ * prepared_statement_limit is configured.
  */
 static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list);
 
@@ -97,6 +100,7 @@ static dlist_head saved_plan_list = 
DLIST_STATIC_INIT(saved_plan_list);
 static dlist_head cached_expression_list = 
DLIST_STATIC_INIT(cached_expression_list);
 
 static void ReleaseGenericPlan(CachedPlanSource *plansource);
+static void ReleaseQueryList(CachedPlanSource *plansource);
 static List *RevalidateCachedQuery(CachedPlanSource *plansource,
   
QueryEnvironment *queryEnv);
 static bool CheckCachedPlan(CachedPlanSource *plansource);
@@ -114,6 +118,7 @@ static TupleDesc PlanCacheComputeResultDesc(List 
*stmt_list);
 static void PlanCacheRelCallback(Datum arg, Oid relid);
 static void PlanCacheObjectCallback(Datum arg, int cacheid, uint32 hashvalue);
 static void PlanCacheSysCallback(Datum arg, int cacheid, uint32 hashvalue);
+static void EnforcePreparedStatementLimit(CachedPlanSource* ignoreThis);
 
 /* GUC parameter */
 intplan_cache_mode;
@@ -482,6 +487,11 @@ SaveCachedPlan(CachedPlanSource *plansource)
dlist_push_tail(_plan_list, >node);
 
plansource->is_saved = true;
+
+   if( prep_statement_limit > 0 ) {
+   /* Clean up statements when mem limit is hit */
+   EnforcePreparedStatementLimit(plansource);
+   }
 }
 
 /*
@@ -536,6 +546,31 @@ ReleaseGenericPlan(CachedPlanSource *plansource)
 }
 
 /*
+ * ReleaseQueryList: release a CachedPlanSource's query list, relationOids,
+ * invalItems and search_path.
+ */
+static void
+ReleaseQueryList(CachedPlanSource *plansource)
+{
+   if (plansource->query_list)
+   {
+   plansource->is_valid = false;
+   plansource->query_list = NIL;
+   plansource->relationOids = NIL;
+   plansource->invalItems = NIL;
+   plansource->search_path = NULL;
+
+   if (plansource->query_context)
+   {
+   MemoryContext qcxt = plansource->query_context;
+
+   plansource->query_context = NULL;
+   

Re: Adding column "mem_usage" to view pg_prepared_statements

2019-08-05 Thread Daniel Migowski

Am 05.08.2019 um 19:16 schrieb Andres Freund:

On 2019-07-28 06:20:40 +, Daniel Migowski wrote:

how do you want to generalize it? Are you thinking about a view solely
for the display of the memory usage of different objects?

I'm not quite sure. I'm just not sure that adding separate
infrastructure for various objects is a sutainable approach. We'd likely
want to have this for prepared statements, for cursors, for the current
statement, for various caches, ...

I think an approach would be to add an 'owning_object' field to memory
contexts, which has to point to a Node type if set. A table returning reporting
function could recursively walk through the memory contexts, starting at
TopMemoryContext. Whenever it encounters a context with owning_object
set, it prints a string version of nodeTag(owning_object). For some node
types it knows about (e.g. PreparedStatement, Portal, perhaps some of
the caches), it prints additional metadata specific to the type (so for
prepared statement it'd be something like 'prepared statement', '[name
of prepared statement]'), and prints information about the associated
context and all its children.
I understand. So it would be something like the output of 
MemoryContextStatsInternal, but in table form with some extra columns. I 
would have loved this extra information already in 
MemoryContextStatsInternal btw., so it might be a good idea to upgrade 
it first to find the information and wrap a table function over it 
afterwards.

The general context information probably should be something like:
context_name, context_ident,
context_total_bytes, context_total_blocks, context_total_freespace, 
context_total_freechunks, context_total_used, context_total_children
context_self_bytes, context_self_blocks, context_self_freespace, 
context_self_freechunks, context_self_used, context_self_children,

It might make sense to have said function return a row for the contexts
it encounters that do not have an owner set too (that way we'd e.g. get
CacheMemoryContext handled), but then still recurse.
A nice way to learn about the internals of the server and to analyze the 
effects of memory reducing enhancements.

Arguably the proposed owning_object field would be a bit redundant with
the already existing ident/MemoryContextSetIdentifier field, which
e.g. already associates the query string with the contexts used for a
prepared statement. But I'm not convinced that's going to be enough
context in a lot of cases, because e.g. for prepared statements it could
be interesting to have access to both the prepared statement name, and
the statement.
The identifier seems to be more like a category at the moment, because 
it does not seem to hold any relevant information about the object in 
question. So a more specific name would be nice.

The reason I like something like this is that we wouldn't add new
columns to a number of views, and lack views to associate such
information to for some objects. And it'd be disproportional to add all
the information to numerous places anyway.
I understand your argumentation, but things like Cursors and Portals are 
rather short living while prepared statements seem to be the place where 
memory really builds up.

One counter-argument is that it'd be more expensive to get information
specific to prepared statements (or other object types) that way. I'm
not sure I buy that that's a problem - this isn't something that's
likely going to be used at a high frequency. But if it becomes a
problem, we can add a function that starts that process at a distinct
memory context (e.g. a function that does this just for a single
prepared statement, identified by name) - but I'd not start there.
I also see no problem here, and with Konstantin Knizhnik's autoprepare I 
wouldn't use this very often anyway, more just for monitoring purposes, 
where I don't care if my query is a bit more complex.

While being interesting I still believe monitoring the mem usage of
prepared statements is a bit more important than that of other objects
because of how they change memory consumption of the server without
using any DDL or configuration options and I am not aware of other
objects with the same properties, or are there some? And for the other
volatile objects like tables and indexes and their contents PostgreSQL
already has it's information functions.

Plenty other objects have that property. E.g. cursors. And for the
catalog/relation/... caches it's even more pernicious - the client might
have closed all its "handles", but we still use memory (and it's
absolutely crucial for performance).


Maybe we can do both? Add a single column to pg_prepared_statements, and 
add another table for the output of MemoryContextStatsDetail? This has 
the advantage that the single real memory indicator useful for end users 
(to the question: How much mem takes my sh*t up?) is in 
pg_prepared_statements and some more intrinsic information in a detail 
view.


Thinking about the latter I 

Re: Adding column "mem_usage" to view pg_prepared_statements

2019-08-05 Thread Daniel Migowski

Am 05.08.2019 um 18:30 schrieb Konstantin Knizhnik:

On 31.07.2019 1:38, Daniel Migowski wrote:

Am 31.07.2019 um 00:29 schrieb Tom Lane:

Daniel Migowski  writes:
Ok, just have read about the commitfest thing. Is the patch OK for 
that? Or is there generally no love for a mem_usage column here? If 
it was, I really would add some memory monitoring in our app 
regarding this.

You should certainly put it into the next commitfest.  We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it.  (The CF app is really a to-do
list for patches ...)

OK, added it there. Thanks for your patience :).
The patch is not applied to the most recent source because extra 
parameter was added to CreateTemplateTupleDesc method.

Please rebase it - the fix is trivial.

OK, will do.
I think that including in pg_prepared_statements information about 
memory used this statement is very useful.
CachedPlanMemoryUsage function may be useful not only for this view, 
but for example it is also need in my autoprepare patch.
I would love to use your work if it's done, and would also love to work 
together here. I am quite novice in C thought, I might take my time to 
get things right.
I wonder if you consider go further and not only report but control 
memory used by prepared statements?
For example implement some LRU replacement discipline on top of 
prepared statements cache which can

evict rarely used prepared statements to avoid memory overflow.


THIS! Having some kind of safety net here would finally make sure that 
my precious processes will not grow endlessly until all mem is eaten up, 
even with prep statement count limits.


While working on stuff I noticed there are three things stored in a 
CachedPlanSource. The raw query tree (a relatively small thing), the 
query list (analyzed-and-rewritten query tree) which takes up the most 
memory (at least here, maybe different with your usecases), and (often 
after the 6th call) the CachedPlan, which is more optimized that the 
query list and often needs less memory (half of the query list here).


The query list seems to take the most time to create here, because I hit 
the GEQO engine here, but it could be recreated easily (up to 500ms for 
some queries). Creating the CachedPlan afterwards takes 60ms in some 
usecase. IF we could just invalidate them from time to time, that would 
be grate.


Also, invalidating just the queries or the CachedPlan would not 
invalidate the whole prepared statement, which would break clients 
expectations, but just make them a slower, adding much to the stability 
of the system. I would pay that price, because I just don't use manually 
named prepared statements anyway and just autogenerate them as 
performance sugar without thinking about what really needs to be 
prepared anyway. There is an option in the JDBC driver to use prepared 
statements automatically after you have used them a few time.


We have such patch for PgPro-EE but it limits only number of prepared 
statement, not taken in account amount of memory used by them.
I think that memory based limit will be more accurate (although it 
adds more overhead).


Limiting them by number is already done automatically here and would 
really not be of much value, but having a mem limit would be great. We 
could have a combined memory limit for your autoprepared statements as 
well as the manually prepared ones, so clients can know for sure the 
server processes won't eat up more that e.g. 800MB for prepared 
statements. And also I would like to have this value spread across all 
client processes, e.g. specifying max_prepared_statement_total_mem=5G 
for the server, and maybe max_prepared_statement_mem=1G for client 
processes. Of course we would have to implement cross client process 
invalidation here, and I don't know if communicating client processes 
are even intended.


Anyway, a memory limit won't really add that much more overhead. At 
least not more than having no prepared statements at all because of the 
fear of server OOMs, or have just a small count of those statements. I 
was even think about a prepared statement reaper that checks the 
pg_prepared_statements every few minutes to clean things up manually, 
but having this in the server would be of great value to me.



If you want, I can be reviewer of your patch.


I'd love to have you as my reviewer.

Regards,
Daniel Migowski





Re: Patch to clean Query after rewrite-and-analyze - reduces memusage up to 50% - increases TPS by up to 50%

2019-08-03 Thread Daniel Migowski

Am 03.08.2019 um 18:38 schrieb Tom Lane:

Daniel Migowski  writes:

While examining the reasons for excessive memory usage in prepared
statements I noticed that RTE_JOIN-kind RTEs contain a bunch of
columnNames and joinaliasvars, that are irrelevant after the Query after
has been rewritten.

Uh, they're not irrelevant to planning, nor to EXPLAIN.  I don't know how
thoroughly you tested this patch, but it seems certain to break things.

As far as the final plan goes, setrefs.c's add_rte_to_flat_rtable already
drops RTE infrastructure that isn't needed by either the executor or
EXPLAIN.  But we need it up to that point.

OK, I will investigate.

(After thinking a bit, I'm guessing that it seemed not to break because
your tests never actually exercised the generic-plan path, or perhaps
there was always a plancache invalidation before we tried to use the
query_list submitted by PrepareQuery.  I wonder if this is telling us
something about the value of having PrepareQuery do that at all,
rather than just caching the raw parse tree and calling it a day.)


Having PreparyQuery do _what_ exactly? Sorry, I am still learning how 
everything works here.


It seems like the patch crashes the postmaster when I use JOINSs 
directly in the PreparedStatement, not when I just place all the Joins 
in views. I will also look into this further.



A few tips on submitting patches:

* Providing concrete test cases to back up improvement claims is a
good idea.

OK, I will provide.

* Please try to make your code follow established PG style.  Ignoring
project conventions about whitespace and brace layout just makes your
code harder to read.  (A lot of us would just summarily run the code
through pgindent before trying to review it.)
OK. I just tried to have git diff stop marking my indentations red, but 
I am also new to git, will use pgindent now.

* Please don't include random cosmetic changes (eg renaming of unrelated
variables) in a patch that's meant to illustrate some specific functional
change.  It just confuses matters.


Was useful here because I had to declare a ListCell anyway, and 
ListCell's in other places where named 'lc' not 'l', and 'l' was usually 
used for lists, so I thought reusal was nice, but OK.







Patch to clean Query after rewrite-and-analyze - reduces memusage up to 50% - increases TPS by up to 50%

2019-08-03 Thread Daniel Migowski

Hello,

While examining the reasons for excessive memory usage in prepared 
statements I noticed that RTE_JOIN-kind RTEs contain a bunch of 
columnNames and joinaliasvars, that are irrelevant after the Query after 
has been rewritten. I have some queries that join about 20 tables and 
select only a few values, mainly names of objects from those tables.


The attached patch adds a small cleanup function that iterates thought 
the query and cleans stuff up. I may have missed some places that could 
also be cleaned up but for now the memory requirements for my largest 
statements have dropped from 31.2MB to 10.4MB with this patch.


After the statement has be executed seven times a generic plan is stored 
in the statement, resulting in an extra 8,8MB memory usage, but still 
this makes a difference of more than 50% total.


But the most interesting thing was that this patch reduced query 
execution time by 50% (~110ms vs. 55ms) when no generic was created yet, 
and by 35% (7.5ms vs. 5.1ms) when the global query plan had been created.


All tests still pass with my cleanup command, but I am afraid the tests 
might not contain queries that still need that info after statement 
preparation.


If anyone might have a look at it and hint me to a situation where this 
might crash later on? Also, would it be possible for someone to run a 
benchmark after applying this test to ensure my findings are not totally 
off? I tested on a Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz with SSDs, 
but everything should have been in memory when I ran the test.


Regards,
Daniel Migowski


diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index b945b15..af8dbc9 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -51,6 +51,36 @@ static ParamListInfo EvaluateParams(PreparedStatement 
*pstmt, List *params,
 static Datum build_regtype_array(Oid *param_types, int num_params);
 
 /*
+ * cleanUpQuery - Cleanup queries after they have been rewritten.
+ *
+ * This nulls all the joinaliasvars and colnames in RTEs of kind RTE_JOIN
+ * in CMD_SELECT queries after the Query has been analyzed and rewritten.
+ * For some join-heavy queries this results in a 70% memory usage reduction.
+ */
+static void cleanQuery(Query* q) {
+   ListCell* lc2;
+   if( q->commandType == CMD_SELECT ) {
+   foreach(lc2, q->rtable) {
+   RangeTblEntry* rte = (RangeTblEntry*)lfirst(lc2);
+   if( rte->rtekind == RTE_JOIN ) {
+   ListCell* lc3;
+   foreach(lc3, rte->joinaliasvars) {
+   lc3->data.ptr_value = NULL;
+   }
+   if( rte->eref ) {
+   foreach(lc3, rte->eref->colnames) {
+   lc3->data.ptr_value = NULL;
+   }
+   }
+   }
+   if( rte->rtekind == RTE_SUBQUERY ) {
+   cleanQuery(rte->subquery);
+   }
+   }
+   }
+}
+
+/*
  * Implements the 'PREPARE' utility statement.
  */
 void
@@ -64,6 +94,7 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString,
Query  *query;
List   *query_list;
int i;
+   ListCell*   lc;
 
/*
 * Disallow empty-string statement name (conflicts with protocol-level
@@ -99,7 +130,6 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString,
if (nargs)
{
ParseState *pstate;
-   ListCell   *l;
 
/*
 * typenameTypeId wants a ParseState to carry the source query 
string.
@@ -111,9 +141,9 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString,
argtypes = (Oid *) palloc(nargs * sizeof(Oid));
i = 0;
 
-   foreach(l, stmt->argtypes)
+   foreach(lc, stmt->argtypes)
{
-   TypeName   *tn = lfirst(l);
+   TypeName   *tn = lfirst(lc);
Oid toid = typenameTypeId(pstate, 
tn);
 
argtypes[i++] = toid;
@@ -163,6 +193,11 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString,
/* Rewrite the query. The result could be 0, 1, or many queries. */
query_list = QueryRewrite(query);
 
+   /* Clearing joinaliasvars from join rte's now */
+   foreach(lc, query_list) {
+   cleanQuery((Query*)lfirst(lc));
+   }
+
/* Finish filling in the CachedPlanSource */
CompleteCachedPlan(plansource,
   query_list,


Re: [HACKERS] Cached plans and statement generalization

2019-08-02 Thread Daniel Migowski

Am 02.08.2019 um 10:57 schrieb Konstantin Knizhnik:

On 02.08.2019 11:25, Daniel Migowski wrote:


I have two suggestions however:

1. Please allow to gather information about the autoprepared 
statements by returning them in pg_prepared_statements view. I would 
love to monitor usage of them as well as the memory consumption that 
occurs. I suggested a patch to display that in 
https://www.postgresql.org/message-id/41ed3f5450c90f4d8381bc4d8df6bbdcf02e1...@exchangeserver.ikoffice.de 



Sorry, but there is pg_autoprepared_statements view. I think that it 
will be better to distinguish explicitly and implicitly prepared 
statements, will not it?
Do you want to add more information to this view? Right now it shows 
query string, types of parameters and number of this query execution.
My sorry, I didn't notice it in the patch. If there is another view for 
those that's perfectly fine.
2. Please not only use a LRU list, but maybe something which would 
prefer statements that get reused at all. We often create ad hoc 
statements with parameters which don't really need to be kept. Maybe 
I can suggest an implementation of an LRU list where a reusal of a 
statement not only pulls it to the top of the list but also increases 
a reuse counter. When then a statement would drop off the end of the 
list one checks if the reusal count is non-zero, and only really 
drops it if the resual count is zero. Else the reusal count is 
decremented (or halved) and the element is also placed at the start 
of the LRU list, so it is kept a bit longer.


There are many variations of LRU and alternatives: clock, segmented 
LRU, ...
I agree that classical LRU may be not the best replacement policy for 
autoprepare.
Application is either use static queries, either constructing them 
dynamically. It will be better to distinguish queries executed at 
least twice from one-shot queries.
So may be SLRU will be the best choice. But actually situation you 
have described (We often create ad hoc statements with parameters 
which don't really need to be kept)
is not applicable to atutoprepare. Autoprepare deals only with 
statements which are actually executed.


We have another patch which limits number of stored prepared plans to 
avoid memory overflow (in case of using stored procedures which 
implicitly prepare all issued queries).

Here choice of replacement policy is more critical.


Alright, just wanted to have something better than a LRU, so it is not a 
stepback from the implementation in the JDBC driver for us.


Kind regards,
Daniel Migowski





Re: [HACKERS] Cached plans and statement generalization

2019-08-02 Thread Daniel Migowski

Am 01.08.2019 um 18:56 schrieb Konstantin Knizhnik:
I decided to implement your proposal. Much simple version of 
autoprepare patch is attached.

At my computer I got the following results:

 pgbench -M simple -S 22495 TPS
 pgbench -M extended -S    47633 TPS
 pgbench -M prepared -S    47683 TPS


So autoprepare speedup execution of queries sent using extended 
protocol more than twice and it is almost the same as with explicitly 
prepared statements.
I failed to create regression test for it because I do not know how to 
force psql to use extended protocol. Any advice is welcome.


I am very interested in such a patch, because currently I use the same 
functionality within my JDBC driver and having this directly in 
PostgreSQL would surely speed things up.


I have two suggestions however:

1. Please allow to gather information about the autoprepared statements 
by returning them in pg_prepared_statements view. I would love to 
monitor usage of them as well as the memory consumption that occurs. I 
suggested a patch to display that in 
https://www.postgresql.org/message-id/41ed3f5450c90f4d8381bc4d8df6bbdcf02e1...@exchangeserver.ikoffice.de


2. Please not only use a LRU list, but maybe something which would 
prefer statements that get reused at all. We often create ad hoc 
statements with parameters which don't really need to be kept. Maybe I 
can suggest an implementation of an LRU list where a reusal of a 
statement not only pulls it to the top of the list but also increases a 
reuse counter. When then a statement would drop off the end of the list 
one checks if the reusal count is non-zero, and only really drops it if 
the resual count is zero. Else the reusal count is decremented (or 
halved) and the element is also placed at the start of the LRU list, so 
it is kept a bit longer.


Thanks,

Daniel Migowski






Proposal: Clean up RangeTblEntry nodes after query preparation

2019-08-02 Thread Daniel Migowski
.

Useful cols: "id" "number" "id" "invoice_id" "description"

Useless because complety unreferenced cols: "amount" "customer_id" 
"position" "quantity" "priceperunit" "amount"


I believe one could easily drop the unrefernced names from the RTE as 
well as the Var nodes which would cut mem usage drastically.


*Final questions:* Is there a reason we don't just null the unused 
values from the RTEs? I would love to implement such a cleanup step. Or 
if null is not possible, just replace stuff with a simpler NOVAR node 
and replace names with empty strings?


I believe this would reduce mem usage for PreparedStatements by >90% at 
least here.


Regards,
Daniel Migowski



Re: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Daniel Migowski



Am 31.07.2019 um 00:29 schrieb Tom Lane:

Daniel Migowski  writes:

Ok, just have read about the commitfest thing. Is the patch OK for that? Or is 
there generally no love for a mem_usage column here? If it was, I really would 
add some memory monitoring in our app regarding this.

You should certainly put it into the next commitfest.  We might or
might not accept it, but if it's not listed in the CF we might
not remember to even review it.  (The CF app is really a to-do
list for patches ...)


OK, added it there. Thanks for your patience :).

Regards,
Daniel Migowski





Re: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Daniel Migowski

Am 31.07.2019 um 00:17 schrieb Tomas Vondra:

FWIW not sure what mail client you're using, but it seems to be breaking
the threads for some reason, splitting it into two - see [1].

Also, please stop top-posting. It makes it way harder to follow the
discussion.


Was using Outlook because it's my companies mail client but switching to 
Thunderbird now for communication with the list, trying to be a good 
citizen now.


Regards,
Daniel Migowski





AW: AW: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Daniel Migowski
Ok, just have read about the commitfest thing. Is the patch OK for that? Or is 
there generally no love for a mem_usage column here? If it was, I really would 
add some memory monitoring in our app regarding this.

-Ursprüngliche Nachricht-
Von: Tom Lane  
Gesendet: Mittwoch, 31. Juli 2019 00:09
An: Daniel Migowski 
Cc: Andres Freund ; pgsql-hackers@lists.postgresql.org
Betreff: Re: AW: Adding column "mem_usage" to view pg_prepared_statements

Daniel Migowski  writes:
> Will my patch be considered for 12.0? The calculation of the mem_usage value 
> might be improved later on but because the system catalog is changed I would 
> love to add it before 12.0 becomes stable. 

v12 has been feature-frozen for months, and it's pretty hard to paint this as a 
bug fix, so I'd say the answer is "no".

regards, tom lane




AW: Adding column "mem_usage" to view pg_prepared_statements

2019-07-30 Thread Daniel Migowski
Hello,

Will my patch be considered for 12.0? The calculation of the mem_usage value 
might be improved later on but because the system catalog is changed I would 
love to add it before 12.0 becomes stable. 

Regards,
Daniel Migowski

-Ursprüngliche Nachricht-
Von: Daniel Migowski  
Gesendet: Sonntag, 28. Juli 2019 08:21
An: Andres Freund 
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hello Andres,

how do you want to generalize it? Are you thinking about a view solely for the 
display of the memory usage of different objects? Like functions or views (that 
also have a plan associated with it, when I think about it)? While being 
interesting I still believe monitoring the mem usage of prepared statements is 
a bit more important than that of other objects because of how they change 
memory consumption of the server without using any DDL or configuration options 
and I am not aware of other objects with the same properties, or are there 
some? And for the other volatile objects like tables and indexes and their 
contents PostgreSQL already has it's information functions. 

Regardless of that here is the patch for now. I didn't want to fiddle to much 
with MemoryContexts yet, so it still doesn't recurse in child contexts, but I 
will change that also when I try to build a more compact MemoryContext 
implementation and see how that works out.

Thanks for pointing out the relevant information in the statement column of the 
view.

Regards,
Daniel Migowski

-Ursprüngliche Nachricht-
Von: Andres Freund 
Gesendet: Samstag, 27. Juli 2019 21:12
An: Daniel Migowski 
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote:
> I just implemented a small change that adds another column "mem_usage"
> to the system view "pg_prepared_statements". It returns the memory 
> usage total of CachedPlanSource.context, 
> CachedPlanSource.query_content and if available 
> CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the patch, even 
if it's just POC, as that gives a better handle on how much additional 
complexity it introduces.

I think this could be a useful feature. I'm not so sure we want it tied to just 
cached statements however - perhaps we ought to generalize it a bit more.


Regarding the prepared statements specific considerations: I don't think we 
ought to explicitly reference CachedPlanSource.query_content, and 
CachedPlanSource.gplan.context.

In the case of actual prepared statements (rather than oneshot plans) 
CachedPlanSource.query_context IIRC should live under CachedPlanSource.context. 
 I think there's no relevant cases where gplan.context isn't a child of 
CachedPlanSource.context either, but not quite sure.

Then we ought to just include child contexts in the memory computation (cf. 
logic in MemoryContextStatsInternal(), although you obviously wouldn't need all 
that). That way, if the cached statements has child contexts, we're going to 
stay accurate.


> Also I wonder why the "prepare test as" is part of the statement 
> column. I isn't even part of the real statement that is prepared as 
> far as I would assume. Would prefer to just have the "select *..." in 
> that column.

It's the statement that was executed. Note that you'll not see that in the case 
of protocol level prepared statements.  It will sometimes include relevant 
information, e.g. about the types specified as part of the prepare (as in 
PREPARE foo(int, float, ...) AS ...).

Greetings,

Andres Freund




Re: Adding column "mem_usage" to view pg_prepared_statements

2019-07-28 Thread Daniel Migowski
Hello Andres,

how do you want to generalize it? Are you thinking about a view solely for the 
display of the memory usage of different objects? Like functions or views (that 
also have a plan associated with it, when I think about it)? While being 
interesting I still believe monitoring the mem usage of prepared statements is 
a bit more important than that of other objects because of how they change 
memory consumption of the server without using any DDL or configuration options 
and I am not aware of other objects with the same properties, or are there 
some? And for the other volatile objects like tables and indexes and their 
contents PostgreSQL already has it's information functions. 

Regardless of that here is the patch for now. I didn't want to fiddle to much 
with MemoryContexts yet, so it still doesn't recurse in child contexts, but I 
will change that also when I try to build a more compact MemoryContext 
implementation and see how that works out.

Thanks for pointing out the relevant information in the statement column of the 
view.

Regards,
Daniel Migowski

-Ursprüngliche Nachricht-
Von: Andres Freund  
Gesendet: Samstag, 27. Juli 2019 21:12
An: Daniel Migowski 
Cc: pgsql-hackers@lists.postgresql.org
Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements

Hi,

On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote:
> I just implemented a small change that adds another column "mem_usage"
> to the system view "pg_prepared_statements". It returns the memory 
> usage total of CachedPlanSource.context, 
> CachedPlanSource.query_content and if available 
> CachedPlanSource.gplan.context.

FWIW, it's generally easier to comment if you actually provide the patch, even 
if it's just POC, as that gives a better handle on how much additional 
complexity it introduces.

I think this could be a useful feature. I'm not so sure we want it tied to just 
cached statements however - perhaps we ought to generalize it a bit more.


Regarding the prepared statements specific considerations: I don't think we 
ought to explicitly reference CachedPlanSource.query_content, and 
CachedPlanSource.gplan.context.

In the case of actual prepared statements (rather than oneshot plans) 
CachedPlanSource.query_context IIRC should live under CachedPlanSource.context. 
 I think there's no relevant cases where gplan.context isn't a child of 
CachedPlanSource.context either, but not quite sure.

Then we ought to just include child contexts in the memory computation (cf. 
logic in MemoryContextStatsInternal(), although you obviously wouldn't need all 
that). That way, if the cached statements has child contexts, we're going to 
stay accurate.


> Also I wonder why the "prepare test as" is part of the statement 
> column. I isn't even part of the real statement that is prepared as 
> far as I would assume. Would prefer to just have the "select *..." in 
> that column.

It's the statement that was executed. Note that you'll not see that in the case 
of protocol level prepared statements.  It will sometimes include relevant 
information, e.g. about the types specified as part of the prepare (as in 
PREPARE foo(int, float, ...) AS ...).

Greetings,

Andres Freund


prepared_statements_mem_usage.diff
Description: prepared_statements_mem_usage.diff


Adding column "mem_usage" to view pg_prepared_statements

2019-07-27 Thread Daniel Migowski
Hello,

I just implemented a small change that adds another column "mem_usage" to the 
system view "pg_prepared_statements". It returns the memory usage total of 
CachedPlanSource.context, CachedPlanSource.query_content and if available 
CachedPlanSource.gplan.context.

Looks like this:

IKOffice_Daume=# prepare test as select * from vw_report_salesinvoice where 
salesinvoice_id = $1;
PREPARE
IKOffice_Daume=# select * from pg_prepared_statements;
name |statement 
| prepare_time | parameter_types | from_sql | mem_usage
--+--+--+-+--+---
test | prepare test as select * from vw_report_salesinvoice where 
salesinvoice_id = $1; | 2019-07-27 20:21:12.63093+02 | {integer}   | t  
  |  33580232
(1 row)

I did this in preparation of reducing the memory usage of prepared statements 
and believe that this gives client application an option to investigate which 
prepared statements should be dropped. Also this makes it possible to directly 
examine the results of further changes and their effectiveness on reducing the 
memory load of prepared_statements.

Is a patch welcome or is this feature not of interest?

Also I wonder why the "prepare test as" is part of the statement column. I 
isn't even part of the real statement that is prepared as far as I would 
assume. Would prefer to just have the "select *..." in that column.

Kind regards,
Daniel Migowski



AW: Question about MemoryContexts / possible memory leak in CachedPlanSource usage

2019-07-25 Thread Daniel Migowski
Ah, you are right, I looked in fe_memutils.c. Makes sense now, thanks!!


-Ursprüngliche Nachricht-
Von: Andres Freund  
Gesendet: Donnerstag, 25. Juli 2019 22:31
An: Daniel Migowski 
Cc: pgsql-hack...@postgresql.org
Betreff: Re: Question about MemoryContexts / possible memory leak in 
CachedPlanSource usage

Hi,

On 2019-07-25 20:21:06 +, Daniel Migowski wrote:
> When CachedPlanSource instances are created the field query_string is 
> filled with pstrdup(query_string) in CreateCachedPlan, 
> plancache.c:182, which is just a wrapper for strdup. According to the 
> docs the returned pointer should be freed with “free” sometimes later.

Note pstrdup is *not* just a wrapper for strdup:

return MemoryContextStrdup(CurrentMemoryContext, in);

i.e. it explicitly allocates memory in the current memory context.

Perhaps you looked at the version of pstrdup() in src/common/fe_memutils.c? 
That's just for "frontend" code (we call code that doesn't run in the server 
frontend, for reasons). There we don't have the whole memory context 
infrastructure... It's only there so we can reuse code that uses pstrdup() 
between frontend and server.


> I believe in DropCachedPlan the free should take place but I don’t see 
> it. Is it just missing or is memory allocated by strdup and friends 
> automatically created in the current MemoryContext?  It so, why do I 
> need to use palloc() instead of malloc()?

We don't intercept malloc itself (doing so would have a *lot* of issues, 
starting from palloc internally using malloc, and ending with having lots of 
problems with libraries because their malloc would suddenly behave differently).


Greetings,

Andres Freund


Question about MemoryContexts / possible memory leak in CachedPlanSource usage

2019-07-25 Thread Daniel Migowski
Hello,

I am just starting to get my feet wet with PostgreSQL development and am 
starting to understand the source, so please be kind . I am working on the 
REL_11_4 tag.

When CachedPlanSource instances are created the field query_string is filled 
with pstrdup(query_string) in CreateCachedPlan, plancache.c:182, which is just 
a wrapper for strdup. According to the docs the returned pointer should be 
freed with “free” sometimes later.

I believe in DropCachedPlan the free should take place but I don’t see it. Is 
it just missing or is memory allocated by strdup and friends automatically 
created in the current MemoryContext? It so, why do I need to use palloc() 
instead of malloc()?

Kind regards,
Daniel Migowski