Proposal: Better generation of values in GENERATED columns.
Hello, one of the most frustating things when I started with PostgreSQL was that IDENTITY columns are based on sequences that are completly disconnected from the table contents and manually imported data will lead to errors like 'duplicate key value violates unique constraint "xyz_pkey"'. I had to fight this initially with an insert trigger that always updates the sequences on each insert, or with client side code that updates the sequence when such an error occurs and then retries the insert. Even Microsoft Access did a better job at autogenerated primaries keys, and while I love the elegant design of PostgreSQL in many ways I believe we can do better here. I would like to implement a fallback solution that detects such errors and automatically updates the nextvalue of the sequence when the nextvalue is already used on insert. I believe this can be implemented without affecting performance negatively when one just does extra stuff in the error case, so I wouldn't do table scans when creating the insert initially. Any reasons why this isn't a good idea to try? Regards, Daniel Migowski
Re: Patch: New GUC prepared_statement_limit to limit memory used by prepared statements
Am 19.08.2019 um 05:57 schrieb Kyotaro Horiguchi: At Sun, 18 Aug 2019 09:43:09 +0200, Daniel Migowski wrote in <6e25ca12-9484-8994-a1ee-40fdbe6af...@ikoffice.de> Am 17.08.2019 um 19:10 schrieb Ibrar Ahmed: On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski mailto:dmigow...@ikoffice.de>> wrote: attached you find a patch that adds a new GUC: Quick questions before looking at the patch. prepared_statement_limit: - Do we have a consensus about the name of GUC? I don't think it is the right name for that. The almost same was proposed [1] as a part of syscache-pruning patch [2], but removed to concentrate on defining how to do that on the first instance - syscache. We have some mechanisms that have the same characteristics - can be bloat and no means to keep it in a certain size. It is better that they are treated the same way, or at least on the same principle. Correct. However, I don't know the backend well enough to see how to unify this. Also time based eviction isn't that important for me, and I'd rather work with the memory used. I agree that memory leaks are all bad, and a time based eviction for some small cache entries might suffice, but CachedPlanSources take up up to 45MB EACH here, and looking at the memory directly seems desirable in that case. [1] https://www.postgresql.org/message-id/20180315.141246.130742928.horiguchi.kyotaro%40lab.ntt.co.jp [2] https://commitfest.postgresql.org/23/931/ Pruning plancaches in any means is valuable, but we haven't reached a concsensus on how to do that. My old patch does that based on the number of entries because precise memory accounting of memory contexts is too expensive. I didn't look this patch closer but it seems to use MemoryContext->methods->stats to count memory usage, which would be too expensive for the purpose. We currently use it only for debug output on critical errors like OOM. Yes, this looks like a place to improve. I hadn't looked at the stats function before, and wasn't aware it iterates over all chunks of the context. We really don't need all the mem stats, just the total usage and this calculates solely from the size of the blocks. Maybe we should add this counter (totalmemusage) to the MemoryContexts themselves to start to be able to make decisions based on the memory usage fast. Shouldn't be too slow because blocks seem to be aquired much less often than chunks and when they are aquired an additional add to variable in memory wouldn't hurt. One the other hand they should be as fast as possible given how often they are used in the database, so that might be bad. Also one could precompute the memory usage of a CachedPlanSource when it is saved, when the query_list gets calculated (for plans about to be saved) and when the generic plan is stored in it. In combination with a fast stats function which only calculates the total usage this looks good for me. Also one could store the sum in a session global variable to make checking for the need of a prune run faster. While time based eviction also makes sense in a context where the database is not alone on a server, contraining the memory used directly attacks the problem one tries to solve with time based eviction.
Re: Patch: New GUC prepared_statement_limit to limit memory used by prepared statements
Am 17.08.2019 um 19:10 schrieb Ibrar Ahmed: On Sat, Aug 17, 2019 at 6:58 PM Daniel Migowski <mailto:dmigow...@ikoffice.de>> wrote: attached you find a patch that adds a new GUC: Quick questions before looking at the patch. prepared_statement_limit: - Do we have a consensus about the name of GUC? I don't think it is the right name for that. No, it is a proposal. It could also be named plancache_mem or cachedplansource_maxmem or anything else. It was intended to make prepared statements not use up all my mem, but development has shown that it could also be used for other CachedPlans, as long as it is a saved plan. - Is this a WIP patch or the final patch? Because I can see TODO and non-standard comments in the patch. Definitely work in progress! The current implementation seems to work for me, but might be improved, but I wanted some input from the mailing list before I optimize things. The most important question is, if such a feature would find some love here. Personally it is essential for me because a single prepared statement uses up to 45MB in my application and there were cases where ORM-generated prepared statememts would crash my server after some time. Then I would like to know if the current implementation would at least not crash (even it might by slow a bit) or if I have to take more care for locking in some places. I believe a backend is a single thread of execution but there were notes of invalidation messages that seem to be run asynchronously to the main thread. Could someone care to explain the treading model of a single backend to me? Does it react so signals and what would those signals change? Can I assume that only the ResetPlanCache, PlanCacheRelCallback and PlanCacheObjectCallback will be called async? Specifies the maximum amount of memory used in each session to cache parsed-and-rewritten queries and execution plans. This affects the maximum memory a backend threads will reserve when many prepared statements are used. The default value of 0 disables this setting, but it is recommended to set this value to a bit lower than the maximum memory a backend worker thread should reserve permanently. If the GUC is configured after each save of a CachedPlanSource, or after creating a CachedPlan from it, the function EnforcePreparedStatementLimit is called now. It checks the mem usage of the existing saved CachedPlanSources and invalidates the query_list and the gplan if available until the memory limit is met again. CachedPlanSource are removed-and-tailadded in the saved_plan_list everytime GetCachedPlan is called on them so it can be used as a LRU list. Could be a single move-to-tail function I would add. I also reworked ResetPlanCache, PlanCacheRelCallback and PlanCacheObjectCallback a bit so when a CachedPlanSource is invalidated the query_list is not only marked as invalid but it is also fully released to free memory here. Because this seems to work I was able to reuse the new ReleaseQueryList in my implementation. Regards, Daniel Migowski PS@Konstantin: This patch also includes the CachedPlanMemoryUsage function you like, maybe you like the review the patch for me? -- Ibrar Ahmed Daniel Migowski
Patch: New GUC prepared_statement_limit to limit memory used by prepared statements
Hello, attached you find a patch that adds a new GUC: prepared_statement_limit: Specifies the maximum amount of memory used in each session to cache parsed-and-rewritten queries and execution plans. This affects the maximum memory a backend threads will reserve when many prepared statements are used. The default value of 0 disables this setting, but it is recommended to set this value to a bit lower than the maximum memory a backend worker thread should reserve permanently. If the GUC is configured after each save of a CachedPlanSource, or after creating a CachedPlan from it, the function EnforcePreparedStatementLimit is called now. It checks the mem usage of the existing saved CachedPlanSources and invalidates the query_list and the gplan if available until the memory limit is met again. CachedPlanSource are removed-and-tailadded in the saved_plan_list everytime GetCachedPlan is called on them so it can be used as a LRU list. I also reworked ResetPlanCache, PlanCacheRelCallback and PlanCacheObjectCallback a bit so when a CachedPlanSource is invalidated the query_list is not only marked as invalid but it is also fully released to free memory here. Regards, Daniel Migowski PS@Konstantin: This patch also includes the CachedPlanMemoryUsage function you like, maybe you like the review the patch for me? diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index cdc30fa..2da4ba8 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1702,6 +1702,24 @@ include_dir 'conf.d' + + prepared_statement_limit (integer) + + prepared_statement_limit configuration parameter + + + + +Specifies the maximum amount of memory used in each session to cache +parsed-and-rewritten queries and execution plans. This affects the maximum memory +a backend threads will reserve when many prepared statements are used. +The default value of 0 disables this setting, but it is recommended to set this +value to a bit lower than the maximum memory a backend worker thread should reserve +permanently. + + + + max_stack_depth (integer) diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index abc3062..dbeb5a2 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -88,6 +88,9 @@ * those that are in long-lived storage and are examined for sinval events). * We use a dlist instead of separate List cells so that we can guarantee * to save a CachedPlanSource without error. + * + * This list is used as a LRU list for prepared statements when a + * prepared_statement_limit is configured. */ static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list); @@ -97,6 +100,7 @@ static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list); static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_list); static void ReleaseGenericPlan(CachedPlanSource *plansource); +static void ReleaseQueryList(CachedPlanSource *plansource); static List *RevalidateCachedQuery(CachedPlanSource *plansource, QueryEnvironment *queryEnv); static bool CheckCachedPlan(CachedPlanSource *plansource); @@ -114,6 +118,7 @@ static TupleDesc PlanCacheComputeResultDesc(List *stmt_list); static void PlanCacheRelCallback(Datum arg, Oid relid); static void PlanCacheObjectCallback(Datum arg, int cacheid, uint32 hashvalue); static void PlanCacheSysCallback(Datum arg, int cacheid, uint32 hashvalue); +static void EnforcePreparedStatementLimit(CachedPlanSource* ignoreThis); /* GUC parameter */ intplan_cache_mode; @@ -482,6 +487,11 @@ SaveCachedPlan(CachedPlanSource *plansource) dlist_push_tail(_plan_list, >node); plansource->is_saved = true; + + if( prep_statement_limit > 0 ) { + /* Clean up statements when mem limit is hit */ + EnforcePreparedStatementLimit(plansource); + } } /* @@ -536,6 +546,31 @@ ReleaseGenericPlan(CachedPlanSource *plansource) } /* + * ReleaseQueryList: release a CachedPlanSource's query list, relationOids, + * invalItems and search_path. + */ +static void +ReleaseQueryList(CachedPlanSource *plansource) +{ + if (plansource->query_list) + { + plansource->is_valid = false; + plansource->query_list = NIL; + plansource->relationOids = NIL; + plansource->invalItems = NIL; + plansource->search_path = NULL; + + if (plansource->query_context) + { + MemoryContext qcxt = plansource->query_context; + + plansource->query_context = NULL; +
Re: Adding column "mem_usage" to view pg_prepared_statements
Am 05.08.2019 um 19:16 schrieb Andres Freund: On 2019-07-28 06:20:40 +, Daniel Migowski wrote: how do you want to generalize it? Are you thinking about a view solely for the display of the memory usage of different objects? I'm not quite sure. I'm just not sure that adding separate infrastructure for various objects is a sutainable approach. We'd likely want to have this for prepared statements, for cursors, for the current statement, for various caches, ... I think an approach would be to add an 'owning_object' field to memory contexts, which has to point to a Node type if set. A table returning reporting function could recursively walk through the memory contexts, starting at TopMemoryContext. Whenever it encounters a context with owning_object set, it prints a string version of nodeTag(owning_object). For some node types it knows about (e.g. PreparedStatement, Portal, perhaps some of the caches), it prints additional metadata specific to the type (so for prepared statement it'd be something like 'prepared statement', '[name of prepared statement]'), and prints information about the associated context and all its children. I understand. So it would be something like the output of MemoryContextStatsInternal, but in table form with some extra columns. I would have loved this extra information already in MemoryContextStatsInternal btw., so it might be a good idea to upgrade it first to find the information and wrap a table function over it afterwards. The general context information probably should be something like: context_name, context_ident, context_total_bytes, context_total_blocks, context_total_freespace, context_total_freechunks, context_total_used, context_total_children context_self_bytes, context_self_blocks, context_self_freespace, context_self_freechunks, context_self_used, context_self_children, It might make sense to have said function return a row for the contexts it encounters that do not have an owner set too (that way we'd e.g. get CacheMemoryContext handled), but then still recurse. A nice way to learn about the internals of the server and to analyze the effects of memory reducing enhancements. Arguably the proposed owning_object field would be a bit redundant with the already existing ident/MemoryContextSetIdentifier field, which e.g. already associates the query string with the contexts used for a prepared statement. But I'm not convinced that's going to be enough context in a lot of cases, because e.g. for prepared statements it could be interesting to have access to both the prepared statement name, and the statement. The identifier seems to be more like a category at the moment, because it does not seem to hold any relevant information about the object in question. So a more specific name would be nice. The reason I like something like this is that we wouldn't add new columns to a number of views, and lack views to associate such information to for some objects. And it'd be disproportional to add all the information to numerous places anyway. I understand your argumentation, but things like Cursors and Portals are rather short living while prepared statements seem to be the place where memory really builds up. One counter-argument is that it'd be more expensive to get information specific to prepared statements (or other object types) that way. I'm not sure I buy that that's a problem - this isn't something that's likely going to be used at a high frequency. But if it becomes a problem, we can add a function that starts that process at a distinct memory context (e.g. a function that does this just for a single prepared statement, identified by name) - but I'd not start there. I also see no problem here, and with Konstantin Knizhnik's autoprepare I wouldn't use this very often anyway, more just for monitoring purposes, where I don't care if my query is a bit more complex. While being interesting I still believe monitoring the mem usage of prepared statements is a bit more important than that of other objects because of how they change memory consumption of the server without using any DDL or configuration options and I am not aware of other objects with the same properties, or are there some? And for the other volatile objects like tables and indexes and their contents PostgreSQL already has it's information functions. Plenty other objects have that property. E.g. cursors. And for the catalog/relation/... caches it's even more pernicious - the client might have closed all its "handles", but we still use memory (and it's absolutely crucial for performance). Maybe we can do both? Add a single column to pg_prepared_statements, and add another table for the output of MemoryContextStatsDetail? This has the advantage that the single real memory indicator useful for end users (to the question: How much mem takes my sh*t up?) is in pg_prepared_statements and some more intrinsic information in a detail view. Thinking about the latter I
Re: Adding column "mem_usage" to view pg_prepared_statements
Am 05.08.2019 um 18:30 schrieb Konstantin Knizhnik: On 31.07.2019 1:38, Daniel Migowski wrote: Am 31.07.2019 um 00:29 schrieb Tom Lane: Daniel Migowski writes: Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usage column here? If it was, I really would add some memory monitoring in our app regarding this. You should certainly put it into the next commitfest. We might or might not accept it, but if it's not listed in the CF we might not remember to even review it. (The CF app is really a to-do list for patches ...) OK, added it there. Thanks for your patience :). The patch is not applied to the most recent source because extra parameter was added to CreateTemplateTupleDesc method. Please rebase it - the fix is trivial. OK, will do. I think that including in pg_prepared_statements information about memory used this statement is very useful. CachedPlanMemoryUsage function may be useful not only for this view, but for example it is also need in my autoprepare patch. I would love to use your work if it's done, and would also love to work together here. I am quite novice in C thought, I might take my time to get things right. I wonder if you consider go further and not only report but control memory used by prepared statements? For example implement some LRU replacement discipline on top of prepared statements cache which can evict rarely used prepared statements to avoid memory overflow. THIS! Having some kind of safety net here would finally make sure that my precious processes will not grow endlessly until all mem is eaten up, even with prep statement count limits. While working on stuff I noticed there are three things stored in a CachedPlanSource. The raw query tree (a relatively small thing), the query list (analyzed-and-rewritten query tree) which takes up the most memory (at least here, maybe different with your usecases), and (often after the 6th call) the CachedPlan, which is more optimized that the query list and often needs less memory (half of the query list here). The query list seems to take the most time to create here, because I hit the GEQO engine here, but it could be recreated easily (up to 500ms for some queries). Creating the CachedPlan afterwards takes 60ms in some usecase. IF we could just invalidate them from time to time, that would be grate. Also, invalidating just the queries or the CachedPlan would not invalidate the whole prepared statement, which would break clients expectations, but just make them a slower, adding much to the stability of the system. I would pay that price, because I just don't use manually named prepared statements anyway and just autogenerate them as performance sugar without thinking about what really needs to be prepared anyway. There is an option in the JDBC driver to use prepared statements automatically after you have used them a few time. We have such patch for PgPro-EE but it limits only number of prepared statement, not taken in account amount of memory used by them. I think that memory based limit will be more accurate (although it adds more overhead). Limiting them by number is already done automatically here and would really not be of much value, but having a mem limit would be great. We could have a combined memory limit for your autoprepared statements as well as the manually prepared ones, so clients can know for sure the server processes won't eat up more that e.g. 800MB for prepared statements. And also I would like to have this value spread across all client processes, e.g. specifying max_prepared_statement_total_mem=5G for the server, and maybe max_prepared_statement_mem=1G for client processes. Of course we would have to implement cross client process invalidation here, and I don't know if communicating client processes are even intended. Anyway, a memory limit won't really add that much more overhead. At least not more than having no prepared statements at all because of the fear of server OOMs, or have just a small count of those statements. I was even think about a prepared statement reaper that checks the pg_prepared_statements every few minutes to clean things up manually, but having this in the server would be of great value to me. If you want, I can be reviewer of your patch. I'd love to have you as my reviewer. Regards, Daniel Migowski
Re: Patch to clean Query after rewrite-and-analyze - reduces memusage up to 50% - increases TPS by up to 50%
Am 03.08.2019 um 18:38 schrieb Tom Lane: Daniel Migowski writes: While examining the reasons for excessive memory usage in prepared statements I noticed that RTE_JOIN-kind RTEs contain a bunch of columnNames and joinaliasvars, that are irrelevant after the Query after has been rewritten. Uh, they're not irrelevant to planning, nor to EXPLAIN. I don't know how thoroughly you tested this patch, but it seems certain to break things. As far as the final plan goes, setrefs.c's add_rte_to_flat_rtable already drops RTE infrastructure that isn't needed by either the executor or EXPLAIN. But we need it up to that point. OK, I will investigate. (After thinking a bit, I'm guessing that it seemed not to break because your tests never actually exercised the generic-plan path, or perhaps there was always a plancache invalidation before we tried to use the query_list submitted by PrepareQuery. I wonder if this is telling us something about the value of having PrepareQuery do that at all, rather than just caching the raw parse tree and calling it a day.) Having PreparyQuery do _what_ exactly? Sorry, I am still learning how everything works here. It seems like the patch crashes the postmaster when I use JOINSs directly in the PreparedStatement, not when I just place all the Joins in views. I will also look into this further. A few tips on submitting patches: * Providing concrete test cases to back up improvement claims is a good idea. OK, I will provide. * Please try to make your code follow established PG style. Ignoring project conventions about whitespace and brace layout just makes your code harder to read. (A lot of us would just summarily run the code through pgindent before trying to review it.) OK. I just tried to have git diff stop marking my indentations red, but I am also new to git, will use pgindent now. * Please don't include random cosmetic changes (eg renaming of unrelated variables) in a patch that's meant to illustrate some specific functional change. It just confuses matters. Was useful here because I had to declare a ListCell anyway, and ListCell's in other places where named 'lc' not 'l', and 'l' was usually used for lists, so I thought reusal was nice, but OK.
Patch to clean Query after rewrite-and-analyze - reduces memusage up to 50% - increases TPS by up to 50%
Hello, While examining the reasons for excessive memory usage in prepared statements I noticed that RTE_JOIN-kind RTEs contain a bunch of columnNames and joinaliasvars, that are irrelevant after the Query after has been rewritten. I have some queries that join about 20 tables and select only a few values, mainly names of objects from those tables. The attached patch adds a small cleanup function that iterates thought the query and cleans stuff up. I may have missed some places that could also be cleaned up but for now the memory requirements for my largest statements have dropped from 31.2MB to 10.4MB with this patch. After the statement has be executed seven times a generic plan is stored in the statement, resulting in an extra 8,8MB memory usage, but still this makes a difference of more than 50% total. But the most interesting thing was that this patch reduced query execution time by 50% (~110ms vs. 55ms) when no generic was created yet, and by 35% (7.5ms vs. 5.1ms) when the global query plan had been created. All tests still pass with my cleanup command, but I am afraid the tests might not contain queries that still need that info after statement preparation. If anyone might have a look at it and hint me to a situation where this might crash later on? Also, would it be possible for someone to run a benchmark after applying this test to ensure my findings are not totally off? I tested on a Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz with SSDs, but everything should have been in memory when I ran the test. Regards, Daniel Migowski diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c index b945b15..af8dbc9 100644 --- a/src/backend/commands/prepare.c +++ b/src/backend/commands/prepare.c @@ -51,6 +51,36 @@ static ParamListInfo EvaluateParams(PreparedStatement *pstmt, List *params, static Datum build_regtype_array(Oid *param_types, int num_params); /* + * cleanUpQuery - Cleanup queries after they have been rewritten. + * + * This nulls all the joinaliasvars and colnames in RTEs of kind RTE_JOIN + * in CMD_SELECT queries after the Query has been analyzed and rewritten. + * For some join-heavy queries this results in a 70% memory usage reduction. + */ +static void cleanQuery(Query* q) { + ListCell* lc2; + if( q->commandType == CMD_SELECT ) { + foreach(lc2, q->rtable) { + RangeTblEntry* rte = (RangeTblEntry*)lfirst(lc2); + if( rte->rtekind == RTE_JOIN ) { + ListCell* lc3; + foreach(lc3, rte->joinaliasvars) { + lc3->data.ptr_value = NULL; + } + if( rte->eref ) { + foreach(lc3, rte->eref->colnames) { + lc3->data.ptr_value = NULL; + } + } + } + if( rte->rtekind == RTE_SUBQUERY ) { + cleanQuery(rte->subquery); + } + } + } +} + +/* * Implements the 'PREPARE' utility statement. */ void @@ -64,6 +94,7 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString, Query *query; List *query_list; int i; + ListCell* lc; /* * Disallow empty-string statement name (conflicts with protocol-level @@ -99,7 +130,6 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString, if (nargs) { ParseState *pstate; - ListCell *l; /* * typenameTypeId wants a ParseState to carry the source query string. @@ -111,9 +141,9 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString, argtypes = (Oid *) palloc(nargs * sizeof(Oid)); i = 0; - foreach(l, stmt->argtypes) + foreach(lc, stmt->argtypes) { - TypeName *tn = lfirst(l); + TypeName *tn = lfirst(lc); Oid toid = typenameTypeId(pstate, tn); argtypes[i++] = toid; @@ -163,6 +193,11 @@ PrepareQuery(PrepareStmt *stmt, const char *queryString, /* Rewrite the query. The result could be 0, 1, or many queries. */ query_list = QueryRewrite(query); + /* Clearing joinaliasvars from join rte's now */ + foreach(lc, query_list) { + cleanQuery((Query*)lfirst(lc)); + } + /* Finish filling in the CachedPlanSource */ CompleteCachedPlan(plansource, query_list,
Re: [HACKERS] Cached plans and statement generalization
Am 02.08.2019 um 10:57 schrieb Konstantin Knizhnik: On 02.08.2019 11:25, Daniel Migowski wrote: I have two suggestions however: 1. Please allow to gather information about the autoprepared statements by returning them in pg_prepared_statements view. I would love to monitor usage of them as well as the memory consumption that occurs. I suggested a patch to display that in https://www.postgresql.org/message-id/41ed3f5450c90f4d8381bc4d8df6bbdcf02e1...@exchangeserver.ikoffice.de Sorry, but there is pg_autoprepared_statements view. I think that it will be better to distinguish explicitly and implicitly prepared statements, will not it? Do you want to add more information to this view? Right now it shows query string, types of parameters and number of this query execution. My sorry, I didn't notice it in the patch. If there is another view for those that's perfectly fine. 2. Please not only use a LRU list, but maybe something which would prefer statements that get reused at all. We often create ad hoc statements with parameters which don't really need to be kept. Maybe I can suggest an implementation of an LRU list where a reusal of a statement not only pulls it to the top of the list but also increases a reuse counter. When then a statement would drop off the end of the list one checks if the reusal count is non-zero, and only really drops it if the resual count is zero. Else the reusal count is decremented (or halved) and the element is also placed at the start of the LRU list, so it is kept a bit longer. There are many variations of LRU and alternatives: clock, segmented LRU, ... I agree that classical LRU may be not the best replacement policy for autoprepare. Application is either use static queries, either constructing them dynamically. It will be better to distinguish queries executed at least twice from one-shot queries. So may be SLRU will be the best choice. But actually situation you have described (We often create ad hoc statements with parameters which don't really need to be kept) is not applicable to atutoprepare. Autoprepare deals only with statements which are actually executed. We have another patch which limits number of stored prepared plans to avoid memory overflow (in case of using stored procedures which implicitly prepare all issued queries). Here choice of replacement policy is more critical. Alright, just wanted to have something better than a LRU, so it is not a stepback from the implementation in the JDBC driver for us. Kind regards, Daniel Migowski
Re: [HACKERS] Cached plans and statement generalization
Am 01.08.2019 um 18:56 schrieb Konstantin Knizhnik: I decided to implement your proposal. Much simple version of autoprepare patch is attached. At my computer I got the following results: pgbench -M simple -S 22495 TPS pgbench -M extended -S 47633 TPS pgbench -M prepared -S 47683 TPS So autoprepare speedup execution of queries sent using extended protocol more than twice and it is almost the same as with explicitly prepared statements. I failed to create regression test for it because I do not know how to force psql to use extended protocol. Any advice is welcome. I am very interested in such a patch, because currently I use the same functionality within my JDBC driver and having this directly in PostgreSQL would surely speed things up. I have two suggestions however: 1. Please allow to gather information about the autoprepared statements by returning them in pg_prepared_statements view. I would love to monitor usage of them as well as the memory consumption that occurs. I suggested a patch to display that in https://www.postgresql.org/message-id/41ed3f5450c90f4d8381bc4d8df6bbdcf02e1...@exchangeserver.ikoffice.de 2. Please not only use a LRU list, but maybe something which would prefer statements that get reused at all. We often create ad hoc statements with parameters which don't really need to be kept. Maybe I can suggest an implementation of an LRU list where a reusal of a statement not only pulls it to the top of the list but also increases a reuse counter. When then a statement would drop off the end of the list one checks if the reusal count is non-zero, and only really drops it if the resual count is zero. Else the reusal count is decremented (or halved) and the element is also placed at the start of the LRU list, so it is kept a bit longer. Thanks, Daniel Migowski
Proposal: Clean up RangeTblEntry nodes after query preparation
. Useful cols: "id" "number" "id" "invoice_id" "description" Useless because complety unreferenced cols: "amount" "customer_id" "position" "quantity" "priceperunit" "amount" I believe one could easily drop the unrefernced names from the RTE as well as the Var nodes which would cut mem usage drastically. *Final questions:* Is there a reason we don't just null the unused values from the RTEs? I would love to implement such a cleanup step. Or if null is not possible, just replace stuff with a simpler NOVAR node and replace names with empty strings? I believe this would reduce mem usage for PreparedStatements by >90% at least here. Regards, Daniel Migowski
Re: Adding column "mem_usage" to view pg_prepared_statements
Am 31.07.2019 um 00:29 schrieb Tom Lane: Daniel Migowski writes: Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usage column here? If it was, I really would add some memory monitoring in our app regarding this. You should certainly put it into the next commitfest. We might or might not accept it, but if it's not listed in the CF we might not remember to even review it. (The CF app is really a to-do list for patches ...) OK, added it there. Thanks for your patience :). Regards, Daniel Migowski
Re: Adding column "mem_usage" to view pg_prepared_statements
Am 31.07.2019 um 00:17 schrieb Tomas Vondra: FWIW not sure what mail client you're using, but it seems to be breaking the threads for some reason, splitting it into two - see [1]. Also, please stop top-posting. It makes it way harder to follow the discussion. Was using Outlook because it's my companies mail client but switching to Thunderbird now for communication with the list, trying to be a good citizen now. Regards, Daniel Migowski
AW: AW: Adding column "mem_usage" to view pg_prepared_statements
Ok, just have read about the commitfest thing. Is the patch OK for that? Or is there generally no love for a mem_usage column here? If it was, I really would add some memory monitoring in our app regarding this. -Ursprüngliche Nachricht- Von: Tom Lane Gesendet: Mittwoch, 31. Juli 2019 00:09 An: Daniel Migowski Cc: Andres Freund ; pgsql-hackers@lists.postgresql.org Betreff: Re: AW: Adding column "mem_usage" to view pg_prepared_statements Daniel Migowski writes: > Will my patch be considered for 12.0? The calculation of the mem_usage value > might be improved later on but because the system catalog is changed I would > love to add it before 12.0 becomes stable. v12 has been feature-frozen for months, and it's pretty hard to paint this as a bug fix, so I'd say the answer is "no". regards, tom lane
AW: Adding column "mem_usage" to view pg_prepared_statements
Hello, Will my patch be considered for 12.0? The calculation of the mem_usage value might be improved later on but because the system catalog is changed I would love to add it before 12.0 becomes stable. Regards, Daniel Migowski -Ursprüngliche Nachricht- Von: Daniel Migowski Gesendet: Sonntag, 28. Juli 2019 08:21 An: Andres Freund Cc: pgsql-hackers@lists.postgresql.org Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements Hello Andres, how do you want to generalize it? Are you thinking about a view solely for the display of the memory usage of different objects? Like functions or views (that also have a plan associated with it, when I think about it)? While being interesting I still believe monitoring the mem usage of prepared statements is a bit more important than that of other objects because of how they change memory consumption of the server without using any DDL or configuration options and I am not aware of other objects with the same properties, or are there some? And for the other volatile objects like tables and indexes and their contents PostgreSQL already has it's information functions. Regardless of that here is the patch for now. I didn't want to fiddle to much with MemoryContexts yet, so it still doesn't recurse in child contexts, but I will change that also when I try to build a more compact MemoryContext implementation and see how that works out. Thanks for pointing out the relevant information in the statement column of the view. Regards, Daniel Migowski -Ursprüngliche Nachricht- Von: Andres Freund Gesendet: Samstag, 27. Juli 2019 21:12 An: Daniel Migowski Cc: pgsql-hackers@lists.postgresql.org Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements Hi, On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote: > I just implemented a small change that adds another column "mem_usage" > to the system view "pg_prepared_statements". It returns the memory > usage total of CachedPlanSource.context, > CachedPlanSource.query_content and if available > CachedPlanSource.gplan.context. FWIW, it's generally easier to comment if you actually provide the patch, even if it's just POC, as that gives a better handle on how much additional complexity it introduces. I think this could be a useful feature. I'm not so sure we want it tied to just cached statements however - perhaps we ought to generalize it a bit more. Regarding the prepared statements specific considerations: I don't think we ought to explicitly reference CachedPlanSource.query_content, and CachedPlanSource.gplan.context. In the case of actual prepared statements (rather than oneshot plans) CachedPlanSource.query_context IIRC should live under CachedPlanSource.context. I think there's no relevant cases where gplan.context isn't a child of CachedPlanSource.context either, but not quite sure. Then we ought to just include child contexts in the memory computation (cf. logic in MemoryContextStatsInternal(), although you obviously wouldn't need all that). That way, if the cached statements has child contexts, we're going to stay accurate. > Also I wonder why the "prepare test as" is part of the statement > column. I isn't even part of the real statement that is prepared as > far as I would assume. Would prefer to just have the "select *..." in > that column. It's the statement that was executed. Note that you'll not see that in the case of protocol level prepared statements. It will sometimes include relevant information, e.g. about the types specified as part of the prepare (as in PREPARE foo(int, float, ...) AS ...). Greetings, Andres Freund
Re: Adding column "mem_usage" to view pg_prepared_statements
Hello Andres, how do you want to generalize it? Are you thinking about a view solely for the display of the memory usage of different objects? Like functions or views (that also have a plan associated with it, when I think about it)? While being interesting I still believe monitoring the mem usage of prepared statements is a bit more important than that of other objects because of how they change memory consumption of the server without using any DDL or configuration options and I am not aware of other objects with the same properties, or are there some? And for the other volatile objects like tables and indexes and their contents PostgreSQL already has it's information functions. Regardless of that here is the patch for now. I didn't want to fiddle to much with MemoryContexts yet, so it still doesn't recurse in child contexts, but I will change that also when I try to build a more compact MemoryContext implementation and see how that works out. Thanks for pointing out the relevant information in the statement column of the view. Regards, Daniel Migowski -Ursprüngliche Nachricht- Von: Andres Freund Gesendet: Samstag, 27. Juli 2019 21:12 An: Daniel Migowski Cc: pgsql-hackers@lists.postgresql.org Betreff: Re: Adding column "mem_usage" to view pg_prepared_statements Hi, On 2019-07-27 18:29:23 +0000, Daniel Migowski wrote: > I just implemented a small change that adds another column "mem_usage" > to the system view "pg_prepared_statements". It returns the memory > usage total of CachedPlanSource.context, > CachedPlanSource.query_content and if available > CachedPlanSource.gplan.context. FWIW, it's generally easier to comment if you actually provide the patch, even if it's just POC, as that gives a better handle on how much additional complexity it introduces. I think this could be a useful feature. I'm not so sure we want it tied to just cached statements however - perhaps we ought to generalize it a bit more. Regarding the prepared statements specific considerations: I don't think we ought to explicitly reference CachedPlanSource.query_content, and CachedPlanSource.gplan.context. In the case of actual prepared statements (rather than oneshot plans) CachedPlanSource.query_context IIRC should live under CachedPlanSource.context. I think there's no relevant cases where gplan.context isn't a child of CachedPlanSource.context either, but not quite sure. Then we ought to just include child contexts in the memory computation (cf. logic in MemoryContextStatsInternal(), although you obviously wouldn't need all that). That way, if the cached statements has child contexts, we're going to stay accurate. > Also I wonder why the "prepare test as" is part of the statement > column. I isn't even part of the real statement that is prepared as > far as I would assume. Would prefer to just have the "select *..." in > that column. It's the statement that was executed. Note that you'll not see that in the case of protocol level prepared statements. It will sometimes include relevant information, e.g. about the types specified as part of the prepare (as in PREPARE foo(int, float, ...) AS ...). Greetings, Andres Freund prepared_statements_mem_usage.diff Description: prepared_statements_mem_usage.diff
Adding column "mem_usage" to view pg_prepared_statements
Hello, I just implemented a small change that adds another column "mem_usage" to the system view "pg_prepared_statements". It returns the memory usage total of CachedPlanSource.context, CachedPlanSource.query_content and if available CachedPlanSource.gplan.context. Looks like this: IKOffice_Daume=# prepare test as select * from vw_report_salesinvoice where salesinvoice_id = $1; PREPARE IKOffice_Daume=# select * from pg_prepared_statements; name |statement | prepare_time | parameter_types | from_sql | mem_usage --+--+--+-+--+--- test | prepare test as select * from vw_report_salesinvoice where salesinvoice_id = $1; | 2019-07-27 20:21:12.63093+02 | {integer} | t | 33580232 (1 row) I did this in preparation of reducing the memory usage of prepared statements and believe that this gives client application an option to investigate which prepared statements should be dropped. Also this makes it possible to directly examine the results of further changes and their effectiveness on reducing the memory load of prepared_statements. Is a patch welcome or is this feature not of interest? Also I wonder why the "prepare test as" is part of the statement column. I isn't even part of the real statement that is prepared as far as I would assume. Would prefer to just have the "select *..." in that column. Kind regards, Daniel Migowski
AW: Question about MemoryContexts / possible memory leak in CachedPlanSource usage
Ah, you are right, I looked in fe_memutils.c. Makes sense now, thanks!! -Ursprüngliche Nachricht- Von: Andres Freund Gesendet: Donnerstag, 25. Juli 2019 22:31 An: Daniel Migowski Cc: pgsql-hack...@postgresql.org Betreff: Re: Question about MemoryContexts / possible memory leak in CachedPlanSource usage Hi, On 2019-07-25 20:21:06 +, Daniel Migowski wrote: > When CachedPlanSource instances are created the field query_string is > filled with pstrdup(query_string) in CreateCachedPlan, > plancache.c:182, which is just a wrapper for strdup. According to the > docs the returned pointer should be freed with “free” sometimes later. Note pstrdup is *not* just a wrapper for strdup: return MemoryContextStrdup(CurrentMemoryContext, in); i.e. it explicitly allocates memory in the current memory context. Perhaps you looked at the version of pstrdup() in src/common/fe_memutils.c? That's just for "frontend" code (we call code that doesn't run in the server frontend, for reasons). There we don't have the whole memory context infrastructure... It's only there so we can reuse code that uses pstrdup() between frontend and server. > I believe in DropCachedPlan the free should take place but I don’t see > it. Is it just missing or is memory allocated by strdup and friends > automatically created in the current MemoryContext? It so, why do I > need to use palloc() instead of malloc()? We don't intercept malloc itself (doing so would have a *lot* of issues, starting from palloc internally using malloc, and ending with having lots of problems with libraries because their malloc would suddenly behave differently). Greetings, Andres Freund
Question about MemoryContexts / possible memory leak in CachedPlanSource usage
Hello, I am just starting to get my feet wet with PostgreSQL development and am starting to understand the source, so please be kind . I am working on the REL_11_4 tag. When CachedPlanSource instances are created the field query_string is filled with pstrdup(query_string) in CreateCachedPlan, plancache.c:182, which is just a wrapper for strdup. According to the docs the returned pointer should be freed with “free” sometimes later. I believe in DropCachedPlan the free should take place but I don’t see it. Is it just missing or is memory allocated by strdup and friends automatically created in the current MemoryContext? It so, why do I need to use palloc() instead of malloc()? Kind regards, Daniel Migowski