Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Tue, Aug 11, 2015 at 2:58 AM, Jeff Janeswrote: > On Thu, Oct 30, 2014 at 5:30 AM, Fujii Masao wrote: >> >> On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita >> wrote: >> >> > + { >> > + {"pending_list_cleanup_size", PGC_USERSET, >> > CLIENT_CONN_STATEMENT, >> > + gettext_noop("Sets the maximum size of the >> > pending >> > list for GIN index."), >> > +NULL, >> > + GUC_UNIT_KB >> > + }, >> > + _list_cleanup_size, >> > + 4096, 0, MAX_KILOBYTES, >> > + NULL, NULL, NULL >> > + }, >> > >> > ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No? >> >> Yes if the pending list always exists in the memory. But not, IIUC. >> Thought? >> >> > Also why not set min to 64, not to 0, in accoradance with that of >> > work_mem? >> >> I'm OK to use 64. But I just chose 0 because I could not think of any >> reasonable >> reason why 64k is suitable as the minimum size of the pending list. >> IOW, I have no idea about whether it's reasonable to use the min value of >> work_mem as the min size of the pending list. > > > > I know I am late to the party here, but would like to have the minimum be 0, > not 64. As long as by zero, it means it doesn't insert anything into the > pending list, rather than inserting and promptly cleaning it up. > > The reason for this is that if I am trying to decide what > pending_list_cleanup_size I want to set for the index in the indexes storage > parameters, the way to find that out is try a bunch of different ones > through the guc while the index is still at the default of no overriding > storage parameter. It would be nice to try the fastupdate=off alternative > (i.e. the same as pending_list_cleanup_size=0) without having to change the > index itself and change the syntax used in the testing. Sounds OK to me. So we should change the minimum values of both gin_pending_list_limit GUC and storage parameters to 0? Or only GUC? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Thu, Oct 30, 2014 at 5:30 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: + { + {pending_list_cleanup_size, PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop(Sets the maximum size of the pending list for GIN index.), +NULL, + GUC_UNIT_KB + }, + pending_list_cleanup_size, + 4096, 0, MAX_KILOBYTES, + NULL, NULL, NULL + }, ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No? Yes if the pending list always exists in the memory. But not, IIUC. Thought? Also why not set min to 64, not to 0, in accoradance with that of work_mem? I'm OK to use 64. But I just chose 0 because I could not think of any reasonable reason why 64k is suitable as the minimum size of the pending list. IOW, I have no idea about whether it's reasonable to use the min value of work_mem as the min size of the pending list. I know I am late to the party here, but would like to have the minimum be 0, not 64. As long as by zero, it means it doesn't insert anything into the pending list, rather than inserting and promptly cleaning it up. The reason for this is that if I am trying to decide what pending_list_cleanup_size I want to set for the index in the indexes storage parameters, the way to find that out is try a bunch of different ones through the guc while the index is still at the default of no overriding storage parameter. It would be nice to try the fastupdate=off alternative (i.e. the same as pending_list_cleanup_size=0) without having to change the index itself and change the syntax used in the testing. Cheers, Jeff
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Nov 12, 2014 at 12:40 AM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Tue, Nov 11, 2014 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote: Not to kibitz too much after-the-fact, but wouldn't it be better to give this a name that has GIN in it somewhere? Maybe. gin_pending_list_cleanup_size? gin_pending_list_limit? Better name? gin_pending_list_limit sounds good to me. OK, barring any objection, I will rename the reloption and GUC to gin_pending_list_limit. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Nov 12, 2014 at 9:30 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Nov 12, 2014 at 12:40 AM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Tue, Nov 11, 2014 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote: Not to kibitz too much after-the-fact, but wouldn't it be better to give this a name that has GIN in it somewhere? Maybe. gin_pending_list_cleanup_size? gin_pending_list_limit? Better name? gin_pending_list_limit sounds good to me. OK, barring any objection, I will rename the reloption and GUC to gin_pending_list_limit. Done. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/11/11 2:31), Fujii Masao wrote: On Mon, Nov 10, 2014 at 4:15 PM, Etsuro Fujita The patch looks good to me except for the following point: *** a/src/backend/access/gin/ginfast.c --- b/src/backend/access/gin/ginfast.c *** *** 25,30 --- 25,32 #include utils/memutils.h #include utils/rel.h + /* GUC parameter */ + int pending_list_cleanup_size = 0; I think we need to initialize the GUC to boot_val, 4096 in this case. No, IIUC basically the variable for GUC doesn't need to be initialized to its default value. OTOH, it's also harmless to initialize it to the default. I like the current code a bit because we don't need to change the initial value again when we decide to change the default value of GUC. I have no strong opinion about this, though. OK, so if there are no objections of others, I'll mark this as Ready for Committer. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Tue, Nov 11, 2014 at 7:01 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/11/11 2:31), Fujii Masao wrote: On Mon, Nov 10, 2014 at 4:15 PM, Etsuro Fujita The patch looks good to me except for the following point: *** a/src/backend/access/gin/ginfast.c --- b/src/backend/access/gin/ginfast.c *** *** 25,30 --- 25,32 #include utils/memutils.h #include utils/rel.h + /* GUC parameter */ + int pending_list_cleanup_size = 0; I think we need to initialize the GUC to boot_val, 4096 in this case. No, IIUC basically the variable for GUC doesn't need to be initialized to its default value. OTOH, it's also harmless to initialize it to the default. I like the current code a bit because we don't need to change the initial value again when we decide to change the default value of GUC. I have no strong opinion about this, though. OK, so if there are no objections of others, I'll mark this as Ready for Committer. I just pushed this. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Tue, Nov 11, 2014 at 7:11 AM, Fujii Masao masao.fu...@gmail.com wrote: OK, so if there are no objections of others, I'll mark this as Ready for Committer. I just pushed this. Thanks! Not to kibitz too much after-the-fact, but wouldn't it be better to give this a name that has GIN in it somewhere? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Tue, Nov 11, 2014 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Nov 11, 2014 at 7:11 AM, Fujii Masao masao.fu...@gmail.com wrote: OK, so if there are no objections of others, I'll mark this as Ready for Committer. I just pushed this. Thanks! Not to kibitz too much after-the-fact, but wouldn't it be better to give this a name that has GIN in it somewhere? Maybe. gin_pending_list_cleanup_size? gin_pending_list_limit? Better name? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
Fujii Masao masao.fu...@gmail.com writes: On Tue, Nov 11, 2014 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote: Not to kibitz too much after-the-fact, but wouldn't it be better to give this a name that has GIN in it somewhere? Maybe. gin_pending_list_cleanup_size? gin_pending_list_limit? Better name? gin_pending_list_limit sounds good to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Mon, Nov 10, 2014 at 4:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/11/06 23:38), Fujii Masao wrote: On Tue, Nov 4, 2014 at 12:04 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: IIUC, I think that min = 0 disables fast update, so ISTM that it'd be appropriate to set min to some positive value. And ISTM that the idea of using the min value of work_mem is not so bad. OK. I changed the min value to 64kB. *** 356,361 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/ --- 356,372 /listitem /varlistentry /variablelist +variablelist +varlistentry + termliteralPENDING_LIST_CLEANUP_SIZE//term The above is still in uppercse. Fixed. Attached is the updated version of the patch. Thanks for the review! Thanks for the updating the patch! The patch looks good to me except for the following point: Thanks for the review again! *** a/src/backend/access/gin/ginfast.c --- b/src/backend/access/gin/ginfast.c *** *** 25,30 --- 25,32 #include utils/memutils.h #include utils/rel.h + /* GUC parameter */ + int pending_list_cleanup_size = 0; I think we need to initialize the GUC to boot_val, 4096 in this case. No, IIUC basically the variable for GUC doesn't need to be initialized to its default value. OTOH, it's also harmless to initialize it to the default. I like the current code a bit because we don't need to change the initial value again when we decide to change the default value of GUC. I have no strong opinion about this, though. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/11/06 23:38), Fujii Masao wrote: On Tue, Nov 4, 2014 at 12:04 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: IIUC, I think that min = 0 disables fast update, so ISTM that it'd be appropriate to set min to some positive value. And ISTM that the idea of using the min value of work_mem is not so bad. OK. I changed the min value to 64kB. *** 356,361 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/ --- 356,372 /listitem /varlistentry /variablelist +variablelist +varlistentry + termliteralPENDING_LIST_CLEANUP_SIZE//term The above is still in uppercse. Fixed. Attached is the updated version of the patch. Thanks for the review! Thanks for the updating the patch! The patch looks good to me except for the following point: *** a/src/backend/access/gin/ginfast.c --- b/src/backend/access/gin/ginfast.c *** *** 25,30 --- 25,32 #include utils/memutils.h #include utils/rel.h + /* GUC parameter */ + int pending_list_cleanup_size = 0; I think we need to initialize the GUC to boot_val, 4096 in this case. I asked the opinions of others about the config_group of the GUC. But there seems no opinions, so I agree with Fujii-san's idea of assigning the GUC CLIENT_CONN_STATEMENT. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Tue, Nov 4, 2014 at 12:04 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: IIUC, I think that min = 0 disables fast update, so ISTM that it'd be appropriate to set min to some positive value. And ISTM that the idea of using the min value of work_mem is not so bad. OK. I changed the min value to 64kB. *** 356,361 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/ --- 356,372 /listitem /varlistentry /variablelist +variablelist +varlistentry + termliteralPENDING_LIST_CLEANUP_SIZE//term The above is still in uppercse. Fixed. Attached is the updated version of the patch. Thanks for the review! Regards, -- Fujii Masao *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 5911,5916 SET XML OPTION { DOCUMENT | CONTENT }; --- 5911,5937 /listitem /varlistentry + varlistentry id=guc-pending-list-cleanup-size xreflabel=pending_list_cleanup_size + termvarnamepending_list_cleanup_size/varname (typeinteger/type) + indexterm +primaryvarnamepending_list_cleanup_size/ configuration parameter/primary + /indexterm + /term + listitem +para + Sets the maximum size of the GIN pending list which is used + when literalfastupdate/ is enabled. If the list grows + larger than this maximum size, it is cleaned up by moving + the entries in it to the main GIN data structure in bulk. + The default is four megabytes (literal4MB/). This setting + can be overridden for individual GIN indexes by changing + storage parameters. + See xref linkend=gin-fast-update and xref linkend=gin-tips + for more information. +/para + /listitem + /varlistentry + /variablelist /sect2 sect2 id=runtime-config-client-format *** a/doc/src/sgml/gin.sgml --- b/doc/src/sgml/gin.sgml *** *** 728,735 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes too large !(larger than xref linkend=guc-work-mem), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional --- 728,735 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes larger than !xref linkend=guc-pending-list-cleanup-size, the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional *** *** 750,756 para If consistent response time is more important than update speed, use of pending entries can be disabled by turning off the !literalFASTUPDATE/literal storage parameter for a acronymGIN/acronym index. See xref linkend=sql-createindex for details. /para --- 750,756 para If consistent response time is more important than update speed, use of pending entries can be disabled by turning off the !literalfastupdate/literal storage parameter for a acronymGIN/acronym index. See xref linkend=sql-createindex for details. /para *** *** 812,829 /varlistentry varlistentry !termxref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym ! index that has literalFASTUPDATE/ enabled, the system will clean up the pending-entry list whenever the list grows larger than ! varnamework_mem/. To avoid fluctuations in observed response time, ! it's desirable to have pending-list cleanup occur in the background ! (i.e., via autovacuum). Foreground cleanup operations can be avoided by ! increasing varnamework_mem/ or making autovacuum more aggressive. ! However, enlarging varnamework_mem/ means that if a foreground ! cleanup does occur, it will take even longer. /para /listitem /varlistentry --- 812,837 /varlistentry varlistentry !termxref linkend=guc-pending-list-cleanup-size/term listitem para During a series of insertions into an existing acronymGIN/acronym ! index that has literalfastupdate/ enabled, the system will clean up the pending-entry list whenever the list grows larger than !
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/10/30 21:30), Fujii Masao wrote: On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Here are my review comments. * The patch applies cleanly and make and make check run successfully. I think that the patch is mostly good. Thanks! Attached is the updated version of the patch. Thank you for updating the patch! * In src/backend/utils/misc/guc.c: + { + {pending_list_cleanup_size, PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop(Sets the maximum size of the pending list for GIN index.), +NULL, + GUC_UNIT_KB + }, + pending_list_cleanup_size, + 4096, 0, MAX_KILOBYTES, + NULL, NULL, NULL + }, ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No? Yes if the pending list always exists in the memory. But not, IIUC. Thought? Exactly. But I think we can expect that in many cases, since I think that the users would often set the GUC to a small value to the extent that most of the pending list pages would be cached by shared buffer, to maintain *search* performance. I'd like to hear the opinions of others about the category for the GUC. Also why not set min to 64, not to 0, in accoradance with that of work_mem? I'm OK to use 64. But I just chose 0 because I could not think of any reasonable reason why 64k is suitable as the minimum size of the pending list. IOW, I have no idea about whether it's reasonable to use the min value of work_mem as the min size of the pending list. IIUC, I think that min = 0 disables fast update, so ISTM that it'd be appropriate to set min to some positive value. And ISTM that the idea of using the min value of work_mem is not so bad. * In doc/src/sgml/ref/create_index.sgml: + termliteralPENDING_LIST_CLEANUP_SIZE//term IMHO, it seems to me better for this variable to be in lowercase in accordance with the GUC version. Using lowercase only for pending_list_cleanup_size and uppercase for other options looks strange to me. What about using lowercase for all the storage options? +1 I changed the document in that way. *** 356,361 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/ --- 356,372 /listitem /varlistentry /variablelist +variablelist +varlistentry + termliteralPENDING_LIST_CLEANUP_SIZE//term The above is still in uppercse. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/10/09 11:49), Etsuro Fujita wrote: (2014/10/08 22:51), Fujii Masao wrote: On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Please see the attached patch. In this patch, I introduced the GUC parameter, pending_list_cleanup_size. I chose 4MB as the default value of the parameter. But do you have any better idea about that default value? It seems reasonable to me that the GUC has the same default value as work_mem. So, +1 for the default value of 4MB. BTW, I moved the CommitFest entry of this patch to next CF 2014-10. OK, I'll review the patch in the CF. Thank you for updating the patch! Here are my review comments. * The patch applies cleanly and make and make check run successfully. I think that the patch is mostly good. * In src/backend/utils/misc/guc.c: + { + {pending_list_cleanup_size, PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop(Sets the maximum size of the pending list for GIN index.), +NULL, + GUC_UNIT_KB + }, + pending_list_cleanup_size, + 4096, 0, MAX_KILOBYTES, + NULL, NULL, NULL + }, ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No? Also why not set min to 64, not to 0, in accoradance with that of work_mem? * In src/backend/utils/misc/postgresql.conf.sample: Likewise, why not put this variable in the section of RESOURCE USAGE, not in that of CLIENT CONNECTION DEFAULTS. * In src/backend/access/common/reloptions.c: + { + { + pending_list_cleanup_size, + Maximum size of the pending list for this GIN index, in kilobytes., + RELOPT_KIND_GIN + }, + -1, 0, MAX_KILOBYTES + }, As in guc.c, why not set min to 64, not to 0? * In src/include/access/gin.h: /* GUC parameter */ extern PGDLLIMPORT int GinFuzzySearchLimit; + extern int pending_list_cleanup_size; The comment should be GUC parameters. * In src/backend/access/gin/ginfast.c: + /* GUC parameter */ + int pending_list_cleanup_size = 0; Do we need to substitute 0 for pending_list_cleanup_size? * In doc/src/sgml/config.sgml: + varlistentry id=guc-pending-list-cleanup-size xreflabel=pending_list_cleanup_size + termvarnamepending_list_cleanup_size/varname (typeinteger/type) As in postgresql.conf.sample, ISTM it'd be better to explain this variable in the section of Resource Consumption (maybe in Memory), not in that of Client Connection Defaults. * In doc/src/sgml/gin.sgml: ! literalpending_list_cleanup_size/. To avoid fluctuations in observed ISTM it'd be better to use varname for pending_list_cleanup_size, not literal, here. * In doc/src/sgml/ref/create_index.sgml: + termliteralPENDING_LIST_CLEANUP_SIZE//term IMHO, it seems to me better for this variable to be in lowercase in accordance with the GUC version. Also, I think that the words GIN indexes accept a different parameter: in the section of Index Storage Parameters in this reference page would be GIN indexes accept different parameters:. Sorry for the delay in reviewing the patch. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/10/09 11:49), Etsuro Fujita wrote: (2014/10/08 22:51), Fujii Masao wrote: On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Please see the attached patch. In this patch, I introduced the GUC parameter, pending_list_cleanup_size. I chose 4MB as the default value of the parameter. But do you have any better idea about that default value? It seems reasonable to me that the GUC has the same default value as work_mem. So, +1 for the default value of 4MB. BTW, I moved the CommitFest entry of this patch to next CF 2014-10. OK, I'll review the patch in the CF. Thank you for updating the patch! Here are my review comments. * The patch applies cleanly and make and make check run successfully. I think that the patch is mostly good. Thanks! Attached is the updated version of the patch. * In src/backend/utils/misc/guc.c: + { + {pending_list_cleanup_size, PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop(Sets the maximum size of the pending list for GIN index.), +NULL, + GUC_UNIT_KB + }, + pending_list_cleanup_size, + 4096, 0, MAX_KILOBYTES, + NULL, NULL, NULL + }, ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No? Yes if the pending list always exists in the memory. But not, IIUC. Thought? Also why not set min to 64, not to 0, in accoradance with that of work_mem? I'm OK to use 64. But I just chose 0 because I could not think of any reasonable reason why 64k is suitable as the minimum size of the pending list. IOW, I have no idea about whether it's reasonable to use the min value of work_mem as the min size of the pending list. * In src/backend/utils/misc/postgresql.conf.sample: Likewise, why not put this variable in the section of RESOURCE USAGE, not in that of CLIENT CONNECTION DEFAULTS. Same as above. * In src/backend/access/common/reloptions.c: + { + { + pending_list_cleanup_size, + Maximum size of the pending list for this GIN index, in kilobytes., + RELOPT_KIND_GIN + }, + -1, 0, MAX_KILOBYTES + }, As in guc.c, why not set min to 64, not to 0? Same as above. * In src/include/access/gin.h: /* GUC parameter */ extern PGDLLIMPORT int GinFuzzySearchLimit; + extern int pending_list_cleanup_size; The comment should be GUC parameters. Yes, fixed. * In src/backend/access/gin/ginfast.c: + /* GUC parameter */ + int pending_list_cleanup_size = 0; Do we need to substitute 0 for pending_list_cleanup_size? Same as above. * In doc/src/sgml/config.sgml: + varlistentry id=guc-pending-list-cleanup-size xreflabel=pending_list_cleanup_size + termvarnamepending_list_cleanup_size/varname (typeinteger/type) As in postgresql.conf.sample, ISTM it'd be better to explain this variable in the section of Resource Consumption (maybe in Memory), not in that of Client Connection Defaults. Same as above. * In doc/src/sgml/gin.sgml: ! literalpending_list_cleanup_size/. To avoid fluctuations in observed ISTM it'd be better to use varname for pending_list_cleanup_size, not literal, here. Yes, fixed. * In doc/src/sgml/ref/create_index.sgml: + termliteralPENDING_LIST_CLEANUP_SIZE//term IMHO, it seems to me better for this variable to be in lowercase in accordance with the GUC version. Using lowercase only for pending_list_cleanup_size and uppercase for other options looks strange to me. What about using lowercase for all the storage options? I changed the document in that way. Also, I think that the words GIN indexes accept a different parameter: in the section of Index Storage Parameters in this reference page would be GIN indexes accept different parameters:. Yes,
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/09/13 2:42), Fujii Masao wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. Agreed. I'm not sure about the idea of being able to change it per session, though. Do you mean that you would like insert processes use a very large value so that they can just append new values to the pending list, and have vacuum use a small value so that it cleans up as soon as it runs? Two things: 1. we could have an autovacuum_ reloption which only changes what autovacuum does; 2. we could have autovacuum run index cleanup actions separately from actual vacuuming. Yes, I was thinking something like that. But if autovacuum has already been able to handle that, it's nice. Anyway, as you pointed out, it's better to have both GUC and reloption for the cleanup size of pending list. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Please see the attached patch. In this patch, I introduced the GUC parameter, pending_list_cleanup_size. I chose 4MB as the default value of the parameter. But do you have any better idea about that default value? BTW, I moved the CommitFest entry of this patch to next CF 2014-10. Regards, -- Fujii Masao *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** *** 5907,5912 SET XML OPTION { DOCUMENT | CONTENT }; --- 5907,5933 /listitem /varlistentry + varlistentry id=guc-pending-list-cleanup-size xreflabel=pending_list_cleanup_size + termvarnamepending_list_cleanup_size/varname (typeinteger/type) + indexterm +primaryvarnamepending_list_cleanup_size/ configuration parameter/primary + /indexterm + /term + listitem +para + Sets the maximum size of the GIN pending list which is used + when literalFASTUPDATE/ is enabled. If the list grows + larger than this maximum size, it is cleaned up by moving + the entries in it to the main GIN data structure in bulk. + The default is four megabytes (literal4MB/). This setting + can be overridden for individual GIN indexes by changing + storage parameters. + See xref linkend=gin-fast-update and xref linkend=gin-tips + for more information. +/para + /listitem + /varlistentry + /variablelist /sect2 sect2 id=runtime-config-client-format *** a/doc/src/sgml/gin.sgml --- b/doc/src/sgml/gin.sgml *** *** 728,735 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes too large !(larger than xref linkend=guc-work-mem), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional --- 728,735 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes larger than !xref linkend=guc-pending-list-cleanup-size, the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional *** *** 812,829 /varlistentry varlistentry !termxref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/10/08 22:51), Fujii Masao wrote: On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Please see the attached patch. In this patch, I introduced the GUC parameter, pending_list_cleanup_size. I chose 4MB as the default value of the parameter. But do you have any better idea about that default value? It seems reasonable to me that the GUC has the same default value as work_mem. So, +1 for the default value of 4MB. BTW, I moved the CommitFest entry of this patch to next CF 2014-10. OK, I'll review the patch in the CF. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/09/13 2:42), Fujii Masao wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. Agreed. I'm not sure about the idea of being able to change it per session, though. Do you mean that you would like insert processes use a very large value so that they can just append new values to the pending list, and have vacuum use a small value so that it cleans up as soon as it runs? Two things: 1. we could have an autovacuum_ reloption which only changes what autovacuum does; 2. we could have autovacuum run index cleanup actions separately from actual vacuuming. Yes, I was thinking something like that. But if autovacuum has already been able to handle that, it's nice. Anyway, as you pointed out, it's better to have both GUC and reloption for the cleanup size of pending list. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Sorry for the delay. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. Agreed. I'm not sure about the idea of being able to change it per session, though. Do you mean that you would like insert processes use a very large value so that they can just append new values to the pending list, and have vacuum use a small value so that it cleans up as soon as it runs? Two things: 1. we could have an autovacuum_ reloption which only changes what autovacuum does; 2. we could have autovacuum run index cleanup actions separately from actual vacuuming. Yes, I was thinking something like that. But if autovacuum has already been able to handle that, it's nice. Anyway, as you pointed out, it's better to have both GUC and reloption for the cleanup size of pending list. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/09/10 12:31), Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/09/09 22:17), Fujii Masao wrote: Attached is the updated version of the patch. I took a quick review on the patch. It looks good to me, but one thing I'm concerned about is You wrote: The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. As you mentioned, I think it's important to consider for the existing applications, but I'm wondering if it would be a bit confusing users to have two parameters, Yep. PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... Why not use ALTER INDEX ... SET (PENDING_LIST_CLEANUP_SIZE= ...)? Maybe I'm missing something, though. So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yeah, that's an idea. So, I'd like to hear the opinions of others. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Sep 10, 2014 at 5:35 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/09/10 12:31), Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/09/09 22:17), Fujii Masao wrote: Attached is the updated version of the patch. I took a quick review on the patch. It looks good to me, but one thing I'm concerned about is You wrote: The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. As you mentioned, I think it's important to consider for the existing applications, but I'm wondering if it would be a bit confusing users to have two parameters, Yep. PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... Why not use ALTER INDEX ... SET (PENDING_LIST_CLEANUP_SIZE= ...)? Maybe I'm missing something, though. It takes AccessExclusive lock and has an effect on every sessions (not only specified session). Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. I'm not sure about the idea of being able to change it per session, though. Do you mean that you would like insert processes use a very large value so that they can just append new values to the pending list, and have vacuum use a small value so that it cleans up as soon as it runs? Two things: 1. we could have an autovacuum_ reloption which only changes what autovacuum does; 2. we could have autovacuum run index cleanup actions separately from actual vacuuming. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Tue, Sep 9, 2014 at 1:28 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Sun, Aug 17, 2014 at 7:46 PM, Fujii Masao masao.fu...@gmail.com wrote: Thanks for reviewing the patch! ISTM that I failed to make the patch from my git repository... Attached is the rebased version. I get some compiler warnings on v2 of this patch: reloptions.c:219: warning: excess elements in struct initializer reloptions.c:219: warning: (near initialization for 'intRelOpts[15]') Thanks for testing the patch! Attached is the updated version of the patch. Previously the patch depended on another infrastructure patch (which allows a user to specify the unit in reloption (*1)). But that infrastructure patch has serious problem and it's not easy to fix the problem. So I changed the patch so that it doesn't depend on that infrastructure patch at all. Even without the infrastructure patch, the feature that this patch introduces is useful. Also I added the regression test into the patch. (*1) http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com Regards, -- Fujii Masao *** a/doc/src/sgml/gin.sgml --- b/doc/src/sgml/gin.sgml *** *** 728,735 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes too large !(larger than xref linkend=guc-work-mem), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional --- 728,736 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes larger than !literalPENDING_LIST_CLEANUP_SIZE/literal (or !xref linkend=guc-work-mem if not set), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional *** *** 812,829 /varlistentry varlistentry !termxref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the system will clean up the pending-entry list whenever the list grows larger than ! varnamework_mem/. To avoid fluctuations in observed response time, ! it's desirable to have pending-list cleanup occur in the background ! (i.e., via autovacuum). Foreground cleanup operations can be avoided by ! increasing varnamework_mem/ or making autovacuum more aggressive. ! However, enlarging varnamework_mem/ means that if a foreground ! cleanup does occur, it will take even longer. /para /listitem /varlistentry --- 813,839 /varlistentry varlistentry !termliteralPENDING_LIST_CLEANUP_SIZE/ and !xref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the system will clean up the pending-entry list whenever the list grows larger than ! literalPENDING_LIST_CLEANUP_SIZE/ (if not set, varnamework_mem/ ! is used as that threshold, instead). To avoid fluctuations in observed ! response time, it's desirable to have pending-list cleanup occur in the ! background (i.e., via autovacuum). Foreground cleanup operations ! can be avoided by increasing literalPENDING_LIST_CLEANUP_SIZE/ ! (or varnamework_mem/) or making autovacuum more aggressive. ! However, enlarging the threshold of the cleanup operation means that ! if a foreground cleanup does occur, it will take even longer. ! /para ! para ! literalPENDING_LIST_CLEANUP_SIZE/ is an index storage parameter, ! and allows each GIN index to have its own cleanup threshold. ! For example, it's possible to increase the threshold only for the GIN ! index which can be updated heavily, and decrease it otherwise. /para /listitem /varlistentry *** a/doc/src/sgml/ref/create_index.sgml --- b/doc/src/sgml/ref/create_index.sgml *** *** 356,361 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/ --- 356,377 /listitem /varlistentry /variablelist +variablelist +varlistentry + termliteralPENDING_LIST_CLEANUP_SIZE//term + listitem + para + This setting specifies the maximum size
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/09/09 22:17), Fujii Masao wrote: On Tue, Sep 9, 2014 at 1:28 AM, Jeff Janes jeff.ja...@gmail.com wrote: I get some compiler warnings on v2 of this patch: reloptions.c:219: warning: excess elements in struct initializer reloptions.c:219: warning: (near initialization for 'intRelOpts[15]') Attached is the updated version of the patch. Thank you for updating the patch! I took a quick review on the patch. It looks good to me, but one thing I'm concerned about is You wrote: The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. As you mentioned, I think it's important to consider for the existing applications, but I'm wondering if it would be a bit confusing users to have two parameters, PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? Sorry for the delay. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/09/09 22:17), Fujii Masao wrote: On Tue, Sep 9, 2014 at 1:28 AM, Jeff Janes jeff.ja...@gmail.com wrote: I get some compiler warnings on v2 of this patch: reloptions.c:219: warning: excess elements in struct initializer reloptions.c:219: warning: (near initialization for 'intRelOpts[15]') Attached is the updated version of the patch. Thank you for updating the patch! I took a quick review on the patch. It looks good to me, Thanks for reviewing the patch! but one thing I'm concerned about is You wrote: The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. As you mentioned, I think it's important to consider for the existing applications, but I'm wondering if it would be a bit confusing users to have two parameters, Yep. PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? That's an idea. But there might be some users who want to change the cleanup size per session like they can do by setting work_mem, and your idea would prevent them from doing that... So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Sorry for the delay. No problem. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Sun, Aug 17, 2014 at 7:46 PM, Fujii Masao masao.fu...@gmail.com wrote: Thanks for reviewing the patch! ISTM that I failed to make the patch from my git repository... Attached is the rebased version. I get some compiler warnings on v2 of this patch: reloptions.c:219: warning: excess elements in struct initializer reloptions.c:219: warning: (near initialization for 'intRelOpts[15]') Cheers, Jeff
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Sat, Aug 16, 2014 at 4:23 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Aug 8, 2014 at 11:45 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas robertmh...@gmail.com wrote: Should we try to install some hack around fastupdate for 9.4? I fear the divergence between reasonable values of work_mem and reasonable sizes for that list is only going to continue to get bigger. I'm sure there's somebody out there who has work_mem = 16GB, and stuff like 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the appeal of large values. Controlling the threshold of the size of pending list only by GUC doesn't seem reasonable. Users may want to increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. So I think that it's better to add new storage parameter for GIN index to control the threshold, or both storage parameter and GUC. Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more appropriate. The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. This is an index storage parameter, and allows us to specify each threshold per GIN index. So, for example, we can increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. This patch uses another patch that I proposed (*1) as an infrastructure. Please apply that infrastructure patch first if you apply this patch. (*1) http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com Regards, -- Fujii Masao Sorry, I forgot to attached the patch This time, attached. I think that this patch should be rebased. It contains garbage code. Thanks for reviewing the patch! ISTM that I failed to make the patch from my git repository... Attached is the rebased version. Regards, -- Fujii Masao diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml index 80a578d..24c8dc1 100644 --- a/doc/src/sgml/gin.sgml +++ b/doc/src/sgml/gin.sgml @@ -728,8 +728,9 @@ from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. - When the table is vacuumed, or if the pending list becomes too large - (larger than xref linkend=guc-work-mem), the entries are moved to the + When the table is vacuumed, or if the pending list becomes larger than + literalPENDING_LIST_CLEANUP_SIZE/literal (or + xref linkend=guc-work-mem if not set), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional @@ -812,18 +813,27 @@ /varlistentry varlistentry - termxref linkend=guc-work-mem/term + termliteralPENDING_LIST_CLEANUP_SIZE/ and + xref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the system will clean up the pending-entry list whenever the list grows larger than - varnamework_mem/. To avoid fluctuations in observed response time, - it's desirable to have pending-list cleanup occur in the background - (i.e., via autovacuum). Foreground cleanup operations can be avoided by - increasing varnamework_mem/ or making autovacuum more aggressive. - However, enlarging varnamework_mem/ means that if a foreground - cleanup does occur, it will take even longer. + literalPENDING_LIST_CLEANUP_SIZE/ (if not set, varnamework_mem/ + is used as that threshold, instead). To avoid fluctuations in observed + response time, it's desirable to have pending-list cleanup occur in the + background (i.e., via autovacuum). Foreground cleanup operations + can be avoided by increasing literalPENDING_LIST_CLEANUP_SIZE/ + (or varnamework_mem/) or making autovacuum more aggressive. + However, enlarging the threshold of the cleanup operation means that + if a foreground cleanup does occur, it will take even longer. +/para +para + literalPENDING_LIST_CLEANUP_SIZE/ is an index storage parameter, + and allows each GIN index to have its own cleanup threshold. + For example, it's possible to increase the
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Fri, Aug 8, 2014 at 11:45 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas robertmh...@gmail.com wrote: Should we try to install some hack around fastupdate for 9.4? I fear the divergence between reasonable values of work_mem and reasonable sizes for that list is only going to continue to get bigger. I'm sure there's somebody out there who has work_mem = 16GB, and stuff like 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the appeal of large values. Controlling the threshold of the size of pending list only by GUC doesn't seem reasonable. Users may want to increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. So I think that it's better to add new storage parameter for GIN index to control the threshold, or both storage parameter and GUC. Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more appropriate. The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. This is an index storage parameter, and allows us to specify each threshold per GIN index. So, for example, we can increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. This patch uses another patch that I proposed (*1) as an infrastructure. Please apply that infrastructure patch first if you apply this patch. (*1) http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com Regards, -- Fujii Masao Sorry, I forgot to attached the patch This time, attached. I think that this patch should be rebased. It contains garbage code. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas robertmh...@gmail.com wrote: Should we try to install some hack around fastupdate for 9.4? I fear the divergence between reasonable values of work_mem and reasonable sizes for that list is only going to continue to get bigger. I'm sure there's somebody out there who has work_mem = 16GB, and stuff like 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the appeal of large values. Controlling the threshold of the size of pending list only by GUC doesn't seem reasonable. Users may want to increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. So I think that it's better to add new storage parameter for GIN index to control the threshold, or both storage parameter and GUC. Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more appropriate. The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. This is an index storage parameter, and allows us to specify each threshold per GIN index. So, for example, we can increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. This patch uses another patch that I proposed (*1) as an infrastructure. Please apply that infrastructure patch first if you apply this patch. (*1) http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas robertmh...@gmail.com wrote: Should we try to install some hack around fastupdate for 9.4? I fear the divergence between reasonable values of work_mem and reasonable sizes for that list is only going to continue to get bigger. I'm sure there's somebody out there who has work_mem = 16GB, and stuff like 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the appeal of large values. Controlling the threshold of the size of pending list only by GUC doesn't seem reasonable. Users may want to increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. So I think that it's better to add new storage parameter for GIN index to control the threshold, or both storage parameter and GUC. Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more appropriate. The attached patch introduces the GIN index storage parameter PENDING_LIST_CLEANUP_SIZE which specifies the maximum size of GIN pending list. If it's not set, work_mem is used as that maximum size, instead. So this patch doesn't break the existing application which currently uses work_mem as the threshold of cleanup operation of the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE. This is an index storage parameter, and allows us to specify each threshold per GIN index. So, for example, we can increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. This patch uses another patch that I proposed (*1) as an infrastructure. Please apply that infrastructure patch first if you apply this patch. (*1) http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com Regards, -- Fujii Masao Sorry, I forgot to attached the patch This time, attached. Regards, -- Fujii Masao *** a/doc/src/sgml/gin.sgml --- b/doc/src/sgml/gin.sgml *** *** 728,735 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes too large !(larger than xref linkend=guc-work-mem), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional --- 728,736 from the indexed item). As of productnamePostgreSQL/productname 8.4, acronymGIN/ is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes larger than !literalPENDING_LIST_CLEANUP_SIZE/literal (or !xref linkend=guc-work-mem if not set), the entries are moved to the main acronymGIN/acronym data structure using the same bulk insert techniques used during initial index creation. This greatly improves acronymGIN/acronym index update speed, even counting the additional *** *** 812,829 /varlistentry varlistentry !termxref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the system will clean up the pending-entry list whenever the list grows larger than ! varnamework_mem/. To avoid fluctuations in observed response time, ! it's desirable to have pending-list cleanup occur in the background ! (i.e., via autovacuum). Foreground cleanup operations can be avoided by ! increasing varnamework_mem/ or making autovacuum more aggressive. ! However, enlarging varnamework_mem/ means that if a foreground ! cleanup does occur, it will take even longer. /para /listitem /varlistentry --- 813,839 /varlistentry varlistentry !termliteralPENDING_LIST_CLEANUP_SIZE/ and !xref linkend=guc-work-mem/term listitem para During a series of insertions into an existing acronymGIN/acronym index that has literalFASTUPDATE/ enabled, the system will clean up the pending-entry list whenever the list grows larger than ! literalPENDING_LIST_CLEANUP_SIZE/ (if not set, varnamework_mem/ ! is used as that threshold, instead). To avoid fluctuations in observed ! response time, it's desirable to have pending-list cleanup occur in the ! background (i.e., via autovacuum). Foreground cleanup operations ! can be avoided by increasing literalPENDING_LIST_CLEANUP_SIZE/ ! (or
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao masao.fu...@gmail.com wrote: The attached patch introduces... A patch perhaps? :) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 20, 2014 at 1:12 PM, Jesper Krogh jes...@krogh.cc wrote: On 15/03/14 20:27, Heikki Linnakangas wrote: That said, I didn't expect the difference to be quite that big when you're appending to the end of the table. When the new entries go to the end of the posting lists, you only need to recompress and WAL-log the last posting list, which is max 256 bytes long. But I guess that's still a lot more WAL than in the old format. That could be optimized, but I figured we can live with it, thanks to the fastupdate feature. Fastupdate allows amortizing that cost over several insertions. But of course, you explicitly disabled that... In a concurrent update environment, fastupdate as it is in 9.2 is not really useful. It may be that you can bulk up insertion, but you have no control over who ends up paying the debt. Doubling the amount of wal from gin-indexing would be pretty tough for us, in 9.2 we generate roughly 1TB wal / day, keeping it for some weeks to be able to do PITR. The wal are mainly due to gin-index updates as new data is added and needs to be searchable by users. We do run gzip that cuts it down to 25-30% before keeping the for too long, but doubling this is going to be a migration challenge. If fast-update could be made to work in an environment where we both have users searching the index and manually updating it and 4+ backend processes updating the index concurrently then it would be a good benefit to gain. the gin index currently contains 70+ million records with and average tsvector of 124 terms. Should we try to install some hack around fastupdate for 9.4? I fear the divergence between reasonable values of work_mem and reasonable sizes for that list is only going to continue to get bigger. I'm sure there's somebody out there who has work_mem = 16GB, and stuff like 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the appeal of large values. Controlling the threshold of the size of pending list only by GUC doesn't seem reasonable. Users may want to increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. So I think that it's better to add new storage parameter for GIN index to control the threshold, or both storage parameter and GUC. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
Fujii Masao masao.fu...@gmail.com writes: On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas robertmh...@gmail.com wrote: Should we try to install some hack around fastupdate for 9.4? I fear the divergence between reasonable values of work_mem and reasonable sizes for that list is only going to continue to get bigger. I'm sure there's somebody out there who has work_mem = 16GB, and stuff like 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the appeal of large values. Controlling the threshold of the size of pending list only by GUC doesn't seem reasonable. Users may want to increase the threshold only for the GIN index which can be updated heavily, and decrease it otherwise. So I think that it's better to add new storage parameter for GIN index to control the threshold, or both storage parameter and GUC. Yeah, -1 for a GUC. A GIN-index-specific storage parameter seems more appropriate. Or we could just hard-wire some maximum limit. Is it really likely that users would trouble to set such a parameter if it existed? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Thu, Mar 20, 2014 at 1:12 PM, Jesper Krogh jes...@krogh.cc wrote: On 15/03/14 20:27, Heikki Linnakangas wrote: That said, I didn't expect the difference to be quite that big when you're appending to the end of the table. When the new entries go to the end of the posting lists, you only need to recompress and WAL-log the last posting list, which is max 256 bytes long. But I guess that's still a lot more WAL than in the old format. That could be optimized, but I figured we can live with it, thanks to the fastupdate feature. Fastupdate allows amortizing that cost over several insertions. But of course, you explicitly disabled that... In a concurrent update environment, fastupdate as it is in 9.2 is not really useful. It may be that you can bulk up insertion, but you have no control over who ends up paying the debt. Doubling the amount of wal from gin-indexing would be pretty tough for us, in 9.2 we generate roughly 1TB wal / day, keeping it for some weeks to be able to do PITR. The wal are mainly due to gin-index updates as new data is added and needs to be searchable by users. We do run gzip that cuts it down to 25-30% before keeping the for too long, but doubling this is going to be a migration challenge. If fast-update could be made to work in an environment where we both have users searching the index and manually updating it and 4+ backend processes updating the index concurrently then it would be a good benefit to gain. the gin index currently contains 70+ million records with and average tsvector of 124 terms. Should we try to install some hack around fastupdate for 9.4? I fear the divergence between reasonable values of work_mem and reasonable sizes for that list is only going to continue to get bigger. I'm sure there's somebody out there who has work_mem = 16GB, and stuff like 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the appeal of large values. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
I came up with the attached patch, to reduce the WAL volume of GIN insertions. It become fairly large, but I guess that's not too surprising as the old WAL-logging method was basically to dump the whole page to WAL record. This is now a lot more fine-grained and smarter. I separated constructing the WAL record from copying the changes back to the disk page, which IMHO is a readability improvement even though it's more code. There are two parts to this patch: * leafRepackItems has been rewritten. The previous coding basically searched for the first modified item, and decoded and re-encoded everything on the page that after that. Now it tries harder to avoid re-encoding segments that are still reasonably sized (between 128 and 384 bytes, with the target for new segments being 256 bytes). This ought to make random updates faster as a bonus, but I didn't performance test that. * Track more carefully which segments on the page have been modified. The in-memory structure used to manipulate a page now keeps an action code for each segment, indicating if the segment is completely new, deleted, or replaced with new content, or if just some new items have been added to it. These same actions are WAL-logged, and replayed in the redo routine. This brings the WAL volume back to the same ballpark as 9.3. Or better, depending on the operation. Fujii, Alexander, how does this look to you? - Heikki diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c index 313d53c..56b456e 100644 --- a/src/backend/access/gin/gindatapage.c +++ b/src/backend/access/gin/gindatapage.c @@ -22,17 +22,17 @@ #include utils/rel.h /* - * Size of the posting lists stored on leaf pages, in bytes. The code can - * deal with any size, but random access is more efficient when a number of - * smaller lists are stored, rather than one big list. - */ -#define GinPostingListSegmentMaxSize 256 - -/* - * Existing posting lists smaller than this are recompressed, when inserting - * new items to page. + * Min, Max and Target size of posting lists stored on leaf pages, in bytes. + * + * The code can deal with any size, but random access is more efficient when + * a number of smaller lists are stored, rather than one big list. If a + * posting list would become larger than Max size as a result of insertions, + * it is split into two. If a posting list would be smaller than minimum + * size */ -#define GinPostingListSegmentMinSize 192 +#define GinPostingListSegmentMaxSize 384 +#define GinPostingListSegmentTargetSize 256 +#define GinPostingListSegmentMinSize 128 /* * At least this many items fit in a GinPostingListSegmentMaxSize-bytes @@ -55,12 +55,32 @@ typedef struct dlist_node *lastleft; /* last segment on left page */ int lsize; /* total size on left page */ int rsize; /* total size on right page */ + + bool oldformat; /* page is in pre-9.4 on disk */ } disassembledLeaf; typedef struct { dlist_node node; /* linked list pointers */ + /*- + * 'action' indicates the status of this in-memory segment, compared to + * what's on disk. It is one of the GIN_SEGMENT_* action codes: + * + * UNMODIFIED no changes + * DELETED the segment is to be removed. 'seg' and 'items' are + *ignored + * INSERT this is a completely new segment + * REPLACE this replaces an existing segment with new content + * ADDITEMS like REPLACE, but we track in detail what items have + *been added to this segment, in 'modifieditems' + *- + */ + char action; + + ItemPointerData *modifieditems; + int nmodifieditems; + /* * The following fields represent the items in this segment. * If 'items' is not NULL, it contains a palloc'd array of the items @@ -72,8 +92,6 @@ typedef struct GinPostingList *seg; ItemPointer items; int nitems; /* # of items in 'items', if items != NULL */ - - bool modified; /* is this segment on page already? */ } leafSegmentInfo; static ItemPointer dataLeafPageGetUncompressed(Page page, int *nitems); @@ -87,9 +105,9 @@ static bool leafRepackItems(disassembledLeaf *leaf, ItemPointer remaining); static bool addItemsToLeaf(disassembledLeaf *leaf, ItemPointer newItems, int nNewItems); -static void dataPlaceToPageLeafRecompress(Buffer buf, - disassembledLeaf *leaf, - XLogRecData **prdata); +static XLogRecData *constructLeafRecompressWALData(Buffer buf, + disassembledLeaf *leaf); +static void dataPlaceToPageLeafRecompress(Buffer buf, disassembledLeaf *leaf); static void dataPlaceToPageLeafSplit(Buffer buf, disassembledLeaf *leaf, ItemPointerData lbound, ItemPointerData rbound, @@ -563,15 +581,21 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, if (!needsplit) { /* - * Great, all the items fit on a single page. Write the segments to - * the page, and WAL-log appropriately. + * Great, all the items fit on a single page. Construct a
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On 15/03/14 20:27, Heikki Linnakangas wrote: That said, I didn't expect the difference to be quite that big when you're appending to the end of the table. When the new entries go to the end of the posting lists, you only need to recompress and WAL-log the last posting list, which is max 256 bytes long. But I guess that's still a lot more WAL than in the old format. That could be optimized, but I figured we can live with it, thanks to the fastupdate feature. Fastupdate allows amortizing that cost over several insertions. But of course, you explicitly disabled that... In a concurrent update environment, fastupdate as it is in 9.2 is not really useful. It may be that you can bulk up insertion, but you have no control over who ends up paying the debt. Doubling the amount of wal from gin-indexing would be pretty tough for us, in 9.2 we generate roughly 1TB wal / day, keeping it for some weeks to be able to do PITR. The wal are mainly due to gin-index updates as new data is added and needs to be searchable by users. We do run gzip that cuts it down to 25-30% before keeping the for too long, but doubling this is going to be a migration challenge. If fast-update could be made to work in an environment where we both have users searching the index and manually updating it and 4+ backend processes updating the index concurrently then it would be a good benefit to gain. the gin index currently contains 70+ million records with and average tsvector of 124 terms. -- Jesper .. trying to add some real-world info. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Mon, Mar 17, 2014 at 10:44 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao escribió: On Sun, Mar 16, 2014 at 7:15 AM, Alexander Korotkov aekorot...@gmail.com wrote: That could be optimized, but I figured we can live with it, thanks to the fastupdate feature. Fastupdate allows amortizing that cost over several insertions. But of course, you explicitly disabled that... Let me know if you want me to write patch addressing this issue. Yeah, I really want you to address this problem! That's definitely useful for every users disabling FASTUPDATE option for some reasons. Users that disable FASTUPDATE, in my experience, do so because their stock work_mem is way too high and GIN searches become too slow due to having to scan too large a list. Yes. Another reason that I've heard from users so far is that the size of GIN index with FASTUPDATE=off is likely to be smaller than that with FASTUPDATE=on. I think it might make sense to invest a modest amount of time in getting FASTUPDATE to be sized completely differently from today -- perhaps base it on a hardcoded factor of BLCKSZ, rather than work_mem. Or, if we really need to make it configurable, then let it have its own parameter. I prefer to have the parameter. When users create multiple GIN indexes for various uses, they might want to use different thresholds of the pending list for each index. So, GIN index parameter might be better than GUC one. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Sun, Mar 16, 2014 at 7:15 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Sat, Mar 15, 2014 at 11:27 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 03/15/2014 08:40 PM, Fujii Masao wrote: Hi, I executed the following statements in HEAD and 9.3, and compared the size of WAL which were generated by data insertion in GIN index. - CREATE EXTENSION pg_trgm; CREATE TABLE hoge (col1 text); CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH (FASTUPDATE = off); CHECKPOINT; SELECT pg_switch_xlog(); SELECT pg_switch_xlog(); SELECT pg_current_xlog_location(); INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 100); SELECT pg_current_xlog_location(); - The results of WAL size are 960 MB (9.3) 2113 MB (HEAD) The WAL size in HEAD was more than two times bigger than that in 9.3. Recently the source code of GIN index has been changed dramatically. Is the increase in GIN-related WAL intentional or a bug? It was somewhat expected. Updating individual items on the new-format GIN pages requires decompressing and recompressing the page, and the recompressed posting lists need to be WAL-logged. Which generates much larger WAL records. That said, I didn't expect the difference to be quite that big when you're appending to the end of the table. When the new entries go to the end of the posting lists, you only need to recompress and WAL-log the last posting list, which is max 256 bytes long. But I guess that's still a lot more WAL than in the old format. I ran pg_xlogdump | grep Gin and checked the size of GIN-related WAL, and then found its max seems more than 256B. Am I missing something? What I observed is [In HEAD] At first, the size of GIN-related WAL is gradually increasing up to about 1400B. rmgr: Gin len (rec/tot): 48/80, tx: 1813, lsn: 0/020020D8, prev 0/0270, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: F rmgr: Gin len (rec/tot): 56/88, tx: 1813, lsn: 0/02002440, prev 0/020023F8, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: T rmgr: Gin len (rec/tot): 64/96, tx: 1813, lsn: 0/020044D8, prev 0/02004490, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: T ... rmgr: Gin len (rec/tot): 1376/ 1408, tx: 1813, lsn: 0/02A7EE90, prev 0/02A7E910, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 2 isdata: F isleaf: T isdelete: T rmgr: Gin len (rec/tot): 1392/ 1424, tx: 1813, lsn: 0/02A7F458, prev 0/02A7F410, bkp: , desc: Create posting tree, node: 1663/12945/16441 blkno: 4 Then the size decreases to about 100B and is gradually increasing again up to 320B. rmgr: Gin len (rec/tot):116/ 148, tx: 1813, lsn: 0/02A7F9E8, prev 0/02A7F458, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 4 isdata: T isleaf: T unmodified: 1280 length: 1372 (compressed) rmgr: Gin len (rec/tot): 40/72, tx: 1813, lsn: 0/02A7FA80, prev 0/02A7F9E8, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 3 isdata: F isleaf: T isdelete: T ... rmgr: Gin len (rec/tot):118/ 150, tx: 1813, lsn: 0/02A83BA0, prev 0/02A83B58, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 4 isdata: T isleaf: T unmodified: 1280 length: 1374 (compressed) ... rmgr: Gin len (rec/tot):288/ 320, tx: 1813, lsn: 0/02AEDE28, prev 0/02AEDCE8, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 14 isdata: T isleaf: T unmodified: 1280 length: 1544 (compressed) Then the size decreases to 66B and is gradually increasing again up to 320B. This increase and decrease of WAL size seems to continue. [In 9.3] At first, the size of GIN-related WAL is gradually increasing up to about 2700B. rmgr: Gin len (rec/tot): 52/84, tx: 1812, lsn: 0/02000430, prev 0/020003D8, bkp: , desc: Insert item, node: 1663/12896/16441 blkno: 1 offset: 11 nitem: 1 isdata: F isleaf T isdelete F updateBlkno:4294967295 rmgr: Gin len (rec/tot): 60/92, tx: 1812, lsn: 0/020004D0, prev 0/02000488, bkp: , desc: Insert item, node: 1663/12896/16441 blkno: 1 offset: 1 nitem: 1 isdata: F isleaf T isdelete T updateBlkno:4294967295 ... rmgr: Gin len (rec/tot): 2740/ 2772, tx: 1812, lsn: 0/026D1670, prev 0/026D0B98, bkp: , desc: Insert item, node: 1663/12896/16441 blkno: 5 offset: 2 nitem: 1 isdata: F isleaf T isdelete T updateBlkno:4294967295 rmgr: Gin len (rec/tot): 2714/ 2746, tx: 1812, lsn: 0/026D21A8, prev 0/026D2160, bkp: , desc: Create posting tree, node: 1663/12896/16441 blkno: 6 The size decreases to 66B and then is never changed. rmgr: Gin len (rec/tot):
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
Fujii Masao escribió: On Sun, Mar 16, 2014 at 7:15 AM, Alexander Korotkov aekorot...@gmail.com wrote: That could be optimized, but I figured we can live with it, thanks to the fastupdate feature. Fastupdate allows amortizing that cost over several insertions. But of course, you explicitly disabled that... Let me know if you want me to write patch addressing this issue. Yeah, I really want you to address this problem! That's definitely useful for every users disabling FASTUPDATE option for some reasons. Users that disable FASTUPDATE, in my experience, do so because their stock work_mem is way too high and GIN searches become too slow due to having to scan too large a list. I think it might make sense to invest a modest amount of time in getting FASTUPDATE to be sized completely differently from today -- perhaps base it on a hardcoded factor of BLCKSZ, rather than work_mem. Or, if we really need to make it configurable, then let it have its own parameter. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On 03/17/2014 03:20 PM, Fujii Masao wrote: On Sun, Mar 16, 2014 at 7:15 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Sat, Mar 15, 2014 at 11:27 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I ran pg_xlogdump | grep Gin and checked the size of GIN-related WAL, and then found its max seems more than 256B. Am I missing something? What I observed is [In HEAD] At first, the size of GIN-related WAL is gradually increasing up to about 1400B. rmgr: Gin len (rec/tot): 48/80, tx: 1813, lsn: 0/020020D8, prev 0/0270, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: F rmgr: Gin len (rec/tot): 56/88, tx: 1813, lsn: 0/02002440, prev 0/020023F8, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: T rmgr: Gin len (rec/tot): 64/96, tx: 1813, lsn: 0/020044D8, prev 0/02004490, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 1 isdata: F isleaf: T isdelete: T ... rmgr: Gin len (rec/tot): 1376/ 1408, tx: 1813, lsn: 0/02A7EE90, prev 0/02A7E910, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 2 isdata: F isleaf: T isdelete: T rmgr: Gin len (rec/tot): 1392/ 1424, tx: 1813, lsn: 0/02A7F458, prev 0/02A7F410, bkp: , desc: Create posting tree, node: 1663/12945/16441 blkno: 4 This corresponds to the stage where the items are stored in-line in the entry-tree. After it reaches a certain size, a posting tree is created. Then the size decreases to about 100B and is gradually increasing again up to 320B. rmgr: Gin len (rec/tot):116/ 148, tx: 1813, lsn: 0/02A7F9E8, prev 0/02A7F458, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 4 isdata: T isleaf: T unmodified: 1280 length: 1372 (compressed) rmgr: Gin len (rec/tot): 40/72, tx: 1813, lsn: 0/02A7FA80, prev 0/02A7F9E8, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 3 isdata: F isleaf: T isdelete: T ... rmgr: Gin len (rec/tot):118/ 150, tx: 1813, lsn: 0/02A83BA0, prev 0/02A83B58, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 4 isdata: T isleaf: T unmodified: 1280 length: 1374 (compressed) ... rmgr: Gin len (rec/tot):288/ 320, tx: 1813, lsn: 0/02AEDE28, prev 0/02AEDCE8, bkp: , desc: Insert item, node: 1663/12945/16441 blkno: 14 isdata: T isleaf: T unmodified: 1280 length: 1544 (compressed) Then the size decreases to 66B and is gradually increasing again up to 320B. This increase and decrease of WAL size seems to continue. Here the new items are appended to posting tree pages. This is where the maximum of 256 bytes I mentioned applies. 256 bytes is the max size of one compressed posting list, the WAL record containing it includes some other stuff too, which adds up to that 320 bytes. [In 9.3] At first, the size of GIN-related WAL is gradually increasing up to about 2700B. rmgr: Gin len (rec/tot): 52/84, tx: 1812, lsn: 0/02000430, prev 0/020003D8, bkp: , desc: Insert item, node: 1663/12896/16441 blkno: 1 offset: 11 nitem: 1 isdata: F isleaf T isdelete F updateBlkno:4294967295 rmgr: Gin len (rec/tot): 60/92, tx: 1812, lsn: 0/020004D0, prev 0/02000488, bkp: , desc: Insert item, node: 1663/12896/16441 blkno: 1 offset: 1 nitem: 1 isdata: F isleaf T isdelete T updateBlkno:4294967295 ... rmgr: Gin len (rec/tot): 2740/ 2772, tx: 1812, lsn: 0/026D1670, prev 0/026D0B98, bkp: , desc: Insert item, node: 1663/12896/16441 blkno: 5 offset: 2 nitem: 1 isdata: F isleaf T isdelete T updateBlkno:4294967295 rmgr: Gin len (rec/tot): 2714/ 2746, tx: 1812, lsn: 0/026D21A8, prev 0/026D2160, bkp: , desc: Create posting tree, node: 1663/12896/16441 blkno: 6 The size decreases to 66B and then is never changed. Same mechanism on 9.3, but the insertions to the posting tree pages are constant size. That could be optimized, but I figured we can live with it, thanks to the fastupdate feature. Fastupdate allows amortizing that cost over several insertions. But of course, you explicitly disabled that... Let me know if you want me to write patch addressing this issue. Yeah, I really want you to address this problem! That's definitely useful for every users disabling FASTUPDATE option for some reasons. Ok, let's think about it a little bit. I think there are three fairly simple ways to address this: 1. The GIN data leaf recompress record contains an offset called unmodifiedlength, and the data that comes after that offset. Currently, the record is written so that unmodifiedlength points to the end of the last compressed posting list stored on the page that was not modified, followed by all the modified ones. The straightforward way to cut down the WAL record
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
Heikki Linnakangas hlinnakan...@vmware.com writes: 2. Instead of storing the new compressed posting list in the WAL record, store only the new item pointers added to the page. WAL replay would then have to duplicate the work done in the main insertion code path: find the right posting lists to insert to, decode them, add the new items, and re-encode. That sounds fairly dangerous ... is any user-defined code involved in those decisions? This record format would be higher-level, in the sense that we would not store the physical copy of the compressed posting list as it was formed originally. The same work would be done at WAL replay. As the code stands, it will produce exactly the same result, but that's not guaranteed if we make bugfixes to the code later, and a master and standby are running different minor version. There's not necessarily anything wrong with that, but it's something to keep in mind. Version skew would be a hazard too, all right. I think it's important that WAL replay be a pretty mechanical, predictable process. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On 03/17/2014 04:33 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: 2. Instead of storing the new compressed posting list in the WAL record, store only the new item pointers added to the page. WAL replay would then have to duplicate the work done in the main insertion code path: find the right posting lists to insert to, decode them, add the new items, and re-encode. That sounds fairly dangerous ... is any user-defined code involved in those decisions? No. This record format would be higher-level, in the sense that we would not store the physical copy of the compressed posting list as it was formed originally. The same work would be done at WAL replay. As the code stands, it will produce exactly the same result, but that's not guaranteed if we make bugfixes to the code later, and a master and standby are running different minor version. There's not necessarily anything wrong with that, but it's something to keep in mind. Version skew would be a hazard too, all right. I think it's important that WAL replay be a pretty mechanical, predictable process. Yeah. One particular point to note is that if in one place we do the more high level thing and have WAL replay re-encode the page as it sees fit, then we can *not* rely on the page being byte-by-byte identical in other places. Like, in vacuum, where items are deleted. Heap and B-tree WAL records also rely on PageAddItem etc. to reconstruct the page, instead of making a physical copy of the modified parts. And _bt_restore_page even inserts the items physically in different order than the normal codepath does. So for good or bad, there is some precedence for this. The imminent danger I see is if we change the logic on how the items are divided into posting lists, and end up in a situation where a master server adds an item to a page, and it just fits, but with the compression logic the standby version has, it cannot make it fit. As an escape hatch for that, we could have the WAL replay code try the compression again, with a larger max. posting list size, if it doesn't fit at first. And/or always leave something like 10 bytes of free space on every data page to make up for small differences in the logic. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Mon, Mar 17, 2014 at 10:54 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Heap and B-tree WAL records also rely on PageAddItem etc. to reconstruct the page, instead of making a physical copy of the modified parts. And _bt_restore_page even inserts the items physically in different order than the normal codepath does. So for good or bad, there is some precedence for this. Yikes. The imminent danger I see is if we change the logic on how the items are divided into posting lists, and end up in a situation where a master server adds an item to a page, and it just fits, but with the compression logic the standby version has, it cannot make it fit. As an escape hatch for that, we could have the WAL replay code try the compression again, with a larger max. posting list size, if it doesn't fit at first. And/or always leave something like 10 bytes of free space on every data page to make up for small differences in the logic. That scares the crap out of me. I don't see any intrinsic problem with relying on the existence page contents to figure out how to roll forward, as PageAddItem does; after all, we do FPIs precisely so that the page is in a known good state when we start. However, I really think we ought to try hard to make this deterministic in terms of what the resulting state of the page is; anything else seems like it's playing with fire, and I bet we'll get burned sooner rather than later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
Robert Haas robertmh...@gmail.com writes: On Mon, Mar 17, 2014 at 10:54 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Heap and B-tree WAL records also rely on PageAddItem etc. to reconstruct the page, instead of making a physical copy of the modified parts. And _bt_restore_page even inserts the items physically in different order than the normal codepath does. So for good or bad, there is some precedence for this. Yikes. Yeah. I think it's arguably a bug that _bt_restore_page works like that, even though we've not been burnt up to now. The imminent danger I see is if we change the logic on how the items are divided into posting lists, and end up in a situation where a master server adds an item to a page, and it just fits, but with the compression logic the standby version has, it cannot make it fit. As an escape hatch for that, we could have the WAL replay code try the compression again, with a larger max. posting list size, if it doesn't fit at first. And/or always leave something like 10 bytes of free space on every data page to make up for small differences in the logic. That scares the crap out of me. Likewise. Saving some WAL space is *not* worth this kind of risk. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On 03/17/2014 05:35 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Mar 17, 2014 at 10:54 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: The imminent danger I see is if we change the logic on how the items are divided into posting lists, and end up in a situation where a master server adds an item to a page, and it just fits, but with the compression logic the standby version has, it cannot make it fit. As an escape hatch for that, we could have the WAL replay code try the compression again, with a larger max. posting list size, if it doesn't fit at first. And/or always leave something like 10 bytes of free space on every data page to make up for small differences in the logic. That scares the crap out of me. Likewise. Saving some WAL space is *not* worth this kind of risk. One fairly good compromise would be to only include the new items, not the whole modified compression lists, and let the replay logic do the re-encoding of the posting lists. But also include the cutoff points of each posting list in the WAL record. That way the replay code would have no freedom in how it decides to split the items into compressed lists, that would be fully specified by the WAL record. Here's a refresher for those who didn't follow the development of the new page format: The data page basically contains a list of ItemPointers. The items are compressed, to save disk space. However, to make random access faster, all the items on the page are not compressed as one big list. Instead, the big array of items is split into roughly equal chunks, and each chunk is compressed separately. The chunks are stored on the page one after each other. (The chunks are called posting lists in the code, the struct is called GinPostingListData) The compression is completely deterministic (each item is stored as a varbyte-encoded delta from the previous item), but there are no hard rules on how the items on the page ought to be divided into the posting lists. Currently, the code tries to maintain a max size of 256 bytes per list - but it will cope with any size it finds on disk. This is where the danger lies, where we could end up with a different physical page after WAL replay, if we just include the new items in the WAL record. The WAL replay might decide to split the items into posting lists differently than was originally done. (as the code stands, it would always make the same decision, completely deterministically, but that might change in a minor version if we're not careful) We can tie WAL replay's hands about that, if we include a list of items that form the posting lists in the WAL record. That adds some bloat, compared to only including the new items, but not too much. (and we still only need do that for posting lists following the first modified one.) Alexander, would you like to give that a shot, or will I? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] HEAD seems to generate larger WAL regarding GIN index
Hi, I executed the following statements in HEAD and 9.3, and compared the size of WAL which were generated by data insertion in GIN index. - CREATE EXTENSION pg_trgm; CREATE TABLE hoge (col1 text); CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH (FASTUPDATE = off); CHECKPOINT; SELECT pg_switch_xlog(); SELECT pg_switch_xlog(); SELECT pg_current_xlog_location(); INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 100); SELECT pg_current_xlog_location(); - The results of WAL size are 960 MB (9.3) 2113 MB (HEAD) The WAL size in HEAD was more than two times bigger than that in 9.3. Recently the source code of GIN index has been changed dramatically. Is the increase in GIN-related WAL intentional or a bug? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On 03/15/2014 08:40 PM, Fujii Masao wrote: Hi, I executed the following statements in HEAD and 9.3, and compared the size of WAL which were generated by data insertion in GIN index. - CREATE EXTENSION pg_trgm; CREATE TABLE hoge (col1 text); CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH (FASTUPDATE = off); CHECKPOINT; SELECT pg_switch_xlog(); SELECT pg_switch_xlog(); SELECT pg_current_xlog_location(); INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 100); SELECT pg_current_xlog_location(); - The results of WAL size are 960 MB (9.3) 2113 MB (HEAD) The WAL size in HEAD was more than two times bigger than that in 9.3. Recently the source code of GIN index has been changed dramatically. Is the increase in GIN-related WAL intentional or a bug? It was somewhat expected. Updating individual items on the new-format GIN pages requires decompressing and recompressing the page, and the recompressed posting lists need to be WAL-logged. Which generates much larger WAL records. That said, I didn't expect the difference to be quite that big when you're appending to the end of the table. When the new entries go to the end of the posting lists, you only need to recompress and WAL-log the last posting list, which is max 256 bytes long. But I guess that's still a lot more WAL than in the old format. That could be optimized, but I figured we can live with it, thanks to the fastupdate feature. Fastupdate allows amortizing that cost over several insertions. But of course, you explicitly disabled that... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Sat, Mar 15, 2014 at 11:27 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 03/15/2014 08:40 PM, Fujii Masao wrote: Hi, I executed the following statements in HEAD and 9.3, and compared the size of WAL which were generated by data insertion in GIN index. - CREATE EXTENSION pg_trgm; CREATE TABLE hoge (col1 text); CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH (FASTUPDATE = off); CHECKPOINT; SELECT pg_switch_xlog(); SELECT pg_switch_xlog(); SELECT pg_current_xlog_location(); INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 100); SELECT pg_current_xlog_location(); - The results of WAL size are 960 MB (9.3) 2113 MB (HEAD) The WAL size in HEAD was more than two times bigger than that in 9.3. Recently the source code of GIN index has been changed dramatically. Is the increase in GIN-related WAL intentional or a bug? It was somewhat expected. Updating individual items on the new-format GIN pages requires decompressing and recompressing the page, and the recompressed posting lists need to be WAL-logged. Which generates much larger WAL records. That said, I didn't expect the difference to be quite that big when you're appending to the end of the table. When the new entries go to the end of the posting lists, you only need to recompress and WAL-log the last posting list, which is max 256 bytes long. But I guess that's still a lot more WAL than in the old format. That could be optimized, but I figured we can live with it, thanks to the fastupdate feature. Fastupdate allows amortizing that cost over several insertions. But of course, you explicitly disabled that... Let me know if you want me to write patch addressing this issue. -- With best regards, Alexander Korotkov.