Re: [HACKERS] GIN pending list clean up exposure to SQL
On Wed, Jan 27, 2016 at 12:54 PM, Julien Rouhaudwrote: > On 27/01/2016 10:27, Fujii Masao wrote: >> >> Thanks for updating the patch! It looks good to me. >> >> Based on your patch, I just improved the doc. For example, I added >> the following note into the doc. >> >> +These functions cannot be executed during recovery. +Use >> of these functions is restricted to superusers and the owner + >> of the given index. >> >> If there is no problem, I'm thinking to commit this version. >> > > Just a detail: > > +Note that the cleanup does not happen and the return value is 0 > +if the argument is the GIN index built with fastupdate > +option disabled because it doesn't have a pending list. > > It should be "if the argument is *a* GIN index" > > I find this sentence a little confusing, maybe rephrase like this would > be better: > > -Note that the cleanup does not happen and the return value is 0 > -if the argument is the GIN index built with fastupdate > -option disabled because it doesn't have a pending list. > +Note that if the argument is a GIN index built with > fastupdate > +option disabled, the cleanup does not happen and the return value > is 0 > +because the index doesn't have a pending list. > > Otherwise, I don't see any problem on this version. This is a corner case that probably does not need to be in the docs, but I wanted to clarify it here in case you disagree: If the index ever had fastupdate turned on, when it is turned off the index will keep whatever pending_list it had until something cleans it up one last time. Thanks for the review and changes and commit. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Mon, Feb 8, 2016 at 9:53 AM, Jeff Janeswrote: > On Wed, Jan 27, 2016 at 12:54 PM, Julien Rouhaud > wrote: >> On 27/01/2016 10:27, Fujii Masao wrote: >>> >>> Thanks for updating the patch! It looks good to me. >>> >>> Based on your patch, I just improved the doc. For example, I added >>> the following note into the doc. >>> >>> +These functions cannot be executed during recovery. +Use >>> of these functions is restricted to superusers and the owner + >>> of the given index. >>> >>> If there is no problem, I'm thinking to commit this version. >>> >> >> Just a detail: >> >> +Note that the cleanup does not happen and the return value is 0 >> +if the argument is the GIN index built with fastupdate >> +option disabled because it doesn't have a pending list. >> >> It should be "if the argument is *a* GIN index" >> >> I find this sentence a little confusing, maybe rephrase like this would >> be better: >> >> -Note that the cleanup does not happen and the return value is 0 >> -if the argument is the GIN index built with fastupdate >> -option disabled because it doesn't have a pending list. >> +Note that if the argument is a GIN index built with >> fastupdate >> +option disabled, the cleanup does not happen and the return value >> is 0 >> +because the index doesn't have a pending list. >> >> Otherwise, I don't see any problem on this version. > > This is a corner case that probably does not need to be in the docs, > but I wanted to clarify it here in case you disagree: If the index > ever had fastupdate turned on, when it is turned off the index will > keep whatever pending_list it had until something cleans it up one > last time. I agree to add that note to the doc. Or we should remove the above description that I added? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Thu, Jan 28, 2016 at 5:54 AM, Julien Rouhaudwrote: > On 27/01/2016 10:27, Fujii Masao wrote: >> On Mon, Jan 25, 2016 at 3:54 PM, Jeff Janes >> wrote: >>> On Wed, Jan 20, 2016 at 6:17 AM, Fujii Masao >>> wrote: On Sat, Jan 16, 2016 at 7:42 AM, Julien Rouhaud wrote: > On 15/01/2016 22:59, Jeff Janes wrote: >> On Sun, Jan 10, 2016 at 4:24 AM, Julien Rouhaud >> wrote: > > All looks fine to me, I flag it as ready for committer. When I compiled the PostgreSQL with the patch, I got the following error. ISTM that the inclusion of pg_am.h header file is missing in ginfast.c. >>> >>> Thanks. Fixed. >>> gin_clean_pending_list() must check whether the server is in recovery or not. If it's in recovery, the function must exit with an error. That is, IMO, something like the following check must be added. if (RecoveryInProgress()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("recovery is in progress"), errhint("GIN pending list cannot be cleaned up during recovery."))); It's better to make gin_clean_pending_list() check whether the target index is temporary index of other sessions or not, like pgstatginindex() does. >>> >>> I've added both of these checks. Sorry I overlooked your early >>> email in this thread about those. >>> +RelationindexRel = index_open(indexoid, AccessShareLock); ISTM that AccessShareLock is not safe when updating the pending list and GIN index main structure. Probaby we should use RowExclusiveLock? >>> >>> Other callers of the ginInsertCleanup function also do so while >>> holding only the AccessShareLock on the index. It turns out >>> that there is a bug around this, as discussed in "Potential GIN >>> vacuum bug" >>> (http://www.postgresql.org/message-id/flat/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com) >>> >>> >>> > But, that bug has to be solved at a deeper level than this patch. >>> >>> I've also cleaned up some other conflicts, and chose a more >>> suitable OID for the new catalog function. >>> >>> The number of new header includes needed to implement this makes >>> me wonder if I put this code in the correct file, but I don't see >>> a better location for it. >>> >>> New version attached. >> >> Thanks for updating the patch! It looks good to me. >> >> Based on your patch, I just improved the doc. For example, I added >> the following note into the doc. >> >> +These functions cannot be executed during recovery. +Use >> of these functions is restricted to superusers and the owner + >> of the given index. >> >> If there is no problem, I'm thinking to commit this version. >> > > Just a detail: > > +Note that the cleanup does not happen and the return value is 0 > +if the argument is the GIN index built with fastupdate > +option disabled because it doesn't have a pending list. > > It should be "if the argument is *a* GIN index" > > I find this sentence a little confusing, maybe rephrase like this would > be better: > > -Note that the cleanup does not happen and the return value is 0 > -if the argument is the GIN index built with fastupdate > -option disabled because it doesn't have a pending list. > +Note that if the argument is a GIN index built with > fastupdate > +option disabled, the cleanup does not happen and the return value > is 0 > +because the index doesn't have a pending list. > > Otherwise, I don't see any problem on this version. Thanks for the review! I adopted the description you suggested. Just pushed the patch. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 27/01/2016 10:27, Fujii Masao wrote: > On Mon, Jan 25, 2016 at 3:54 PM, Jeff Janes> wrote: >> On Wed, Jan 20, 2016 at 6:17 AM, Fujii Masao >> wrote: >>> On Sat, Jan 16, 2016 at 7:42 AM, Julien Rouhaud >>> wrote: On 15/01/2016 22:59, Jeff Janes wrote: > On Sun, Jan 10, 2016 at 4:24 AM, Julien Rouhaud > wrote: All looks fine to me, I flag it as ready for committer. >>> >>> When I compiled the PostgreSQL with the patch, I got the >>> following error. ISTM that the inclusion of pg_am.h header file >>> is missing in ginfast.c. >> >> Thanks. Fixed. >> >>> gin_clean_pending_list() must check whether the server is in >>> recovery or not. If it's in recovery, the function must exit >>> with an error. That is, IMO, something like the following check >>> must be added. >>> >>> if (RecoveryInProgress()) ereport(ERROR, >>> >>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), >>> errmsg("recovery is in progress"), errhint("GIN pending list >>> cannot be cleaned up during recovery."))); >>> >>> It's better to make gin_clean_pending_list() check whether the >>> target index is temporary index of other sessions or not, like >>> pgstatginindex() does. >> >> I've added both of these checks. Sorry I overlooked your early >> email in this thread about those. >> >>> >>> +RelationindexRel = index_open(indexoid, >>> AccessShareLock); >>> >>> ISTM that AccessShareLock is not safe when updating the pending >>> list and GIN index main structure. Probaby we should use >>> RowExclusiveLock? >> >> Other callers of the ginInsertCleanup function also do so while >> holding only the AccessShareLock on the index. It turns out >> that there is a bug around this, as discussed in "Potential GIN >> vacuum bug" >> (http://www.postgresql.org/message-id/flat/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com) >> >> >> But, that bug has to be solved at a deeper level than this patch. >> >> I've also cleaned up some other conflicts, and chose a more >> suitable OID for the new catalog function. >> >> The number of new header includes needed to implement this makes >> me wonder if I put this code in the correct file, but I don't see >> a better location for it. >> >> New version attached. > > Thanks for updating the patch! It looks good to me. > > Based on your patch, I just improved the doc. For example, I added > the following note into the doc. > > +These functions cannot be executed during recovery. +Use > of these functions is restricted to superusers and the owner + > of the given index. > > If there is no problem, I'm thinking to commit this version. > Just a detail: +Note that the cleanup does not happen and the return value is 0 +if the argument is the GIN index built with fastupdate +option disabled because it doesn't have a pending list. It should be "if the argument is *a* GIN index" I find this sentence a little confusing, maybe rephrase like this would be better: -Note that the cleanup does not happen and the return value is 0 -if the argument is the GIN index built with fastupdate -option disabled because it doesn't have a pending list. +Note that if the argument is a GIN index built with fastupdate +option disabled, the cleanup does not happen and the return value is 0 +because the index doesn't have a pending list. Otherwise, I don't see any problem on this version. Regards. > Regards, > > > > -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Mon, Jan 25, 2016 at 3:54 PM, Jeff Janeswrote: > On Wed, Jan 20, 2016 at 6:17 AM, Fujii Masao wrote: >> On Sat, Jan 16, 2016 at 7:42 AM, Julien Rouhaud >> wrote: >>> On 15/01/2016 22:59, Jeff Janes wrote: On Sun, Jan 10, 2016 at 4:24 AM, Julien Rouhaud wrote: >>> >>> All looks fine to me, I flag it as ready for committer. >> >> When I compiled the PostgreSQL with the patch, I got the following error. >> ISTM that the inclusion of pg_am.h header file is missing in ginfast.c. > > Thanks. Fixed. > >> gin_clean_pending_list() must check whether the server is in recovery or not. >> If it's in recovery, the function must exit with an error. That is, IMO, >> something like the following check must be added. >> >> if (RecoveryInProgress()) >> ereport(ERROR, >> >> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), >> errmsg("recovery is in progress"), >> errhint("GIN pending list cannot be >> cleaned up during recovery."))); >> >> It's better to make gin_clean_pending_list() check whether the target index >> is temporary index of other sessions or not, like pgstatginindex() does. > > I've added both of these checks. Sorry I overlooked your early email > in this thread about those. > >> >> +RelationindexRel = index_open(indexoid, AccessShareLock); >> >> ISTM that AccessShareLock is not safe when updating the pending list and >> GIN index main structure. Probaby we should use RowExclusiveLock? > > Other callers of the ginInsertCleanup function also do so while > holding only the AccessShareLock on the index. It turns out that > there is a bug around this, as discussed in "Potential GIN vacuum bug" > (http://www.postgresql.org/message-id/flat/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com) > > But, that bug has to be solved at a deeper level than this patch. > > I've also cleaned up some other conflicts, and chose a more suitable > OID for the new catalog function. > > The number of new header includes needed to implement this makes me > wonder if I put this code in the correct file, but I don't see a > better location for it. > > New version attached. Thanks for updating the patch! It looks good to me. Based on your patch, I just improved the doc. For example, I added the following note into the doc. +These functions cannot be executed during recovery. +Use of these functions is restricted to superusers and the owner +of the given index. If there is no problem, I'm thinking to commit this version. Regards, -- Fujii Masao *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 18036,18044 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); --- 18036,18051 brin_summarize_new_values + + gin_clean_pending_list + + shows the functions available for index maintenance tasks. + These functions cannot be executed during recovery. + Use of these functions is restricted to superusers and the owner + of the given index. *** *** 18056,18061 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); --- 18063,18075 integer summarize page ranges not already summarized + + + gin_clean_pending_list(index regclass) + +bigint +move GIN pending list entries into main index structure + *** *** 18069,18074 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); --- 18083,18100 into the index. + + gin_clean_pending_list accepts the OID or name of + a GIN index and cleans up the pending list of the specified GIN index + by moving entries in it to the main GIN data structure in bulk. + It returns the number of pages cleaned up from the pending list. + Note that the cleanup does not happen and the return value is 0 + if the argument is the GIN index built with fastupdate + option disabled because it doesn't have a pending list. + Please see and + for details of the pending list and fastupdate option. + + *** a/doc/src/sgml/gin.sgml --- b/doc/src/sgml/gin.sgml *** *** 734,740 from the indexed item). As of PostgreSQL 8.4, GIN is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes larger than , the entries are moved to the main GIN data structure using the same bulk insert techniques used during initial index creation. This greatly improves --- 734,742 from the indexed item). As of PostgreSQL 8.4, GIN is capable of postponing much of this work
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Wed, Jan 20, 2016 at 6:17 AM, Fujii Masaowrote: > On Sat, Jan 16, 2016 at 7:42 AM, Julien Rouhaud > wrote: >> On 15/01/2016 22:59, Jeff Janes wrote: >>> On Sun, Jan 10, 2016 at 4:24 AM, Julien Rouhaud >>> wrote: >> >> All looks fine to me, I flag it as ready for committer. > > When I compiled the PostgreSQL with the patch, I got the following error. > ISTM that the inclusion of pg_am.h header file is missing in ginfast.c. Thanks. Fixed. > gin_clean_pending_list() must check whether the server is in recovery or not. > If it's in recovery, the function must exit with an error. That is, IMO, > something like the following check must be added. > > if (RecoveryInProgress()) > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("recovery is in progress"), > errhint("GIN pending list cannot be > cleaned up during recovery."))); > > It's better to make gin_clean_pending_list() check whether the target index > is temporary index of other sessions or not, like pgstatginindex() does. I've added both of these checks. Sorry I overlooked your early email in this thread about those. > > +RelationindexRel = index_open(indexoid, AccessShareLock); > > ISTM that AccessShareLock is not safe when updating the pending list and > GIN index main structure. Probaby we should use RowExclusiveLock? Other callers of the ginInsertCleanup function also do so while holding only the AccessShareLock on the index. It turns out that there is a bug around this, as discussed in "Potential GIN vacuum bug" (http://www.postgresql.org/message-id/flat/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com) But, that bug has to be solved at a deeper level than this patch. I've also cleaned up some other conflicts, and chose a more suitable OID for the new catalog function. The number of new header includes needed to implement this makes me wonder if I put this code in the correct file, but I don't see a better location for it. New version attached. Thanks, Jeff diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index 9c143b2..ada228f *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** postgres=# SELECT * FROM pg_xlogfile_nam *** 18036,18041 --- 18036,18045 brin_summarize_new_values + + gin_clean_pending_list + + shows the functions available for index maintenance tasks. *** postgres=# SELECT * FROM pg_xlogfile_nam *** 18056,18061 --- 18060,18072 integer summarize page ranges not already summarized + + + gin_clean_pending_list(index_oid regclass) + +bigint +move fast-update pending list entries into main index structure + *** postgres=# SELECT * FROM pg_xlogfile_nam *** 18069,18074 --- 18080,18092 into the index. + +gin_clean_pending_list accepts a GIN index OID argument + and moves the fast-update entries from the pending list into the + main index structure for that index. It returns the number of pages + removed from the pending list. + + diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml new file mode 100644 index 9eb0b5a..17a5087 *** a/doc/src/sgml/gin.sgml --- b/doc/src/sgml/gin.sgml *** *** 734,740 from the indexed item). As of PostgreSQL 8.4, GIN is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes larger than , the entries are moved to the main GIN data structure using the same bulk insert techniques used during initial index creation. This greatly improves --- 734,742 from the indexed item). As of PostgreSQL 8.4, GIN is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed or autoanalyzed, or when !gin_clean_pending_list(regclass) is called, or if the !pending list becomes larger than , the entries are moved to the main GIN data structure using the same bulk insert techniques used during initial index creation. This greatly improves diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c new file mode 100644 index 681ce09..dfdc45f *** a/src/backend/access/gin/ginfast.c --- b/src/backend/access/gin/ginfast.c *** *** 20,29 --- 20,32 #include "access/gin_private.h" #include "access/xloginsert.h" + #include "access/xlog.h" #include "commands/vacuum.h" + #include "catalog/pg_am.h" #include "miscadmin.h" #include
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 20/01/2016 15:17, Fujii Masao wrote: > > When I compiled the PostgreSQL with the patch, I got the following error. > ISTM that the inclusion of pg_am.h header file is missing in ginfast.c. > > ginfast.c: In function 'gin_clean_pending_list': > ginfast.c:980: error: 'GIN_AM_OID' undeclared (first use in this function) > ginfast.c:980: error: (Each undeclared identifier is reported only once > ginfast.c:980: error: for each function it appears in.) > > gin_clean_pending_list() must check whether the server is in recovery or not. > If it's in recovery, the function must exit with an error. That is, IMO, > something like the following check must be added. > > if (RecoveryInProgress()) > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("recovery is in progress"), > errhint("GIN pending list cannot be > cleaned up during recovery."))); > > It's better to make gin_clean_pending_list() check whether the target index > is temporary index of other sessions or not, like pgstatginindex() does. > > +RelationindexRel = index_open(indexoid, AccessShareLock); > > ISTM that AccessShareLock is not safe when updating the pending list and > GIN index main structure. Probaby we should use RowExclusiveLock? > This lock should be safe, as pointed by Alvaro and Jaime earlier in this thread (http://www.postgresql.org/message-id/20151119161846.GK614468@alvherre.pgsql), as ginInsertCleanup() can be called concurrently. Also, since 38710a374ea the ginInsertCleanup() call must be fixed: - ginInsertCleanup(, false, true, ); + ginInsertCleanup(, true, ); Regards. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Sat, Jan 16, 2016 at 7:42 AM, Julien Rouhaudwrote: > On 15/01/2016 22:59, Jeff Janes wrote: >> On Sun, Jan 10, 2016 at 4:24 AM, Julien Rouhaud >> wrote: >>> On 29/12/2015 00:30, Jeff Janes wrote: On Wed, Nov 25, 2015 at 9:29 AM, Jeff Janes wrote: > > I'll prepare a patch for core for the January commitfest, and see if > it flies. If not, there is always the extension to fall back to. Here is an updated patch. I've added type and permission checks, docs, and some regression tests. >>> >>> I just reviewed it. Patch applies cleanly, everything works as intended, >>> including regression tests. >>> >>> I think the function should be declared as strict. >> >> OK. I see that brin_summarize_new_values, which I modeled this on, >> was recently changed to be strict. So I've changed this the same way. >> >>> >>> Also, there are some trailing whitespaces in the documentation diff. >> >> Fixed. > > Thanks! > >> I also added the DESC to the pg_proc entry, which I somehow >> missed before. >> > > Good catch, I also missed it. > > All looks fine to me, I flag it as ready for committer. When I compiled the PostgreSQL with the patch, I got the following error. ISTM that the inclusion of pg_am.h header file is missing in ginfast.c. ginfast.c: In function 'gin_clean_pending_list': ginfast.c:980: error: 'GIN_AM_OID' undeclared (first use in this function) ginfast.c:980: error: (Each undeclared identifier is reported only once ginfast.c:980: error: for each function it appears in.) gin_clean_pending_list() must check whether the server is in recovery or not. If it's in recovery, the function must exit with an error. That is, IMO, something like the following check must be added. if (RecoveryInProgress()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("recovery is in progress"), errhint("GIN pending list cannot be cleaned up during recovery."))); It's better to make gin_clean_pending_list() check whether the target index is temporary index of other sessions or not, like pgstatginindex() does. +RelationindexRel = index_open(indexoid, AccessShareLock); ISTM that AccessShareLock is not safe when updating the pending list and GIN index main structure. Probaby we should use RowExclusiveLock? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Sun, Jan 10, 2016 at 4:24 AM, Julien Rouhaudwrote: > On 29/12/2015 00:30, Jeff Janes wrote: >> On Wed, Nov 25, 2015 at 9:29 AM, Jeff Janes wrote: >>> >>> I'll prepare a patch for core for the January commitfest, and see if >>> it flies. If not, there is always the extension to fall back to. >> >> Here is an updated patch. I've added type and permission checks, >> docs, and some regression tests. >> > > I just reviewed it. Patch applies cleanly, everything works as intended, > including regression tests. > > I think the function should be declared as strict. OK. I see that brin_summarize_new_values, which I modeled this on, was recently changed to be strict. So I've changed this the same way. > > Also, there are some trailing whitespaces in the documentation diff. Fixed. I also added the DESC to the pg_proc entry, which I somehow missed before. Thanks, Jeff gin_clean_pending_user_v003.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 15/01/2016 22:59, Jeff Janes wrote: > On Sun, Jan 10, 2016 at 4:24 AM, Julien Rouhaud >wrote: >> On 29/12/2015 00:30, Jeff Janes wrote: >>> On Wed, Nov 25, 2015 at 9:29 AM, Jeff Janes wrote: I'll prepare a patch for core for the January commitfest, and see if it flies. If not, there is always the extension to fall back to. >>> >>> Here is an updated patch. I've added type and permission checks, >>> docs, and some regression tests. >>> >> >> I just reviewed it. Patch applies cleanly, everything works as intended, >> including regression tests. >> >> I think the function should be declared as strict. > > OK. I see that brin_summarize_new_values, which I modeled this on, > was recently changed to be strict. So I've changed this the same way. > >> >> Also, there are some trailing whitespaces in the documentation diff. > > Fixed. Thanks! > I also added the DESC to the pg_proc entry, which I somehow > missed before. > Good catch, I also missed it. All looks fine to me, I flag it as ready for committer. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 29/12/2015 00:30, Jeff Janes wrote: > On Wed, Nov 25, 2015 at 9:29 AM, Jeff Janeswrote: >> >> I'll prepare a patch for core for the January commitfest, and see if >> it flies. If not, there is always the extension to fall back to. > > Here is an updated patch. I've added type and permission checks, > docs, and some regression tests. > I just reviewed it. Patch applies cleanly, everything works as intended, including regression tests. I think the function should be declared as strict. Also, there are some trailing whitespaces in the documentation diff. Regards. > Cheers, > > Jeff > > > > -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Wed, Nov 25, 2015 at 9:29 AM, Jeff Janeswrote: > > I'll prepare a patch for core for the January commitfest, and see if > it flies. If not, there is always the extension to fall back to. Here is an updated patch. I've added type and permission checks, docs, and some regression tests. Cheers, Jeff diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index 8ef9fce..6cbe11b *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** postgres=# SELECT * FROM pg_xlogfile_nam *** 17919,17924 --- 17919,17928 brin_summarize_new_values + + gin_clean_pending_list + + shows the functions available for index maintenance tasks. *** postgres=# SELECT * FROM pg_xlogfile_nam *** 17939,17944 --- 17943,17955 integer summarize page ranges not already summarized + + + gin_clean_pending_list(index_oid regclass) + +bigint +move fast-update pending list entries into main index structure + *** postgres=# SELECT * FROM pg_xlogfile_nam *** 17952,17957 --- 17963,17975 into the index. + +gin_clean_pending_list accepts a GIN index OID argument + and moves the fast-update entries from the pending list into the + main index structure for that index. It returns the number of pages + removed from the pending list. + + diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml new file mode 100644 index 262f1e4..c6f19de *** a/doc/src/sgml/gin.sgml --- b/doc/src/sgml/gin.sgml *** *** 728,734 from the indexed item). As of PostgreSQL 8.4, GIN is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed, or if the pending list becomes larger than , the entries are moved to the main GIN data structure using the same bulk insert techniques used during initial index creation. This greatly improves --- 728,736 from the indexed item). As of PostgreSQL 8.4, GIN is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. !When the table is vacuumed or autoanalyzed, or when !gin_clean_pending_list(regclass) is called, or if the !pending list becomes larger than , the entries are moved to the main GIN data structure using the same bulk insert techniques used during initial index creation. This greatly improves diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c new file mode 100644 index 4bb22fe..373b52d *** a/src/backend/access/gin/ginfast.c --- b/src/backend/access/gin/ginfast.c *** *** 24,29 --- 24,30 #include "miscadmin.h" #include "utils/memutils.h" #include "utils/rel.h" + #include "utils/acl.h" #include "storage/indexfsm.h" /* GUC parameter */ *** ginInsertCleanup(GinState *ginstate, *** 962,964 --- 963,999 MemoryContextSwitchTo(oldCtx); MemoryContextDelete(opCtx); } + + /* + * SQL-callable function to clean the insert pending list + */ + Datum + gin_clean_pending_list(PG_FUNCTION_ARGS) + { + Oid indexoid = PG_GETARG_OID(0); + Relation indexRel = index_open(indexoid, AccessShareLock); + IndexBulkDeleteResult stats; + GinState ginstate; + + /* Must be a GIN index */ + if (indexRel->rd_rel->relkind != RELKIND_INDEX || + indexRel->rd_rel->relam != GIN_AM_OID) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("\"%s\" is not a GIN index", + RelationGetRelationName(indexRel; + + /* User must own the index (comparable to privileges needed for VACUUM) */ + if (!pg_class_ownercheck(indexoid, GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS, + RelationGetRelationName(indexRel)); + + + memset(, 0, sizeof(stats)); + initGinState(, indexRel); + ginInsertCleanup(, false, true, ); + + index_close(indexRel, AccessShareLock); + + PG_RETURN_INT64((int64) stats.pages_deleted); + } diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h new file mode 100644 index 5021887..9373ef5 *** a/src/include/access/gin_private.h --- b/src/include/access/gin_private.h *** extern void ginFreeScanKeys(GinScanOpaqu *** 878,883 --- 878,886 /* ginget.c */ extern Datum gingetbitmap(PG_FUNCTION_ARGS); + /* ginfast.c */ + extern Datum gin_clean_pending_list(PG_FUNCTION_ARGS); + /* ginlogic.c */ extern void ginInitConsistentFunction(GinState *ginstate, GinScanKey key); diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h new file mode 100644 index d8640db..4f76b84 *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h ***
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Thu, Nov 19, 2015 at 8:37 AM, Jaime Casanovawrote: > On 12 August 2015 at 20:19, Jeff Janes wrote: >> >> But where does this belong? Core? Its own separate extension? >> > > I will say core. Gin indexes are in core and i don't see why this > function shouldn't. > FWIW, brin indexes has a similar function brin_summarize_new_values() in core I was holding off on doing more on this until after the bug http://www.postgresql.org/message-id/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com was resolved, as I think it might change the way things work enough to make a difference, at least to the documentation if not the functioning. I will look at brin_summarize_new_values for guidance. But now we have one vote for core and one for 'gevel' extension, so I'm not sure where to go. The nice thing about core is that is always there (in the future) and maintained by a group of people, and as you point out there is precedence with the BRIN index. But the nice thing with an extension is that it easier to adapt for existing installations, so you don't have to wait for 9.6 to get access to it. I'll prepare a patch for core for the January commitfest, and see if it flies. If not, there is always the extension to fall back to. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Sun, Nov 22, 2015 at 1:15 PM, Jaime Casanovawrote: > On 21 November 2015 at 03:54, Jim Nasby wrote: >> On 11/19/15 10:47 AM, Jaime Casanova wrote: >>> >>> - only superusers? >> >> >> I would think the owner of the table (index?) should also be able to run >> this. > > agreed, that makes sense Also the function should ensure that the server is running in normal mode (not recovery mode) and the specified relation is NOT other session's temporary table. I added the similar function into pg_bigm extension (pg_gin_pending_cleanup function in https://osdn.jp/projects/pgbigm/scm/git/pg_bigm/blobs/master/bigm_gin.c). It might help us improve the patch. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 21 November 2015 at 03:54, Jim Nasbywrote: > On 11/19/15 10:47 AM, Jaime Casanova wrote: >> >> - only superusers? > > > I would think the owner of the table (index?) should also be able to run > this. agreed, that makes sense -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 19 November 2015 at 14:57, Jaime Casanovawrote: > On 19 November 2015 at 14:47, Jaime Casanova > wrote: >> On 19 November 2015 at 14:18, Alvaro Herrera >> wrote: >>> Alvaro Herrera wrote: Jeff Janes wrote: > I've written a function which allows users to clean up the pending list. > It takes the index name and returns the number of pending list pages > deleted. I just noticed that your patch uses AccessShareLock on the index. Is that okay? I would have assumed that you'd need ShareUpdateExclusive (same as vacuum uses), but I don't really know. Was that a carefully thought-out choice? >>> >>> After reading gitPendingCleanup it becomes clear that there's no need >>> for a stronger lock than what you've chosen. Jaime Casanova just >>> pointed this out to me. >>> >> >> But it should do some checks, no? >> - only superusers? >> - what i received as parameter is a GIN index? >> > > I just notice this: > > + ginInsertCleanup(, true, ); > > ginInsertCleanup() now has four parameters, so you should update the call > Btw, this is not in the commitfest and seems like a useful thing to have -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 11/19/15 10:47 AM, Jaime Casanova wrote: - only superusers? I would think the owner of the table (index?) should also be able to run this. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
Alvaro Herrera wrote: > Jeff Janes wrote: > > I've written a function which allows users to clean up the pending list. > > It takes the index name and returns the number of pending list pages > > deleted. > > I just noticed that your patch uses AccessShareLock on the index. Is > that okay? I would have assumed that you'd need ShareUpdateExclusive > (same as vacuum uses), but I don't really know. Was that a carefully > thought-out choice? After reading gitPendingCleanup it becomes clear that there's no need for a stronger lock than what you've chosen. Jaime Casanova just pointed this out to me. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 12 August 2015 at 20:19, Jeff Janeswrote: > > But where does this belong? Core? Its own separate extension? > I will say core. Gin indexes are in core and i don't see why this function shouldn't. FWIW, brin indexes has a similar function brin_summarize_new_values() in core -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
Jeff Janes wrote: > I've written a function which allows users to clean up the pending list. > It takes the index name and returns the number of pending list pages > deleted. I just noticed that your patch uses AccessShareLock on the index. Is that okay? I would have assumed that you'd need ShareUpdateExclusive (same as vacuum uses), but I don't really know. Was that a carefully thought-out choice? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 19 November 2015 at 14:18, Alvaro Herrerawrote: > Alvaro Herrera wrote: >> Jeff Janes wrote: >> > I've written a function which allows users to clean up the pending list. >> > It takes the index name and returns the number of pending list pages >> > deleted. >> >> I just noticed that your patch uses AccessShareLock on the index. Is >> that okay? I would have assumed that you'd need ShareUpdateExclusive >> (same as vacuum uses), but I don't really know. Was that a carefully >> thought-out choice? > > After reading gitPendingCleanup it becomes clear that there's no need > for a stronger lock than what you've chosen. Jaime Casanova just > pointed this out to me. > But it should do some checks, no? - only superusers? - what i received as parameter is a GIN index? -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On 19 November 2015 at 14:47, Jaime Casanovawrote: > On 19 November 2015 at 14:18, Alvaro Herrera wrote: >> Alvaro Herrera wrote: >>> Jeff Janes wrote: >>> > I've written a function which allows users to clean up the pending list. >>> > It takes the index name and returns the number of pending list pages >>> > deleted. >>> >>> I just noticed that your patch uses AccessShareLock on the index. Is >>> that okay? I would have assumed that you'd need ShareUpdateExclusive >>> (same as vacuum uses), but I don't really know. Was that a carefully >>> thought-out choice? >> >> After reading gitPendingCleanup it becomes clear that there's no need >> for a stronger lock than what you've chosen. Jaime Casanova just >> pointed this out to me. >> > > But it should do some checks, no? > - only superusers? > - what i received as parameter is a GIN index? > I just notice this: + ginInsertCleanup(, true, ); ginInsertCleanup() now has four parameters, so you should update the call -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GIN pending list clean up exposure to SQL
I've written a function which allows users to clean up the pending list. It takes the index name and returns the number of pending list pages deleted. # select * from gin_clean_pending_list('foo_text_array_idx'); gin_clean_pending_list 278 (1 row) Time: 31994.880 ms This is needed because there needs to be a way to offload this duty from the user backends, and the only other way to intentionaly clean up the list is by vacuum (and the rest of a vacuum can take days to run on a large table). Autoanalyze will also do it, but it hard to arrange for those to occur at need, and unless you drop default_statistics_target very low they can also take a long time. And if you do lower the target, it screws up your statistics, of course. I've currently crammed it into pageinspect, simply because that is where I found the existing code which I used as an exemplar for writing this code. But where does this belong? Core? Its own separate extension? Cheers, Jeff gin_clean_pending_user_v001.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pending list clean up exposure to SQL
On Thu, Aug 13, 2015 at 2:19 AM, Jeff Janes jeff.ja...@gmail.com wrote: I've written a function which allows users to clean up the pending list. It takes the index name and returns the number of pending list pages deleted. # select * from gin_clean_pending_list('foo_text_array_idx'); gin_clean_pending_list 278 (1 row) Time: 31994.880 ms This is needed because there needs to be a way to offload this duty from the user backends, and the only other way to intentionaly clean up the list is by vacuum (and the rest of a vacuum can take days to run on a large table). Autoanalyze will also do it, but it hard to arrange for those to occur at need, and unless you drop default_statistics_target very low they can also take a long time. And if you do lower the target, it screws up your statistics, of course. I've currently crammed it into pageinspect, simply because that is where I found the existing code which I used as an exemplar for writing this code. But where does this belong? Core? Its own separate extension? For a long time we have gevel extension ( http://www.sigaev.ru/git/gitweb.cgi?p=gevel.git;a=summary) , which was originally developed to help us debugging indexes, but it's also contains user's functions. Your function could be there, but I have no idea about gevel itself. Probably, it should be restructurized and come to contrib. Do you have time to look into ? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers