Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
At Thu, 21 Sep 2017 20:35:01 -0400, Robert Haas wrote in > On Thu, Sep 21, 2017 at 8:16 PM, Kyotaro HORIGUCHI > wrote: > > Though I don't see it's bug and agree that the message is not > > proper, currently we can create hash indexes without no warning > > on unlogged tables and it causes a problem with replication. > > That's true, but I don't believe it's a sufficient reason to make a change. > > Before 84aa8ba128a08e6fdebb2497c7a79ebf18093e12 (2014), we didn't > issue a warning about hash indexes in any case whatsoever; we relied > on people reading the documentation to find out about the limitations > of hash indexes. They can still do that in any cases that the warning > doesn't adequately cover. I really don't think it's worth kibitzing > the cases where this message is emitted in released branches, or the > text of the message, just as we didn't back-patch the message itself > into older releases that are still supported. We need a compelling > reason to change things in stable branches, and the fact that a > warning message added in 2014 doesn't cover every limitation of a > pre-1996 hash index implementation is not an emergency. Let's save > back-patching for actual bugs, or we'll forever be fiddling with > things in stable branches that would be better left alone. Sorry for annoying you and thank you. I agree with that after just knowing the reason is not precisely (3) (we already have WARNING for the problematic ops). regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Thu, Sep 21, 2017 at 8:16 PM, Kyotaro HORIGUCHI wrote: > Though I don't see it's bug and agree that the message is not > proper, currently we can create hash indexes without no warning > on unlogged tables and it causes a problem with replication. That's true, but I don't believe it's a sufficient reason to make a change. Before 84aa8ba128a08e6fdebb2497c7a79ebf18093e12 (2014), we didn't issue a warning about hash indexes in any case whatsoever; we relied on people reading the documentation to find out about the limitations of hash indexes. They can still do that in any cases that the warning doesn't adequately cover. I really don't think it's worth kibitzing the cases where this message is emitted in released branches, or the text of the message, just as we didn't back-patch the message itself into older releases that are still supported. We need a compelling reason to change things in stable branches, and the fact that a warning message added in 2014 doesn't cover every limitation of a pre-1996 hash index implementation is not an emergency. Let's save back-patching for actual bugs, or we'll forever be fiddling with things in stable branches that would be better left alone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
At Thu, 21 Sep 2017 16:19:27 -0400, Robert Haas wrote in <694cb417-ef2c-4760-863b-aec4530c2...@gmail.com> > On Sep 21, 2017, at 8:59 AM, Amit Kapila wrote:. > > I think giving an error message like "hash indexes are not WAL-logged > > and .." for unlogged tables doesn't seem like a good behavior. > > +1. This seems like deliberate behavior, not a bug. Though I don't see it's bug and agree that the message is not proper, currently we can create hash indexes without no warning on unlogged tables and it causes a problem with replication. The point here is that if we leave the behavior (failure on the standby) for the reason that we see a warning on index creation, a similar messages ought to be for unlogged tables. Otherwise, our decision will be another option. (4) Though we won't not see a warning on hash index creation on unlogged tables, it seems to have been a problem and won't mind. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Sep 21, 2017, at 8:59 AM, Amit Kapila wrote:. > I think giving an error message like "hash indexes are not WAL-logged > and .." for unlogged tables doesn't seem like a good behavior. +1. This seems like deliberate behavior, not a bug. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Thu, Sep 21, 2017 at 4:14 PM, Kyotaro HORIGUCHI wrote: > > postgres=# create table test (id int primary key, v text); > postgres=# create index on test using hash (id); > WARNING: hash indexes are not WAL-logged and their use is discouraged > > But not for for unlogged tables. > > postgres=# create unlogged table test (id int primary key, v text); > postgres=# create index on test using hash (id); > postgres=# (no warning) > > And fails on promotion in the same way. > > postgres=# select * from test; > ERROR: could not open file "base/13324/16446": No such file or directory > > indexcmds.c@965:503 >> if (strcmp(accessMethodName, "hash") == 0 && >> RelationNeedsWAL(rel)) >> ereport(WARNING, >> (errmsg("hash indexes are not WAL-logged and their use is >> discouraged"))); > > Using !RelationUsesLocalBuffers instead fixes that and the > attached patch is for 9.6. I'm a bit unconfident on the usage of > logical meaning of the macro but what it does fits there. > I think giving an error message like "hash indexes are not WAL-logged and .." for unlogged tables doesn't seem like a good behavior. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
Hello, Following a bit older thread. At Tue, 18 Jul 2017 08:33:07 +0200, Michael Paquier wrote in > On Tue, Jul 18, 2017 at 4:18 AM, Amit Kapila wrote: > > Thanks. Do you have any suggestion for back-branches? As of now, it > > fails badly with below kind of error: > > > > test=> SELECT * FROM t_u_hash; > > ERROR: could not open file "base/16384/16392": No such file or directory > > > > It is explained in another thread [3] where it has been found that the > > reason for such an error is that hash indexes are not WAL logged prior > > to 10. Now, we can claim that we don't recommend hash indexes to be > > used prior to 10 in production, so such an error is okay even if there > > is no crash has happened in the system. > > There are a couple of approaches: > 1) Marking such indexes as invalid at recovery and log information > about the switch done. > 2) Error at creation of hash indexes on unlogged tables. > 3) Leave it as-is, because there is already a WARNING at creation. > I don't mind seeing 3) per the amount of work done lately to support > WAL on hash indexes. I overlooked that but (3) is true as long as the table is *logged* one. postgres=# create table test (id int primary key, v text); postgres=# create index on test using hash (id); WARNING: hash indexes are not WAL-logged and their use is discouraged But not for for unlogged tables. postgres=# create unlogged table test (id int primary key, v text); postgres=# create index on test using hash (id); postgres=# (no warning) And fails on promotion in the same way. postgres=# select * from test; ERROR: could not open file "base/13324/16446": No such file or directory indexcmds.c@965:503 > if (strcmp(accessMethodName, "hash") == 0 && > RelationNeedsWAL(rel)) > ereport(WARNING, > (errmsg("hash indexes are not WAL-logged and their use is > discouraged"))); Using !RelationUsesLocalBuffers instead fixes that and the attached patch is for 9.6. I'm a bit unconfident on the usage of logical meaning of the macro but what it does fits there. regards, -- Kyotaro Horiguchi NTT Open Source Software Center *** a/src/backend/commands/indexcmds.c --- b/src/backend/commands/indexcmds.c *** *** 501,507 DefineIndex(Oid relationId, amRoutine = GetIndexAmRoutine(accessMethodForm->amhandler); if (strcmp(accessMethodName, "hash") == 0 && ! RelationNeedsWAL(rel)) ereport(WARNING, (errmsg("hash indexes are not WAL-logged and their use is discouraged"))); --- 501,507 amRoutine = GetIndexAmRoutine(accessMethodForm->amhandler); if (strcmp(accessMethodName, "hash") == 0 && ! !RelationUsesLocalBuffers(rel)) ereport(WARNING, (errmsg("hash indexes are not WAL-logged and their use is discouraged"))); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Tue, Jul 18, 2017 at 4:18 AM, Amit Kapila wrote: > Thanks. Do you have any suggestion for back-branches? As of now, it > fails badly with below kind of error: > > test=> SELECT * FROM t_u_hash; > ERROR: could not open file "base/16384/16392": No such file or directory > > It is explained in another thread [3] where it has been found that the > reason for such an error is that hash indexes are not WAL logged prior > to 10. Now, we can claim that we don't recommend hash indexes to be > used prior to 10 in production, so such an error is okay even if there > is no crash has happened in the system. There are a couple of approaches: 1) Marking such indexes as invalid at recovery and log information about the switch done. 2) Error at creation of hash indexes on unlogged tables. 3) Leave it as-is, because there is already a WARNING at creation. I don't mind seeing 3) per the amount of work done lately to support WAL on hash indexes. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Mon, Jul 17, 2017 at 10:21 PM, Michael Paquier wrote: > On Mon, Jul 17, 2017 at 6:02 PM, Robert Haas wrote: >> It seems to me that this area might benefit from a broader rethink. >> It's not very nice to impose a restriction that init forks can only be >> constructed using log_newpage(); on the other hand, it is also not >> very nice that Amit's patch needs to recapitulate the mysterious >> incantation used by XLogReadBufferForRedoExtended in several more >> places. The basic problem here is that it would be really easy for >> the next person who adds an index AM to make the exact same mistake >> that Amit made here and that I failed to spot during code review. It >> would be nice to come up with some more generic solution to the >> problem rather than relying on each AM to do insert this >> FlushOneBuffer() incantation in each place where it's needed. I think >> there are probably several ways to do that; one idea might be to flush >> all init-fork buffers in bulk at the end of recovery. > > Things are centralized in XLogReadBufferForRedoExtended() for init > fork flushes, which is not something bad in itself as it is the unique > place doing this work normally for other AMs. And I have to admit that > the current coding for hash index WAL has the advantage to create less > WAL quantity as the bitmap and metadata pages get initialized using > the data of the records themselves. One idea I can think of would be > to create a README in src/backend/access for index AMs that summarizes > a basic set of recommendations for each routine used. > We already have quite a decent amount of information about indexes in our docs [1][2]. We might want to extend that as well. >> However, it doesn't really seem urgent to tinker with that; while I >> think the fact that existing AMs use log_newpage() to populate the >> init fork is mostly just luck, it might well be 10 or 20 years before >> somebody adds another AM that happens to use any other technique. >> Moreover, at the moment, we're trying to get a release out the door, >> and to me that argues for keeping structural changes to a minimum. >> Amit's patch seems like a pretty surgical fix to this problem with >> minimal chance of breaking anything new, and that seems like the right >> kind of fix for July. So I plan to commit what Amit proposed (with a >> few grammar corrections). > Thanks. Do you have any suggestion for back-branches? As of now, it fails badly with below kind of error: test=> SELECT * FROM t_u_hash; ERROR: could not open file "base/16384/16392": No such file or directory It is explained in another thread [3] where it has been found that the reason for such an error is that hash indexes are not WAL logged prior to 10. Now, we can claim that we don't recommend hash indexes to be used prior to 10 in production, so such an error is okay even if there is no crash has happened in the system. [1] - https://www.postgresql.org/docs/devel/static/indexam.html [2] - https://www.postgresql.org/docs/devel/static/index-functions.html [3] - https://www.postgresql.org/message-id/20170630005634.GA4448%40momjian.us -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Mon, Jul 17, 2017 at 6:02 PM, Robert Haas wrote: > It seems to me that this area might benefit from a broader rethink. > It's not very nice to impose a restriction that init forks can only be > constructed using log_newpage(); on the other hand, it is also not > very nice that Amit's patch needs to recapitulate the mysterious > incantation used by XLogReadBufferForRedoExtended in several more > places. The basic problem here is that it would be really easy for > the next person who adds an index AM to make the exact same mistake > that Amit made here and that I failed to spot during code review. It > would be nice to come up with some more generic solution to the > problem rather than relying on each AM to do insert this > FlushOneBuffer() incantation in each place where it's needed. I think > there are probably several ways to do that; one idea might be to flush > all init-fork buffers in bulk at the end of recovery. Things are centralized in XLogReadBufferForRedoExtended() for init fork flushes, which is not something bad in itself as it is the unique place doing this work normally for other AMs. And I have to admit that the current coding for hash index WAL has the advantage to create less WAL quantity as the bitmap and metadata pages get initialized using the data of the records themselves. One idea I can think of would be to create a README in src/backend/access for index AMs that summarizes a basic set of recommendations for each routine used. > However, it doesn't really seem urgent to tinker with that; while I > think the fact that existing AMs use log_newpage() to populate the > init fork is mostly just luck, it might well be 10 or 20 years before > somebody adds another AM that happens to use any other technique. > Moreover, at the moment, we're trying to get a release out the door, > and to me that argues for keeping structural changes to a minimum. > Amit's patch seems like a pretty surgical fix to this problem with > minimal chance of breaking anything new, and that seems like the right > kind of fix for July. So I plan to commit what Amit proposed (with a > few grammar corrections). No objections to that. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Sat, Jul 15, 2017 at 4:25 AM, Michael Paquier wrote: > On Sat, Jul 15, 2017 at 6:27 AM, Ashutosh Sharma > wrote: >> I do agree with Amit. I think hash index is slightly trickier (in >> terms of how it is organised) than other indexes and that could be the >> reason for maintaining common code for hashbuild and hashbuildempty. > > Well, you both and Robert worked more on this code for PG10 than I > did, so I am fine to rely on your judgement for the final result. > Still I find this special handling quite surprising. All other AMs > just always log FPWs for the init fork pages so I'd rather not break > this treaty, but that's one against the world as things stand > currently on this thread ;) It seems to me that this area might benefit from a broader rethink. It's not very nice to impose a restriction that init forks can only be constructed using log_newpage(); on the other hand, it is also not very nice that Amit's patch needs to recapitulate the mysterious incantation used by XLogReadBufferForRedoExtended in several more places. The basic problem here is that it would be really easy for the next person who adds an index AM to make the exact same mistake that Amit made here and that I failed to spot during code review. It would be nice to come up with some more generic solution to the problem rather than relying on each AM to do insert this FlushOneBuffer() incantation in each place where it's needed. I think there are probably several ways to do that; one idea might be to flush all init-fork buffers in bulk at the end of recovery. However, it doesn't really seem urgent to tinker with that; while I think the fact that existing AMs use log_newpage() to populate the init fork is mostly just luck, it might well be 10 or 20 years before somebody adds another AM that happens to use any other technique. Moreover, at the moment, we're trying to get a release out the door, and to me that argues for keeping structural changes to a minimum. Amit's patch seems like a pretty surgical fix to this problem with minimal chance of breaking anything new, and that seems like the right kind of fix for July. So I plan to commit what Amit proposed (with a few grammar corrections). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Sat, Jul 15, 2017 at 6:27 AM, Ashutosh Sharma wrote: > I do agree with Amit. I think hash index is slightly trickier (in > terms of how it is organised) than other indexes and that could be the > reason for maintaining common code for hashbuild and hashbuildempty. Well, you both and Robert worked more on this code for PG10 than I did, so I am fine to rely on your judgement for the final result. Still I find this special handling quite surprising. All other AMs just always log FPWs for the init fork pages so I'd rather not break this treaty, but that's one against the world as things stand currently on this thread ;) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Sat, Jul 15, 2017 at 8:20 AM, Amit Kapila wrote: > On Fri, Jul 14, 2017 at 6:02 PM, Michael Paquier > wrote: >> (catching up finally with this thread) >> >> On Mon, Jul 10, 2017 at 11:57 AM, Kyotaro HORIGUCHI >> wrote: >>> At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila >>> wrote in >>> I am also not 100% comfortable with adding flush at two new places, but that makes the code for fix simpler and fundamentally there is no problem in doing so. >>> >>> I agree that the patch would be simpler. Ok, I am satisfied with >>> an additional comment for _hash_init and hash_xlog_init_meta_page >>> that describes the reason that _hash_init doesn't/can't use >>> log_newpage and thus requires explicit flushes. Something like >>> the description in [1] would be enough. >> >> It seems to me that we should really consider logging a full page of >> the bitmap and meta pages for init forks as this is the common >> practice used by all the other AMs. >> > > I think if we really want to go in that direction then it is better to > write code separately for hashbuildempty, rather than adding special > purpose logic in _hash_init for init forks. As to the comparison > with other indexes, the logic of hash index is slightly tricky (as I > have already explained in email up thread) as compared to other > indexes, so we should not force ourselves to do that way if it can be > integrated with logged index creation path. I am not against doing > that way as it has some merit, but the advantage seems to be thin. > Let's not argue endlessly on this point because it is not that I have > not considered it doing the way you are saying (in fact I have > mentioned that in my first e-mail). Let us wait for an opinion from > others and or from committers. > I do agree with Amit. I think hash index is slightly trickier (in terms of how it is organised) than other indexes and that could be the reason for maintaining common code for hashbuild and hashbuildempty. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Fri, Jul 14, 2017 at 6:02 PM, Michael Paquier wrote: > (catching up finally with this thread) > > On Mon, Jul 10, 2017 at 11:57 AM, Kyotaro HORIGUCHI > wrote: >> At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila >> wrote in >>> I am also not 100% comfortable with adding flush at two new places, >>> but that makes the code for fix simpler and fundamentally there is no >>> problem in doing so. >> >> I agree that the patch would be simpler. Ok, I am satisfied with >> an additional comment for _hash_init and hash_xlog_init_meta_page >> that describes the reason that _hash_init doesn't/can't use >> log_newpage and thus requires explicit flushes. Something like >> the description in [1] would be enough. > > It seems to me that we should really consider logging a full page of > the bitmap and meta pages for init forks as this is the common > practice used by all the other AMs. > I think if we really want to go in that direction then it is better to write code separately for hashbuildempty, rather than adding special purpose logic in _hash_init for init forks. As to the comparison with other indexes, the logic of hash index is slightly tricky (as I have already explained in email up thread) as compared to other indexes, so we should not force ourselves to do that way if it can be integrated with logged index creation path. I am not against doing that way as it has some merit, but the advantage seems to be thin. Let's not argue endlessly on this point because it is not that I have not considered it doing the way you are saying (in fact I have mentioned that in my first e-mail). Let us wait for an opinion from others and or from committers. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
(catching up finally with this thread) On Mon, Jul 10, 2017 at 11:57 AM, Kyotaro HORIGUCHI wrote: > At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila > wrote in >> I am also not 100% comfortable with adding flush at two new places, >> but that makes the code for fix simpler and fundamentally there is no >> problem in doing so. > > I agree that the patch would be simpler. Ok, I am satisfied with > an additional comment for _hash_init and hash_xlog_init_meta_page > that describes the reason that _hash_init doesn't/can't use > log_newpage and thus requires explicit flushes. Something like > the description in [1] would be enough. It seems to me that we should really consider logging a full page of the bitmap and meta pages for init forks as this is the common practice used by all the other AMs. I would go as far as spreading a method similar to ginbuildempty() to all the AMs as delayed checkpoints guarantee that buffers are correctly flushed when marked as dirty. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
At Mon, 10 Jul 2017 18:37:34 +0530, Amit Kapila wrote in > On Mon, Jul 10, 2017 at 3:27 PM, Kyotaro HORIGUCHI > wrote: > > Hi, > > > > At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila > > wrote in > > > > > >> I am also not 100% comfortable with adding flush at two new places, > >> but that makes the code for fix simpler and fundamentally there is no > >> problem in doing so. > > > > I agree that the patch would be simpler. Ok, I am satisfied with > > an additional comment for _hash_init and hash_xlog_init_meta_page > > that describes the reason that _hash_init doesn't/can't use > > log_newpage and thus requires explicit flushes. Something like > > the description in [1] would be enough. > > > > I have modified the comment in hash_xlog_init_meta_page and a > corresponding function for bitmap page. However, I think adding > anything about not using log_newpage in _hash_init doesn't sound good > to me. I think you have suggested it so that we don't forget the > reason for not using log_newpage, but I think that is overkill. Let > me know if you have any other concerns? The modified comment looks perfect. Thanks! regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Mon, Jul 10, 2017 at 3:27 PM, Kyotaro HORIGUCHI wrote: > Hi, > > At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila > wrote in > >> I am also not 100% comfortable with adding flush at two new places, >> but that makes the code for fix simpler and fundamentally there is no >> problem in doing so. > > I agree that the patch would be simpler. Ok, I am satisfied with > an additional comment for _hash_init and hash_xlog_init_meta_page > that describes the reason that _hash_init doesn't/can't use > log_newpage and thus requires explicit flushes. Something like > the description in [1] would be enough. > I have modified the comment in hash_xlog_init_meta_page and a corresponding function for bitmap page. However, I think adding anything about not using log_newpage in _hash_init doesn't sound good to me. I think you have suggested it so that we don't forget the reason for not using log_newpage, but I think that is overkill. Let me know if you have any other concerns? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_unlogged_hash_index_issue_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
Hi, At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila wrote in > On Mon, Jul 10, 2017 at 10:39 AM, Kyotaro HORIGUCHI > wrote: > > At Sat, 8 Jul 2017 16:41:27 +0530, Amit Kapila > > wrote in > > > >> On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas wrote: > >> > On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila > >> > wrote: > >> >> I think we should do that as a separate patch (I can write the same as > >> >> well) because that is not new behavior introduced by this patch, ... > >> > > >> > True, although maybe hash indexes are the only way that happens today? > >> > > >> > >> No, I think it will happen for all other indexes as well. Basically, > >> we log new page for init forks of other indexes and then while > >> restoring that full page image, it will fall through the same path. > > > > (Sorry, I didn't noticed that hash metapage is not using log_newpgae) > > > > The bitmappage is also not using log_newpage. > > > For example, (bt|gin|gist|spgist|brin)buildempty are using > > log_newpage to log INIT_FORK metapages. I believe that the xlog > > flow from log_newpage to XLogReadBufferForRedoExtended is > > developed so that pages in init forks are surely flushed during > > recovery, so we should fix it instead adding other flushes if the > > path doesn't work. Am I looking wrong place? (I think it is > > working.) > > > > You are looking at right place. > > > If I'm understanding correctly here, hashbild and hashbuildempty > > should be refactored using the same structure with other *build > > and *buildempty functions. Specifically metapages initialization > > subroutines (_hash_init or _bt_initmetapage or SpGistInitMetapage > > or...) does only on-memory initialization and does not emit WAL, > > then *build and *buildempty emits WAL in their required way. > > > > I have considered this way of doing as well, read my first e-mail [1] > in this thread "Another approach to fix the issue ". It is not > that we can't change the code of hashbuildempty to make it log new > pages for all kind of pages (metapage, bitmappage and datapages), but > I am not sure if there is a value in going in that direction. If we > have to go in that direction, we need to hard code some of the values > like hashm_nmaps and hashm_mapp in metapage rather than initializing > them after bitmappage creation. Before going in that direction, I > think we should also take opinion from other people especially some > committer as we might need to maintain two different ways of doing > almost the same thing. Thanks for the explanation and the pointer (to the start of this thread.. sorry). > I am also not 100% comfortable with adding flush at two new places, > but that makes the code for fix simpler and fundamentally there is no > problem in doing so. I agree that the patch would be simpler. Ok, I am satisfied with an additional comment for _hash_init and hash_xlog_init_meta_page that describes the reason that _hash_init doesn't/can't use log_newpage and thus requires explicit flushes. Something like the description in [1] would be enough. > There is a small downside to always logging new page which is that it > will always log full page image in WAL which has the potential to > increase WAL volume if there are many unlogged tables and indexes. > Now considering the init fork generally has very less pages, it is not > a big concern, but still avoiding full page image has some value. Perhaps more effective mechanism will be developed at the time. > >> >>> By the way the comment of the function ReadBufferWithoutRelcache > >> >>> has the following sentense. > >> >>> > >> >>> | * NB: At present, this function may only be used on permanent > >> >>> relations, which > >> >>> | * is OK, because we only use it during XLOG replay. If in the > >> >>> future we > >> >>> | * want to use it on temporary or unlogged relations, we could pass > >> >>> additional > >> >>> | * parameters. > >> >>> > >> >>> and does > >> >>> > >> >>> | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, > >> >>> blockNum, > >> >>> mode, > >> >>> strategy, &hit); > >> >>> > >> >>> This surely works since BufferAlloc recognizes INIT_FORK. But > >> >>> some adjustment may be needed around here. > >> >> > >> >> Yeah, it should probably mention that the init fork of an unlogged > >> >> relation is also OK. > >> >> > >> > > >> > I think we should do that as a separate patch (I can write the same as > >> > well) because that is not new behavior introduced by this patch, but > >> > let me know if you think that we should add such a comment in this > >> > patch itself. > >> > > >> > >> Attached a separate patch to adjust the comment atop > >> ReadBufferWithoutRelcache. > > > > + * NB: At present, this function may only be used on permanent relations > > and > > + * init fork of an unlogged relation, which is OK, because we only use it > > + * during XLOG replay. If in the future we want t
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Mon, Jul 10, 2017 at 10:39 AM, Kyotaro HORIGUCHI wrote: > At Sat, 8 Jul 2017 16:41:27 +0530, Amit Kapila > wrote in >> On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas wrote: >> > On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila >> > wrote: >> >> I think we should do that as a separate patch (I can write the same as >> >> well) because that is not new behavior introduced by this patch, ... >> > >> > True, although maybe hash indexes are the only way that happens today? >> > >> >> No, I think it will happen for all other indexes as well. Basically, >> we log new page for init forks of other indexes and then while >> restoring that full page image, it will fall through the same path. > > (Sorry, I didn't noticed that hash metapage is not using log_newpgae) > The bitmappage is also not using log_newpage. > For example, (bt|gin|gist|spgist|brin)buildempty are using > log_newpage to log INIT_FORK metapages. I believe that the xlog > flow from log_newpage to XLogReadBufferForRedoExtended is > developed so that pages in init forks are surely flushed during > recovery, so we should fix it instead adding other flushes if the > path doesn't work. Am I looking wrong place? (I think it is > working.) > You are looking at right place. > If I'm understanding correctly here, hashbild and hashbuildempty > should be refactored using the same structure with other *build > and *buildempty functions. Specifically metapages initialization > subroutines (_hash_init or _bt_initmetapage or SpGistInitMetapage > or...) does only on-memory initialization and does not emit WAL, > then *build and *buildempty emits WAL in their required way. > I have considered this way of doing as well, read my first e-mail [1] in this thread "Another approach to fix the issue ". It is not that we can't change the code of hashbuildempty to make it log new pages for all kind of pages (metapage, bitmappage and datapages), but I am not sure if there is a value in going in that direction. If we have to go in that direction, we need to hard code some of the values like hashm_nmaps and hashm_mapp in metapage rather than initializing them after bitmappage creation. Before going in that direction, I think we should also take opinion from other people especially some committer as we might need to maintain two different ways of doing almost the same thing. I am also not 100% comfortable with adding flush at two new places, but that makes the code for fix simpler and fundamentally there is no problem in doing so. There is a small downside to always logging new page which is that it will always log full page image in WAL which has the potential to increase WAL volume if there are many unlogged tables and indexes. Now considering the init fork generally has very less pages, it is not a big concern, but still avoiding full page image has some value. > >> >>> By the way the comment of the function ReadBufferWithoutRelcache >> >>> has the following sentense. >> >>> >> >>> | * NB: At present, this function may only be used on permanent >> >>> relations, which >> >>> | * is OK, because we only use it during XLOG replay. If in the future >> >>> we >> >>> | * want to use it on temporary or unlogged relations, we could pass >> >>> additional >> >>> | * parameters. >> >>> >> >>> and does >> >>> >> >>> | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, >> >>> blockNum, >> >>> mode, strategy, >> >>> &hit); >> >>> >> >>> This surely works since BufferAlloc recognizes INIT_FORK. But >> >>> some adjustment may be needed around here. >> >> >> >> Yeah, it should probably mention that the init fork of an unlogged >> >> relation is also OK. >> >> >> > >> > I think we should do that as a separate patch (I can write the same as >> > well) because that is not new behavior introduced by this patch, but >> > let me know if you think that we should add such a comment in this >> > patch itself. >> > >> >> Attached a separate patch to adjust the comment atop >> ReadBufferWithoutRelcache. > > + * NB: At present, this function may only be used on permanent relations and > + * init fork of an unlogged relation, which is OK, because we only use it > + * during XLOG replay. If in the future we want to use it on temporary or > + * unlogged relations, we could pass additional parameters. > > *I* think the target of the funcion is permanent relations and > init forks, not unlogged relations. And I'd like to have an > additional comment about RELPERSISTENCE_PERMANENT. Something like > the attached. > Okay, let's leave this for committer to decide. [1] - https://www.postgresql.org/message-id/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
At Sat, 8 Jul 2017 16:41:27 +0530, Amit Kapila wrote in > On Sat, Jul 8, 2017 at 9:06 AM, Amit Kapila wrote: > > On Sat, Jul 8, 2017 at 8:52 AM, Robert Haas wrote: > >> On Thu, Jul 6, 2017 at 5:54 AM, Kyotaro HORIGUCHI > >> wrote: > >>> Check for INIT_FORKNUM appears both accompanied and not > >>> accompanied by check for RELPER.._UNLOGGED, so I'm not sure which > >>> is the convention here. > >> > >> Checking only for INIT_FORKNUM seems logical to me. Also checking for > >> RELPERSISTENCE_UNLOGGED just makes the code longer to no benefit. I > >> guess Amit copied the test from ATExecSetTablespace, which does it as > >> he did, but it seems unnecessarily long-winded. > >> > > > > Okay. If you and Michael feel the check that way is better, I will > > change and submit the patch. > > > > Changed as per suggestion. > On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas wrote: > > On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila > > wrote: > >> I think we should do that as a separate patch (I can write the same as > >> well) because that is not new behavior introduced by this patch, ... > > > > True, although maybe hash indexes are the only way that happens today? > > > > No, I think it will happen for all other indexes as well. Basically, > we log new page for init forks of other indexes and then while > restoring that full page image, it will fall through the same path. (Sorry, I didn't noticed that hash metapage is not using log_newpgae) For example, (bt|gin|gist|spgist|brin)buildempty are using log_newpage to log INIT_FORK metapages. I believe that the xlog flow from log_newpage to XLogReadBufferForRedoExtended is developed so that pages in init forks are surely flushed during recovery, so we should fix it instead adding other flushes if the path doesn't work. Am I looking wrong place? (I think it is working.) If I'm understanding correctly here, hashbild and hashbuildempty should be refactored using the same structure with other *build and *buildempty functions. Specifically metapages initialization subroutines (_hash_init or _bt_initmetapage or SpGistInitMetapage or...) does only on-memory initialization and does not emit WAL, then *build and *buildempty emits WAL in their required way. > >>> By the way the comment of the function ReadBufferWithoutRelcache > >>> has the following sentense. > >>> > >>> | * NB: At present, this function may only be used on permanent > >>> relations, which > >>> | * is OK, because we only use it during XLOG replay. If in the future we > >>> | * want to use it on temporary or unlogged relations, we could pass > >>> additional > >>> | * parameters. > >>> > >>> and does > >>> > >>> | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, > >>> blockNum, > >>> mode, strategy, > >>> &hit); > >>> > >>> This surely works since BufferAlloc recognizes INIT_FORK. But > >>> some adjustment may be needed around here. > >> > >> Yeah, it should probably mention that the init fork of an unlogged > >> relation is also OK. > >> > > > > I think we should do that as a separate patch (I can write the same as > > well) because that is not new behavior introduced by this patch, but > > let me know if you think that we should add such a comment in this > > patch itself. > > > > Attached a separate patch to adjust the comment atop > ReadBufferWithoutRelcache. + * NB: At present, this function may only be used on permanent relations and + * init fork of an unlogged relation, which is OK, because we only use it + * during XLOG replay. If in the future we want to use it on temporary or + * unlogged relations, we could pass additional parameters. *I* think the target of the funcion is permanent relations and init forks, not unlogged relations. And I'd like to have an additional comment about RELPERSISTENCE_PERMANENT. Something like the attached. reagards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 15795b0..cc3980c 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -673,10 +673,10 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum, * ReadBufferWithoutRelcache -- like ReadBufferExtended, but doesn't require * a relcache entry for the relation. * - * NB: At present, this function may only be used on permanent relations, which - * is OK, because we only use it during XLOG replay. If in the future we - * want to use it on temporary or unlogged relations, we could pass additional - * parameters. + * NB: At present, this function may only be used on main fork pages of + * permanent relations and init fork pages, which is OK, because we only use + * it during XLOG replay. If in the future we want to use it on temporary or + * unlogged relations, we could pass additional parameters. */ Buffer ReadBufferWithoutRelcache(RelFileNode rnode,
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Sat, Jul 8, 2017 at 9:06 AM, Amit Kapila wrote: > On Sat, Jul 8, 2017 at 8:52 AM, Robert Haas wrote: >> On Thu, Jul 6, 2017 at 5:54 AM, Kyotaro HORIGUCHI >> wrote: >>> Check for INIT_FORKNUM appears both accompanied and not >>> accompanied by check for RELPER.._UNLOGGED, so I'm not sure which >>> is the convention here. >> >> Checking only for INIT_FORKNUM seems logical to me. Also checking for >> RELPERSISTENCE_UNLOGGED just makes the code longer to no benefit. I >> guess Amit copied the test from ATExecSetTablespace, which does it as >> he did, but it seems unnecessarily long-winded. >> > > Okay. If you and Michael feel the check that way is better, I will > change and submit the patch. > Changed as per suggestion. >>> By the way the comment of the function ReadBufferWithoutRelcache >>> has the following sentense. >>> >>> | * NB: At present, this function may only be used on permanent relations, >>> which >>> | * is OK, because we only use it during XLOG replay. If in the future we >>> | * want to use it on temporary or unlogged relations, we could pass >>> additional >>> | * parameters. >>> >>> and does >>> >>> | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, >>> blockNum, >>> mode, strategy, >>> &hit); >>> >>> This surely works since BufferAlloc recognizes INIT_FORK. But >>> some adjustment may be needed around here. >> >> Yeah, it should probably mention that the init fork of an unlogged >> relation is also OK. >> > > I think we should do that as a separate patch (I can write the same as > well) because that is not new behavior introduced by this patch, but > let me know if you think that we should add such a comment in this > patch itself. > Attached a separate patch to adjust the comment atop ReadBufferWithoutRelcache. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_unlogged_hash_index_issue_v2.patch Description: Binary data fix_comment_atop_ReadBufferWithoutRelcache_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas wrote: > On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila wrote: >> I think we should do that as a separate patch (I can write the same as >> well) because that is not new behavior introduced by this patch, ... > > True, although maybe hash indexes are the only way that happens today? > No, I think it will happen for all other indexes as well. Basically, we log new page for init forks of other indexes and then while restoring that full page image, it will fall through the same path. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila wrote: > I think we should do that as a separate patch (I can write the same as > well) because that is not new behavior introduced by this patch, ... True, although maybe hash indexes are the only way that happens today? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Sat, Jul 8, 2017 at 8:52 AM, Robert Haas wrote: > On Thu, Jul 6, 2017 at 5:54 AM, Kyotaro HORIGUCHI > wrote: >> Check for INIT_FORKNUM appears both accompanied and not >> accompanied by check for RELPER.._UNLOGGED, so I'm not sure which >> is the convention here. > > Checking only for INIT_FORKNUM seems logical to me. Also checking for > RELPERSISTENCE_UNLOGGED just makes the code longer to no benefit. I > guess Amit copied the test from ATExecSetTablespace, which does it as > he did, but it seems unnecessarily long-winded. > Okay. If you and Michael feel the check that way is better, I will change and submit the patch. >> The difference of the two is an init fork of TEMP >> relations. However I believe that init fork is by definition a >> part of an unlogged relation, it seems to me that it ought not to >> be wal-logged if we had it. From this viewpoint, the middle line >> makes sense. > > Actually, the init fork of an unlogged relation *must* be WAL-logged. > All other forks are removed on a crash (and the main fork is copied > anew from the init fork). But the init fork itself must survive - > therefore it, and only it, must be WAL-logged and thus durable. > >> By the way the comment of the function ReadBufferWithoutRelcache >> has the following sentense. >> >> | * NB: At present, this function may only be used on permanent relations, >> which >> | * is OK, because we only use it during XLOG replay. If in the future we >> | * want to use it on temporary or unlogged relations, we could pass >> additional >> | * parameters. >> >> and does >> >> | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum, >> mode, strategy, >> &hit); >> >> This surely works since BufferAlloc recognizes INIT_FORK. But >> some adjustment may be needed around here. > > Yeah, it should probably mention that the init fork of an unlogged > relation is also OK. > I think we should do that as a separate patch (I can write the same as well) because that is not new behavior introduced by this patch, but let me know if you think that we should add such a comment in this patch itself. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Thu, Jul 6, 2017 at 5:54 AM, Kyotaro HORIGUCHI wrote: > Check for INIT_FORKNUM appears both accompanied and not > accompanied by check for RELPER.._UNLOGGED, so I'm not sure which > is the convention here. Checking only for INIT_FORKNUM seems logical to me. Also checking for RELPERSISTENCE_UNLOGGED just makes the code longer to no benefit. I guess Amit copied the test from ATExecSetTablespace, which does it as he did, but it seems unnecessarily long-winded. > The difference of the two is an init fork of TEMP > relations. However I believe that init fork is by definition a > part of an unlogged relation, it seems to me that it ought not to > be wal-logged if we had it. From this viewpoint, the middle line > makes sense. Actually, the init fork of an unlogged relation *must* be WAL-logged. All other forks are removed on a crash (and the main fork is copied anew from the init fork). But the init fork itself must survive - therefore it, and only it, must be WAL-logged and thus durable. > By the way the comment of the function ReadBufferWithoutRelcache > has the following sentense. > > | * NB: At present, this function may only be used on permanent relations, > which > | * is OK, because we only use it during XLOG replay. If in the future we > | * want to use it on temporary or unlogged relations, we could pass > additional > | * parameters. > > and does > > | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum, > mode, strategy, > &hit); > > This surely works since BufferAlloc recognizes INIT_FORK. But > some adjustment may be needed around here. Yeah, it should probably mention that the init fork of an unlogged relation is also OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Wed, Jul 5, 2017 at 11:02 PM, Noah Misch wrote: > [Action required within three days. This is a generic notification.] > > The above-described topic is currently a PostgreSQL 10 open item. Robert, > since you committed the patch believed to have created it, you own this open > item. If some other commit is more relevant or if this does not belong as a > v10 open item, please let us know. Otherwise, please observe the policy on > open item ownership[1] and send a status update within three calendar days of > this message. Include a date for your subsequent status update. Testers may > discover new open items at any time, and I want to plan to get them all fixed > well in advance of shipping v10. Consequently, I will appreciate your efforts > toward speedy resolution. Thanks. I don't intend to rush this in before the beta2 wrap, although some other committer is welcome to do so if they feel confident in the fix. I will update again by July 17th. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Thu, Jul 6, 2017 at 3:24 PM, Kyotaro HORIGUCHI wrote: > Hello, > > At Tue, 4 Jul 2017 14:49:26 +0530, Amit Kapila > wrote in >> On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier >> wrote: >> > On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila >> > wrote: >> > >> > It seems to me that this qualifies as an open item for 10. WAL-logging >> > is new for hash tables. Amit, could you add one? >> > >> >> Added. >> >> > + use_wal = RelationNeedsWAL(rel) || >> > + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && >> > + forkNum == INIT_FORKNUM); >> > In access AMs, empty() routines are always only called for unlogged >> > relations, so you could relax that to check for INIT_FORKNUM only. >> > >> >> Yeah, but _hash_init() is an exposed API, so I am not sure it is a >> good idea to make such an assumption at that level of code. Now, as >> there is no current user which wants to use it any other way, we can >> make such an assumption, but does it serve any purpose? > > Check for INIT_FORKNUM appears both accompanied and not > accompanied by check for RELPER.._UNLOGGED, so I'm not sure which > is the convention here. > > The difference of the two is an init fork of TEMP > relations. However I believe that init fork is by definition a > part of an unlogged relation, it seems to me that it ought not to > be wal-logged if we had it. From this viewpoint, the middle line > makes sense. > What is the middle line? Are you okay with a current check or do you see any problem with it or do you prefer to write it differently? > > >> > Looking at the patch, I think that you are right to do the logging >> > directly in _hash_init() and not separately in hashbuildempty(). >> > Compared to other relations the init data is more dynamic. >> > >> > + /* >> > +* Force the on-disk state of init forks to always be in sync with >> > the >> > +* state in shared buffers. See XLogReadBufferForRedoExtended. >> > +*/ >> > + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); >> > + if (forknum == INIT_FORKNUM) >> > + FlushOneBuffer(metabuf); >> > I find those forced syncs a bit surprising. The buffer is already >> > marked as dirty, so even if there is a concurrent checkpoint they >> > would be flushed. >> > >> >> What if there is no concurrent checkpoint activity and the server is >> shutdown? Remember this replay happens on standby and we will just >> try to perform Restartpoint and there is no guarantee that it will >> flush the buffers. If the buffers are not flushed at shutdown, then >> the Init fork will be empty on restart and the hash index will be >> corrupt. > > XLogReadBufferForRedoExtended() automatically force the flush for > a XLOG record on init fork that having FPW imaeges. And > _hash_init seems to have emitted such a XLOG record using > log_newpage. > Sure, but log_newpage is not used for metapage and bitmappage. >> > If recovery comes again here, then they would just >> > be replayed. I am unclear why XLogReadBufferForRedoExtended is not >> > enough to replay a FPW of that. >> > >> >> Where does FPW come into the picture for a replay of metapage or bitmappage? > > Since required FPW seems to be emitted and the flush should be > done automatically, the issuer side (_hash_init) may attach wrong > FPW images if hash_xlog_init_meta_page requires additional > flushes. But I don't see such a flaw there. I think Michael wants > to say such a kind of thing. > I am not able to understand what you want to say, but I think you don't see any problem with the patch as such. > > > By the way the comment of the function ReadBufferWithoutRelcache > has the following sentense. > I don't think we need anything with respect to this patch because ReadBufferWithoutRelcache is used in case of FPW replay of INIT_FORK pages as well. Do you have something else in mind which I am not able to see? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
FWIW.. At Thu, 06 Jul 2017 18:54:47 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20170706.185447.256482539.horiguchi.kyot...@lab.ntt.co.jp> > > > + /* > > > +* Force the on-disk state of init forks to always be in sync > > > with the > > > +* state in shared buffers. See XLogReadBufferForRedoExtended. > > > +*/ > > > + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); > > > + if (forknum == INIT_FORKNUM) > > > + FlushOneBuffer(metabuf); > > > I find those forced syncs a bit surprising. The buffer is already > > > marked as dirty, so even if there is a concurrent checkpoint they > > > would be flushed. > > > > > > > What if there is no concurrent checkpoint activity and the server is > > shutdown? Remember this replay happens on standby and we will just > > try to perform Restartpoint and there is no guarantee that it will > > flush the buffers. If the buffers are not flushed at shutdown, then > > the Init fork will be empty on restart and the hash index will be > > corrupt. > > XLogReadBufferForRedoExtended() automatically force the flush for > a XLOG record on init fork that having FPW imaeges. And > _hash_init seems to have emitted such a XLOG record using > log_newpage. > > > > If recovery comes again here, then they would just > > > be replayed. I am unclear why XLogReadBufferForRedoExtended is not > > > enough to replay a FPW of that. > > > > > > > Where does FPW come into the picture for a replay of metapage or bitmappage? > > Since required FPW seems to be emitted and the flush should be > done automatically, the issuer side (_hash_init) may attach wrong "may attach" -> "may have attached" > FPW images if hash_xlog_init_meta_page requires additional > flushes. But I don't see such a flaw there. I think Michael wants > to say such a kind of thing. regards. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
Hello, At Tue, 4 Jul 2017 14:49:26 +0530, Amit Kapila wrote in > On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier > wrote: > > On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila wrote: > > > > It seems to me that this qualifies as an open item for 10. WAL-logging > > is new for hash tables. Amit, could you add one? > > > > Added. > > > + use_wal = RelationNeedsWAL(rel) || > > + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && > > + forkNum == INIT_FORKNUM); > > In access AMs, empty() routines are always only called for unlogged > > relations, so you could relax that to check for INIT_FORKNUM only. > > > > Yeah, but _hash_init() is an exposed API, so I am not sure it is a > good idea to make such an assumption at that level of code. Now, as > there is no current user which wants to use it any other way, we can > make such an assumption, but does it serve any purpose? Check for INIT_FORKNUM appears both accompanied and not accompanied by check for RELPER.._UNLOGGED, so I'm not sure which is the convention here. The difference of the two is an init fork of TEMP relations. However I believe that init fork is by definition a part of an unlogged relation, it seems to me that it ought not to be wal-logged if we had it. From this viewpoint, the middle line makes sense. > > Looking at the patch, I think that you are right to do the logging > > directly in _hash_init() and not separately in hashbuildempty(). > > Compared to other relations the init data is more dynamic. > > > > + /* > > +* Force the on-disk state of init forks to always be in sync with > > the > > +* state in shared buffers. See XLogReadBufferForRedoExtended. > > +*/ > > + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); > > + if (forknum == INIT_FORKNUM) > > + FlushOneBuffer(metabuf); > > I find those forced syncs a bit surprising. The buffer is already > > marked as dirty, so even if there is a concurrent checkpoint they > > would be flushed. > > > > What if there is no concurrent checkpoint activity and the server is > shutdown? Remember this replay happens on standby and we will just > try to perform Restartpoint and there is no guarantee that it will > flush the buffers. If the buffers are not flushed at shutdown, then > the Init fork will be empty on restart and the hash index will be > corrupt. XLogReadBufferForRedoExtended() automatically force the flush for a XLOG record on init fork that having FPW imaeges. And _hash_init seems to have emitted such a XLOG record using log_newpage. > > If recovery comes again here, then they would just > > be replayed. I am unclear why XLogReadBufferForRedoExtended is not > > enough to replay a FPW of that. > > > > Where does FPW come into the picture for a replay of metapage or bitmappage? Since required FPW seems to be emitted and the flush should be done automatically, the issuer side (_hash_init) may attach wrong FPW images if hash_xlog_init_meta_page requires additional flushes. But I don't see such a flaw there. I think Michael wants to say such a kind of thing. By the way the comment of the function ReadBufferWithoutRelcache has the following sentense. | * NB: At present, this function may only be used on permanent relations, which | * is OK, because we only use it during XLOG replay. If in the future we | * want to use it on temporary or unlogged relations, we could pass additional | * parameters. and does | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum, mode, strategy, &hit); This surely works since BufferAlloc recognizes INIT_FORK. But some adjustment may be needed around here. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Mon, Jul 03, 2017 at 09:12:01AM +0530, Amit Kapila wrote: > While discussing the behavior of hash indexes with Bruce in the nearby > thread [1], it has been noticed that hash index on unlogged tables > doesn't behave as expected. Prior to 10, it has the different set of > problems (mainly because hash indexes are not WAL-logged) which were > discussed on that thread [1], however when I checked, it doesn't work > even for 10. Below are steps to reproduce the problem. > > 1. Setup master and standby > 2. On the master, create unlogged table and hash index. >2A. Create unlogged table t1(c1 int); >2B. Create hash index idx_t1_hash on t1 using hash(c1); > 3. On Standby, try selecting data, >select * from t1; >ERROR: cannot access temporary or unlogged relations during recovery > ---Till here everything works as expected > 4. Promote standby to master (I have just stopped the standby and > master and removed recovery.conf file from the standby database > location) and try starting the new master, it > gives below error and doesn't get started. > FATAL: could not create file "base/12700/16387": File exists > > The basic issue was that the WAL logging for Create Index operation > was oblivion of the fact that for unlogged tables only INIT forks need > to be logged. Another point which we need to consider is that while > replaying the WAL for the create index operation, we need to flush the > buffer if it is for init fork. This needs to be done only for pages > that can be part of init fork file (like metapage, bitmappage). > Attached patch fixes the issue. > > Another approach to fix the issue could be that always log complete > pages for the create index operation on unlogged tables (in > hashbuildempty). However, the logic for initial hash index pages > needs us to perform certain other actions (like update metapage after > the creation of bitmappage) which make it difficult to follow that > approach. > > I think this should be considered a PostgreSQL 10 open item. > > > [1] - https://www.postgresql.org/message-id/20170630005634.GA4448%40momjian.us [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier wrote: > On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila wrote: > > It seems to me that this qualifies as an open item for 10. WAL-logging > is new for hash tables. Amit, could you add one? > Added. > + use_wal = RelationNeedsWAL(rel) || > + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && > + forkNum == INIT_FORKNUM); > In access AMs, empty() routines are always only called for unlogged > relations, so you could relax that to check for INIT_FORKNUM only. > Yeah, but _hash_init() is an exposed API, so I am not sure it is a good idea to make such an assumption at that level of code. Now, as there is no current user which wants to use it any other way, we can make such an assumption, but does it serve any purpose? > Looking at the patch, I think that you are right to do the logging > directly in _hash_init() and not separately in hashbuildempty(). > Compared to other relations the init data is more dynamic. > > + /* > +* Force the on-disk state of init forks to always be in sync with the > +* state in shared buffers. See XLogReadBufferForRedoExtended. > +*/ > + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); > + if (forknum == INIT_FORKNUM) > + FlushOneBuffer(metabuf); > I find those forced syncs a bit surprising. The buffer is already > marked as dirty, so even if there is a concurrent checkpoint they > would be flushed. > What if there is no concurrent checkpoint activity and the server is shutdown? Remember this replay happens on standby and we will just try to perform Restartpoint and there is no guarantee that it will flush the buffers. If the buffers are not flushed at shutdown, then the Init fork will be empty on restart and the hash index will be corrupt. > If recovery comes again here, then they would just > be replayed. I am unclear why XLogReadBufferForRedoExtended is not > enough to replay a FPW of that. > Where does FPW come into the picture for a replay of metapage or bitmappage? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila wrote: > There is already such a test, see create_index.sql. > CREATE UNLOGGED TABLE unlogged_hash_table (id int4); > CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id > int4_ops); > > Do you have something else in mind? > > I think the problem mentioned in this thread can be caught only if we > promote the standby and restart it. We might be able to write it > using TAP framework, but I think such a test should exist for other > index types as well or in general for unlogged tables. I am not > opposed to writing such a test if you and or others feel so. There have been many requests for something like that. This overlaps a lot with the existing make check though, so a buildfarm module using wal_consistency_checking may be a better idea than something in core. >> Apart from that i have tested the patch and the patch seems to be working >> fine. > > Thanks. It seems to me that this qualifies as an open item for 10. WAL-logging is new for hash tables. Amit, could you add one? + use_wal = RelationNeedsWAL(rel) || + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && + forkNum == INIT_FORKNUM); In access AMs, empty() routines are always only called for unlogged relations, so you could relax that to check for INIT_FORKNUM only. Looking at the patch, I think that you are right to do the logging directly in _hash_init() and not separately in hashbuildempty(). Compared to other relations the init data is more dynamic. + /* +* Force the on-disk state of init forks to always be in sync with the +* state in shared buffers. See XLogReadBufferForRedoExtended. +*/ + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); + if (forknum == INIT_FORKNUM) + FlushOneBuffer(metabuf); I find those forced syncs a bit surprising. The buffer is already marked as dirty, so even if there is a concurrent checkpoint they would be flushed. If recovery comes again here, then they would just be replayed. I am unclear why XLogReadBufferForRedoExtended is not enough to replay a FPW of that. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
Hi, On Mon, Jul 3, 2017 at 4:10 PM, Amit Kapila wrote: > > On Mon, Jul 3, 2017 at 3:24 PM, Ashutosh Sharma wrote: > > On Mon, Jul 3, 2017 at 9:12 AM, Amit Kapila wrote: > >> The basic issue was that the WAL logging for Create Index operation > >> was oblivion of the fact that for unlogged tables only INIT forks need > >> to be logged. Another point which we need to consider is that while > >> replaying the WAL for the create index operation, we need to flush the > >> buffer if it is for init fork. This needs to be done only for pages > >> that can be part of init fork file (like metapage, bitmappage). > >> Attached patch fixes the issue. > >> > >> Another approach to fix the issue could be that always log complete > >> pages for the create index operation on unlogged tables (in > >> hashbuildempty). However, the logic for initial hash index pages > >> needs us to perform certain other actions (like update metapage after > >> the creation of bitmappage) which make it difficult to follow that > >> approach. > >> > > > > Thanks for sharing the steps to reproduce the issue in detail. I could > > easily reproduce this issue. I also had a quick look into the patch and the > > fix looks fine to me. However, it would be good if we can add at least one > > test for unlogged table with hash index in the regression test suite. > > > > There is already such a test, see create_index.sql. > CREATE UNLOGGED TABLE unlogged_hash_table (id int4); > CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id > int4_ops); > > Do you have something else in mind? Yes, that's what i had in my mind. But, good to know that we already have a test-case with hash index created on an unlogged table. > > > I think the problem mentioned in this thread can be caught only if we > promote the standby and restart it. We might be able to write it > using TAP framework, but I think such a test should exist for other > index types as well or in general for unlogged tables. I am not > opposed to writing such a test if you and or others feel so. I think, it would be a good thing to add such test cases using TAP framework but as you said, it would be good to take others opinion as well on this. Thanks. > > > Apart from that i have tested the patch and the patch seems to be working > > fine. > > Thanks. > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
On Mon, Jul 3, 2017 at 3:24 PM, Ashutosh Sharma wrote: > On Mon, Jul 3, 2017 at 9:12 AM, Amit Kapila wrote: >> The basic issue was that the WAL logging for Create Index operation >> was oblivion of the fact that for unlogged tables only INIT forks need >> to be logged. Another point which we need to consider is that while >> replaying the WAL for the create index operation, we need to flush the >> buffer if it is for init fork. This needs to be done only for pages >> that can be part of init fork file (like metapage, bitmappage). >> Attached patch fixes the issue. >> >> Another approach to fix the issue could be that always log complete >> pages for the create index operation on unlogged tables (in >> hashbuildempty). However, the logic for initial hash index pages >> needs us to perform certain other actions (like update metapage after >> the creation of bitmappage) which make it difficult to follow that >> approach. >> > > Thanks for sharing the steps to reproduce the issue in detail. I could > easily reproduce this issue. I also had a quick look into the patch and the > fix looks fine to me. However, it would be good if we can add at least one > test for unlogged table with hash index in the regression test suite. > There is already such a test, see create_index.sql. CREATE UNLOGGED TABLE unlogged_hash_table (id int4); CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id int4_ops); Do you have something else in mind? I think the problem mentioned in this thread can be caught only if we promote the standby and restart it. We might be able to write it using TAP framework, but I think such a test should exist for other index types as well or in general for unlogged tables. I am not opposed to writing such a test if you and or others feel so. > Apart from that i have tested the patch and the patch seems to be working > fine. Thanks. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
Hi, On Mon, Jul 3, 2017 at 9:12 AM, Amit Kapila wrote: > While discussing the behavior of hash indexes with Bruce in the nearby > thread [1], it has been noticed that hash index on unlogged tables > doesn't behave as expected. Prior to 10, it has the different set of > problems (mainly because hash indexes are not WAL-logged) which were > discussed on that thread [1], however when I checked, it doesn't work > even for 10. Below are steps to reproduce the problem. > > 1. Setup master and standby > 2. On the master, create unlogged table and hash index. >2A. Create unlogged table t1(c1 int); >2B. Create hash index idx_t1_hash on t1 using hash(c1); > 3. On Standby, try selecting data, >select * from t1; >ERROR: cannot access temporary or unlogged relations during recovery > ---Till here everything works as expected > 4. Promote standby to master (I have just stopped the standby and > master and removed recovery.conf file from the standby database > location) and try starting the new master, it > gives below error and doesn't get started. > FATAL: could not create file "base/12700/16387": File exists > > The basic issue was that the WAL logging for Create Index operation > was oblivion of the fact that for unlogged tables only INIT forks need > to be logged. Another point which we need to consider is that while > replaying the WAL for the create index operation, we need to flush the > buffer if it is for init fork. This needs to be done only for pages > that can be part of init fork file (like metapage, bitmappage). > Attached patch fixes the issue. > > Another approach to fix the issue could be that always log complete > pages for the create index operation on unlogged tables (in > hashbuildempty). However, the logic for initial hash index pages > needs us to perform certain other actions (like update metapage after > the creation of bitmappage) which make it difficult to follow that > approach. > Thanks for sharing the steps to reproduce the issue in detail. I could easily reproduce this issue. I also had a quick look into the patch and the fix looks fine to me. However, it would be good if we can add at least one test for unlogged table with hash index in the regression test suite. Apart from that i have tested the patch and the patch seems to be working fine. Here are the steps that i have followed to verify the fix, With Patch: postgres=# SELECT pg_relation_filepath('t1'); pg_relation_filepath -- base/14915/16384 (1 row) postgres=# SELECT pg_relation_filepath('idx_t1_hash'); pg_relation_filepath -- base/14915/16387 (1 row) Master: = [ashu@localhost bin]$ ls -l ../master/base/14915/1638* -rw---. 1 ashu ashu 18128896 Jul 3 14:29 ../master/base/14915/16384 -rw---. 1 ashu ashu24576 Jul 3 14:29 ../master/base/14915/16384_fsm -rw---. 1 ashu ashu0 Jul 3 14:28 ../master/base/14915/16384_init -rw---. 1 ashu ashu 22339584 Jul 3 14:29 ../master/base/14915/16387 -rw---. 1 ashu ashu32768 Jul 3 14:28 ../master/base/14915/16387_init Standby: == [ashu@localhost bin]$ ls -l ../standby/base/14915/1638* -rw---. 1 ashu ashu 0 Jul 3 14:28 ../standby/base/14915/16384_init -rw---. 1 ashu ashu 32768 Jul 3 14:28 ../standby/base/14915/16387_init Without patch: == postgres=# SELECT pg_relation_filepath('t1'); pg_relation_filepath -- base/14915/16384 (1 row) postgres=# SELECT pg_relation_filepath('idx_t1_hash'); pg_relation_filepath -- base/14915/16387 (1 row) Master: = [ashu@localhost bin]$ ls -l ../master/base/14915/1638* -rw---. 1 ashu ashu 18128896 Jul 3 14:36 ../master/base/14915/16384 -rw---. 1 ashu ashu24576 Jul 3 14:36 ../master/base/14915/16384_fsm -rw---. 1 ashu ashu0 Jul 3 14:35 ../master/base/14915/16384_init -rw---. 1 ashu ashu 22339584 Jul 3 14:36 ../master/base/14915/16387 -rw---. 1 ashu ashu32768 Jul 3 14:35 ../master/base/14915/16387_init Standby: == [ashu@localhost bin]$ ls -l ../standby/base/14915/1638* -rw---. 1 ashu ashu 0 Jul 3 14:35 ../standby/base/14915/16384_init *-rw---. 1 ashu ashu 73728 Jul 3 14:35 ../standby/base/14915/16387* -rw---. 1 ashu ashu 24576 Jul 3 14:35 ../standby/base/14915/16387_init As seen from the test results above, it is clear that without patch, both main fork and init fork were getting WAL logged as the create hash index WAL logging was being done without checking forkNum type. _hash_init() { log_newpage(&rel->rd_node, forkNum, blkno, BufferGetPage(buf), true); _hash_relbuf(rel, buf); } I have also ran the WAL consistency check tool to verify the things and didn't find any issues. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://w