On Thu, Mar 16, 2017 at 1:02 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote: > On Thu, Mar 16, 2017 at 11:11 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> On Wed, Mar 15, 2017 at 9:23 PM, Ashutosh Sharma <ashu.coe...@gmail.com> >> wrote: >>> >>>> >>>> Few other comments: >>>> 1. >>>> + if (ndeletable > 0) >>>> + { >>>> + /* No ereport(ERROR) until changes are logged */ >>>> + START_CRIT_SECTION(); >>>> + >>>> + PageIndexMultiDelete(page, deletable, ndeletable); >>>> + >>>> + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page); >>>> + pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES; >>>> >>>> You clearing this flag while logging the action, but same is not taken >>>> care during replay. Any reasons? >>> >>> That's because we conditionally WAL Log this flag status and when we >>> do so, we take a it's FPI. >>> >> >> Sure, but we are not clearing in conditionally. I am not sure, how >> after recovery it will be cleared it gets set during normal operation. >> Moreover, btree already clears similar flag during replay (refer >> btree_xlog_delete). > > You were right. In case datachecksum is enabled or wal_log_hint is set > to true, 'LH_PAGE_HAS_DEAD_TUPLES' will get wal logged and therefore > needs to be cleared on the standby as well. >
I was thinking what bad can happen if we don't clear this flag during replay, the main thing that comes to mind is that after crash recovery, if the flag is set the inserts on that page might need to traverse all the tuples on that page once the page is full even if there are no dead tuples in that page. It can be later cleared when there are dead tuples in that page and we actually delete them, but I don't think it is worth the price to pay for not clearing the flag during replay. > Attached is the patch that > clears this flag on standby during replay. > Don't you think, we should also clear it during the replay of XLOG_HASH_DELETE? We might want to log the clear of flag along with WAL record for XLOG_HASH_DELETE. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers