Re: [HACKERS] Write Ahead Logging for Hash Indexes

Amit Kapila Mon, 13 Mar 2017 06:01:01 -0700

On Sun, Mar 12, 2017 at 8:06 AM, Robert Haas <[email protected]> wrote:
> On Sat, Mar 11, 2017 at 12:20 AM, Amit Kapila <[email protected]> wrote:
>>>                  /*
>>> +                 * Change the shared buffer state in critical section,
>>> +                 * otherwise any error could make it unrecoverable after
>>> +                 * recovery.
>>> +                 */
>>> +                START_CRIT_SECTION();
>>> +
>>> +                /*
>>>                   * Insert tuple on new page, using _hash_pgaddtup to ensure
>>>                   * correct ordering by hashkey.  This is a tad inefficient
>>>                   * since we may have to shuffle itempointers repeatedly.
>>>                   * Possible future improvement: accumulate all the items 
>>> for
>>>                   * the new page and qsort them before insertion.
>>>                   */
>>>                  (void) _hash_pgaddtup(rel, nbuf, itemsz, new_itup);
>>>
>>> +                END_CRIT_SECTION();
>>>
>>> No way.  You have to start the critical section before making any page
>>> modifications and keep it alive until all changes have been logged.
>>>
>>
>> I think what we need to do here is to accumulate all the tuples that
>> need to be added to new bucket page till either that page has no more
>> space or there are no more tuples remaining in an old bucket.  Then in
>> a critical section, add them to the page using _hash_pgaddmultitup and
>> log the entire new bucket page contents as is currently done in patch
>> log_split_page().
>
> I agree.
>


Okay, I have changed like that in the patch.

>> Now, here we can choose to log the individual
>> tuples as well instead of a complete page, however not sure if there
>> is any benefit for doing the same because XLogRecordAssemble() will
>> anyway remove the empty space from the page.  Let me know if you have
>> something else in mind.
>
> Well, if you have two pages that are 75% full, and you move a third of
> the tuples from one of them into the other, it's going to be about
> four times more efficient to log only the moved tuples than the whole
> page.
>

I don't see how this could happen during split?  I mean if you are
moving 25% tuples from old bucket page to a new bucket page which is
75% full, it will log the new bucket page only after it gets full.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Write Ahead Logging for Hash Indexes

Reply via email to