Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Apr 4, 2017 at 6:33 AM, Mithun Cy wrote: > On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas wrote: >> Committed. > > Thanks Robert, > > And also sorry, one unfortunate thing happened in the last patch while > fixing one of the review comments a variable disappeared from the > equation > @_hash_spareindex. > > splitpoint_phases += > - (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) & > + (((num_bucket - 1) >> > + (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) & > HASH_SPLITPOINT_PHASE_MASK); /* to 0-based value. */ > > I wanted most significant 3 bits. And while fixing comments in patch11 > I unknowingly somehow removed splitpoint_group from the equation. > Extremely sorry for the mistake. Thanks to Ashutosh Sharma for > pointing the mistake. Ugh, OK, committed that also. Please try to be more careful in the future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas wrote: > Committed. Thanks Robert, And also sorry, one unfortunate thing happened in the last patch while fixing one of the review comments a variable disappeared from the equation @_hash_spareindex. splitpoint_phases += - (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) & + (((num_bucket - 1) >> + (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) & HASH_SPLITPOINT_PHASE_MASK); /* to 0-based value. */ I wanted most significant 3 bits. And while fixing comments in patch11 I unknowingly somehow removed splitpoint_group from the equation. Extremely sorry for the mistake. Thanks to Ashutosh Sharma for pointing the mistake. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com _hash_spareindex_defect.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Sat, Apr 1, 2017 at 3:29 AM, Mithun Cy wrote: > On Sat, Apr 1, 2017 at 12:31 PM, Mithun Cy wrote: >> Also adding a patch which implements the 2nd way. > Sorry, I forgot to add sortbuild_hash patch, which also needs similar > changes for the hash_mask. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Sat, Apr 1, 2017 at 12:31 PM, Mithun Cy wrote: > Also adding a patch which implements the 2nd way. Sorry, I forgot to add sortbuild_hash patch, which also needs similar changes for the hash_mask. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com sortbuild_hash_A_2.patch Description: Binary data yet_another_expand_hashbucket_efficiently_15.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Sat, Apr 1, 2017 at 7:05 AM, Robert Haas wrote: > Hmm, don't the changes to contrib/pageinspect/expected/hash.out > indicate that you've broken something? The hash index has only 4 > buckets, so the new code shouldn't be doing anything differently, but > you've got highmask and lowmask changing for some reason. The high mask calculation has got changed a bit to accommodate num_buckets which are not 2-power number. If num_bucket is not a 2-power number highmask should be its immediate next ((2^x) - 1) and low mask should be (highmask >> 1) to cover the first half of the buckets. Trying to generalize same has changed the masks for 2-power num_buckets from older implementation. + /* set highmask, which should be sufficient to cover num_buckets. */ + metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1; But this do not cause any adverse effect the high and low mask is sufficiently enough to get the same mod, If we add one more bucket then @_hash_expandtable, immediately we make the masks bigger. if (new_bucket > metap->hashm_highmask) { /* Starting a new doubling */ metap->hashm_lowmask = metap->hashm_highmask; metap->hashm_highmask = new_bucket | metap->hashm_lowmask; The state (metap->hashm_highmask == metap->hashm_maxbucket) is natural state to occur while hash index is growing and just before doubling. Another choice I could have made is bump a number so that for 2-power num_buckets will get highmask as similar to old code, and non 2-power num_buckets highmask will be immediate next ((2^x) - 1). + /* set highmask, which should be sufficient to cover num_buckets. */ + metap->hashm_highmask = (1 << (_hash_log2(num_buckets + 1))) - 1; It was just a personal preference I choose 1, as it appeared consistent with running state of hash index expansion. Also adding a patch which implements the 2nd way. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com yet_another_expand_hashbucket_efficiently_15.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Fri, Mar 31, 2017 at 1:15 AM, Mithun Cy wrote: > Thanks, I have tried to fix all of the comments. Thanks. Hmm, don't the changes to contrib/pageinspect/expected/hash.out indicate that you've broken something? The hash index has only 4 buckets, so the new code shouldn't be doing anything differently, but you've got highmask and lowmask changing for some reason. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
Thanks, I have tried to fix all of the comments. On Fri, Mar 31, 2017 at 8:10 AM, Robert Haas wrote: > On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy wrote: >> Thanks Robert, I have tried to fix the comments given as below. > > Thanks. > > Since this changes the on-disk format of hash indexes in an > incompatible way, I think it should bump HASH_VERSION. (Hopefully > that doesn't preclude REINDEX?) pg_upgrade should probably also be > patched to issue a warning if upgrading from a version < 10 to a > version >= 10 whenever hash indexes are present; I thought we had > similar cases already, but I don't see them at the moment. Maybe we > can get Bruce or someone to give us some advice on exactly what should > be done here. As of now increasing version ask us to REINDEX (metapage access verify whether we are in right version) postgres=# set enable_seqscan= off; SET postgres=# select * from t1 where i = 10; ERROR: index "hash2" has wrong hash version HINT: Please REINDEX it. postgres=# insert into t1 values(10); ERROR: index "hash2" has wrong hash version HINT: Please REINDEX it. postgres=# REINDEX INDEX hash2; REINDEX postgres=# select * from t1 where i = 10; i 10 (1 row) Last time we changed this version from 1 to 2 (4adc2f72a4ccd6e55e594aca837f09130a6af62b), from logs I see no changes for upgrade specifically. Hi Bruce, can you please advise us what should be done here. > In a couple of places, you say that a splitpoint group has 4 slots > rather than 4 phases. --Fixed > I think that in _hash_get_totalbuckets(), you should use blah & > HASH_SPLITPOINT_PHASE_MASK rather than blah % > HASH_SPLITPOINT_PHASES_PER_GRP for consistency with _hash_spareindex > and, perhaps, speed. Similarly, instead of blah / > HASH_SPLITPOINT_PHASES_PER_GRP, use blah >> > HASH_SPLITPOINT_PHASE_BITS. --Fixed > > buckets_toadd is punctuated oddly. buckets_to_add? Instead of > hand-calculating this, how about calculating it as > _hash_get_totalbuckets(spare_ndx) - _hash_get_totalbuckets(spare_ndx - > 1)? I think this should do that, considering new_bucket is nothng but 1-based max_buckets. buckets_to_add = _hash_get_totalbuckets(spare_ndx) - new_bucket; That makes me do away with +#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \ + (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \ + (sp_phase) : \ + sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \ + HASH_SPLITPOINT_PHASE_BITS) + \ + HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE)) as this is now used in only one place _hash_get_totalbuckets. I also think the comments above can be removed now. As we have removed the code related to multi-phased allocation there. + * The number of buckets in the new splitpoint group is equal to the + * total number already in existence. But we do not allocate them at + * once. Each splitpoint group will have 4 phases, we distribute the + * buckets equally among them. So we allocate only one fourth of total + * buckets in new splitpoint group at a time to consume one phase after + * another. > +uint32 splitpoint_group = 0; > > Don't need the = 0 here; the next reference to this variable is an > unconditional initialization. --Fixed, with new code splitpoint_group is not needed. > > + */ > + > +splitpoint_group = > HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(spare_ndx); > > I would delete the blank line. --Fixed. > > - * should start from ((2 ^ i) + metap->hashm_spares[i - 1] + 1). > + * should start from > + * (_hash_get_totalbuckets(i) + metap->hashm_spares[i - 1] + 1). > > Won't survive pgindent. --Fixed as pgindent has suggested. > > - * The number of buckets in the new splitpoint is equal to the total > - * number already in existence, i.e. new_bucket. Currently this maps > - * one-to-one to blocks required, but someday we may need a more > - * complicated calculation here. We treat allocation of buckets as a > - * separate WAL-logged action. Even if we fail after this operation, > - * won't leak bucket pages; rather, the next split will consume this > - * space. In any case, even without failure we don't use all the > space > - * in one split operation. > + * The number of buckets in the new splitpoint group is equal to the > + * total number already in existence. But we do not allocate them at > + * once. Each splitpoint group will have 4 slots, we distribute the > + * buckets equally among them. So we allocate only one fourth of > total > + * buckets in new splitpoint group at a time to consume one phase > after > + * another. We treat allocation of buckets as a separate WAL-logged > + * action. Even if we fail after this operation, won't leak bucket > + * pages; rather, the next split will consume this space. In any > case, > + * even without failur
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy wrote: > Thanks Robert, I have tried to fix the comments given as below. Thanks. Since this changes the on-disk format of hash indexes in an incompatible way, I think it should bump HASH_VERSION. (Hopefully that doesn't preclude REINDEX?) pg_upgrade should probably also be patched to issue a warning if upgrading from a version < 10 to a version >= 10 whenever hash indexes are present; I thought we had similar cases already, but I don't see them at the moment. Maybe we can get Bruce or someone to give us some advice on exactly what should be done here. In a couple of places, you say that a splitpoint group has 4 slots rather than 4 phases. I think that in _hash_get_totalbuckets(), you should use blah & HASH_SPLITPOINT_PHASE_MASK rather than blah % HASH_SPLITPOINT_PHASES_PER_GRP for consistency with _hash_spareindex and, perhaps, speed. Similarly, instead of blah / HASH_SPLITPOINT_PHASES_PER_GRP, use blah >> HASH_SPLITPOINT_PHASE_BITS. buckets_toadd is punctuated oddly. buckets_to_add? Instead of hand-calculating this, how about calculating it as _hash_get_totalbuckets(spare_ndx) - _hash_get_totalbuckets(spare_ndx - 1)? That way you reuse the existing logic instead of writing a slightly different thing in a new place and maybe making a mistake. If you're going to calculate it, use & and >> rather than % and /, as above, and drop the parentheses around new_bucket -- this isn't a macro definition. +uint32 splitpoint_group = 0; Don't need the = 0 here; the next reference to this variable is an unconditional initialization. + */ + +splitpoint_group = HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(spare_ndx); I would delete the blank line. - * should start from ((2 ^ i) + metap->hashm_spares[i - 1] + 1). + * should start from + * (_hash_get_totalbuckets(i) + metap->hashm_spares[i - 1] + 1). Won't survive pgindent. - * The number of buckets in the new splitpoint is equal to the total - * number already in existence, i.e. new_bucket. Currently this maps - * one-to-one to blocks required, but someday we may need a more - * complicated calculation here. We treat allocation of buckets as a - * separate WAL-logged action. Even if we fail after this operation, - * won't leak bucket pages; rather, the next split will consume this - * space. In any case, even without failure we don't use all the space - * in one split operation. + * The number of buckets in the new splitpoint group is equal to the + * total number already in existence. But we do not allocate them at + * once. Each splitpoint group will have 4 slots, we distribute the + * buckets equally among them. So we allocate only one fourth of total + * buckets in new splitpoint group at a time to consume one phase after + * another. We treat allocation of buckets as a separate WAL-logged + * action. Even if we fail after this operation, won't leak bucket + * pages; rather, the next split will consume this space. In any case, + * even without failure we don't use all the space in one split + * operation. I think here you should break this into two paragraphs -- start a new paragraph with the sentence that begins "We treat..." -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
Thanks Robert, I have tried to fix the comments given as below. On Thu, Mar 30, 2017 at 9:19 PM, Robert Haas wrote: > I think that the macros in hash.h need some more work: > > - Pretty much any time you use the argument of a macro, you need to > parenthesize it in the macro definition to avoid surprises if the > macros is called using an expression. That isn't done consistently > here. --I have tried to fix same in the latest patch. > - The macros make extensive use of magic numbers like 1, 2, and 3. I > suggest something like: > > #define SPLITPOINT_PHASE_BITS 2 > #define SPLITPOINT_PHASES_PER_GROUP (1 << SPLITPOINT_PHASE_BITS) > > And then use SPLITPOINT_PHASE_BITS any place where you're currently > saying 2. The reference to 3 is really SPLITPOINT_PHASE_BITS + 1. -- Taken modified same in the latest patch. > - Many of these macros are only used in one place. Maybe just move > the computation to that place and get rid of the macro. For example, > _hash_spareindex() could be written like this: > > if (splitpoint_group < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) > return splitpoint_group; > > /* account for single-phase groups */ > splitpoint = SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE; > > /* account for completed groups */ > splitpoint += (splitpoint_group - > SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) << SPLITPOINT_PHASE_BITS; > > /* account for phases within current group */ > splitpoint += (bucket_num >> (SPLITPOINT_PHASE_BITS + 1)) & > SPLITPOINT_PHASE_MASK; > > return splitpoint; > > That eliminates the only use of two complicated macros and is in my > opinion more clear than what you've currently got. -- Taken, also rewrote _hash_get_totalbuckets in similar lines. With that, we will end up with only 2 macros which have some computing code +/* defines max number of splitpoit phases a hash index can have */ +#define HASH_MAX_SPLITPOINT_GROUP 32 +#define HASH_MAX_SPLITPOINTS \ + (((HASH_MAX_SPLITPOINT_GROUP - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) * \ + HASH_SPLITPOINT_PHASES_PER_GRP) + \ + HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) + +/* given a splitpoint phase get its group */ +#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \ + (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \ + (sp_phase) : \ + sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \ + HASH_SPLITPOINT_PHASE_BITS) + \ + HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE)) > - Some of these macros lack clear comments explaining their purpose. -- I have written some comments to explain the use of the macros. > - Some of them don't include HASH anywhere in the name, which is > essential for a header that may easily be included by non-hash index > code. -- Fixed, all MACROS are prefixed with HASH > - The names don't all follow a consistent format. Maybe that's too > much to hope for at some level, but I think they could be more > consistent than they are. -- Fixed, apart from old HASH_MAX_SPLITPOINTS rest all have a prefix HASH_SPLITPOINT. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com yet_another_expand_hashbucket_efficiently_12.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Wed, Mar 29, 2017 at 8:03 AM, Mithun Cy wrote: > Thanks, Amit for a detailed review. I think that the macros in hash.h need some more work: - Pretty much any time you use the argument of a macro, you need to parenthesize it in the macro definition to avoid surprises if the macros is called using an expression. That isn't done consistently here. - The macros make extensive use of magic numbers like 1, 2, and 3. I suggest something like: #define SPLITPOINT_PHASE_BITS 2 #define SPLITPOINT_PHASES_PER_GROUP (1 << SPLITPOINT_PHASE_BITS) And then use SPLITPOINT_PHASE_BITS any place where you're currently saying 2. The reference to 3 is really SPLITPOINT_PHASE_BITS + 1. - Many of these macros are only used in one place. Maybe just move the computation to that place and get rid of the macro. For example, _hash_spareindex() could be written like this: if (splitpoint_group < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) return splitpoint_group; /* account for single-phase groups */ splitpoint = SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE; /* account for completed groups */ splitpoint += (splitpoint_group - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) << SPLITPOINT_PHASE_BITS; /* account for phases within current group */ splitpoint += (bucket_num >> (SPLITPOINT_PHASE_BITS + 1)) & SPLITPOINT_PHASE_MASK; return splitpoint; That eliminates the only use of two complicated macros and is in my opinion more clear than what you've currently got. - Some of these macros lack clear comments explaining their purpose. - Some of them don't include HASH anywhere in the name, which is essential for a header that may easily be included by non-hash index code. - The names don't all follow a consistent format. Maybe that's too much to hope for at some level, but I think they could be more consistent than they are. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Thu, Mar 30, 2017 at 7:23 PM, Robert Haas wrote: > I think approach B is incorrect. Suppose we have 1536 buckets and > hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193. If I > understand correctly, each of these values should be mapped either to > bucket 0 or to bucket 1, and the goal of the sort is to put all of the > bucket 0 tuples before all of the bucket 1 tuples, so that we get > physical locality when inserting. With approach A, the sort keys will > match the bucket numbers -- we'll be sorting the list 0, 1, 0, 1, 0, > 1, 0, 1 -- and we will end up doing all of the inserts to bucket 0 > before any of the inserts to bucket 1. With approach B, we'll be > sorting 512, 513, 1024, 1025, 0, 1, 512, 513 and will end up > alternating inserts to bucket 0 with inserts to bucket 1. Oops sorry, yes 2 denominators are different (one used in an insert and another used in sorting keys) we will end up with different bucket numbers. I think in patch B, I should have actually taken next 2-power number of 1536 as the denominator and try to get the mod value. If the mod value is > 1536 then reduce the denominator by half and retake the mod to get the bucket within 1536. Which is what effectively Patch A is doing. Approach B is a blunder, I apologize for that mistake. I think Patch A should be considered. If adding the members of struct Tuplesortstate is a concern I will rewrite Patch B as said above. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Mar 28, 2017 at 1:13 AM, Mithun Cy wrote: > B. In tuple sort we can use hash function bucket = hash_key % > num_buckets instead of existing one which does bitwise "and" to > determine the bucket of hash key. This way we will not wrongly assign > buckets beyond max_buckets and sorted hash keys will be in sync with > actual insertion order of _hash_doinsert. I think approach B is incorrect. Suppose we have 1536 buckets and hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193. If I understand correctly, each of these values should be mapped either to bucket 0 or to bucket 1, and the goal of the sort is to put all of the bucket 0 tuples before all of the bucket 1 tuples, so that we get physical locality when inserting. With approach A, the sort keys will match the bucket numbers -- we'll be sorting the list 0, 1, 0, 1, 0, 1, 0, 1 -- and we will end up doing all of the inserts to bucket 0 before any of the inserts to bucket 1. With approach B, we'll be sorting 512, 513, 1024, 1025, 0, 1, 512, 513 and will end up alternating inserts to bucket 0 with inserts to bucket 1. To put that another way, see this comment at the top of hashsort.c: * When building a very large hash index, we pre-sort the tuples by bucket * number to improve locality of access to the index, and thereby avoid * thrashing. We use tuplesort.c to sort the given index tuples into order. So, you can't just decide to sort on a random number, which is what approach B effectively does. Or, you can, but it completely misses the point of sorting in the first place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
Thanks, Amit for a detailed review. On Wed, Mar 29, 2017 at 4:09 PM, Amit Kapila wrote: > Okay, your current patch looks good to me apart from minor comments, > so marked as Read For Committer. Please either merge the > sort_hash_b_2.patch with main patch or submit it along with next > revision for easier reference. I will keep it separated just in case commiter likes sortbuild_hash_A.patch. We can use either of sortbuild_hash_*.patch on top of main patch. > > Few minor comments: > 1. > +splitpoint phase S. The hashm_spares[0] is always 0, so that buckets 0 and 1 > +(which belong to splitpoint group 0's phase 1 and phase 2 respectively) > always > +appear at block numbers 1 and 2, just after the meta page. > > This explanation doesn't seem to be right as with current patch we > start phased allocation only after splitpoint group 9. Again a mistake, removed the sentence in parentheses. > 2. > -#define HASH_MAX_SPLITPOINTS 32 > #define HASH_MAX_BITMAPS 128 > > +#define SPLITPOINT_PHASES_PER_GRP 4 > +#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1) > +#define SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE 10 > +#define HASH_MAX_SPLITPOINTS \ > + ((32 - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) * \ > + SPLITPOINT_PHASES_PER_GRP) + \ > + SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE > > You have changed the value of HASH_MAX_SPLITPOINTS, but the comments > explaining that value are still unchanged. Refer below text. > "The limitation on the size of spares[] comes from the fact that there's > * no point in having more than 2^32 buckets with only uint32 hashcodes." The limitation is still indirectly imposed by the fact that we can have only 2^32 buckets. But I also added a note that HASH_MAX_SPLITPOINTS also considers that after SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE bucket allocation will be done in multiple(exactly 4) phases. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com yet_another_expand_hashbucket_efficiently_11.patch Description: Binary data sortbuild_hash_B_2.patch Description: Binary data sortbuild_hash_A.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
That means at every new +split point we double the existing number of buckets. Allocating huge chucks On Mon, Mar 27, 2017 at 11:56 PM, Jesper Pedersen wrote: > I ran some performance scenarios on the patch to see if the increased > 'spares' allocation had an impact. I havn't found any regressions in that > regard. Thanks Jasper for testing the patch. > Attached patch contains some small fixes, mainly to the documentation - on > top of v7. I have taken some of the grammatical and spell check issues you have mentioned. One major thing I left it as it is term "splitpoint" which you have tried to change in many places to "split point", The splitpoint is not introduced by me, it was already used in many places, so I think it is acceptable to use that term. I think I shall not add changes which are not part of the core issue. I think another patch on top of this should be okay. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Wed, Mar 29, 2017 at 12:51 PM, Mithun Cy wrote: > On Wed, Mar 29, 2017 at 10:12 AM, Amit Kapila wrote: >> Few other comments: >> +/* >> + * This is just a trick to save a division operation. If you look into the >> + * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will >> indicate >> + * which phase of allocation the bucket_num belongs to with in the group. >> This >> + * is because at every splitpoint group we allocate (2 ^ x) buckets and we >> have >> + * divided the allocation process into 4 equal phases. This macro returns >> value >> + * from 0 to 3. >> + */ >> >> +#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \ >> + (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \ >> + SPLITPOINT_PHASE_MASK) >> This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to >> number other than 3. I think you should change it so that it can work >> with any value of SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE. > > Fixed, using SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE was accidental. All > I need is most significant 3 bits hence should be subtracted by 3 > always. > Okay, your current patch looks good to me apart from minor comments, so marked as Read For Committer. Please either merge the sort_hash_b_2.patch with main patch or submit it along with next revision for easier reference. Few minor comments: 1. +splitpoint phase S. The hashm_spares[0] is always 0, so that buckets 0 and 1 +(which belong to splitpoint group 0's phase 1 and phase 2 respectively) always +appear at block numbers 1 and 2, just after the meta page. This explanation doesn't seem to be right as with current patch we start phased allocation only after splitpoint group 9. 2. -#define HASH_MAX_SPLITPOINTS 32 #define HASH_MAX_BITMAPS 128 +#define SPLITPOINT_PHASES_PER_GRP 4 +#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1) +#define SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE 10 +#define HASH_MAX_SPLITPOINTS \ + ((32 - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) * \ + SPLITPOINT_PHASES_PER_GRP) + \ + SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE You have changed the value of HASH_MAX_SPLITPOINTS, but the comments explaining that value are still unchanged. Refer below text. "The limitation on the size of spares[] comes from the fact that there's * no point in having more than 2^32 buckets with only uint32 hashcodes." -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Wed, Mar 29, 2017 at 10:12 AM, Amit Kapila wrote: >> I wonder if we should consider increasing >> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat. For example, split >> point 4 is responsible for allocating only 16 new buckets = 128kB; >> doing those in four groups of two (16kB) seems fairly pointless. >> Suppose we start applying this technique beginning around splitpoint 9 >> or 10. Breaking 1024 new buckets * 8kB = 8MB of index growth into 4 >> phases might save enough to be worthwhile. >> > > 10 sounds better point to start allocating in phases. +1. At splitpoint group 10 we will allocate (2 ^ 9) buckets = 4MB in total and each phase will allocate 2 ^ 7 buckets = 128 * 8kB = 1MB. > Few other comments: > +/* > + * This is just a trick to save a division operation. If you look into the > + * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will > indicate > + * which phase of allocation the bucket_num belongs to with in the group. > This > + * is because at every splitpoint group we allocate (2 ^ x) buckets and we > have > + * divided the allocation process into 4 equal phases. This macro returns > value > + * from 0 to 3. > + */ > > +#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \ > + (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \ > + SPLITPOINT_PHASE_MASK) > This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to > number other than 3. I think you should change it so that it can work > with any value of SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE. Fixed, using SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE was accidental. All I need is most significant 3 bits hence should be subtracted by 3 always. > I think you should name this define as SPLITPOINT_PHASE_WITHIN_GROUP > as this refers to only one particular phase within group. Fixed. Mithun C Y EnterpriseDB: http://www.enterprisedb.com yet_another_expand_hashbucket_efficiently_10.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Mar 28, 2017 at 8:56 PM, Robert Haas wrote: > On Tue, Mar 28, 2017 at 5:00 AM, Mithun Cy wrote: >>> This will go wrong for split point group zero. In general, I feel if >>> you handle computation for split groups lesser than >>> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your >>> macros will become much simpler and less error prone. >> >> Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros >> only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP >> is used in one more place at expand index so thought kepeping it as it >> is is better. > > I wonder if we should consider increasing > SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat. For example, split > point 4 is responsible for allocating only 16 new buckets = 128kB; > doing those in four groups of two (16kB) seems fairly pointless. > Suppose we start applying this technique beginning around splitpoint 9 > or 10. Breaking 1024 new buckets * 8kB = 8MB of index growth into 4 > phases might save enough to be worthwhile. > 10 sounds better point to start allocating in phases. > Of course, even if we decide to apply this even for very small > splitpoints, it probably doesn't cost us anything other than some > space in the metapage. But maybe saving space in the metapage isn't > such a bad idea anyway. > Yeah metapage space is scarce, so lets try to save it as much possible. Few other comments: +/* + * This is just a trick to save a division operation. If you look into the + * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will indicate + * which phase of allocation the bucket_num belongs to with in the group. This + * is because at every splitpoint group we allocate (2 ^ x) buckets and we have + * divided the allocation process into 4 equal phases. This macro returns value + * from 0 to 3. + */ +#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \ + (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \ + SPLITPOINT_PHASE_MASK) This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to number other than 3. I think you should change it so that it can work with any value of SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE. I think you should name this define as SPLITPOINT_PHASE_WITHIN_GROUP as this refers to only one particular phase within group. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Mar 28, 2017 at 5:00 AM, Mithun Cy wrote: >> This will go wrong for split point group zero. In general, I feel if >> you handle computation for split groups lesser than >> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your >> macros will become much simpler and less error prone. > > Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros > only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP > is used in one more place at expand index so thought kepeping it as it > is is better. I wonder if we should consider increasing SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat. For example, split point 4 is responsible for allocating only 16 new buckets = 128kB; doing those in four groups of two (16kB) seems fairly pointless. Suppose we start applying this technique beginning around splitpoint 9 or 10. Breaking 1024 new buckets * 8kB = 8MB of index growth into 4 phases might save enough to be worthwhile. Of course, even if we decide to apply this even for very small splitpoints, it probably doesn't cost us anything other than some space in the metapage. But maybe saving space in the metapage isn't such a bad idea anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Mar 28, 2017 at 12:21 PM, Amit Kapila wrote: > I think both way it can work. I feel there is no hard pressed need > that we should make the computation in sorting same as what you do > _hash_doinsert. In patch_B, I don't think new naming for variables is > good. > > Assert(!a->isnull1); > - hash1 = DatumGetUInt32(a->datum1) & state->hash_mask; > + bucket1 = DatumGetUInt32(a->datum1) % state->num_buckets; > > Can we use hash_mod instead of num_buckets and retain hash1 in above > code and similar other places? Yes done renamed it to hash_mod. > > Few comments: > 1. > +#define BUCKETS_WITHIN_SP_GRP(sp_g, nphase) \ > + ((sp_g < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) ? \ > + (1 << (sp_g - 1)) : \ > + ((nphase) * ((1 << (sp_g - 1)) >> 2))) > > This will go wrong for split point group zero. In general, I feel if > you handle computation for split groups lesser than > SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your > macros will become much simpler and less error prone. Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP is used in one more place at expand index so thought kepeping it as it is is better. . > 2. > +#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) (_hash_log2(num_bucket)) > > What is the use of such a define, can't we directly use _hash_log2 in > the caller? No real reason, just that NAMED macro appeared more readable than just _hash_log2. Now I have removed same. > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com yet_another_expand_hashbucket_efficiently_09.patch Description: Binary data sortbuild_hash_B_2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Mar 28, 2017 at 10:43 AM, Mithun Cy wrote: > On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila wrote: > > > As you have said we can solve it if we allocate buckets always in > power-of-2 when we do hash index meta page init. But on other > occasions, when we try to double the existing buckets we can do the > allocation in 4 equal phases. > > But I think there are 2 more ways to solve same, > > A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to > tuplesort and let them use _hash_hashkey2bucket to figure out which > key belong to which bucket. And then sort them. I think this way we > make both sorting and insertion to hash index both consistent with > each other. > > B. In tuple sort we can use hash function bucket = hash_key % > num_buckets instead of existing one which does bitwise "and" to > determine the bucket of hash key. This way we will not wrongly assign > buckets beyond max_buckets and sorted hash keys will be in sync with > actual insertion order of _hash_doinsert. > > > I am adding both the patches Patch_A and Patch_B. My preference is > Patch_A and I am open for suggestion. > I think both way it can work. I feel there is no hard pressed need that we should make the computation in sorting same as what you do _hash_doinsert. In patch_B, I don't think new naming for variables is good. Assert(!a->isnull1); - hash1 = DatumGetUInt32(a->datum1) & state->hash_mask; + bucket1 = DatumGetUInt32(a->datum1) % state->num_buckets; Can we use hash_mod instead of num_buckets and retain hash1 in above code and similar other places? >>+#define SPLITPOINT_PHASES_PER_GRP 4 >>+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1) >>+#define Buckets_First_Split_Group 4 > Fixed. > >>In the above computation +2 and -2 still bothers me. I think you need >>to do this because you have defined split group zero to have four >>buckets, how about if you don't force that and rather define to have >>split point phases only from split point which has four or more >>buckets? > > Okay as suggested instead of group zero having 4 phases of 1 bucket > each, I have recalculated the spare mapping as below. > Few comments: 1. +#define BUCKETS_WITHIN_SP_GRP(sp_g, nphase) \ + ((sp_g < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) ? \ + (1 << (sp_g - 1)) : \ + ((nphase) * ((1 << (sp_g - 1)) >> 2))) This will go wrong for split point group zero. In general, I feel if you handle computation for split groups lesser than SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your macros will become much simpler and less error prone. 2. +#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) (_hash_log2(num_bucket)) What is the use of such a define, can't we directly use _hash_log2 in the caller? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila wrote: > I think we can't change the number of buckets to be created or lowmask > and highmask calculation here without modifying _h_spoolinit() because > it sorts the data to be inserted based on hashkey which in turn > depends on the number of buckets that we are going to create during > create index operation. We either need to allow create index > operation to still always create buckets in power-of-two fashion or we > need to update _h_spoolinit according to new computation. One minor > drawback of using power-of-two scheme for creation of buckets during > create index is that it can lead to wastage of space and will be > inconsistent with what the patch does during split operation. Yes, this was a miss. Now Number of buckets allocated during metap_init is not always a power-of-two number. The hashbuild which uses just the hash_mask to decide which bucket does the hashkey belong to is not sufficient. It can give buckets beyond max_buckets and sorting of index values based on their buckets will be out of order. When we try to actually insert the same in hash index we loose the advantage of the spatial locality which existed before. And, hence indexbuild performance can reduce. As you have said we can solve it if we allocate buckets always in power-of-2 when we do hash index meta page init. But on other occasions, when we try to double the existing buckets we can do the allocation in 4 equal phases. But I think there are 2 more ways to solve same, A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to tuplesort and let them use _hash_hashkey2bucket to figure out which key belong to which bucket. And then sort them. I think this way we make both sorting and insertion to hash index both consistent with each other. B. In tuple sort we can use hash function bucket = hash_key % num_buckets instead of existing one which does bitwise "and" to determine the bucket of hash key. This way we will not wrongly assign buckets beyond max_buckets and sorted hash keys will be in sync with actual insertion order of _hash_doinsert. I am adding both the patches Patch_A and Patch_B. My preference is Patch_A and I am open for suggestion. >+#define SPLITPOINT_PHASES_PER_GRP 4 >+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1) >+#define Buckets_First_Split_Group 4 Fixed. >In the above computation +2 and -2 still bothers me. I think you need >to do this because you have defined split group zero to have four >buckets, how about if you don't force that and rather define to have >split point phases only from split point which has four or more >buckets? Okay as suggested instead of group zero having 4 phases of 1 bucket each, I have recalculated the spare mapping as below. Allocating huge chunks of bucket pages all at once isn't optimal and we will take ages to consume those. To avoid this exponential growth of index size, we did use a trick to breakup allocation of buckets at the splitpoint into 4 equal phases. If (2 ^ x) is the total buckets need to be allocated at a splitpoint (from now on we shall call this as a splitpoint group), then we allocate 1/4th (2 ^ (x - 2)) of total buckets at each phase of splitpoint group. Next quarter of allocation will only happen if buckets of the previous phase have been already consumed. Since for buckets number < 4 we cannot further divide it into multiple phases, the first 3 group will have only one phase of allocation. The groups 0, 1, 2 will allocate 1, 1, 2 buckets respectively at once in one phase. For the groups > 2 Where we allocate buckets > 4, the allocation process is distributed among four equal phases. At group 3 we allocate 4 buckets in 4 different phases {1, 1, 1, 1}, the numbers in curly braces indicate number of buckets allocated within each phase of splitpoint group 3. And, for splitpoint group 4 and 5 allocation phase will be {2, 2, 2, 2} = 16 buckets in total and {4, 4, 4, 4} = 32 buckets in total. We can see that at each splitpoint group we double the total number of buckets from previous group but in an incremental phase. The bucket pages allocated within one phase of a splitpoint group will appear consecutively in the index. The sortbuild_hash_*.patch can be applied independently on any of expand_hashbucket_efficiently_08.patch -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com sortbuild_hash_B.patch Description: Binary data sortbuild_hash_A.patch Description: Binary data yet_another_expand_hashbucket_efficiently_08.patch Description: Binary data expand_hashbucket_efficiently_08.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
Hi Mithun, On 03/26/2017 01:56 AM, Mithun Cy wrote: Thanks, Amit for the review. On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila wrote: I think one-dimensional patch has fewer places to touch, so that looks better to me. However, I think there is still hard coding and assumptions in code which we should try to improve. Great!, I will continue with spares 1-dimensional improvement. I ran some performance scenarios on the patch to see if the increased 'spares' allocation had an impact. I havn't found any regressions in that regard. Attached patch contains some small fixes, mainly to the documentation - on top of v7. Best regards, Jesper >From 5545e48ab7136f17b3d471e0ee679a6db6040865 Mon Sep 17 00:00:00 2001 From: jesperpedersen Date: Mon, 27 Mar 2017 14:15:00 -0400 Subject: [PATCH] Small fixes --- src/backend/access/hash/README | 50 +++--- src/backend/access/hash/hashpage.c | 26 ++-- src/backend/access/hash/hashutil.c | 24 +- 3 files changed, 50 insertions(+), 50 deletions(-) diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README index 6721ee1..ca46de7 100644 --- a/src/backend/access/hash/README +++ b/src/backend/access/hash/README @@ -58,50 +58,50 @@ rules to support a variable number of overflow pages while not having to move primary bucket pages around after they are created. Primary bucket pages (henceforth just "bucket pages") are allocated in -power-of-2 groups, called "split points" in the code. That means on every new -split point we double the existing number of buckets. And, it seems bad to -allocate huge chunks of bucket pages all at once and we take ages to consume it. -To avoid this exponential growth of index size, we did a trick to breakup -allocation of buckets at splitpoint into 4 equal phases. If 2^x is the total -buckets need to be allocated at a splitpoint (from now on we shall call this -as splitpoint group), then we allocate 1/4th (2^(x - 2)) of total buckets at -each phase of splitpoint group. Next quarter of allocation will only happen if +power-of-2 groups, called "split points" in the code. That means at every new +split point we double the existing number of buckets. Allocating huge chucks +of bucket pages all at once isn't optimal and we will take ages to consume those. +To avoid this exponential growth of index size, we did use a trick to breakup +allocation of buckets at the split points into 4 equal phases. If 2^x is the total +buckets needed to be allocated at a split point (from now on we shall call this +a split point group), then we allocate 1/4th (2^(x - 2)) of total buckets at +each phase of the split point group. Next quarter of allocation will only happen if buckets of previous phase has been already consumed. Since for buckets number -< 4 we cannot further divide it in to multiple phases, the first splitpoint +< 4 we cannot further divide it in to multiple phases, the first split point group 0's allocation is done as follows {1, 1, 1, 1} = 4 buckets in total, the numbers in curly braces indicate number of buckets allocated within each phase -of splitpoint group 0. In next splitpoint group 1 the allocation phases will -be as follow {1, 1, 1, 1} = 8 buckets in total. And, for splitpoint group 2 +of split point group 0. In next split point group 1 the allocation phases will +be as follow {1, 1, 1, 1} = 8 buckets in total. And, for split point group 2 and 3 allocation phase will be {2, 2, 2, 2} = 16 buckets in total and -{4, 4, 4, 4} = 32 buckets in total. We can see that at each splitpoint group -we double the total number of buckets from previous group but in a incremental -phase. The bucket pages allocated within one phase of a splitpoint group will +{4, 4, 4, 4} = 32 buckets in total. We can see that at each split point group +we double the total number of buckets from previous group but in an incremental +phase. The bucket pages allocated within one phase of a split point group will appear consecutively in the index. This addressing scheme allows the physical location of a bucket page to be computed from the bucket number relatively easily, using only a small amount of control information. If we look at the -function _hash_spareindex for a given bucket number we first compute splitpoint -group it belongs to and then the phase with in it to which the bucket belongs -to. Adding them we get the global splitpoint phase number S to which the +function _hash_spareindex for a given bucket number we first compute the split point +group it belongs to and then the phase to which the bucket belongs +to. Adding them we get the global split point phase number S to which the bucket belongs and then simply add "hashm_spares[S] + 1" (where hashm_spares[] is an array stored in the metapage) with given bucket number to compute its physical address. The hashm_spares[S] can be interpreted as the total number of overflow pages that have been
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila wrote: > On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy > wrote: >> Thanks, Amit for the review. >> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila wrote: >>> >>> I think one-dimensional patch has fewer places to touch, so that looks >>> better to me. However, I think there is still hard coding and >>> assumptions in code which we should try to improve. >> >> Great!, I will continue with spares 1-dimensional improvement. >> > > @@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double > num_tuples, RegProcedure procid,\ > { > .. > else > - num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets); > + num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets)); > .. > .. > - metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1; > - metap->hashm_highmask = (num_buckets << 1) - 1; > + metap->hashm_maxbucket = num_buckets - 1; > + > + /* set hishmask, which should be sufficient to cover num_buckets. */ > + metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1; > + metap->hashm_lowmask = (metap->hashm_highmask >> 1); > } > > I think we can't change the number of buckets to be created or lowmask > and highmask calculation here without modifying _h_spoolinit() because > it sorts the data to be inserted based on hashkey which in turn > depends on the number of buckets that we are going to create during > create index operation. We either need to allow create index > operation to still always create buckets in power-of-two fashion or we > need to update _h_spoolinit according to new computation. One minor > drawback of using power-of-two scheme for creation of buckets during > create index is that it can lead to wastage of space and will be > inconsistent with what the patch does during split operation. > Few more comments: 1. @@ -149,6 +149,84 @@ _hash_log2(uint32 num) return i; } +#define SPLITPOINT_PHASES_PER_GRP 4 +#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1) +#define Buckets_First_Split_Group 4 The defines should be at the beginning of file. 2. +/* + * At splitpoint group 0 we have 2 ^ (0 + 2) = 4 buckets, then at splitpoint + * group 1 we have 2 ^ (1 + 2) = 8 total buckets. As the doubling continues at + * splitpoint group "x" we will have 2 ^ (x + 2) total buckets. Total buckets + * before x splitpoint group will be 2 ^ (x + 1). At each phase of allocation + * within splitpoint group we add 2 ^ (x - 1) buckets, as we have to divide the + * task of allocation of 2 ^ (x + 1) buckets among 4 phases. + * + * Also, splitpoint group of a given bucket can be taken as + * (_hash_log2(bucket) - 2). Subtracted by 2 because each group have + * 2 ^ (x + 2) buckets. + */ .. +#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \ + ((num_bucket <= Buckets_First_Split_Group) ? 0 : \ + (_hash_log2(num_bucket) - 2)) In the above computation +2 and -2 still bothers me. I think you need to do this because you have defined split group zero to have four buckets, how about if you don't force that and rather define to have split point phases only from split point which has four or more buckets? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy wrote: > Thanks, Amit for the review. > On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila wrote: >> >> I think one-dimensional patch has fewer places to touch, so that looks >> better to me. However, I think there is still hard coding and >> assumptions in code which we should try to improve. > > Great!, I will continue with spares 1-dimensional improvement. > @@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,\ { .. else - num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets); + num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets)); .. .. - metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1; - metap->hashm_highmask = (num_buckets << 1) - 1; + metap->hashm_maxbucket = num_buckets - 1; + + /* set hishmask, which should be sufficient to cover num_buckets. */ + metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1; + metap->hashm_lowmask = (metap->hashm_highmask >> 1); } I think we can't change the number of buckets to be created or lowmask and highmask calculation here without modifying _h_spoolinit() because it sorts the data to be inserted based on hashkey which in turn depends on the number of buckets that we are going to create during create index operation. We either need to allow create index operation to still always create buckets in power-of-two fashion or we need to update _h_spoolinit according to new computation. One minor drawback of using power-of-two scheme for creation of buckets during create index is that it can lead to wastage of space and will be inconsistent with what the patch does during split operation. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
Thanks, Amit for the review. On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila wrote: > > I think one-dimensional patch has fewer places to touch, so that looks > better to me. However, I think there is still hard coding and > assumptions in code which we should try to improve. Great!, I will continue with spares 1-dimensional improvement. > > 1. > + /* > + * The first 4 bucket belongs to first splitpoint group 0. And since group > + * 0 have 4 = 2^2 buckets, we double them in group 1. So total buckets > + * after group 1 is 8 = 2^3. Then again at group 2 we add another 2^3 > + * buckets to double the total number of buckets, which will become 2^4. I > + * think by this time we can see a pattern which say if num_bucket > 4 > + * splitpoint group = log2(num_bucket) - 2 > + */ > > + if (num_bucket <= 4) > + splitpoint_group = 0; /* converted to base 0. */ > + else > + splitpoint_group = _hash_log2(num_bucket) - 2; > > This patch defines split point group zero has four buckets and based > on that above calculation is done. I feel you can define it like > #define Buckets_First_Split_Group 4 and then use it in above code. > Also, in else part number 2 looks awkward, can we define it as > log2_buckets_first_group = _hash_log2(Buckets_First_Split_Group) or > some thing like that. I think that way code will look neat. I don't > like the way above comment is worded even though it is helpful to > understand the calculation. If you want, then you can add such a > comment in file header, here it looks out of place. I have removed the comments. And, defined a new macro which maps bucket to SPLIT GROUP #define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \ ((num_bucket <= Buckets_First_Split_Group) ? 0 : \ (_hash_log2(num_bucket) - 2)) I could not find a way to explain why minus 2? better than " The splitpoint group of a given bucket can be taken as (_hash_log2(bucket) - 2). Subtracted by 2 because each group have 2 ^ (x + 2) buckets.". Now I have added those with existing comments I think that should make it little better. Adding comments about spares array in hashutils.c's file header did not appear right to me. I think README has some details about same. > 2. > +power-of-2 groups, called "split points" in the code. That means on every > new > +split points we double the existing number of buckets. > > split points/split point Fixed. > 3. > + * which phase of allocation the bucket_num belogs to with in the group. > > /belogs/belongs Fixed -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com expand_hashbucket_efficiently_07.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Sat, Mar 25, 2017 at 10:13 AM, Mithun Cy wrote: > On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila wrote: >> Sure, I was telling you based on that. If you are implicitly treating >> it as 2-dimensional array, it might be easier to compute the array >>offsets. > > I think calculating spares offset will not become anyway much simpler > we still need to calculate split group and split phase separately. I > have added a patch to show how a 2-dimensional spares code looks like > and where all we need changes. > I think one-dimensional patch has fewer places to touch, so that looks better to me. However, I think there is still hard coding and assumptions in code which we should try to improve. 1. + /* + * The first 4 bucket belongs to first splitpoint group 0. And since group + * 0 have 4 = 2^2 buckets, we double them in group 1. So total buckets + * after group 1 is 8 = 2^3. Then again at group 2 we add another 2^3 + * buckets to double the total number of buckets, which will become 2^4. I + * think by this time we can see a pattern which say if num_bucket > 4 + * splitpoint group = log2(num_bucket) - 2 + */ + if (num_bucket <= 4) + splitpoint_group = 0; /* converted to base 0. */ + else + splitpoint_group = _hash_log2(num_bucket) - 2; This patch defines split point group zero has four buckets and based on that above calculation is done. I feel you can define it like #define Buckets_First_Split_Group 4 and then use it in above code. Also, in else part number 2 looks awkward, can we define it as log2_buckets_first_group = _hash_log2(Buckets_First_Split_Group) or some thing like that. I think that way code will look neat. I don't like the way above comment is worded even though it is helpful to understand the calculation. If you want, then you can add such a comment in file header, here it looks out of place. 2. +power-of-2 groups, called "split points" in the code. That means on every new +split points we double the existing number of buckets. split points/split point 3. + * which phase of allocation the bucket_num belogs to with in the group. /belogs/belongs I have still not completely reviewed the patch as I have ran out of time for today. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila wrote: > Sure, I was telling you based on that. If you are implicitly treating > it as 2-dimensional array, it might be easier to compute the array >offsets. I think calculating spares offset will not become anyway much simpler we still need to calculate split group and split phase separately. I have added a patch to show how a 2-dimensional spares code looks like and where all we need changes. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com expand_hashbucket_efficiently_06_spares_2dimesion.patch Description: Binary data expand_hashbucket_efficiently_06_spares_1dimension.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Fri, Mar 24, 2017 at 1:22 AM, Mithun Cy wrote: > Hi Amit please find the new patch The pageinspect.sgml has an example which shows the output of "hash_metapage_info()". Since we increase the spares array and eventually ovflpoint, I have updated the example with corresponding values.. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com expand_hashbucket_efficiently_05.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
Hi Amit please find the new patch On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila wrote: > It is not only about above calculation, but also what the patch is > doing in function _hash_get_tbuckets(). By the way function name also > seems unclear (mainly *tbuckets* in the name). Fixed I have introduced some macros for readability and added more comments to explain why some calculations are mad. Please let me know if you think more improvements are needed. >spelling. >/forth/fourth >/at time/at a time Thanks fixed. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com expand_hashbucket_efficiently_04.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila wrote: > On Mon, Mar 20, 2017 at 8:58 PM, Mithun Cy wrote: >> Hi Amit, Thanks for the review, >> >> On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila wrote: >>> idea could be to make hashm_spares a two-dimensional array >>> hashm_spares[32][4] where the first dimension will indicate the split >>> point and second will indicate the sub-split number. I am not sure >>> whether it will be simpler or complex than the method used in the >>> proposed patch, but I think we should think a bit more to see if we >>> can come up with some simple technique to solve this problem. >> >> I think making it a 2-dimensional array will not be any useful in fact >> we really treat the given array 2-dimensional elements now. >> > > Sure, I was telling you based on that. If you are implicitly treating > it as 2-dimensional array, it might be easier to compute the array > offsets. > The above sentence looks incomplete. If you are implicitly treating it as a 2-dimensional array, it might be easier to compute the array offsets if you explicitly also treats as a 2-dimensional array. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Mon, Mar 20, 2017 at 8:58 PM, Mithun Cy wrote: > Hi Amit, Thanks for the review, > > On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila wrote: >> idea could be to make hashm_spares a two-dimensional array >> hashm_spares[32][4] where the first dimension will indicate the split >> point and second will indicate the sub-split number. I am not sure >> whether it will be simpler or complex than the method used in the >> proposed patch, but I think we should think a bit more to see if we >> can come up with some simple technique to solve this problem. > > I think making it a 2-dimensional array will not be any useful in fact > we really treat the given array 2-dimensional elements now. > Sure, I was telling you based on that. If you are implicitly treating it as 2-dimensional array, it might be easier to compute the array offsets. > The main concern of yours I think is the calculation steps to find the > phase of the splitpoint group the bucket belongs to. > + tbuckets = (1 << (splitpoint_group + 2)); > + phases_beyond_bucket = > + (tbuckets - num_bucket) / (1 << (splitpoint_group - 1)); > + return (((splitpoint_group + 1) << 2) - phases_beyond_bucket) - 1; > It is not only about above calculation, but also what the patch is doing in function _hash_get_tbuckets(). By the way function name also seems unclear (mainly *tbuckets* in the name). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
Hi Amit, Thanks for the review, On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila wrote: > idea could be to make hashm_spares a two-dimensional array > hashm_spares[32][4] where the first dimension will indicate the split > point and second will indicate the sub-split number. I am not sure > whether it will be simpler or complex than the method used in the > proposed patch, but I think we should think a bit more to see if we > can come up with some simple technique to solve this problem. I think making it a 2-dimensional array will not be any useful in fact we really treat the given array 2-dimensional elements now. The main concern of yours I think is the calculation steps to find the phase of the splitpoint group the bucket belongs to. + tbuckets = (1 << (splitpoint_group + 2)); + phases_beyond_bucket = + (tbuckets - num_bucket) / (1 << (splitpoint_group - 1)); + return (((splitpoint_group + 1) << 2) - phases_beyond_bucket) - 1; Quickly thinking further we allocate 2^x of buckets on each phase of any splitpoint group and we have 4 such phases in each group; At each splitpoint group again we allocate 2^y buckets, So buckets number have a pattern in their bitmap representation to find out which phase of allocation they belong to. As below === Group 0 -- bit 0, 1 define which phase of group each bucket belong to. 0 -- 1 -- 0001 2 -- 0010 3 -- 0011 === Group 1 -- bit 0, 1 define which phase of group each bucket belong to. 4 -- 0100 5 -- 0101 6 -- 0110 7 -- 0111 === Group 2 -- bit 1, 2 define which phase of group each bucket belong to. 8 -- 1000 9 -- 1001 10 -- 1010 11 -- 1011 12 -- 1100 13 -- 1101 14 -- 1110 15 -- === Group 3 -- bit 2, 3 define which phase of group each bucket belong to. 16 -- 0001 17 -- 00010001 18 -- 00010010 19 -- 00010011 20 -- 00010100 21 -- 00010101 22 -- 00010110 23 -- 00010111 24 -- 00011000 25 -- 00011001 26 -- 00011010 27 -- 00011011 28 -- 00011100 29 -- 00011101 30 -- 0000 31 -- 0001 So we can say given a bucket x of group n > 0 bits (n-1, n) defines which phase of group they belong to. I see an opportunity here to completely simplify the above calculation into a simple bitwise anding the bucket number with (n-1,n) bitmask to get the phase of allocation of bucket x. Formula can be (x >> (splitpoint_group - 1)) & 0x3 = phase of bucket Does this satisfy your concern? -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Sat, Mar 18, 2017 at 10:59 PM, Mithun Cy wrote: > On Thu, Mar 16, 2017 at 10:55 PM, David Steele wrote: >> It does apply with fuzz on 2b32ac2, so it looks like c11453c and >> subsequent commits are the cause. They represent a fairly substantial >> change to hash indexes by introducing WAL logging so I think you should >> reevaluate your patches to be sure they still function as expected. > > Thanks, David here is the new improved patch I have also corrected the > pageinspect's test output. Also, added notes in README regarding the > new way of adding bucket pages efficiently in hash index. I also did > some more tests pgbench read only and read write; > To make this work, I think the calculations you have introduced are not so easy to understand. For example, it took me quite some time to understand how the below function works to compute the location in hash spares. +uint32 +_hash_spareindex(uint32 num_bucket) +{ .. + /* + * The first 4 bucket belongs to corresponding first 4 splitpoint phases. + */ + if (num_bucket <= 4) + return (num_bucket - 1); /* converted to base 0. */ + splitpoint_group = _hash_log2(num_bucket) - 2; /* The are 4 buckets in .. + /* + * bucket's global splitpoint phase = total number of split point phases + * until its splitpoint group - splitpoint phase within this splitpoint + * group but after buckets own splitpoint phase. + */ + tbuckets = (1 << (splitpoint_group + 2)); + phases_beyond_bucket = + (tbuckets - num_bucket) / (1 << (splitpoint_group - 1)); + return (((splitpoint_group + 1) << 2) - phases_beyond_bucket) - 1; +} I am not sure if it is just a matter of better comments to explain it in a simple way or maybe we can try to find some simpler mechanism to group the split into four (or more) equal parts. I think if someone else can read and share their opinion it could be helpful. Another idea could be to make hashm_spares a two-dimensional array hashm_spares[32][4] where the first dimension will indicate the split point and second will indicate the sub-split number. I am not sure whether it will be simpler or complex than the method used in the proposed patch, but I think we should think a bit more to see if we can come up with some simple technique to solve this problem. + * allocate them at once. Each splitpoint group will have 4 slots, we + * distribute the buckets equally among them. So we allocate only one + * forth of total buckets in new splitpoint group at time to consume + * one phase after another. spelling. /forth/fourth /at time/at a time -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Thu, Mar 16, 2017 at 10:55 PM, David Steele wrote: > It does apply with fuzz on 2b32ac2, so it looks like c11453c and > subsequent commits are the cause. They represent a fairly substantial > change to hash indexes by introducing WAL logging so I think you should > reevaluate your patches to be sure they still function as expected. Thanks, David here is the new improved patch I have also corrected the pageinspect's test output. Also, added notes in README regarding the new way of adding bucket pages efficiently in hash index. I also did some more tests pgbench read only and read write; There is no performance impact due to the patch. The growth of index size has become much efficient as the numbers posted in the initial proposal mail. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com expand_hashbucket_efficiently_03.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On 2/21/17 4:58 AM, Mithun Cy wrote: > Thanks, Amit > > On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila wrote: >> How will high and lowmask calculations work in this new strategy? >> Till now they always work on doubling strategy and I don't see you >> have changed anything related to that code. Check below places. > > It is important that the mask has to be (2^x) -1, if we have to retain > the same hash map function. So mask variables will take same values as > before. Only place I think we need change is _hash_metapinit(); > unfortunately, I did not test for the case where we build the hash > index on already existing tuples. Now I have fixed in the latest > patch. > > >> Till now, we have worked hard for not changing the page format in a >> backward incompatible way, so it will be better if we could find some >> way of doing this without changing the meta page format in a backward >> incompatible way. > > We are not adding any new variable/ deleting some, we increase the > size of hashm_spares and hence mapping functions should be adjusted. > The problem is the block allocation, and its management is based on > the fact that all of the buckets(will be 2^x in number) belonging to a > particular split-point is allocated at once and together. The > hashm_spares is used to record those allocations and that will be used > further by map functions to reach a particular block in the file. If > we want to change the way we allocate the buckets then hashm_spares > will change and hence mapping function. So I do not think we can avoid > incompatibility issue. > > One thing I can think of is if we can increase the hashm_version of > hash index; then for old indexes, we can continue to use doubling > method and its mapping. For new indexes, we can use new way as above. > > Have you considered to store some information in >> shared memory based on which we can decide how much percentage of >> buckets are allocated in current table half? I think we might be able >> to construct this information after crash recovery as well. > > I think all of above data has to be persistent. I am not able to > understand what should be/can be stored in shared buffers. Can you > please correct me if I am wrong? This patch does not apply at cccbdde: $ patch -p1 < ../other/expand_hashbucket_efficiently_02 patching file src/backend/access/hash/hashovfl.c Hunk #1 succeeded at 49 (offset 1 line). Hunk #2 succeeded at 67 (offset 1 line). patching file src/backend/access/hash/hashpage.c Hunk #1 succeeded at 502 with fuzz 1 (offset 187 lines). Hunk #2 succeeded at 518 with fuzz 2 (offset 168 lines). Hunk #3 succeeded at 562 (offset 163 lines). Hunk #4 succeeded at 744 (offset 124 lines). Hunk #5 FAILED at 774. Hunk #6 succeeded at 869 (offset 19 lines). Hunk #7 succeeded at 1450 (offset 242 lines). Hunk #8 succeeded at 1464 (offset 242 lines). Hunk #9 succeeded at 1505 (offset 242 lines). 1 out of 9 hunks FAILED -- saving rejects to file src/backend/access/hash/hashpage.c.rej patching file src/backend/access/hash/hashutil.c Hunk #1 succeeded at 150 (offset 1 line). patching file src/include/access/hash.h Hunk #2 succeeded at 180 (offset 12 lines). Hunk #3 succeeded at 382 (offset 18 lines). It does apply with fuzz on 2b32ac2, so it looks like c11453c and subsequent commits are the cause. They represent a fairly substantial change to hash indexes by introducing WAL logging so I think you should reevaluate your patches to be sure they still function as expected. Marked "Waiting on Author". -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
Thanks, Amit On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila wrote: > How will high and lowmask calculations work in this new strategy? > Till now they always work on doubling strategy and I don't see you > have changed anything related to that code. Check below places. It is important that the mask has to be (2^x) -1, if we have to retain the same hash map function. So mask variables will take same values as before. Only place I think we need change is _hash_metapinit(); unfortunately, I did not test for the case where we build the hash index on already existing tuples. Now I have fixed in the latest patch. > Till now, we have worked hard for not changing the page format in a > backward incompatible way, so it will be better if we could find some > way of doing this without changing the meta page format in a backward > incompatible way. We are not adding any new variable/ deleting some, we increase the size of hashm_spares and hence mapping functions should be adjusted. The problem is the block allocation, and its management is based on the fact that all of the buckets(will be 2^x in number) belonging to a particular split-point is allocated at once and together. The hashm_spares is used to record those allocations and that will be used further by map functions to reach a particular block in the file. If we want to change the way we allocate the buckets then hashm_spares will change and hence mapping function. So I do not think we can avoid incompatibility issue. One thing I can think of is if we can increase the hashm_version of hash index; then for old indexes, we can continue to use doubling method and its mapping. For new indexes, we can use new way as above. Have you considered to store some information in > shared memory based on which we can decide how much percentage of > buckets are allocated in current table half? I think we might be able > to construct this information after crash recovery as well. I think all of above data has to be persistent. I am not able to understand what should be/can be stored in shared buffers. Can you please correct me if I am wrong? -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com expand_hashbucket_efficiently_02 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Fri, Feb 17, 2017 at 7:21 PM, Mithun Cy wrote: > > To implement such an incremental addition of bucket blocks I have to > increase the size of array hashm_spares in meta page by four times. > Also, mapping methods which map a total number of buckets to a > split-point position of hashm_spares array, need to be changed. These > changes create backward compatibility issues. > How will high and lowmask calculations work in this new strategy? Till now they always work on doubling strategy and I don't see you have changed anything related to that code. Check below places. _hash_metapinit() { .. /* * We initialize the index with N buckets, 0 .. N-1, occupying physical * blocks 1 to N. The first freespace bitmap page is in block N+1. Since * N is a power of 2, we can set the masks this way: */ metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1; metap->hashm_highmask = (num_buckets << 1) - 1; .. } _hash_expandtable() { .. if (new_bucket > metap->hashm_highmask) { /* Starting a new doubling */ metap->hashm_lowmask = metap->hashm_highmask; metap->hashm_highmask = new_bucket | metap->hashm_lowmask; } .. } Till now, we have worked hard for not changing the page format in a backward incompatible way, so it will be better if we could find some way of doing this without changing the meta page format in a backward incompatible way. Have you considered to store some information in shared memory based on which we can decide how much percentage of buckets are allocated in current table half? I think we might be able to construct this information after crash recovery as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers