subject:"Re\: \[HACKERS\] \[POC\] A better way to expand hash indexes."

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-04 Thread Robert Haas

On Tue, Apr 4, 2017 at 6:33 AM, Mithun Cy  wrote:
> On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas  wrote:
>> Committed.
>
> Thanks Robert,
>
> And also sorry, one unfortunate thing happened in the last patch while
> fixing one of the review comments a variable disappeared from the
> equation
> @_hash_spareindex.
>
> splitpoint_phases +=
> -   (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) &
> +   (((num_bucket - 1) >>
> + (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) &
>  HASH_SPLITPOINT_PHASE_MASK);   /* to 0-based value. */
>
> I wanted most significant 3 bits. And while fixing comments in patch11
> I unknowingly somehow removed splitpoint_group from the equation.
> Extremely sorry for the mistake. Thanks to Ashutosh Sharma for
> pointing the mistake.

Ugh, OK, committed that also.  Please try to be more careful in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-04 Thread Mithun Cy

On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas  wrote:
> Committed.

Thanks Robert,

And also sorry, one unfortunate thing happened in the last patch while
fixing one of the review comments a variable disappeared from the
equation
@_hash_spareindex.

splitpoint_phases +=
-   (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) &
+   (((num_bucket - 1) >>
+ (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) &
 HASH_SPLITPOINT_PHASE_MASK);   /* to 0-based value. */

I wanted most significant 3 bits. And while fixing comments in patch11
I unknowingly somehow removed splitpoint_group from the equation.
Extremely sorry for the mistake. Thanks to Ashutosh Sharma for
pointing the mistake.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


_hash_spareindex_defect.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-03 Thread Robert Haas

On Sat, Apr 1, 2017 at 3:29 AM, Mithun Cy  wrote:
> On Sat, Apr 1, 2017 at 12:31 PM, Mithun Cy  wrote:
>> Also adding a patch which implements the 2nd way.
> Sorry, I forgot to add sortbuild_hash patch, which also needs similar
> changes for the hash_mask.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-01 Thread Mithun Cy

On Sat, Apr 1, 2017 at 12:31 PM, Mithun Cy  wrote:

> Also adding a patch which implements the 2nd way.
Sorry, I forgot to add sortbuild_hash patch, which also needs similar
changes for the hash_mask.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


sortbuild_hash_A_2.patch
Description: Binary data


yet_another_expand_hashbucket_efficiently_15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-04-01 Thread Mithun Cy

On Sat, Apr 1, 2017 at 7:05 AM, Robert Haas  wrote:
> Hmm, don't the changes to contrib/pageinspect/expected/hash.out
> indicate that you've broken something?  The hash index has only 4
> buckets, so the new code shouldn't be doing anything differently, but
> you've got highmask and lowmask changing for some reason.

The high mask calculation has got changed a bit to accommodate
num_buckets  which are not 2-power number.
If num_bucket is not a 2-power number highmask should be its immediate
next ((2^x) - 1) and low mask should be (highmask >> 1) to cover the
first half of the buckets. Trying to generalize same has changed the
masks for 2-power num_buckets from older implementation.

+ /* set highmask, which should be sufficient to cover num_buckets. */
+ metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;

But this do not cause any adverse effect the high and low mask is
sufficiently enough to get the same mod, If we add one more bucket
then
@_hash_expandtable, immediately we make the masks bigger.
if (new_bucket > metap->hashm_highmask)
{
/* Starting a new doubling */
metap->hashm_lowmask = metap->hashm_highmask;
metap->hashm_highmask = new_bucket | metap->hashm_lowmask;

The state (metap->hashm_highmask == metap->hashm_maxbucket) is natural
state to occur while hash index is growing and just before doubling.

Another choice I could have made is bump a number so that for 2-power
num_buckets will get highmask as similar to old code, and non 2-power
num_buckets highmask will be immediate next ((2^x) - 1).
+ /* set highmask, which should be sufficient to cover num_buckets. */
+ metap->hashm_highmask = (1 << (_hash_log2(num_buckets + 1))) - 1;

It was just a personal preference I choose 1, as it appeared
consistent with running state of hash index expansion.

Also adding a patch which implements the 2nd way.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

yet_another_expand_hashbucket_efficiently_15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-31 Thread Robert Haas

On Fri, Mar 31, 2017 at 1:15 AM, Mithun Cy  wrote:
> Thanks, I have tried to fix all of the comments.

Thanks.

Hmm, don't the changes to contrib/pageinspect/expected/hash.out
indicate that you've broken something?  The hash index has only 4
buckets, so the new code shouldn't be doing anything differently, but
you've got highmask and lowmask changing for some reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Mithun Cy

Thanks, I have tried to fix all of the comments.

On Fri, Mar 31, 2017 at 8:10 AM, Robert Haas  wrote:
> On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy  wrote:
>> Thanks Robert, I have tried to fix the comments given as below.
>
> Thanks.
>
> Since this changes the on-disk format of hash indexes in an
> incompatible way, I think it should bump HASH_VERSION.  (Hopefully
> that doesn't preclude REINDEX?)  pg_upgrade should probably also be
> patched to issue a warning if upgrading from a version < 10 to a
> version >= 10 whenever hash indexes are present; I thought we had
> similar cases already, but I don't see them at the moment.  Maybe we
> can get Bruce or someone to give us some advice on exactly what should
> be done here.

As of now increasing version ask us to REINDEX (metapage access verify
whether we are in right version)
postgres=# set enable_seqscan= off;
SET
postgres=# select * from t1 where i = 10;
ERROR:  index "hash2" has wrong hash version
HINT:  Please REINDEX it.
postgres=# insert into t1 values(10);
ERROR:  index "hash2" has wrong hash version
HINT:  Please REINDEX it.

postgres=# REINDEX INDEX hash2;
REINDEX
postgres=# select * from t1 where i = 10;
 i

 10
(1 row)

Last time we changed this version from 1 to 2
(4adc2f72a4ccd6e55e594aca837f09130a6af62b), from logs I see no changes
for upgrade specifically.

Hi Bruce, can you please advise us what should be done here.

> In a couple of places, you say that a splitpoint group has 4 slots
> rather than 4 phases.

--Fixed

> I think that in _hash_get_totalbuckets(), you should use blah &
> HASH_SPLITPOINT_PHASE_MASK rather than blah %
> HASH_SPLITPOINT_PHASES_PER_GRP for consistency with _hash_spareindex
> and, perhaps, speed.  Similarly, instead of blah /
> HASH_SPLITPOINT_PHASES_PER_GRP, use blah >>
> HASH_SPLITPOINT_PHASE_BITS.

--Fixed

>
> buckets_toadd is punctuated oddly.  buckets_to_add?  Instead of
> hand-calculating this, how about calculating it as
> _hash_get_totalbuckets(spare_ndx) - _hash_get_totalbuckets(spare_ndx -
> 1)?

I think this should do that, considering new_bucket is nothng but
1-based max_buckets.
buckets_to_add = _hash_get_totalbuckets(spare_ndx) - new_bucket;

That makes me do away with

+#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \
+ (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \
+ (sp_phase) : \
+ sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \
+   HASH_SPLITPOINT_PHASE_BITS) + \
+  HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE))

as this is now used in only one place _hash_get_totalbuckets.
I also think the comments above can be removed now. As we have removed
the code related to multi-phased allocation there.

+ * The number of buckets in the new splitpoint group is equal
to the
+ * total number already in existence. But we do not allocate
them at
+ * once. Each splitpoint group will have 4 phases, we
distribute the
+ * buckets equally among them. So we allocate only one fourth
of total
+ * buckets in new splitpoint group at a time to consume one phase after
+ * another.

> +uint32  splitpoint_group = 0;
>
> Don't need the  = 0 here; the next reference to this variable is an
> unconditional initialization.

--Fixed, with new code splitpoint_group is not needed.

>
> + */
> +
> +splitpoint_group = 
> HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(spare_ndx);
>
> I would delete the blank line.

--Fixed.

>
> - * should start from ((2 ^ i) + metap->hashm_spares[i - 1] + 1).
> + * should start from
> + * (_hash_get_totalbuckets(i) + metap->hashm_spares[i - 1] + 1).
>
> Won't survive pgindent.

--Fixed as pgindent has suggested.

>
> - * The number of buckets in the new splitpoint is equal to the total
> - * number already in existence, i.e. new_bucket.  Currently this maps
> - * one-to-one to blocks required, but someday we may need a more
> - * complicated calculation here.  We treat allocation of buckets as a
> - * separate WAL-logged action.  Even if we fail after this operation,
> - * won't leak bucket pages; rather, the next split will consume this
> - * space. In any case, even without failure we don't use all the 
> space
> - * in one split operation.
> + * The number of buckets in the new splitpoint group is equal to the
> + * total number already in existence. But we do not allocate them at
> + * once. Each splitpoint group will have 4 slots, we distribute the
> + * buckets equally among them. So we allocate only one fourth of 
> total
> + * buckets in new splitpoint group at a time to consume one phase 
> after
> + * another. We treat allocation of buckets as a separate WAL-logged
> + * action. Even if we fail after this operation, won't leak bucket
> + * pages; rather, the next split will consume this space. In any 
> case,
> + * even without failur

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Robert Haas

On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy  wrote:
> Thanks Robert, I have tried to fix the comments given as below.

Thanks.

Since this changes the on-disk format of hash indexes in an
incompatible way, I think it should bump HASH_VERSION.  (Hopefully
that doesn't preclude REINDEX?)  pg_upgrade should probably also be
patched to issue a warning if upgrading from a version < 10 to a
version >= 10 whenever hash indexes are present; I thought we had
similar cases already, but I don't see them at the moment.  Maybe we
can get Bruce or someone to give us some advice on exactly what should
be done here.

In a couple of places, you say that a splitpoint group has 4 slots
rather than 4 phases.

I think that in _hash_get_totalbuckets(), you should use blah &
HASH_SPLITPOINT_PHASE_MASK rather than blah %
HASH_SPLITPOINT_PHASES_PER_GRP for consistency with _hash_spareindex
and, perhaps, speed.  Similarly, instead of blah /
HASH_SPLITPOINT_PHASES_PER_GRP, use blah >>
HASH_SPLITPOINT_PHASE_BITS.

buckets_toadd is punctuated oddly.  buckets_to_add?  Instead of
hand-calculating this, how about calculating it as
_hash_get_totalbuckets(spare_ndx) - _hash_get_totalbuckets(spare_ndx -
1)?  That way you reuse the existing logic instead of writing a
slightly different thing in a new place and maybe making a mistake.
If you're going to calculate it, use & and >> rather than % and /, as
above, and drop the parentheses around new_bucket -- this isn't a
macro definition.

+uint32  splitpoint_group = 0;

Don't need the  = 0 here; the next reference to this variable is an
unconditional initialization.

+ */
+
+splitpoint_group = HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(spare_ndx);

I would delete the blank line.

- * should start from ((2 ^ i) + metap->hashm_spares[i - 1] + 1).
+ * should start from
+ * (_hash_get_totalbuckets(i) + metap->hashm_spares[i - 1] + 1).

Won't survive pgindent.

- * The number of buckets in the new splitpoint is equal to the total
- * number already in existence, i.e. new_bucket.  Currently this maps
- * one-to-one to blocks required, but someday we may need a more
- * complicated calculation here.  We treat allocation of buckets as a
- * separate WAL-logged action.  Even if we fail after this operation,
- * won't leak bucket pages; rather, the next split will consume this
- * space. In any case, even without failure we don't use all the space
- * in one split operation.
+ * The number of buckets in the new splitpoint group is equal to the
+ * total number already in existence. But we do not allocate them at
+ * once. Each splitpoint group will have 4 slots, we distribute the
+ * buckets equally among them. So we allocate only one fourth of total
+ * buckets in new splitpoint group at a time to consume one phase after
+ * another. We treat allocation of buckets as a separate WAL-logged
+ * action. Even if we fail after this operation, won't leak bucket
+ * pages; rather, the next split will consume this space. In any case,
+ * even without failure we don't use all the space in one split
+ * operation.

I think here you should break this into two paragraphs -- start a new
paragraph with the sentence that begins "We treat..."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Mithun Cy

Thanks Robert, I have tried to fix the comments given as below.

On Thu, Mar 30, 2017 at 9:19 PM, Robert Haas  wrote:
> I think that the macros in hash.h need some more work:
>
> - Pretty much any time you use the argument of a macro, you need to
> parenthesize it in the macro definition to avoid surprises if the
> macros is called using an expression.  That isn't done consistently
> here.

--I have tried to fix same in the latest patch.

> - The macros make extensive use of magic numbers like 1, 2, and 3.  I
> suggest something like:
>
> #define SPLITPOINT_PHASE_BITS 2
> #define SPLITPOINT_PHASES_PER_GROUP (1 << SPLITPOINT_PHASE_BITS)
>
> And then use SPLITPOINT_PHASE_BITS any place where you're currently
> saying 2.  The reference to 3 is really SPLITPOINT_PHASE_BITS + 1.

-- Taken modified same in the latest patch.

> - Many of these macros are only used in one place.  Maybe just move
> the computation to that place and get rid of the macro.  For example,
> _hash_spareindex() could be written like this:
>
> if (splitpoint_group < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)
> return splitpoint_group;
>
> /* account for single-phase groups */
> splitpoint = SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE;
>
> /* account for completed groups */
> splitpoint += (splitpoint_group -
> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) << SPLITPOINT_PHASE_BITS;
>
> /* account for phases within current group */
> splitpoint += (bucket_num >> (SPLITPOINT_PHASE_BITS + 1)) &
> SPLITPOINT_PHASE_MASK;
>
> return splitpoint;
>
> That eliminates the only use of two complicated macros and is in my
> opinion more clear than what you've currently got.

-- Taken, also rewrote _hash_get_totalbuckets in similar lines.

With that, we will end up with only 2 macros which have some computing code
+/* defines max number of splitpoit phases a hash index can have */
+#define HASH_MAX_SPLITPOINT_GROUP 32
+#define HASH_MAX_SPLITPOINTS \
+ (((HASH_MAX_SPLITPOINT_GROUP - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) * \
+  HASH_SPLITPOINT_PHASES_PER_GRP) + \
+ HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE)
+
+/* given a splitpoint phase get its group */
+#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \
+ (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \
+ (sp_phase) : \
+ sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \
+   HASH_SPLITPOINT_PHASE_BITS) + \
+  HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE))

> - Some of these macros lack clear comments explaining their purpose.

-- I have written some comments to explain the use of the macros.

> - Some of them don't include HASH anywhere in the name, which is
> essential for a header that may easily be included by non-hash index
> code.

-- Fixed, all MACROS are prefixed with HASH

> - The names don't all follow a consistent format.  Maybe that's too
> much to hope for at some level, but I think they could be more
> consistent than they are.

-- Fixed, apart from old HASH_MAX_SPLITPOINTS rest all have a prefix
HASH_SPLITPOINT.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


yet_another_expand_hashbucket_efficiently_12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Robert Haas

On Wed, Mar 29, 2017 at 8:03 AM, Mithun Cy  wrote:
> Thanks, Amit for a detailed review.

I think that the macros in hash.h need some more work:

- Pretty much any time you use the argument of a macro, you need to
parenthesize it in the macro definition to avoid surprises if the
macros is called using an expression.  That isn't done consistently
here.

- The macros make extensive use of magic numbers like 1, 2, and 3.  I
suggest something like:

#define SPLITPOINT_PHASE_BITS 2
#define SPLITPOINT_PHASES_PER_GROUP (1 << SPLITPOINT_PHASE_BITS)

And then use SPLITPOINT_PHASE_BITS any place where you're currently
saying 2.  The reference to 3 is really SPLITPOINT_PHASE_BITS + 1.

- Many of these macros are only used in one place.  Maybe just move
the computation to that place and get rid of the macro.  For example,
_hash_spareindex() could be written like this:

if (splitpoint_group < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)
return splitpoint_group;

/* account for single-phase groups */
splitpoint = SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE;

/* account for completed groups */
splitpoint += (splitpoint_group -
SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) << SPLITPOINT_PHASE_BITS;

/* account for phases within current group */
splitpoint += (bucket_num >> (SPLITPOINT_PHASE_BITS + 1)) &
SPLITPOINT_PHASE_MASK;

return splitpoint;

That eliminates the only use of two complicated macros and is in my
opinion more clear than what you've currently got.

- Some of these macros lack clear comments explaining their purpose.

- Some of them don't include HASH anywhere in the name, which is
essential for a header that may easily be included by non-hash index
code.

- The names don't all follow a consistent format.  Maybe that's too
much to hope for at some level, but I think they could be more
consistent than they are.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Mithun Cy

On Thu, Mar 30, 2017 at 7:23 PM, Robert Haas  wrote:
> I think approach B is incorrect.  Suppose we have 1536 buckets and
> hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193.  If I
> understand correctly, each of these values should be mapped either to
> bucket 0 or to bucket 1, and the goal of the sort is to put all of the
> bucket 0 tuples before all of the bucket 1 tuples, so that we get
> physical locality when inserting.  With approach A, the sort keys will
> match the bucket numbers -- we'll be sorting the list 0, 1, 0, 1, 0,
> 1, 0, 1 -- and we will end up doing all of the inserts to bucket 0
> before any of the inserts to bucket 1.  With approach B, we'll be
> sorting 512, 513, 1024, 1025, 0, 1, 512, 513 and will end up
> alternating inserts to bucket 0 with inserts to bucket 1.

Oops sorry, yes 2 denominators are different (one used in an insert
and another used in sorting keys) we will end up with different bucket
numbers. I think in patch B, I should have actually taken next 2-power
number of 1536 as the denominator and try to get the mod value. If the
mod value is > 1536 then reduce the denominator by half and retake the
mod to get the bucket within 1536. Which is what effectively Patch A
is doing. Approach B is a blunder, I apologize for that mistake. I
think Patch A should be considered. If adding the members of struct
Tuplesortstate is a concern I will rewrite Patch B as said above.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-30 Thread Robert Haas

On Tue, Mar 28, 2017 at 1:13 AM, Mithun Cy  wrote:
> B. In tuple sort we can use hash function bucket = hash_key %
> num_buckets instead of existing one which does bitwise "and" to
> determine the bucket of hash key. This way we will not wrongly assign
> buckets beyond max_buckets and sorted hash keys will be in sync with
> actual insertion order of _hash_doinsert.

I think approach B is incorrect.  Suppose we have 1536 buckets and
hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193.  If I
understand correctly, each of these values should be mapped either to
bucket 0 or to bucket 1, and the goal of the sort is to put all of the
bucket 0 tuples before all of the bucket 1 tuples, so that we get
physical locality when inserting.  With approach A, the sort keys will
match the bucket numbers -- we'll be sorting the list 0, 1, 0, 1, 0,
1, 0, 1 -- and we will end up doing all of the inserts to bucket 0
before any of the inserts to bucket 1.  With approach B, we'll be
sorting 512, 513, 1024, 1025, 0, 1, 512, 513 and will end up
alternating inserts to bucket 0 with inserts to bucket 1.

To put that another way, see this comment at the top of hashsort.c:

 * When building a very large hash index, we pre-sort the tuples by bucket
 * number to improve locality of access to the index, and thereby avoid
 * thrashing.  We use tuplesort.c to sort the given index tuples into order.

So, you can't just decide to sort on a random number, which is what
approach B effectively does.  Or, you can, but it completely misses
the point of sorting in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-29 Thread Mithun Cy

Thanks, Amit for a detailed review.

On Wed, Mar 29, 2017 at 4:09 PM, Amit Kapila  wrote:
> Okay, your current patch looks good to me apart from minor comments,
> so marked as Read For Committer.  Please either merge the
> sort_hash_b_2.patch with main patch or submit it along with next
> revision for easier reference.

I will keep it separated just in case commiter likes
sortbuild_hash_A.patch. We can use either of sortbuild_hash_*.patch on
top of main patch.

>
> Few minor comments:
> 1.
> +splitpoint phase S.  The hashm_spares[0] is always 0, so that buckets 0 and 1
> +(which belong to splitpoint group 0's phase 1 and phase 2 respectively) 
> always
> +appear at block numbers 1 and 2, just after the meta page.
>
> This explanation doesn't seem to be right as with current patch we
> start phased allocation only after splitpoint group 9.

Again a mistake, removed the sentence in parentheses.

> 2.
> -#define HASH_MAX_SPLITPOINTS 32
>  #define HASH_MAX_BITMAPS 128
>
> +#define SPLITPOINT_PHASES_PER_GRP 4
> +#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
> +#define SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE 10
> +#define HASH_MAX_SPLITPOINTS \
> + ((32 - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) * \
> + SPLITPOINT_PHASES_PER_GRP) + \
> + SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE
>
> You have changed the value of HASH_MAX_SPLITPOINTS, but the comments
> explaining that value are still unchanged.  Refer below text.
> "The limitation on the size of spares[] comes from the fact that there's
>  * no point in having more than 2^32 buckets with only uint32 hashcodes."

The limitation is still indirectly imposed by the fact that we can
have only 2^32 buckets. But I also added a note that
HASH_MAX_SPLITPOINTS also considers that after
SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE bucket allocation will be done
in multiple(exactly 4) phases.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


yet_another_expand_hashbucket_efficiently_11.patch
Description: Binary data


sortbuild_hash_B_2.patch
Description: Binary data


sortbuild_hash_A.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-29 Thread Mithun Cy

That means at every new
+split point we double the existing number of buckets.  Allocating huge chucks

On Mon, Mar 27, 2017 at 11:56 PM, Jesper Pedersen
 wrote:

> I ran some performance scenarios on the patch to see if the increased
> 'spares' allocation had an impact. I havn't found any regressions in that
> regard.

Thanks Jasper for testing the patch.

> Attached patch contains some small fixes, mainly to the documentation - on
> top of v7.

I have taken some of the grammatical and spell check issues you have
mentioned. One major thing I left it as it is term "splitpoint" which
you have tried to change in many places to "split point", The
splitpoint is not introduced by me, it was already used in many
places, so I think it is acceptable to use that term. I think I shall
not add changes which are not part of the core issue. I think another
patch on top of this should be okay.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-29 Thread Amit Kapila

On Wed, Mar 29, 2017 at 12:51 PM, Mithun Cy  wrote:
> On Wed, Mar 29, 2017 at 10:12 AM, Amit Kapila  wrote:
>> Few other comments:
>> +/*
>> + * This is just a trick to save a division operation. If you look into the
>> + * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will 
>> indicate
>> + * which phase of allocation the bucket_num belongs to with in the group. 
>> This
>> + * is because at every splitpoint group we allocate (2 ^ x) buckets and we 
>> have
>> + * divided the allocation process into 4 equal phases. This macro returns 
>> value
>> + * from 0 to 3.
>> + */
>>
>> +#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \
>> + (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \
>> + SPLITPOINT_PHASE_MASK)
>> This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to
>> number other than 3.  I think you should change it so that it can work
>> with any value of  SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE.
>
> Fixed, using SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE was accidental. All
> I need is most significant 3 bits hence should be subtracted by 3
> always.
>

Okay, your current patch looks good to me apart from minor comments,
so marked as Read For Committer.  Please either merge the
sort_hash_b_2.patch with main patch or submit it along with next
revision for easier reference.

Few minor comments:
1.
+splitpoint phase S.  The hashm_spares[0] is always 0, so that buckets 0 and 1
+(which belong to splitpoint group 0's phase 1 and phase 2 respectively) always
+appear at block numbers 1 and 2, just after the meta page.

This explanation doesn't seem to be right as with current patch we
start phased allocation only after splitpoint group 9.

2.
-#define HASH_MAX_SPLITPOINTS 32
 #define HASH_MAX_BITMAPS 128

+#define SPLITPOINT_PHASES_PER_GRP 4
+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
+#define SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE 10
+#define HASH_MAX_SPLITPOINTS \
+ ((32 - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) * \
+ SPLITPOINT_PHASES_PER_GRP) + \
+ SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE

You have changed the value of HASH_MAX_SPLITPOINTS, but the comments
explaining that value are still unchanged.  Refer below text.
"The limitation on the size of spares[] comes from the fact that there's
 * no point in having more than 2^32 buckets with only uint32 hashcodes."



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-29 Thread Mithun Cy

On Wed, Mar 29, 2017 at 10:12 AM, Amit Kapila  wrote:

>> I wonder if we should consider increasing
>> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat.  For example, split
>> point 4 is responsible for allocating only 16 new buckets = 128kB;
>> doing those in four groups of two (16kB) seems fairly pointless.
>> Suppose we start applying this technique beginning around splitpoint 9
>> or 10.  Breaking 1024 new buckets * 8kB = 8MB of index growth into 4
>> phases might save enough to be worthwhile.
>>
>
> 10 sounds better point to start allocating in phases.

+1. At splitpoint group 10 we will allocate (2 ^ 9) buckets = 4MB in
total and each phase will allocate 2 ^ 7 buckets = 128 * 8kB = 1MB.

> Few other comments:
> +/*
> + * This is just a trick to save a division operation. If you look into the
> + * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will 
> indicate
> + * which phase of allocation the bucket_num belongs to with in the group. 
> This
> + * is because at every splitpoint group we allocate (2 ^ x) buckets and we 
> have
> + * divided the allocation process into 4 equal phases. This macro returns 
> value
> + * from 0 to 3.
> + */
>
> +#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \
> + (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \
> + SPLITPOINT_PHASE_MASK)
> This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to
> number other than 3.  I think you should change it so that it can work
> with any value of  SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE.

Fixed, using SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE was accidental. All
I need is most significant 3 bits hence should be subtracted by 3
always.

> I think you should name this define as SPLITPOINT_PHASE_WITHIN_GROUP
> as this refers to only one particular phase within group.

Fixed.

Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


yet_another_expand_hashbucket_efficiently_10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-28 Thread Amit Kapila

On Tue, Mar 28, 2017 at 8:56 PM, Robert Haas  wrote:
> On Tue, Mar 28, 2017 at 5:00 AM, Mithun Cy  wrote:
>>> This will go wrong for split point group zero.  In general, I feel if
>>> you handle computation for split groups lesser than
>>> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your
>>> macros will become much simpler and less error prone.
>>
>> Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros
>> only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP
>> is used in one more place at expand index so thought kepeping it as it
>> is is better.
>
> I wonder if we should consider increasing
> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat.  For example, split
> point 4 is responsible for allocating only 16 new buckets = 128kB;
> doing those in four groups of two (16kB) seems fairly pointless.
> Suppose we start applying this technique beginning around splitpoint 9
> or 10.  Breaking 1024 new buckets * 8kB = 8MB of index growth into 4
> phases might save enough to be worthwhile.
>

10 sounds better point to start allocating in phases.

> Of course, even if we decide to apply this even for very small
> splitpoints, it probably doesn't cost us anything other than some
> space in the metapage.  But maybe saving space in the metapage isn't
> such a bad idea anyway.
>

Yeah metapage space is scarce, so lets try to save it as much possible.


Few other comments:
+/*
+ * This is just a trick to save a division operation. If you look into the
+ * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will indicate
+ * which phase of allocation the bucket_num belongs to with in the group. This
+ * is because at every splitpoint group we allocate (2 ^ x) buckets and we have
+ * divided the allocation process into 4 equal phases. This macro returns value
+ * from 0 to 3.
+ */

+#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \
+ (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \
+ SPLITPOINT_PHASE_MASK)

This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to
number other than 3.  I think you should change it so that it can work
with any value of  SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE.

I think you should name this define as SPLITPOINT_PHASE_WITHIN_GROUP
as this refers to only one particular phase within group.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-28 Thread Robert Haas

On Tue, Mar 28, 2017 at 5:00 AM, Mithun Cy  wrote:
>> This will go wrong for split point group zero.  In general, I feel if
>> you handle computation for split groups lesser than
>> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your
>> macros will become much simpler and less error prone.
>
> Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros
> only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP
> is used in one more place at expand index so thought kepeping it as it
> is is better.

I wonder if we should consider increasing
SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat.  For example, split
point 4 is responsible for allocating only 16 new buckets = 128kB;
doing those in four groups of two (16kB) seems fairly pointless.
Suppose we start applying this technique beginning around splitpoint 9
or 10.  Breaking 1024 new buckets * 8kB = 8MB of index growth into 4
phases might save enough to be worthwhile.

Of course, even if we decide to apply this even for very small
splitpoints, it probably doesn't cost us anything other than some
space in the metapage.  But maybe saving space in the metapage isn't
such a bad idea anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-28 Thread Mithun Cy

On Tue, Mar 28, 2017 at 12:21 PM, Amit Kapila  wrote:
> I think both way it can work.  I feel there is no hard pressed need
> that we should make the computation in sorting same as what you do
> _hash_doinsert. In patch_B,  I don't think new naming for variables is
> good.
>
>   Assert(!a->isnull1);
> - hash1 = DatumGetUInt32(a->datum1) & state->hash_mask;
> + bucket1 = DatumGetUInt32(a->datum1) % state->num_buckets;
>
> Can we use hash_mod instead of num_buckets and retain hash1 in above
> code and similar other places?

Yes done renamed it to hash_mod.

>
> Few comments:
> 1.
> +#define BUCKETS_WITHIN_SP_GRP(sp_g, nphase) \
> + ((sp_g < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) ? \
> + (1 << (sp_g - 1)) : \
> + ((nphase) * ((1 << (sp_g - 1)) >> 2)))
>
> This will go wrong for split point group zero.  In general, I feel if
> you handle computation for split groups lesser than
> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your
> macros will become much simpler and less error prone.

Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros
only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP
is used in one more place at expand index so thought kepeping it as it
is is better.
.
> 2.
> +#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) (_hash_log2(num_bucket))
>
> What is the use of such a define, can't we directly use _hash_log2 in
> the caller?

No real reason, just that NAMED macro appeared more readable than just
_hash_log2. Now I have removed same.

> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


yet_another_expand_hashbucket_efficiently_09.patch
Description: Binary data


sortbuild_hash_B_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Amit Kapila

On Tue, Mar 28, 2017 at 10:43 AM, Mithun Cy  wrote:
> On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila  wrote:
>
>
> As you have said we can solve it if we allocate buckets always in
> power-of-2 when we do hash index meta page init. But on other
> occasions, when we try to double the existing buckets we can do the
> allocation in 4 equal phases.
>
> But I think there are 2 more ways to solve same,
>
> A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to
> tuplesort and let them use _hash_hashkey2bucket to figure out which
> key belong to which bucket. And then sort them. I think this way we
> make both sorting and insertion to hash index both consistent with
> each other.
>
> B. In tuple sort we can use hash function bucket = hash_key %
> num_buckets instead of existing one which does bitwise "and" to
> determine the bucket of hash key. This way we will not wrongly assign
> buckets beyond max_buckets and sorted hash keys will be in sync with
> actual insertion order of _hash_doinsert.
>
>
> I am adding both the patches Patch_A and Patch_B. My preference is
> Patch_A and I am open for suggestion.
>

I think both way it can work.  I feel there is no hard pressed need
that we should make the computation in sorting same as what you do
_hash_doinsert. In patch_B,  I don't think new naming for variables is
good.

  Assert(!a->isnull1);
- hash1 = DatumGetUInt32(a->datum1) & state->hash_mask;
+ bucket1 = DatumGetUInt32(a->datum1) % state->num_buckets;

Can we use hash_mod instead of num_buckets and retain hash1 in above
code and similar other places?

>>+#define SPLITPOINT_PHASES_PER_GRP 4
>>+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
>>+#define Buckets_First_Split_Group 4
> Fixed.
>
>>In the above computation +2 and -2 still bothers me.  I think you need
>>to do this because you have defined split group zero to have four
>>buckets, how about if you don't force that and rather define to have
>>split point phases only from split point which has four or more
>>buckets?
>
> Okay as suggested instead of group zero having 4 phases of 1 bucket
> each, I have recalculated the spare mapping as below.
>

Few comments:
1.
+#define BUCKETS_WITHIN_SP_GRP(sp_g, nphase) \
+ ((sp_g < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) ? \
+ (1 << (sp_g - 1)) : \
+ ((nphase) * ((1 << (sp_g - 1)) >> 2)))

This will go wrong for split point group zero.  In general, I feel if
you handle computation for split groups lesser than
SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your
macros will become much simpler and less error prone.

2.
+#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) (_hash_log2(num_bucket))

What is the use of such a define, can't we directly use _hash_log2 in
the caller?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Mithun Cy

On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila  wrote:

> I think we can't change the number of buckets to be created or lowmask
> and highmask calculation here without modifying _h_spoolinit() because
> it sorts the data to be inserted based on hashkey which in turn
> depends on the number of buckets that we are going to create during
> create index operation.  We either need to allow create index
> operation to still always create buckets in power-of-two fashion or we
> need to update _h_spoolinit according to new computation.  One minor
> drawback of using power-of-two scheme for creation of buckets during
> create index is that it can lead to wastage of space and will be
> inconsistent with what the patch does during split operation.

Yes, this was a miss. Now Number of buckets allocated during
metap_init is not always a power-of-two number. The hashbuild which
uses just the hash_mask to decide which bucket does the hashkey
belong to is not sufficient. It can give buckets beyond max_buckets
and sorting of index values based on their buckets will be out of
order. When we try to actually insert the same in hash index we loose
the advantage of the spatial locality which existed before. And, hence
indexbuild performance can reduce.

As you have said we can solve it if we allocate buckets always in
power-of-2 when we do hash index meta page init. But on other
occasions, when we try to double the existing buckets we can do the
allocation in 4 equal phases.

But I think there are 2 more ways to solve same,

A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to
tuplesort and let them use _hash_hashkey2bucket to figure out which
key belong to which bucket. And then sort them. I think this way we
make both sorting and insertion to hash index both consistent with
each other.

B. In tuple sort we can use hash function bucket = hash_key %
num_buckets instead of existing one which does bitwise "and" to
determine the bucket of hash key. This way we will not wrongly assign
buckets beyond max_buckets and sorted hash keys will be in sync with
actual insertion order of _hash_doinsert.

I am adding both the patches Patch_A and Patch_B. My preference is
Patch_A and I am open for suggestion.

>+#define SPLITPOINT_PHASES_PER_GRP 4
>+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
>+#define Buckets_First_Split_Group 4
Fixed.

>In the above computation +2 and -2 still bothers me.  I think you need
>to do this because you have defined split group zero to have four
>buckets, how about if you don't force that and rather define to have
>split point phases only from split point which has four or more
>buckets?

Okay as suggested instead of group zero having 4 phases of 1 bucket
each, I have recalculated the spare mapping as below.
Allocating huge chunks of bucket pages all at once isn't optimal and
we will take ages to consume those. To avoid this exponential growth
of index size, we did use a trick to breakup allocation of buckets at
the splitpoint into 4 equal phases.  If (2 ^ x) is the total buckets
need to be allocated at a splitpoint (from now on we shall call this
as a splitpoint group), then we allocate 1/4th (2 ^ (x - 2)) of total
buckets at each phase of splitpoint group. Next quarter of allocation
will only happen if buckets of the previous phase have been already
consumed.  Since for buckets number < 4 we cannot further divide it
into multiple phases, the first 3 group will have only one phase of
allocation. The groups 0, 1, 2 will allocate 1, 1, 2 buckets
respectively at once in one phase. For the groups > 2 Where we
allocate buckets > 4, the allocation process is distributed among four
equal phases. At group 3 we allocate 4 buckets in 4 different phases
{1, 1, 1, 1}, the numbers in curly braces indicate number of buckets
allocated within each phase of splitpoint group 3. And, for splitpoint
group 4 and 5 allocation phase will be {2, 2, 2, 2} = 16 buckets in
total and {4, 4, 4, 4} = 32 buckets in total.  We can see that at each
splitpoint group
we double the total number of buckets from previous group but in an
incremental phase.  The bucket pages allocated within one phase of a
splitpoint group will appear consecutively in the index.

The sortbuild_hash_*.patch can be applied independently on any of
expand_hashbucket_efficiently_08.patch
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

sortbuild_hash_B.patch
Description: Binary data

sortbuild_hash_A.patch
Description: Binary data

yet_another_expand_hashbucket_efficiently_08.patch
Description: Binary data

expand_hashbucket_efficiently_08.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Jesper Pedersen


Hi Mithun,

On 03/26/2017 01:56 AM, Mithun Cy wrote:

Thanks, Amit for the review.
On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila  wrote:


I think one-dimensional patch has fewer places to touch, so that looks
better to me.  However, I think there is still hard coding and
assumptions in code which we should try to improve.


Great!, I will continue with spares 1-dimensional improvement.



I ran some performance scenarios on the patch to see if the increased 
'spares' allocation had an impact. I havn't found any regressions in 
that regard.


Attached patch contains some small fixes, mainly to the documentation - 
on top of v7.


Best regards,
 Jesper

>From 5545e48ab7136f17b3d471e0ee679a6db6040865 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Mon, 27 Mar 2017 14:15:00 -0400
Subject: [PATCH] Small fixes

---
 src/backend/access/hash/README | 50 +++---
 src/backend/access/hash/hashpage.c | 26 ++--
 src/backend/access/hash/hashutil.c | 24 +-
 3 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 6721ee1..ca46de7 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -58,50 +58,50 @@ rules to support a variable number of overflow pages while not having to
 move primary bucket pages around after they are created.
 
 Primary bucket pages (henceforth just "bucket pages") are allocated in
-power-of-2 groups, called "split points" in the code.  That means on every new
-split point we double the existing number of buckets.  And, it seems bad to
-allocate huge chunks of bucket pages all at once and we take ages to consume it.
-To avoid this exponential growth of index size, we did a trick to breakup
-allocation of buckets at splitpoint into 4 equal phases.  If 2^x is the total
-buckets need to be allocated at a splitpoint (from now on we shall call this
-as splitpoint group), then we allocate 1/4th (2^(x - 2)) of total buckets at
-each phase of splitpoint group. Next quarter of allocation will only happen if
+power-of-2 groups, called "split points" in the code.  That means at every new
+split point we double the existing number of buckets.  Allocating huge chucks
+of bucket pages all at once isn't optimal and we will take ages to consume those.
+To avoid this exponential growth of index size, we did use a trick to breakup
+allocation of buckets at the split points into 4 equal phases.  If 2^x is the total
+buckets needed to be allocated at a split point (from now on we shall call this
+a split point group), then we allocate 1/4th (2^(x - 2)) of total buckets at
+each phase of the split point group. Next quarter of allocation will only happen if
 buckets of previous phase has been already consumed.  Since for buckets number
-< 4 we cannot further divide it in to multiple phases, the first splitpoint
+< 4 we cannot further divide it in to multiple phases, the first split point
 group 0's allocation is done as follows {1, 1, 1, 1} = 4 buckets in total, the
 numbers in curly braces indicate number of buckets allocated within each phase
-of splitpoint group 0.  In next splitpoint group 1 the allocation phases will
-be as follow {1, 1, 1, 1} = 8 buckets in total.  And, for splitpoint group 2
+of split point group 0.  In next split point group 1 the allocation phases will
+be as follow {1, 1, 1, 1} = 8 buckets in total.  And, for split point group 2
 and 3 allocation phase will be {2, 2, 2, 2} = 16 buckets in total and
-{4, 4, 4, 4} = 32 buckets in total.  We can see that at each splitpoint group
-we double the total number of buckets from previous group but in a incremental
-phase.  The bucket pages allocated within one phase of a splitpoint group will
+{4, 4, 4, 4} = 32 buckets in total.  We can see that at each split point group
+we double the total number of buckets from previous group but in an incremental
+phase.  The bucket pages allocated within one phase of a split point group will
 appear consecutively in the index.  This addressing scheme allows the physical
 location of a bucket page to be computed from the bucket number relatively
 easily, using only a small amount of control information.  If we look at the
-function _hash_spareindex for a given bucket number we first compute splitpoint
-group it belongs to and then the phase with in it to which the bucket belongs
-to.  Adding them we get the global splitpoint phase number S to which the
+function _hash_spareindex for a given bucket number we first compute the split point
+group it belongs to and then the phase to which the bucket belongs
+to.  Adding them we get the global split point phase number S to which the
 bucket belongs and then simply add "hashm_spares[S] + 1" (where hashm_spares[] is
 an array stored in the metapage) with given bucket number to compute its
 physical address.  The hashm_spares[S] can be interpreted as the total number
 of overflow pages that have been

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Amit Kapila

On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila  wrote:
> On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy  
> wrote:
>> Thanks, Amit for the review.
>> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila  wrote:
>>>
>>> I think one-dimensional patch has fewer places to touch, so that looks
>>> better to me.  However, I think there is still hard coding and
>>> assumptions in code which we should try to improve.
>>
>> Great!, I will continue with spares 1-dimensional improvement.
>>
>
> @@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
> num_tuples, RegProcedure procid,\
> {
> ..
>   else
> - num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
> + num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
> ..
> ..
> - metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
> - metap->hashm_highmask = (num_buckets << 1) - 1;
> + metap->hashm_maxbucket = num_buckets - 1;
> +
> + /* set hishmask, which should be sufficient to cover num_buckets. */
> + metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
> + metap->hashm_lowmask = (metap->hashm_highmask >> 1);
> }
>
> I think we can't change the number of buckets to be created or lowmask
> and highmask calculation here without modifying _h_spoolinit() because
> it sorts the data to be inserted based on hashkey which in turn
> depends on the number of buckets that we are going to create during
> create index operation.  We either need to allow create index
> operation to still always create buckets in power-of-two fashion or we
> need to update _h_spoolinit according to new computation.  One minor
> drawback of using power-of-two scheme for creation of buckets during
> create index is that it can lead to wastage of space and will be
> inconsistent with what the patch does during split operation.
>

Few more comments:

1.
@@ -149,6 +149,84 @@ _hash_log2(uint32 num)
  return i;
 }

+#define SPLITPOINT_PHASES_PER_GRP 4
+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
+#define Buckets_First_Split_Group 4

The defines should be at the beginning of file.

2.
+/*
+ * At splitpoint group 0 we have 2 ^ (0 + 2) = 4 buckets, then at splitpoint
+ * group 1 we have 2 ^ (1 + 2) = 8 total buckets. As the doubling continues at
+ * splitpoint group "x" we will have 2 ^ (x + 2) total buckets. Total buckets
+ * before x splitpoint group will be 2 ^ (x + 1). At each phase of allocation
+ * within splitpoint group we add 2 ^ (x - 1) buckets, as we have to divide the
+ * task of allocation of 2 ^ (x + 1) buckets among 4 phases.
+ *
+ * Also, splitpoint group of a given bucket can be taken as
+ * (_hash_log2(bucket) - 2). Subtracted by 2 because each group have
+ * 2 ^ (x + 2) buckets.
+ */
..

+#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \
+ ((num_bucket <= Buckets_First_Split_Group) ? 0 : \
+ (_hash_log2(num_bucket) - 2))

In the above computation +2 and -2 still bothers me.  I think you need
to do this because you have defined split group zero to have four
buckets, how about if you don't force that and rather define to have
split point phases only from split point which has four or more
buckets?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-26 Thread Amit Kapila

On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy  wrote:
> Thanks, Amit for the review.
> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila  wrote:
>>
>> I think one-dimensional patch has fewer places to touch, so that looks
>> better to me.  However, I think there is still hard coding and
>> assumptions in code which we should try to improve.
>
> Great!, I will continue with spares 1-dimensional improvement.
>

@@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
num_tuples, RegProcedure procid,\
{
..
  else
- num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
+ num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
..
..
- metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
- metap->hashm_highmask = (num_buckets << 1) - 1;
+ metap->hashm_maxbucket = num_buckets - 1;
+
+ /* set hishmask, which should be sufficient to cover num_buckets. */
+ metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
+ metap->hashm_lowmask = (metap->hashm_highmask >> 1);
}

I think we can't change the number of buckets to be created or lowmask
and highmask calculation here without modifying _h_spoolinit() because
it sorts the data to be inserted based on hashkey which in turn
depends on the number of buckets that we are going to create during
create index operation.  We either need to allow create index
operation to still always create buckets in power-of-two fashion or we
need to update _h_spoolinit according to new computation.  One minor
drawback of using power-of-two scheme for creation of buckets during
create index is that it can lead to wastage of space and will be
inconsistent with what the patch does during split operation.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-25 Thread Mithun Cy

Thanks, Amit for the review.
On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila  wrote:
>
> I think one-dimensional patch has fewer places to touch, so that looks
> better to me.  However, I think there is still hard coding and
> assumptions in code which we should try to improve.

Great!, I will continue with spares 1-dimensional improvement.

>
> 1.
> + /*
> + * The first 4 bucket belongs to first splitpoint group 0. And since group
> + * 0 have 4 = 2^2 buckets, we double them in group 1. So total buckets
> + * after group 1 is 8 = 2^3. Then again at group 2 we add another 2^3
> + * buckets to double the total number of buckets, which will become 2^4. I
> + * think by this time we can see a pattern which say if num_bucket > 4
> + * splitpoint group = log2(num_bucket) - 2
> + */
>
> + if (num_bucket <= 4)
> + splitpoint_group = 0; /* converted to base 0. */
> + else
> + splitpoint_group = _hash_log2(num_bucket) - 2;
>
> This patch defines split point group zero has four buckets and based
> on that above calculation is done.  I feel you can define it like
> #define Buckets_First_Split_Group 4 and then use it in above code.
> Also, in else part number 2 looks awkward, can we define it as
> log2_buckets_first_group = _hash_log2(Buckets_First_Split_Group) or
> some thing like that.  I think that way code will look neat.  I don't
> like the way above comment is worded even though it is helpful to
> understand the calculation.  If you want, then you can add such a
> comment in file header, here it looks out of place.

I have removed the comments. And, defined a new macro which maps
bucket to SPLIT GROUP

#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \
((num_bucket <= Buckets_First_Split_Group) ? 0 : \
(_hash_log2(num_bucket) - 2))

I could not find a way to explain why minus 2? better than " The
splitpoint group of a given bucket can be taken as
(_hash_log2(bucket) - 2). Subtracted by 2 because each group have 2 ^
(x + 2) buckets.". Now I have added those with existing comments I
think that should make it little better.

Adding comments about spares array in hashutils.c's file header did
not appear right to me. I think README has some details about same.

> 2.
> +power-of-2 groups, called "split points" in the code.  That means on every 
> new
> +split points we double the existing number of buckets.
>
> split points/split point

Fixed.

> 3.
> + * which phase of allocation the bucket_num belogs to with in the group.
>
> /belogs/belongs

Fixed

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_07.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-25 Thread Amit Kapila

On Sat, Mar 25, 2017 at 10:13 AM, Mithun Cy  wrote:
> On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila  wrote:
>> Sure, I was telling you based on that.  If you are implicitly treating
>> it as 2-dimensional array, it might be easier to compute the array
>>offsets.
>
> I think calculating spares offset will not become anyway much simpler
> we still need to calculate split group and split phase separately. I
> have added a patch to show how a 2-dimensional spares code looks like
> and where all we need changes.
>

I think one-dimensional patch has fewer places to touch, so that looks
better to me.  However, I think there is still hard coding and
assumptions in code which we should try to improve.

1.
+ /*
+ * The first 4 bucket belongs to first splitpoint group 0. And since group
+ * 0 have 4 = 2^2 buckets, we double them in group 1. So total buckets
+ * after group 1 is 8 = 2^3. Then again at group 2 we add another 2^3
+ * buckets to double the total number of buckets, which will become 2^4. I
+ * think by this time we can see a pattern which say if num_bucket > 4
+ * splitpoint group = log2(num_bucket) - 2
+ */

+ if (num_bucket <= 4)
+ splitpoint_group = 0; /* converted to base 0. */
+ else
+ splitpoint_group = _hash_log2(num_bucket) - 2;

This patch defines split point group zero has four buckets and based
on that above calculation is done.  I feel you can define it like
#define Buckets_First_Split_Group 4 and then use it in above code.
Also, in else part number 2 looks awkward, can we define it as
log2_buckets_first_group = _hash_log2(Buckets_First_Split_Group) or
some thing like that.  I think that way code will look neat.  I don't
like the way above comment is worded even though it is helpful to
understand the calculation.  If you want, then you can add such a
comment in file header, here it looks out of place.

2.
+power-of-2 groups, called "split points" in the code.  That means on every new
+split points we double the existing number of buckets.

split points/split point

3.
+ * which phase of allocation the bucket_num belogs to with in the group.

/belogs/belongs

I have still not completely reviewed the patch as I have ran out of
time for today.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-24 Thread Mithun Cy

On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila  wrote:
> Sure, I was telling you based on that.  If you are implicitly treating
> it as 2-dimensional array, it might be easier to compute the array
>offsets.

I think calculating spares offset will not become anyway much simpler
we still need to calculate split group and split phase separately. I
have added a patch to show how a 2-dimensional spares code looks like
and where all we need changes.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_06_spares_2dimesion.patch
Description: Binary data


expand_hashbucket_efficiently_06_spares_1dimension.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-23 Thread Mithun Cy

On Fri, Mar 24, 2017 at 1:22 AM, Mithun Cy  wrote:
> Hi Amit please find the new patch

The pageinspect.sgml has an example which shows the output of
"hash_metapage_info()". Since we increase the spares array and
eventually ovflpoint, I have updated the example with corresponding
values..


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_05.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-23 Thread Mithun Cy

Hi Amit please find the new patch

On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila  wrote:
> It is not only about above calculation, but also what the patch is
> doing in function _hash_get_tbuckets().  By the way function name also
> seems unclear (mainly *tbuckets* in the name).

Fixed I have introduced some macros for readability and added more
comments to explain why some calculations are mad. Please let me know
if you think more improvements are needed.

>spelling.
>/forth/fourth
>/at time/at a time
Thanks fixed.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_04.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-20 Thread Amit Kapila

On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila  wrote:
> On Mon, Mar 20, 2017 at 8:58 PM, Mithun Cy  wrote:
>> Hi Amit, Thanks for the review,
>>
>> On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila  wrote:
>>> idea could be to make hashm_spares a two-dimensional array
>>> hashm_spares[32][4] where the first dimension will indicate the split
>>> point and second will indicate the sub-split number.  I am not sure
>>> whether it will be simpler or complex than the method used in the
>>> proposed patch, but I think we should think a bit more to see if we
>>> can come up with some simple technique to solve this problem.
>>
>> I think making it a 2-dimensional array will not be any useful in fact
>> we really treat the given array 2-dimensional elements now.
>>
>
> Sure, I was telling you based on that.  If you are implicitly treating
> it as 2-dimensional array, it might be easier to compute the array
> offsets.
>

The above sentence looks incomplete.
If you are implicitly treating it as a 2-dimensional array, it might
be easier to compute the array offsets if you explicitly also treats
as a 2-dimensional array.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-20 Thread Amit Kapila

On Mon, Mar 20, 2017 at 8:58 PM, Mithun Cy  wrote:
> Hi Amit, Thanks for the review,
>
> On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila  wrote:
>> idea could be to make hashm_spares a two-dimensional array
>> hashm_spares[32][4] where the first dimension will indicate the split
>> point and second will indicate the sub-split number.  I am not sure
>> whether it will be simpler or complex than the method used in the
>> proposed patch, but I think we should think a bit more to see if we
>> can come up with some simple technique to solve this problem.
>
> I think making it a 2-dimensional array will not be any useful in fact
> we really treat the given array 2-dimensional elements now.
>

Sure, I was telling you based on that.  If you are implicitly treating
it as 2-dimensional array, it might be easier to compute the array
offsets.

> The main concern of yours I think is the calculation steps to find the
> phase of the splitpoint group the bucket belongs to.
> + tbuckets = (1 << (splitpoint_group + 2));
> + phases_beyond_bucket =
> + (tbuckets - num_bucket) / (1 << (splitpoint_group - 1));
> + return (((splitpoint_group + 1) << 2) - phases_beyond_bucket) - 1;
>

It is not only about above calculation, but also what the patch is
doing in function _hash_get_tbuckets().  By the way function name also
seems unclear (mainly *tbuckets* in the name).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-20 Thread Mithun Cy

Hi Amit, Thanks for the review,

On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila  wrote:
> idea could be to make hashm_spares a two-dimensional array
> hashm_spares[32][4] where the first dimension will indicate the split
> point and second will indicate the sub-split number.  I am not sure
> whether it will be simpler or complex than the method used in the
> proposed patch, but I think we should think a bit more to see if we
> can come up with some simple technique to solve this problem.

I think making it a 2-dimensional array will not be any useful in fact
we really treat the given array 2-dimensional elements now.
The main concern of yours I think is the calculation steps to find the
phase of the splitpoint group the bucket belongs to.
+ tbuckets = (1 << (splitpoint_group + 2));
+ phases_beyond_bucket =
+ (tbuckets - num_bucket) / (1 << (splitpoint_group - 1));
+ return (((splitpoint_group + 1) << 2) - phases_beyond_bucket) - 1;

Quickly thinking further we allocate 2^x of buckets on each phase of
any splitpoint group and we have 4 such phases in each group; At each
splitpoint group again we allocate 2^y buckets, So buckets number have
a pattern in their bitmap representation to find out which phase of
allocation they belong to.


As below
===
Group 0 -- bit 0, 1 define which phase of group each bucket belong to.
0 -- 
1 -- 0001
2 -- 0010
3 -- 0011
===
Group 1 -- bit 0, 1 define which phase of group each bucket belong to.
4 -- 0100
5 -- 0101
6 -- 0110
7 -- 0111
===
Group 2 -- bit 1, 2 define which phase of group each bucket belong to.
8 -- 1000
9 -- 1001
10 -- 1010
11 -- 1011
12 -- 1100
13 -- 1101
14 -- 1110
15 -- 
===
Group 3 -- bit 2, 3 define which phase of group each bucket belong to.
16 -- 0001
17 -- 00010001
18 -- 00010010
19 -- 00010011
20 -- 00010100
21 -- 00010101
22 -- 00010110
23 -- 00010111
24 -- 00011000
25 -- 00011001
26 -- 00011010
27 -- 00011011
28 -- 00011100
29 -- 00011101
30 -- 0000
31 -- 0001


So we can say given a bucket x of group n > 0 bits (n-1, n) defines
which phase of group they belong to. I see an opportunity here to
completely simplify the above calculation into a simple bitwise anding
the bucket number with (n-1,n) bitmask to get the phase of allocation
of bucket x.

Formula can be (x >> (splitpoint_group - 1)) & 0x3 = phase of bucket

Does this satisfy your concern?

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-20 Thread Amit Kapila

On Sat, Mar 18, 2017 at 10:59 PM, Mithun Cy  wrote:
> On Thu, Mar 16, 2017 at 10:55 PM, David Steele  wrote:
>> It does apply with fuzz on 2b32ac2, so it looks like c11453c and
>> subsequent commits are the cause.  They represent a fairly substantial
>> change to hash indexes by introducing WAL logging so I think you should
>> reevaluate your patches to be sure they still function as expected.
>
> Thanks, David here is the new improved patch I have also corrected the
> pageinspect's test output. Also, added notes in README regarding the
> new way of adding bucket pages efficiently in hash index. I also did
> some more tests pgbench read only and read write;
>

To make this work, I think the calculations you have introduced are
not so easy to understand.  For example, it took me quite some time to
understand how the below function works to compute the location in
hash spares.

+uint32
+_hash_spareindex(uint32 num_bucket)
+{
..
+ /*
+ * The first 4 bucket belongs to corresponding first 4 splitpoint phases.
+ */
+ if (num_bucket <= 4)
+ return (num_bucket - 1); /* converted to base 0. */
+ splitpoint_group = _hash_log2(num_bucket) - 2; /* The are 4 buckets in
..
+ /*
+ * bucket's global splitpoint phase = total number of split point phases
+ * until its splitpoint group - splitpoint phase within this splitpoint
+ * group but after buckets own splitpoint phase.
+ */
+ tbuckets = (1 << (splitpoint_group + 2));
+ phases_beyond_bucket =
+ (tbuckets - num_bucket) / (1 << (splitpoint_group - 1));
+ return (((splitpoint_group + 1) << 2) - phases_beyond_bucket) - 1;
+}

I am not sure if it is just a matter of better comments to explain it
in a simple way or maybe we can try to find some simpler mechanism to
group the split into four (or more) equal parts.  I think if someone
else can read and share their opinion it could be helpful.  Another
idea could be to make hashm_spares a two-dimensional array
hashm_spares[32][4] where the first dimension will indicate the split
point and second will indicate the sub-split number.  I am not sure
whether it will be simpler or complex than the method used in the
proposed patch, but I think we should think a bit more to see if we
can come up with some simple technique to solve this problem.

+ * allocate them at once. Each splitpoint group will have 4 slots, we
+ * distribute the buckets equally among them. So we allocate only one
+ * forth of total buckets in new splitpoint group at time to consume
+ * one phase after another.

spelling.
/forth/fourth
/at time/at a time

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-18 Thread Mithun Cy

On Thu, Mar 16, 2017 at 10:55 PM, David Steele  wrote:
> It does apply with fuzz on 2b32ac2, so it looks like c11453c and
> subsequent commits are the cause.  They represent a fairly substantial
> change to hash indexes by introducing WAL logging so I think you should
> reevaluate your patches to be sure they still function as expected.

Thanks, David here is the new improved patch I have also corrected the
pageinspect's test output. Also, added notes in README regarding the
new way of adding bucket pages efficiently in hash index. I also did
some more tests pgbench read only and read write;
There is no performance impact due to the patch. The growth of index
size has become much efficient as the numbers posted in the initial
proposal mail.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

expand_hashbucket_efficiently_03.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-16 Thread David Steele

On 2/21/17 4:58 AM, Mithun Cy wrote:
> Thanks, Amit
> 
> On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila  wrote:
>> How will high and lowmask calculations work in this new strategy?
>> Till now they always work on doubling strategy and I don't see you
>> have changed anything related to that code.  Check below places.
> 
> It is important that the mask has to be (2^x) -1, if we have to retain
> the same hash map function. So mask variables will take same values as
> before. Only place I think we need change is  _hash_metapinit();
> unfortunately, I did not test for the case where we build the hash
> index on already existing tuples. Now I have fixed in the latest
> patch.
> 
> 
>> Till now, we have worked hard for not changing the page format in a
>> backward incompatible way, so it will be better if we could find some
>> way of doing this without changing the meta page format in a backward
>> incompatible way.
> 
> We are not adding any new variable/ deleting some, we increase the
> size of hashm_spares and hence mapping functions should be adjusted.
> The problem is the block allocation, and its management is based on
> the fact that all of the buckets(will be 2^x in number) belonging to a
> particular split-point is allocated at once and together. The
> hashm_spares is used to record those allocations and that will be used
> further by map functions to reach a particular block in the file. If
> we want to change the way we allocate the buckets then hashm_spares
> will change and hence mapping function. So I do not think we can avoid
> incompatibility issue.
> 
> One thing I can think of is if we can increase the hashm_version of
> hash index; then for old indexes, we can continue to use doubling
> method and its mapping. For new indexes, we can use new way as above.
> 
> Have you considered to store some information in
>> shared memory based on which we can decide how much percentage of
>> buckets are allocated in current table half?  I think we might be able
>> to construct this information after crash recovery as well.
> 
> I think all of above data has to be persistent. I am not able to
> understand what should be/can be stored in shared buffers. Can you
> please correct me if I am wrong?

This patch does not apply at cccbdde:

$ patch -p1 < ../other/expand_hashbucket_efficiently_02
patching file src/backend/access/hash/hashovfl.c
Hunk #1 succeeded at 49 (offset 1 line).
Hunk #2 succeeded at 67 (offset 1 line).
patching file src/backend/access/hash/hashpage.c
Hunk #1 succeeded at 502 with fuzz 1 (offset 187 lines).
Hunk #2 succeeded at 518 with fuzz 2 (offset 168 lines).
Hunk #3 succeeded at 562 (offset 163 lines).
Hunk #4 succeeded at 744 (offset 124 lines).
Hunk #5 FAILED at 774.
Hunk #6 succeeded at 869 (offset 19 lines).
Hunk #7 succeeded at 1450 (offset 242 lines).
Hunk #8 succeeded at 1464 (offset 242 lines).
Hunk #9 succeeded at 1505 (offset 242 lines).
1 out of 9 hunks FAILED -- saving rejects to file
src/backend/access/hash/hashpage.c.rej
patching file src/backend/access/hash/hashutil.c
Hunk #1 succeeded at 150 (offset 1 line).
patching file src/include/access/hash.h
Hunk #2 succeeded at 180 (offset 12 lines).
Hunk #3 succeeded at 382 (offset 18 lines).

It does apply with fuzz on 2b32ac2, so it looks like c11453c and
subsequent commits are the cause.  They represent a fairly substantial
change to hash indexes by introducing WAL logging so I think you should
reevaluate your patches to be sure they still function as expected.

Marked "Waiting on Author".

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-02-21 Thread Mithun Cy

Thanks, Amit

On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila  wrote:
> How will high and lowmask calculations work in this new strategy?
> Till now they always work on doubling strategy and I don't see you
> have changed anything related to that code.  Check below places.

It is important that the mask has to be (2^x) -1, if we have to retain
the same hash map function. So mask variables will take same values as
before. Only place I think we need change is  _hash_metapinit();
unfortunately, I did not test for the case where we build the hash
index on already existing tuples. Now I have fixed in the latest
patch.

> Till now, we have worked hard for not changing the page format in a
> backward incompatible way, so it will be better if we could find some
> way of doing this without changing the meta page format in a backward
> incompatible way.

We are not adding any new variable/ deleting some, we increase the
size of hashm_spares and hence mapping functions should be adjusted.
The problem is the block allocation, and its management is based on
the fact that all of the buckets(will be 2^x in number) belonging to a
particular split-point is allocated at once and together. The
hashm_spares is used to record those allocations and that will be used
further by map functions to reach a particular block in the file. If
we want to change the way we allocate the buckets then hashm_spares
will change and hence mapping function. So I do not think we can avoid
incompatibility issue.

One thing I can think of is if we can increase the hashm_version of
hash index; then for old indexes, we can continue to use doubling
method and its mapping. For new indexes, we can use new way as above.

Have you considered to store some information in
> shared memory based on which we can decide how much percentage of
> buckets are allocated in current table half?  I think we might be able
> to construct this information after crash recovery as well.

I think all of above data has to be persistent. I am not able to
understand what should be/can be stored in shared buffers. Can you
please correct me if I am wrong?

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

expand_hashbucket_efficiently_02
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-02-20 Thread Amit Kapila

On Fri, Feb 17, 2017 at 7:21 PM, Mithun Cy  wrote:
>
> To implement such an incremental addition of bucket blocks I have to
> increase the size of array hashm_spares in meta page by four times.
> Also, mapping methods which map a total number of buckets to a
> split-point position of hashm_spares array, need to be changed. These
> changes create backward compatibility issues.
>

How will high and lowmask calculations work in this new strategy?
Till now they always work on doubling strategy and I don't see you
have changed anything related to that code.  Check below places.

_hash_metapinit()
{
..
/*
* We initialize the index with N buckets, 0 .. N-1, occupying physical
* blocks 1 to N.  The first freespace bitmap page is in block N+1. Since
* N is a power of 2, we can set the masks this way:
*/
metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
metap->hashm_highmask = (num_buckets << 1) - 1;
..
}

_hash_expandtable()
{
..
if (new_bucket > metap->hashm_highmask)
{
/* Starting a new doubling */
metap->hashm_lowmask = metap->hashm_highmask;
metap->hashm_highmask = new_bucket | metap->hashm_lowmask;
}
..
}

Till now, we have worked hard for not changing the page format in a
backward incompatible way, so it will be better if we could find some
way of doing this without changing the meta page format in a backward
incompatible way.  Have you considered to store some information in
shared memory based on which we can decide how much percentage of
buckets are allocated in current table half?  I think we might be able
to construct this information after crash recovery as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

37 matches

Mail list logo