Re: [HACKERS] [POC] hash partitioning

Ashutosh Bapat Wed, 17 May 2017 06:25:37 -0700

On Wed, May 17, 2017 at 2:07 PM, amul sul <sula...@gmail.com> wrote:

>
>> In partition_bounds_equal(), please add comments explaining why is it safe to
>> check just the indexes? May be we should add code under assertion to make 
>> sure
>> that the datums are equal as well.
>
> Added assert in the attached version.
>
>> The comment could be something
>> like, "If two partitioned tables have different greatest moduli, their
>> partition schemes don't match. If they have same greatest moduli, and
>> all remainders have different indexes, they all have same modulus
>> specified and the partitions are ordered by remainders, thus indexes
>> array will be an identity i.e. index[i] = i. If the partition
>> corresponding to a given remainder exists, it will have same index
>> entry for both partitioned tables or if it's missing it will be -1.
>> Thus if indexes array matches, corresponding datums array matches. If
>> there are multiple remainders corresponding to a given partition,
>> their partitions are ordered by the lowest of the remainders, thus if
>> indexes array matches, both of the tables have same indexes arrays, in
>> both the tables remainders corresponding to multiple partitions all
>> have same indexes and thus same modulus. Thus again if the indexes are
>> same, datums are same.".
>>
>
> Thanks, added with minor modification.


I have reworded this slightly better. See the attached patch as diff of 0002.

>
>> In the same function
>>     if (key->strategy == PARTITION_STRATEGY_HASH)
>>     {
>>         int            greatest_modulus;
>>
>>         /*
>>          * Compare greatest modulus of hash partition bound which
>>          * is the last element of datums array.
>>          */
>>         if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
>>             return false;
>>
>>         /* Compare indexes */
>>         greatest_modulus = DatumGetInt32(b1->datums[b1->ndatums - 1][0]);
>>         for (i = 0; i < greatest_modulus; i++)
>>             if (b1->indexes[i] != b2->indexes[i])
>>                 return false;
>>     }
>> if we return true from where this block ends, we will save one indenation 
>> level
>> for rest of the code and also FWIW extra diffs in this patch because of this
>> indentation change.
>>
>
> I still do believe having this code in the IF - ELSE block will be
> better for longterm, rather having code clutter to avoid diff that
> unpleasant for now.

Ok, I will leave it to the committer to judge.


Comments on the tests
+#ifdef USE_ASSERT_CHECKING
+        {
+            /*
+             * Hash partition bound stores modulus and remainder at
+             * b1->datums[i][0] and b1->datums[i][1] position respectively.
+             */
+            for (i = 0; i < b1->ndatums; i++)
+                Assert((b1->datums[i][0] == b2->datums[i][0] &&
+                        b1->datums[i][1] == b2->datums[i][1]));
+        }
+#endif
Why do we need extra {} here?

Comments on testcases
+CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH
(modulus 8, remainder 0);
+CREATE TABLE fail_part (LIKE hpart_1 INCLUDING CONSTRAINTS);
+ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH
(modulus 4, remainder 0);
Probably you should also test the other-way round case i.e. create modulus 4,
remainder 0 partition and then try to add partitions with modulus 8, remainder
4 and modulus 8, remainder 0. That should fail.

Why to create two tables hash_parted and hash_parted2, you should be able to
test with only a single table.

+INSERT INTO hpart_2 VALUES (3, 'a');
+DELETE FROM hpart_2;
+INSERT INTO hpart_5_a (a, b) VALUES (6, 'a');
This is slightly tricky. On different platforms the row may map to different
partitions depending upon how the values are hashed. So, this test may not be
portable on all the platforms. Probably you should add such testcases with a
custom hash operator class which is identity function as suggested by Robert.
This also applies to the tests in insert.sql and update.sql for partitioned
table without custom opclass.

+-- delete the faulting row and also add a constraint to skip the scan
+ALTER TABLE hpart_5 ADD CONSTRAINT hcheck_a CHECK (a IN (5)), ALTER a
SET NOT NULL;
The constraint is not same as the implicit constraint added for that partition.
I am not sure whether it's really going to avoid the scan. Did you verify it?
If yes, then how?

+ALTER TABLE hash_parted2 ATTACH PARTITION fail_part FOR VALUES WITH
(modulus 3, remainder 2);
+ERROR:  every hash partition modulus must be a factor of the next
larger modulus
We should add this test with at least two partitions in there so that we can
check lower and upper modulus. Also, testing with some interesting
bounds discussed earlier
in this mail e.g. adding modulus 15 when 5, 10, 60 exist will be better than
testing with 3, 4 and 8.

+ERROR:  cannot use collation for hash partition key column "a"
This seems to indicate that we can not specify collation for hash partition key
column, which isn't true. Column a here can have its collation. What's not
allowed is specifying collation in PARTITION BY clause.
May be reword the error as "cannot use collation for hash partitioning". or
plain "cannot use collation in PARTITION BY clause for hash partitioning".

+ERROR:  invalid bound specification for a list partition
+LINE 1: CREATE TABLE fail_part PARTITION OF list_parted FOR VALUES W...
+                                                        ^
Should the location for this error be that of WITH clause like in case of range
and list partitioned table.

+select tableoid::regclass as part, a from hash_parted order by part;
May be add a % 4 to show clearly that the data really goes to the partitioning
with that remainder.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5b201d6..15d6170 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -711,32 +711,27 @@ partition_bounds_equal(PartitionKey key,
 		int			greatest_modulus;
 
 		/*
-		 * Compare greatest modulus of hash partition bound which
-		 * is the last element of datums array.
+		 * If two hash partitioned tables have different greatest moduli or
+		 * same moduli with different number of partitions, their partition
+		 * schemes don't match. For hash partitioned table, the greatest
+		 * modulus is given by the last datum and number of partitions is given
+		 * by ndatums.
 		 */
 		if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
 			return false;
 
-		/* Compare number of partitions */
 		if (b1->ndatums != b2->ndatums)
 			return false;
 
 		/*
-		 * If two hash partitioned tables have different greatest moduli or
-		 * same moduli with different number of partitions, their partition
-		 * schemes don't match.  If they have same greatest moduli, and number
-		 * of partitions, they all have same hash partition bound i.e. modulus
-		 * and remainder, and the partitions are ordered by modulus, then by
-		 * remainders, thus indexes array will be an identity i.e. index[i] = i.
-		 * If the partition corresponding to a given remainder exists, it will
-		 * have same index entry for both partitioned tables or if it's missing
-		 * it will be -1.  Thus if indexes array matches, corresponding datums
-		 * array matches. If there are multiple remainders corresponding to a
-		 * given partition, their partitions are ordered by the hash partition
-		 * bound, thus if indexes array matches, both of the tables have same
-		 * indexes arrays, in both the tables remainders corresponding to
-		 * multiple partitions all have same hash partition bound and thus same
-		 * modulus. Thus again if the indexes are same, datums are same.
+		 * We arrange the partitions in the ascending order of their modulus
+		 * and remainders. Also every modulus is factor of next larger modulus.
+		 * This means that the index of a given partition is same as the
+		 * remainder of that partition. Also entries at (remainder + N *
+		 * modulus) positions in indexes array are all same for (modulus,
+		 * remainder) specification for any partition. Thus datums array from
+		 * both the given bounds are same, if and only if their indexes array
+		 * will be same. So, it suffices to compare indexes array.
 		 */
 		greatest_modulus = DatumGetInt32(b1->datums[b1->ndatums - 1][0]);
 		for (i = 0; i < greatest_modulus; i++)
@@ -745,9 +740,12 @@ partition_bounds_equal(PartitionKey key,
 
 #ifdef USE_ASSERT_CHECKING
 		{
+
 			/*
-			 * Hash partition bound stores modulus and remainder at
-			 * b1->datums[i][0] and b1->datums[i][1] position respectively.
+			 * Nonetheless make sure that the bounds are indeed same when the
+			 * indexes match.  Hash partition bound stores modulus and
+			 * remainder at b1->datums[i][0] and b1->datums[i][1] position
+			 * respectively.
 			 */
 			for (i = 0; i < b1->ndatums; i++)
 				Assert((b1->datums[i][0] == b2->datums[i][0] &&
diff --git a/src/test/regress/sql/create_table.sql b/src/test/regress/sql/create_table.sql
index 834cba2..06ee5f0 100644
--- a/src/test/regress/sql/create_table.sql
+++ b/src/test/regress/sql/create_table.sql
@@ -320,9 +320,8 @@ CREATE TABLE partitioned (
 	a int
 ) PARTITION BY RANGE (a, a);
 
--- cannot have collation for hash partition key column (although grammar allows).
--- Since hash opclasses provide only equality, not ordering, so that collation
--- is irrelevant here.
+-- Since hash opclasses provide only equality and not ordering, collation
+-- is irrelevant for hash partitioning.
 CREATE TABLE partitioned (
 	a text
 ) PARTITION BY HASH (a collate "C");

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

Reply via email to