On Wed, Mar 1, 2017 at 3:50 PM, Yugo Nagata <nag...@sraoss.co.jp> wrote:
> [....] > > I Agree that it is unavoidable partitions number in modulo hashing, > > but we can do in other hashing technique. Have you had thought about > > Linear hashing[1] or Consistent hashing[2]? This will allow us to > > add/drop > > partition with minimal row moment. > > Thank you for your information of hash technique. I'll see them > and try to allowing the number of partitions to be changed. > > Thanks for showing interest, I was also talking about this with Robert Haas and hacking on this, here is what we came up with this. If we want to introduce hash partitioning without syntax contort and minimal movement while changing hash partitions (ADD-DROP/ATTACH-DETACH operation), at start I thought we could pick up linear hashing, because of in both the hashing we might need to move approx tot_num_of_tuple/tot_num_of_partitions tuples at adding new partition and no row moment required at dropping partitioning. With further thinking and talking through the idea of using linear hashing with my team, we realized that has some problems specially during pg_dump and pg_upgrade. Both a regular pg_dump and the binary-upgrade version of pg_dump which is used by pg_restore need to maintain the identity of the partitions. We can't rely on things like OID order which may be unstable, or name order which might not match the order in which partitions were added. So somehow the partition position would need to be specified explicitly. So later we came up with some syntax like this (just fyi, this doesn't add any new keywords): create table foo (a integer, b text) partition by hash (a); create table foo1 partition of foo with (modulus 4, remainder 0); create table foo2 partition of foo with (modulus 8, remainder 1); -- legal, modulus doesn't need to match create table foo3 partition of foo with (modulus 8, remainder 4); -- illegal, overlaps foo1 Here we need to enforce a rule that every modulus must be a factor of the next larger modulus. So, for example, if you have a bunch of partitions that all have modulus 5, you can add a new partition with modulus 10 or a new partition with modulus 15, but you cannot add both a partition with modulus 10 and a partition with modulus 15, because 10 is not a factor of 15. However, you could simultaneously use modulus 4, modulus 8, modulus 16, and modulus 32 if you wished, because each modulus is a factor of the next larger one. You could also use modulus 10, modulus 20, and modulus 60. But you could not use modulus 10, modulus 15, and modulus 60, because while both of the smaller module are factors of 60, it is not true that each is a factor of the next. Other advantages with this rule are: 1. Dropping (or detaching) and adding (or attaching) a partition can never cause the rule to be violated. 2. We can easily build a tuple-routing data structure based on the largest modulus. For example: If the user has partition 1 with (modulus 2, remainder 1), partition 2 with (modulus 4, remainder 2), partition 3 with (modulus 8, remainder 0) and partition 4 with (modulus 8, remainder 4), then we can build the following tuple routing array in the relcache: == lookup table for hashvalue % 8 == 0 => p3 1 => p1 2 => p2 3 => p1 4 => p4 5 => p1 6 => p2 7 => p1 3. It's also quite easy to test with a proposed new partition overlaps with any existing partition. Just build the mapping array and see if you ever end up trying to assign a partition to a slot that's already been assigned to some other partition. We can still work on the proposed syntax - and I am open for suggestions. One more thought is to use FOR VALUES HAVING like: CREATE TABLE foo1 PARTITION OF foo FOR VALUES HAVING (modulus 2, remainder 1); But still more thoughts/inputs welcome here. Attached patch implements former syntax, here is quick demonstration: 1.CREATE : create table foo (a integer, b text) partition by hash (a); create table foo1 partition of foo with (modulus 2, remainder 1); create table foo2 partition of foo with (modulus 4, remainder 2); create table foo3 partition of foo with (modulus 8, remainder 0); create table foo4 partition of foo with (modulus 8, remainder 4); 2. Display parent table info: postgres=# \d+ foo Table "public.foo" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+----------+--------------+------------- a | integer | | | | plain | | b | text | | | | extended | | Partition key: HASH (a) Partitions: foo1 WITH (modulus 2, remainder 1), foo2 WITH (modulus 4, remainder 2), foo3 WITH (modulus 8, remainder 0), foo4 WITH (modulus 8, remainder 4) 3. Display child table info: postgres=# \d+ foo1 Table "public.foo1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+----------+--------------+------------- a | integer | | | | plain | | b | text | | | | extended | | Partition of: foo WITH (modulus 2, remainder 1) 4. INSERT: postgres=# insert into foo select i, 'abc' from generate_series(1,10) i; INSERT 0 10 postgres=# select tableoid::regclass as part, * from foo; part | a | b ------+----+----- foo1 | 3 | abc foo1 | 4 | abc foo1 | 7 | abc foo1 | 10 | abc foo2 | 1 | abc foo2 | 2 | abc foo2 | 9 | abc foo3 | 6 | abc foo4 | 5 | abc foo4 | 8 | abc (10 rows) TODOs. 1. Maybe need some work in the CREATE TABLE .. PARTITION OF .. syntax. 2. Trim regression tests (if require). 3. Documentation Thoughts/Comments?
hash-partitioning_another_design-v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers