Re: [HACKERS] [POC] hash partitioning

2017-11-09 Thread amul sul
On Fri, Nov 10, 2017 at 4:41 AM, Robert Haas  wrote:
> On Wed, Nov 1, 2017 at 6:16 AM, amul sul  wrote:
>> Fixed in the 0003 patch.
>
> I have committed this patch set with the attached adjustments.
>

Thanks a lot for your support & a ton of thanks to all reviewer.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-11-09 Thread Robert Haas
On Wed, Nov 1, 2017 at 6:16 AM, amul sul  wrote:
> Fixed in the 0003 patch.

I have committed this patch set with the attached adjustments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


hash-adjustments.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-11-02 Thread Robert Haas
On Thu, Nov 2, 2017 at 1:52 PM, Ashutosh Bapat
 wrote:
> Right now partition keys are immutable but we don't have much code
> written with that assumption. All the code usually keeps a lock on the
> parent till the time they use the information in the partition key. In
> a distant future, which may not exist, we may support ALTER TABLE ...
> PARTITION BY to change partition keys (albeit at huge cost of data
> movement). If we do that, we will have to remember this one-off
> instance of code which assumes that the partition keys are immutable.

I am pretty sure this is by no means the only piece of code which assumes that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-11-02 Thread Robert Haas
On Thu, Nov 2, 2017 at 1:45 PM, amul sul  wrote:
> Yes, you are correct, column involved in the partitioning are immutable.
>
> I was just worried about any change in the partition key column that
> might change selected hash function.

Any such change, even if it were allowed, would have to take
AccessExclusiveLock on the child.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-11-02 Thread Ashutosh Bapat
On Thu, Nov 2, 2017 at 1:35 PM, Robert Haas  wrote:
> On Wed, Nov 1, 2017 at 3:46 PM, amul sul  wrote:
>> Although partition constraints become more simple, there isn't any 
>> performance
>> gain with 0005 patch. Also I am little skeptic about logic in 0005 where we
>> copied extended hash function info from the partition key, what if parent is
>> changed while we are using it? Do we need to keep lock on parent until 
>> commit in
>> satisfies_hash_partition?
>
> I don't think it should be possible for the parent to be changed.  I
> mean, the partition key is altogether immutable -- it can't be changed
> after creation time.  The partition bounds can be changed for
> individual partitions but that would require a lock on the partition.
>
> Can you give an example of the kind of scenario about which you are concerned?

Right now partition keys are immutable but we don't have much code
written with that assumption. All the code usually keeps a lock on the
parent till the time they use the information in the partition key. In
a distant future, which may not exist, we may support ALTER TABLE ...
PARTITION BY to change partition keys (albeit at huge cost of data
movement). If we do that, we will have to remember this one-off
instance of code which assumes that the partition keys are immutable.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-11-02 Thread amul sul
On Thu, Nov 2, 2017 at 1:35 PM, Robert Haas  wrote:
> On Wed, Nov 1, 2017 at 3:46 PM, amul sul  wrote:
>> Although partition constraints become more simple, there isn't any 
>> performance
>> gain with 0005 patch. Also I am little skeptic about logic in 0005 where we
>> copied extended hash function info from the partition key, what if parent is
>> changed while we are using it? Do we need to keep lock on parent until 
>> commit in
>> satisfies_hash_partition?
>
> I don't think it should be possible for the parent to be changed.  I
> mean, the partition key is altogether immutable -- it can't be changed
> after creation time.  The partition bounds can be changed for
> individual partitions but that would require a lock on the partition.
>
> Can you give an example of the kind of scenario about which you are concerned?
>

Yes, you are correct, column involved in the partitioning are immutable.

I was just worried about any change in the partition key column that
might change selected hash function.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-11-02 Thread Robert Haas
On Wed, Nov 1, 2017 at 3:46 PM, amul sul  wrote:
> Although partition constraints become more simple, there isn't any performance
> gain with 0005 patch. Also I am little skeptic about logic in 0005 where we
> copied extended hash function info from the partition key, what if parent is
> changed while we are using it? Do we need to keep lock on parent until commit 
> in
> satisfies_hash_partition?

I don't think it should be possible for the parent to be changed.  I
mean, the partition key is altogether immutable -- it can't be changed
after creation time.  The partition bounds can be changed for
individual partitions but that would require a lock on the partition.

Can you give an example of the kind of scenario about which you are concerned?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-30 Thread amul sul
On Tue, Oct 31, 2017 at 9:54 AM, Robert Haas  wrote:
> On Mon, Oct 30, 2017 at 5:52 PM, amul sul  wrote:
>> Actually, int4[] is also inappropriate type as we have started using a 64bit
>> hash function.  We need something int8[] which is not available, so that I
>> have used ANYARRAYOID in the attached patch(0004).
>
> I don't know why you think int8[] is not available.
>
> rhaas=# select 'int8[]'::regtype;
>  regtype
> --
>  bigint[]
> (1 row)
>

I missed _int8, was searching for INT8ARRAYOID in pg_type.h, my bad.

>>> I wrote the following query
>>> to detect problems of this type, and I think we might want to just go
>>> ahead and add this to the regression test suite, verifying that it
>>> returns no rows:
>>>
>>> select oid::regprocedure, provariadic::regtype, proargtypes::regtype[]
>>> from pg_proc where provariadic != 0
>>> and case proargtypes[array_length(proargtypes, 1)-1]
>>> when 2276 then 2276 -- any -> any
>>> when 2277 then 2283 -- anyarray -> anyelement
>>> else (select t.oid from pg_type t where t.typarray =
>>> proargtypes[array_length(proargtypes, 1)-1]) end
>>> != provariadic;
>>>
>>
>> Added in 0001 patch.
>
> Committed.
>

Thanks !

>> One advantage of current implementation is that we can see which hash
>> function are used for the each partitioning column and also we don't need to
>> worry about user specified opclass and different input types.
>>
>> Something similar I've tried in my initial patch version[1], but I have 
>> missed
>> user specified opclass handling for each partitioning column.  Do you want me
>> to handle opclass using RelabelType node? I am afraid that, that would make
>> the \d+ output more horrible than the current one if non-default opclass 
>> used.
>
> Maybe we should just pass the OID of the partition (or both the
> partition and the parent, so we can get the lock ordering right?)
> instead.
>
Okay, will try this.


Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-30 Thread Robert Haas
On Mon, Oct 30, 2017 at 5:52 PM, amul sul  wrote:
> Actually, int4[] is also inappropriate type as we have started using a 64bit
> hash function.  We need something int8[] which is not available, so that I
> have used ANYARRAYOID in the attached patch(0004).

I don't know why you think int8[] is not available.

rhaas=# select 'int8[]'::regtype;
 regtype
--
 bigint[]
(1 row)

>> I wrote the following query
>> to detect problems of this type, and I think we might want to just go
>> ahead and add this to the regression test suite, verifying that it
>> returns no rows:
>>
>> select oid::regprocedure, provariadic::regtype, proargtypes::regtype[]
>> from pg_proc where provariadic != 0
>> and case proargtypes[array_length(proargtypes, 1)-1]
>> when 2276 then 2276 -- any -> any
>> when 2277 then 2283 -- anyarray -> anyelement
>> else (select t.oid from pg_type t where t.typarray =
>> proargtypes[array_length(proargtypes, 1)-1]) end
>> != provariadic;
>>
>
> Added in 0001 patch.

Committed.

> One advantage of current implementation is that we can see which hash
> function are used for the each partitioning column and also we don't need to
> worry about user specified opclass and different input types.
>
> Something similar I've tried in my initial patch version[1], but I have missed
> user specified opclass handling for each partitioning column.  Do you want me
> to handle opclass using RelabelType node? I am afraid that, that would make
> the \d+ output more horrible than the current one if non-default opclass used.

Maybe we should just pass the OID of the partition (or both the
partition and the parent, so we can get the lock ordering right?)
instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-29 Thread Robert Haas
On Tue, Oct 24, 2017 at 1:21 PM, amul sul  wrote:
> Updated patch attached.

This patch needs a rebase.

It appears that satisfies_hash_func is declared incorrectly in
pg_proc.h.  ProcedureCreate seems to think that provariadic should be
ANYOID if the type of the last element is ANYOID, ANYELEMENTOID if the
type of the last element is ANYARRAYOID, and otherwise the element
type corresponding to the array type.   But here you have the last
element as int4[] but provariadic is any.  I wrote the following query
to detect problems of this type, and I think we might want to just go
ahead and add this to the regression test suite, verifying that it
returns no rows:

select oid::regprocedure, provariadic::regtype, proargtypes::regtype[]
from pg_proc where provariadic != 0
and case proargtypes[array_length(proargtypes, 1)-1]
when 2276 then 2276 -- any -> any
when 2277 then 2283 -- anyarray -> anyelement
else (select t.oid from pg_type t where t.typarray =
proargtypes[array_length(proargtypes, 1)-1]) end
!= provariadic;

The simple fix is change provariadic to int4 and call it good.  It's
tempting to go the other way and actually make it
satisfies_hash_partition(int4, int4, variadic "any"), passing the
column values directly and letting satisfies_hash_partition doing the
hashing itself.  Any arguments that had a partition key type different
from the column type would have a RelabelType node placed on top of
the column, so that get_fn_expr_argtype would return the partition key
type.  Then, the function could look up the hash function for that
type and call it directly on the value.  That way, we'd be doing only
one function call instead of many, and the partition constraint would
look nicer in \d+ output, too.  :-)  On the other hand, that would
also mean that we'd have to look up the extended hash function every
time through this function, though maybe that could be prevented by
using fn_extra to cache FmgrInfos for all the hash functions on the
first time through.  I'm not sure how that would compare in terms of
speed with what you have now, but maybe it's worth trying.

The second paragraph of the CREATE TABLE documentation for PARTITION
OF needs to be updated like this: "The form with IN
is used for list partitioning, the form with FROM
and TO is used for range partitioning, and the form
with WITH is used for hash partitioning."

The CREATE TABLE documentation says "When using range partitioning,
the partition key can include multiple columns or expressions (up to
32,"; this should be changed to say "When using range or hash
partitioning".

-  expression.  If no B-tree operator class is specified when creating a
-  partitioned table, the default B-tree operator class for the
datatype will
-  be used.  If there is none, an error will be reported.
+  expression.  If no operator class is specified when creating a
partitioned
+  table, the default operator class of the appropriate type (btree for list
+  and range partitioning, hash for hash partitioning) will be used.  If
+  there is none, an error will be reported.
+ 
+
+ 
+  Since hash operator class provides only equality, not ordering, collation
+  is not relevant for hash partitioning. The behaviour will be unaffected
+  if a collation is specified.
+ 
+
+ 
+  Hash partitioning will use support function 2 routines from the operator
+  class. If there is none, an error will be reported.  See  for details of operator class support
+  functions.

I think we should rework this a little more heavily.  I suggest the
following, starting after "a single column or expression":


Range and list partitioning require a btree operator class, while hash
partitioning requires a hash operator class.  If no operator class is
specified explicitly, the default operator class of the appropriate
type will be used; if no default operator class exists, an error will
be raised.  When hash partitioning is used, the operator class used
must implement support function 2 (see 
for details).


I think we can leave out the part about collations.  It's possibly
worth a longer explanation here at some point: for range partitioning,
collation can affect which rows go into which partitions; for list
partitioning, it can't, but it can affect the order in which
partitions are expanded (which is a can of worms I'm not quite ready
to try to explain in user-facing documentation); for hash
partitioning, it makes no difference at all.  Although at some point
we may want to document this, I think it's a job for a separate patch,
since (1) the existing documentation doesn't document the precise
import of collations on existing partitioning types and (2) I'm not
sure that CREATE TABLE is really the best place to explain this.

The example commands for creating a hash-partitioned table are missing
spaces between WITH and the parenthesis which follows.

In 0003, the changes to partition_bounds_copy 

Re: [HACKERS] [POC] hash partitioning

2017-10-24 Thread amul sul
On Tue, Oct 24, 2017 at 5:00 PM, Andres Freund  wrote:
> On 2017-10-24 12:43:12 +0530, amul sul wrote:
>> I tried to get suggested SMHasher[1] test result for the hash_combine
>> for 32-bit and 64-bit version.
>>
>> SMHasher works on hash keys of the form {0}, {0,1}, {0,1,2}... up to
>> N=255, using 256-N as the seed, for the hash_combine testing we
>> needed two hash value to be combined, for that, I've generated 64
>> and 128-bit hash using cityhash functions[2] for the given smhasher
>> key then split in two part to test 32-bit and 64-bit hash_combine
>> function respectively.   Attached patch for SMHasher code changes &
>> output of 32-bit and 64-bit hash_combine testing. Note that I have
>> skipped speed test this test which is irrelevant here.
>>
>> By referring other hash function results [3], we can see that hash_combine
>> test results are not bad either.
>>
>> Do let me know if current testing is not good enough or if you want me to do
>> more testing, thanks.
>
> This looks very good! Both the tests you did, and the results for
> hash_combineXX. I therefore think we can go ahead with that formulation
> of hash_combine64?
>

Thanks, Andres. Yes we can, I've added your suggested hash_combine64 in
the latest patch[1].

Regards,
Amul

1] 
https://postgr.es/m/CAAJ_b97R2rJinGPAVmZZzpNV%3D-5BgYFxDfY9HYdM1bCYJFGmQw%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-24 Thread Andres Freund
On 2017-10-24 12:43:12 +0530, amul sul wrote:
> I tried to get suggested SMHasher[1] test result for the hash_combine
> for 32-bit and 64-bit version.
> 
> SMHasher works on hash keys of the form {0}, {0,1}, {0,1,2}... up to
> N=255, using 256-N as the seed, for the hash_combine testing we
> needed two hash value to be combined, for that, I've generated 64
> and 128-bit hash using cityhash functions[2] for the given smhasher
> key then split in two part to test 32-bit and 64-bit hash_combine
> function respectively.   Attached patch for SMHasher code changes &
> output of 32-bit and 64-bit hash_combine testing. Note that I have
> skipped speed test this test which is irrelevant here.
> 
> By referring other hash function results [3], we can see that hash_combine
> test results are not bad either.
> 
> Do let me know if current testing is not good enough or if you want me to do
> more testing, thanks.

This looks very good! Both the tests you did, and the results for
hash_combineXX. I therefore think we can go ahead with that formulation
of hash_combine64?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-24 Thread amul sul
On Fri, Oct 13, 2017 at 3:00 AM, Andres Freund  wrote:
> On 2017-10-12 17:27:52 -0400, Robert Haas wrote:
>> On Thu, Oct 12, 2017 at 4:20 PM, Andres Freund  wrote:
>> >> In other words, it's not utterly fixed in stone --- we invented
>> >> --load-via-partition-root primarily to cope with circumstances that
>> >> could change hash values --- but we sure don't want to be changing it
>> >> with any regularity, or for a less-than-excellent reason.
>> >
>> > Yea, that's what I expected. It'd probably good for somebody to run
>> > smhasher or such on the output of the combine function (or even better,
>> > on both the 32 and 64 bit variants) in that case.
>>
>> Not sure how that test suite works exactly, but presumably the
>> characteristics in practice will depend the behavior of the hash
>> functions used as input the combine function - so the behavior could
>> be good for an (int, int) key but bad for a (text, date) key, or
>> whatever.
>
> I don't think that's true, unless you have really bad hash functions on
> the the component hashes. A hash combine function can't really do
> anything about badly hashed input, what you want is that it doesn't
> *reduce* the quality of the hash by combining.
>

I tried to get suggested SMHasher[1] test result for the hash_combine
for 32-bit and 64-bit version.

SMHasher works on hash keys of the form {0}, {0,1}, {0,1,2}... up to
N=255, using 256-N as the seed, for the hash_combine testing we
needed two hash value to be combined, for that, I've generated 64
and 128-bit hash using cityhash functions[2] for the given smhasher
key then split in two part to test 32-bit and 64-bit hash_combine
function respectively.   Attached patch for SMHasher code changes &
output of 32-bit and 64-bit hash_combine testing. Note that I have
skipped speed test this test which is irrelevant here.

By referring other hash function results [3], we can see that hash_combine
test results are not bad either.

Do let me know if current testing is not good enough or if you want me to do
more testing, thanks.

1] https://github.com/aappleby/smhasher
2] https://github.com/aappleby/smhasher/blob/master/src/CityTest.cpp
3] https://github.com/rurban/smhasher/tree/master/doc

Regards,
Amul
[amul.sul@power2 src]$ ./SMHasher hash_combine64
---
--- Testing hash_combine64 (test hash combine 64 pg function.)

[[[ Sanity Tests ]]]   

Verification value 0x3B439A64 : Passed!
Running sanity check 1..PASS
Running sanity check 2..PASS

[[[ Differential Tests ]]]

Testing 8303632 up-to-5-bit differentials in 64-bit keys -> 64 bit hashes.
1000 reps, 8303632000 total tests, expecting 0.00 random collisions..
0 total collisions, of which 0 single collisions were ignored

Testing 11017632 up-to-4-bit differentials in 128-bit keys -> 64 bit hashes.
1000 reps, 11017632000 total tests, expecting 0.00 random collisions..
0 total collisions, of which 0 single collisions were ignored

Testing 2796416 up-to-3-bit differentials in 256-bit keys -> 64 bit hashes.
1000 reps, 2796416000 total tests, expecting 0.00 random collisions..
0 total collisions, of which 0 single collisions were ignored


[[[ Avalanche Tests ]]]

Testing  32-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.742667%
Testing  40-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.684667%
Testing  48-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.570667%
Testing  56-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.726667%
Testing  64-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.67%
Testing  72-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.659333%
Testing  80-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.666000%
Testing  88-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.712000%
Testing  96-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.632667%
Testing 104-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.646000%
Testing 112-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.746000%
Testing 120-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.692667%
Testing 128-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.684000%
Testing 136-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.758667%
Testing 144-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.725333%
Testing 152-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.776000%

[[[ Keyset 'Cyclic' Tests ]]]

Keyset 'Cyclic' - 8 cycles of 8 bytes - 1000 keys
Testing collisions   - Expected 0.00, actual 0.00 ( 0.00x)
Testing distribution - Worst bias is the  20-bit window at bit   6 - 0.039%

Keyset 'Cyclic' - 8 cycles of 9 bytes 

Re: [HACKERS] [POC] hash partitioning

2017-10-16 Thread Ashutosh Bapat
On Mon, Oct 16, 2017 at 2:36 PM, Ashutosh Bapat
 wrote:

>
> Probably we should move changes to partition_bounds_copy() in 0003 to
> 0001, whose name suggests so.
>

We can't do this, hash partition strategy is introduced by 0002. Sorry
for the noise.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-16 Thread Ashutosh Bapat
On Tue, Oct 10, 2017 at 4:37 PM, amul sul  wrote:
> On Tue, Oct 10, 2017 at 3:42 PM, Ashutosh Bapat
>  wrote:
>> On Tue, Oct 10, 2017 at 3:32 PM, amul sul  wrote:
>>
 +hash_part? true : 
 key->parttypbyval[j],
 +key->parttyplen[j]);
 parttyplen is the length of partition key attribute, whereas what you want 
 here
 is the length of type of modulus and remainder. Is that correct? Probably 
 we
 need some special handling wherever parttyplen and parttypbyval is used 
 e.g. in
 call to partition_bounds_equal() from build_joinrel_partition_info().

>>>
>>> Unless I am missing something, I don't think we should worry about 
>>> parttyplen
>>> because in the datumCopy() when the datatype is pass-by-value then typelen
>>> is ignored.
>>
>> That's true, but it's ugly, passing typbyvalue of one type and len of other.
>>
>
> How about the attached patch(0003)?
> Also, the dim variable is renamed to natts.

Probably we should move changes to partition_bounds_copy() in 0003 to
0001, whose name suggests so.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-12 Thread Andres Freund
On 2017-10-12 17:27:52 -0400, Robert Haas wrote:
> On Thu, Oct 12, 2017 at 4:20 PM, Andres Freund  wrote:
> >> In other words, it's not utterly fixed in stone --- we invented
> >> --load-via-partition-root primarily to cope with circumstances that
> >> could change hash values --- but we sure don't want to be changing it
> >> with any regularity, or for a less-than-excellent reason.
> >
> > Yea, that's what I expected. It'd probably good for somebody to run
> > smhasher or such on the output of the combine function (or even better,
> > on both the 32 and 64 bit variants) in that case.
> 
> Not sure how that test suite works exactly, but presumably the
> characteristics in practice will depend the behavior of the hash
> functions used as input the combine function - so the behavior could
> be good for an (int, int) key but bad for a (text, date) key, or
> whatever.

I don't think that's true, unless you have really bad hash functions on
the the component hashes. A hash combine function can't really do
anything about badly hashed input, what you want is that it doesn't
*reduce* the quality of the hash by combining.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-12 Thread Robert Haas
On Thu, Oct 12, 2017 at 4:20 PM, Andres Freund  wrote:
>> In other words, it's not utterly fixed in stone --- we invented
>> --load-via-partition-root primarily to cope with circumstances that
>> could change hash values --- but we sure don't want to be changing it
>> with any regularity, or for a less-than-excellent reason.
>
> Yea, that's what I expected. It'd probably good for somebody to run
> smhasher or such on the output of the combine function (or even better,
> on both the 32 and 64 bit variants) in that case.

Not sure how that test suite works exactly, but presumably the
characteristics in practice will depend the behavior of the hash
functions used as input the combine function - so the behavior could
be good for an (int, int) key but bad for a (text, date) key, or
whatever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-12 Thread Andres Freund
On 2017-10-12 16:06:11 -0400, Robert Haas wrote:
> On Thu, Oct 12, 2017 at 3:43 PM, Andres Freund  wrote:
> > Are we going to rely on the the combine function to stay the same
> > forever after?
> 
> If we change them, it will be a pg_upgrade compatibility break for
> anyone using hash-partitioned tables with more than one partitioning
> column.  Dump and reload will also break unless
> --load-via-partition-root is used.
> 
> In other words, it's not utterly fixed in stone --- we invented
> --load-via-partition-root primarily to cope with circumstances that
> could change hash values --- but we sure don't want to be changing it
> with any regularity, or for a less-than-excellent reason.

Yea, that's what I expected. It'd probably good for somebody to run
smhasher or such on the output of the combine function (or even better,
on both the 32 and 64 bit variants) in that case.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-12 Thread Robert Haas
On Thu, Oct 12, 2017 at 3:43 PM, Andres Freund  wrote:
> Are we going to rely on the the combine function to stay the same
> forever after?

If we change them, it will be a pg_upgrade compatibility break for
anyone using hash-partitioned tables with more than one partitioning
column.  Dump and reload will also break unless
--load-via-partition-root is used.

In other words, it's not utterly fixed in stone --- we invented
--load-via-partition-root primarily to cope with circumstances that
could change hash values --- but we sure don't want to be changing it
with any regularity, or for a less-than-excellent reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-12 Thread Andres Freund
On 2017-10-12 10:05:26 -0400, Robert Haas wrote:
> On Thu, Oct 12, 2017 at 9:08 AM, amul sul  wrote:
> > How about combining high 32 bits and the low 32 bits separately as shown 
> > below?
> >
> > static inline uint64
> > hash_combine64(uint64 a, uint64 b)
> > {
> > return (((uint64) hash_combine((uint32) a >> 32, (uint32) b >> 32) << 
> > 32)
> > | hash_combine((unit32) a, (unit32) b));
> > }
> 
> I doubt that's the best approach, but I don't have something specific
> to recommend.

Yea, that doesn't look great. There's basically no intermixing between
low and high 32 bits. going on.  We probably should just expand the
concept of the 32 bit function:

static inline uint32
hash_combine32(uint32 a, uint32 b)
{
/* 0x9e3779b9 is the golden ratio reciprocal */
a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2);
return a;
}

to something roughly like:

static inline uint64
hash_combine64(uint64 a, uint64 b)
{
/* 0x49A0F4DD15E5A8E3 is 64bit random data */
a ^= b + 0x49A0F4DD15E5A8E3 + (a << 54) + (a >> 7);
return a;
}

In contrast to the 32 bit version's fancy use of the golden ratio
reciprocal as a constant I went brute force, and just used 64bit of
/dev/random. From my understanding the important property is that bits
are independent from each other, nothing else.

The shift widths are fairly random, but they should bring in enough bit
perturbation when mixing in only 32bit of hash value (i.e
0x).

Are we going to rely on the the combine function to stay the same
forever after?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-12 Thread Robert Haas
On Thu, Oct 12, 2017 at 9:08 AM, amul sul  wrote:
> How about combining high 32 bits and the low 32 bits separately as shown 
> below?
>
> static inline uint64
> hash_combine64(uint64 a, uint64 b)
> {
> return (((uint64) hash_combine((uint32) a >> 32, (uint32) b >> 32) << 32)
> | hash_combine((unit32) a, (unit32) b));
> }

I doubt that's the best approach, but I don't have something specific
to recommend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-12 Thread amul sul
On Thu, Oct 12, 2017 at 6:31 AM, Robert Haas  wrote:
> On Tue, Oct 10, 2017 at 7:07 AM, amul sul  wrote:
>> How about the attached patch(0003)?
>> Also, the dim variable is renamed to natts.
>
> I'm not sure I believe this comment:
>
> +/*
> + * We arrange the partitions in the ascending order of their modulus
> + * and remainders.  Also every modulus is factor of next larger
> + * modulus.  This means that the index of a given partition is same 
> as
> + * the remainder of that partition.  Also entries at (remainder + N *
> + * modulus) positions in indexes array are all same for (modulus,
> + * remainder) specification for any partition.  Thus datums array 
> from
> + * both the given bounds are same, if and only if their indexes array
> + * will be same.  So, it suffices to compare indexes array.
> + */
>
> I am particularly not sure that I believe that the index of a
> partition must be the same as the remainder.  It doesn't seem like
> that would be true when there is more than one modulus or when some
> partitions are missing.
>

Looks like an explanation by the comment is not good enough, will think on this.

Here are the links for the previous discussion:
1] 
https://postgr.es/m/CAFjFpRfHqSGBjNgJV2p%2BC4Yr5Qxvwygdsg4G_VQ6q9NTB-i3MA%40mail.gmail.com
2] 
https://postgr.es/m/CAFjFpRdeESKFkVGgmOdYvmD3d56-58c5VCBK0zDRjHrkq_VcNg%40mail.gmail.com


> +if (offset < 0)
> +{
> +next_modulus = DatumGetInt32(datums[0][0]);
> +valid_modulus = (next_modulus % spec->modulus) == 0;
> +}
> +else
> +{
> +prev_modulus = DatumGetInt32(datums[offset][0]);
> +valid_modulus = (spec->modulus % prev_modulus) == 0;
> +
> +if (valid_modulus && (offset + 1) < ndatums)
> +{
> +next_modulus =
> DatumGetInt32(datums[offset + 1][0]);
> +valid_modulus = (next_modulus %
> spec->modulus) == 0;
> +}
> +}
>
> I don't think this is quite right.  It checks the new modulus against
> prev_modulus whenever prev_modulus is defined, which is correct, but
> it doesn't check the new modulus against the next_modulus except when
> offset < 0.  But actually that check needs to be performed, I think,
> whenever the new modulus is less than the greatest modulus seen so
> far.
>
It does. See the "if (valid_modulus && (offset + 1) < ndatums)"  block in the
else part of the snippet that you are referring.

For e.g new modulus 25 & 150 is not accepted for the hash partitioned bound with
modulus 10,50,200. Will cover this test as well.

> + * For a partitioned table defined as:
> + *CREATE TABLE simple_hash (a int, b char(10)) PARTITION BY HASH (a, b);
> + *
> + * CREATE TABLE p_p1 PARTITION OF simple_hash
> + *FOR VALUES WITH (MODULUS 2, REMAINDER 1);
> + * CREATE TABLE p_p2 PARTITION OF simple_hash
> + *FOR VALUES WITH (MODULUS 4, REMAINDER 2);
> + * CREATE TABLE p_p3 PARTITION OF simple_hash
> + *FOR VALUES WITH (MODULUS 8, REMAINDER 0);
> + * CREATE TABLE p_p4 PARTITION OF simple_hash
> + *FOR VALUES WITH (MODULUS 8, REMAINDER 4);
> + *
> + * This function will return one of the following in the form of an
> + * expression:
> + *
> + * for p_p1: satisfies_hash_partition(2, 1, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
> + * for p_p2: satisfies_hash_partition(4, 2, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
> + * for p_p3: satisfies_hash_partition(8, 0, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
> + * for p_p4: satisfies_hash_partition(8, 4, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
>
> I think instead of this lengthy example you should try to explain the
> general rule.  Maybe something like: the partition constraint for a
> hash partition is always a call to the built-in function
> satisfies_hash_partition().  The first two arguments are the modulus
> and remainder for the partition; the remaining arguments are the hash
> values computed for each column of the partition key using the
> extended hash function from the appropriate opclass.
>
Okay will add this.

> +static uint64
> +mix_hash_value(int nkeys, Datum *hash_array, bool *isnull)
>
How about combining high 32 bits and the low 32 bits separately as shown below?

static inline uint64
hash_combine64(uint64 a, uint64 b)
{
return (((uint64) hash_combine((uint32) a >> 32, (uint32) b >> 32) << 32)
| 

Re: [HACKERS] [POC] hash partitioning

2017-10-11 Thread Amit Langote
On 2017/09/30 1:53, Robert Haas wrote:
> On Thu, Sep 28, 2017 at 1:54 AM, Amit Langote
>  wrote:
>> I looked into how satisfies_hash_partition() works and came up with an
>> idea that I think will make constraint exclusion work.  What if we emitted
>> the hash partition constraint in the following form instead:
>>
>> hash_partition_mod(hash_partition_hash(key1-exthash, key2-exthash),
>>) = 
>>
>> With that form, constraint exclusion seems to work as illustrated below:
>>
>> \d+ p0
>> <...>
>> Partition constraint:
>> (hash_partition_modulus(hash_partition_hash(hashint4extended(a,
>> '8816678312871386367'::bigint)), 4) = 0)
>>
>> -- note only p0 is scanned
>> explain select * from p where
>> hash_partition_modulus(hash_partition_hash(hashint4extended(a,
>> '8816678312871386367'::bigint)), 4) = 0;
> 
> What we actually want constraint exclusion to cover is SELECT * FROM p
> WHERE a = 525600;

I agree.

> As Amul says, nobody's going to enter a query in the form you have it
> here.  Life is too short to take time to put queries into bizarre
> forms.

Here too.  I was falsely thinking that satisfies_hash_partition() is
intended to be used for more than just enforcing the partition constraint
when data is directly inserted into a hash partition, or more precisely to
be used in the CHECK constraint of the table that is to be attached as a
hash partition.  Now, we ask users to add such a constraint to avoid the
constraint validation scan, because the system knows how to infer from the
constraint that the partition constraint is satisfied.  I observed however
that, unlike range and list partitioning, the hash partition's constraint
could only ever be implied because of structural equality (equal()'ness)
of the existing constraint expression and the partition constraint
expression.  For example, a more restrictive range or list qual implies
the partition constraint, but it requires performing btree operator based
proof.  The proof is impossible with the chosen structure of hash
partitioning constraint, but it seems that that's OK.  That is, it's OK to
ask users to add the exact constraint (matching modulus and reminder
values in the call to satisfies_hash_partition() specified in the CHECK
constraint) to avoid the validation scan.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-11 Thread Robert Haas
On Tue, Oct 10, 2017 at 7:07 AM, amul sul  wrote:
> How about the attached patch(0003)?
> Also, the dim variable is renamed to natts.

I'm not sure I believe this comment:

+/*
+ * We arrange the partitions in the ascending order of their modulus
+ * and remainders.  Also every modulus is factor of next larger
+ * modulus.  This means that the index of a given partition is same as
+ * the remainder of that partition.  Also entries at (remainder + N *
+ * modulus) positions in indexes array are all same for (modulus,
+ * remainder) specification for any partition.  Thus datums array from
+ * both the given bounds are same, if and only if their indexes array
+ * will be same.  So, it suffices to compare indexes array.
+ */

I am particularly not sure that I believe that the index of a
partition must be the same as the remainder.  It doesn't seem like
that would be true when there is more than one modulus or when some
partitions are missing.

+if (offset < 0)
+{
+next_modulus = DatumGetInt32(datums[0][0]);
+valid_modulus = (next_modulus % spec->modulus) == 0;
+}
+else
+{
+prev_modulus = DatumGetInt32(datums[offset][0]);
+valid_modulus = (spec->modulus % prev_modulus) == 0;
+
+if (valid_modulus && (offset + 1) < ndatums)
+{
+next_modulus =
DatumGetInt32(datums[offset + 1][0]);
+valid_modulus = (next_modulus %
spec->modulus) == 0;
+}
+}

I don't think this is quite right.  It checks the new modulus against
prev_modulus whenever prev_modulus is defined, which is correct, but
it doesn't check the new modulus against the next_modulus except when
offset < 0.  But actually that check needs to be performed, I think,
whenever the new modulus is less than the greatest modulus seen so
far.

+ * For a partitioned table defined as:
+ *CREATE TABLE simple_hash (a int, b char(10)) PARTITION BY HASH (a, b);
+ *
+ * CREATE TABLE p_p1 PARTITION OF simple_hash
+ *FOR VALUES WITH (MODULUS 2, REMAINDER 1);
+ * CREATE TABLE p_p2 PARTITION OF simple_hash
+ *FOR VALUES WITH (MODULUS 4, REMAINDER 2);
+ * CREATE TABLE p_p3 PARTITION OF simple_hash
+ *FOR VALUES WITH (MODULUS 8, REMAINDER 0);
+ * CREATE TABLE p_p4 PARTITION OF simple_hash
+ *FOR VALUES WITH (MODULUS 8, REMAINDER 4);
+ *
+ * This function will return one of the following in the form of an
+ * expression:
+ *
+ * for p_p1: satisfies_hash_partition(2, 1, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))
+ * for p_p2: satisfies_hash_partition(4, 2, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))
+ * for p_p3: satisfies_hash_partition(8, 0, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))
+ * for p_p4: satisfies_hash_partition(8, 4, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))

I think instead of this lengthy example you should try to explain the
general rule.  Maybe something like: the partition constraint for a
hash partition is always a call to the built-in function
satisfies_hash_partition().  The first two arguments are the modulus
and remainder for the partition; the remaining arguments are the hash
values computed for each column of the partition key using the
extended hash function from the appropriate opclass.

+static uint64
+mix_hash_value(int nkeys, Datum *hash_array, bool *isnull)

It would be nice to use the hash_combine() facility Andres recently
added for this rather than having a way to do it that is specific to
hash partitioning, but that function only works for 32-bit hash
values.  Maybe we can persuade Andres to add a hash_combine64...

+ * a hash operator class

Missing period at end.

+if (strategy == PARTITION_STRATEGY_HASH)
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("default hash partition is not supported")));

Maybe errmsg("a hash-partitioned table may not have a default partition")?

+/* Seed for the extended hash function */
+#define HASH_SEED UINT64CONST(0x7A5B22367996DCFF)

I suggest HASH_PARTITION_SEED -- this is too generic.

Have you checked how well the tests you've added cover the code you've
added?  What code is not covered by the tests, and is there any way to
cover it?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread amul sul
On Tue, Oct 10, 2017 at 3:42 PM, Ashutosh Bapat
 wrote:
> On Tue, Oct 10, 2017 at 3:32 PM, amul sul  wrote:
>
>>> +hash_part? true : key->parttypbyval[j],
>>> +key->parttyplen[j]);
>>> parttyplen is the length of partition key attribute, whereas what you want 
>>> here
>>> is the length of type of modulus and remainder. Is that correct? Probably we
>>> need some special handling wherever parttyplen and parttypbyval is used 
>>> e.g. in
>>> call to partition_bounds_equal() from build_joinrel_partition_info().
>>>
>>
>> Unless I am missing something, I don't think we should worry about parttyplen
>> because in the datumCopy() when the datatype is pass-by-value then typelen
>> is ignored.
>
> That's true, but it's ugly, passing typbyvalue of one type and len of other.
>

How about the attached patch(0003)?
Also, the dim variable is renamed to natts.

Regards,
Amul


0001-partition_bounds_copy-code-refactoring-v1.patch
Description: Binary data


0002-hash-partitioning_another_design-v24.patch
Description: Binary data


0003-Enable-partition-wise-join-support-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread Ashutosh Bapat
On Tue, Oct 10, 2017 at 3:40 PM, amul sul  wrote:
>>
>> natts represents the number of attributes, but for the hash partition bound 
>> we
>> are not dealing with the attribute so that I have used short-form of 
>> dimension,
>> thoughts?
>
> Okay, I think the dimension(dim) is also unfit here.  Any suggestions?
>


I think natts is ok, since we are dealing with the number of
attributes in the pack of datums; esp. when ndatums is already taken.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread Ashutosh Bapat
On Tue, Oct 10, 2017 at 3:32 PM, amul sul  wrote:

>> +hash_part? true : key->parttypbyval[j],
>> +key->parttyplen[j]);
>> parttyplen is the length of partition key attribute, whereas what you want 
>> here
>> is the length of type of modulus and remainder. Is that correct? Probably we
>> need some special handling wherever parttyplen and parttypbyval is used e.g. 
>> in
>> call to partition_bounds_equal() from build_joinrel_partition_info().
>>
>
> Unless I am missing something, I don't think we should worry about parttyplen
> because in the datumCopy() when the datatype is pass-by-value then typelen
> is ignored.

That's true, but it's ugly, passing typbyvalue of one type and len of other.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread amul sul
On Tue, Oct 10, 2017 at 3:32 PM, amul sul  wrote:
> On Mon, Oct 9, 2017 at 5:51 PM, Ashutosh Bapat
>  wrote:
>> On Mon, Oct 9, 2017 at 4:44 PM, amul sul  wrote:
>>
>
> Thanks Ashutosh for your review, please find my comment inline.
>
>>
>>> 0002 few changes in partition-wise join code to support
>>> hash-partitioned table as well & regression tests.
>>
>> +switch (key->strategy)
>> +{
>> +case PARTITION_STRATEGY_HASH:
>> +/*
>> + * Indexes array is same as the greatest modulus.
>> + * See partition_bounds_equal() for more explanation.
>> + */
>> +num_indexes = DatumGetInt32(src->datums[ndatums - 1][0]);
>> +break;
>> This logic is duplicated at multiple places.  I think it's time we 
>> consolidate
>> these changes in a function/macro and call it from the places where we have 
>> to
>> calculate number of indexes based on the information in partition descriptor.
>> Refactoring existing code might be a separate patch and then add hash
>> partitioning case in hash partitioning patch.
>>
>
> Make sense, added get_partition_bound_num_indexes() to get number of index
> elements in 0001 & get_greatest_modulus() as name suggested to get the 
> greatest
> modulus of the hash partition bound in 0002.
>
>> +intdim = hash_part? 2 : partnatts;
>> Call the variable as natts_per_datum or just natts?
>>
>
> natts represents the number of attributes, but for the hash partition bound we
> are not dealing with the attribute so that I have used short-form of 
> dimension,
> thoughts?

Okay, I think the dimension(dim) is also unfit here.  Any suggestions?

>
>> +hash_part? true : key->parttypbyval[j],
>> +key->parttyplen[j]);
>> parttyplen is the length of partition key attribute, whereas what you want 
>> here
>> is the length of type of modulus and remainder. Is that correct? Probably we
>> need some special handling wherever parttyplen and parttypbyval is used e.g. 
>> in
>> call to partition_bounds_equal() from build_joinrel_partition_info().
>>
>
> Unless I am missing something, I don't think we should worry about parttyplen
> because in the datumCopy() when the datatype is pass-by-value then typelen
> is ignored.
>
> Regards,
> Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread amul sul
On Mon, Oct 9, 2017 at 5:51 PM, Ashutosh Bapat
 wrote:
> On Mon, Oct 9, 2017 at 4:44 PM, amul sul  wrote:
>

Thanks Ashutosh for your review, please find my comment inline.

>
>> 0002 few changes in partition-wise join code to support
>> hash-partitioned table as well & regression tests.
>
> +switch (key->strategy)
> +{
> +case PARTITION_STRATEGY_HASH:
> +/*
> + * Indexes array is same as the greatest modulus.
> + * See partition_bounds_equal() for more explanation.
> + */
> +num_indexes = DatumGetInt32(src->datums[ndatums - 1][0]);
> +break;
> This logic is duplicated at multiple places.  I think it's time we consolidate
> these changes in a function/macro and call it from the places where we have to
> calculate number of indexes based on the information in partition descriptor.
> Refactoring existing code might be a separate patch and then add hash
> partitioning case in hash partitioning patch.
>

Make sense, added get_partition_bound_num_indexes() to get number of index
elements in 0001 & get_greatest_modulus() as name suggested to get the greatest
modulus of the hash partition bound in 0002.

> +intdim = hash_part? 2 : partnatts;
> Call the variable as natts_per_datum or just natts?
>

natts represents the number of attributes, but for the hash partition bound we
are not dealing with the attribute so that I have used short-form of dimension,
thoughts?

> +hash_part? true : key->parttypbyval[j],
> +key->parttyplen[j]);
> parttyplen is the length of partition key attribute, whereas what you want 
> here
> is the length of type of modulus and remainder. Is that correct? Probably we
> need some special handling wherever parttyplen and parttypbyval is used e.g. 
> in
> call to partition_bounds_equal() from build_joinrel_partition_info().
>

Unless I am missing something, I don't think we should worry about parttyplen
because in the datumCopy() when the datatype is pass-by-value then typelen
is ignored.

Regards,
Amul


0001-partition_bounds_copy-code-refactoring-v1.patch
Description: Binary data


0002-hash-partitioning_another_design-v24.patch
Description: Binary data


0003-Enable-partition-wise-join-support-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-09 Thread Ashutosh Bapat
On Mon, Oct 9, 2017 at 4:44 PM, amul sul  wrote:


> 0002 few changes in partition-wise join code to support
> hash-partitioned table as well & regression tests.

+switch (key->strategy)
+{
+case PARTITION_STRATEGY_HASH:
+/*
+ * Indexes array is same as the greatest modulus.
+ * See partition_bounds_equal() for more explanation.
+ */
+num_indexes = DatumGetInt32(src->datums[ndatums - 1][0]);
+break;
This logic is duplicated at multiple places.  I think it's time we consolidate
these changes in a function/macro and call it from the places where we have to
calculate number of indexes based on the information in partition descriptor.
Refactoring existing code might be a separate patch and then add hash
partitioning case in hash partitioning patch.

+intdim = hash_part? 2 : partnatts;
Call the variable as natts_per_datum or just natts?

+hash_part? true : key->parttypbyval[j],
+key->parttyplen[j]);
parttyplen is the length of partition key attribute, whereas what you want here
is the length of type of modulus and remainder. Is that correct? Probably we
need some special handling wherever parttyplen and parttypbyval is used e.g. in
call to partition_bounds_equal() from build_joinrel_partition_info().

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-09 Thread amul sul
On Sat, Oct 7, 2017 at 5:22 PM, amul sul  wrote:
> On Fri, Oct 6, 2017 at 5:35 PM, Jesper Pedersen
>  wrote:
>> Hi Amul,
>>
>> Could you rebase on latest master ?
>>
>
> Sure will post that soon, but before that, I need to test hash partitioning
> with recent partition-wise join commit (f49842d1ee), thanks.
>

Updated patch attached.

0001 is the rebased of the previous patch, no new change.
0002 few changes in partition-wise join code to support
hash-partitioned table as well & regression tests.

Thanks & Regards,
Amul


0001-hash-partitioning_another_design-v23.patch
Description: Binary data


0002-Enable-partition-wise-join-support-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-07 Thread amul sul
On Fri, Oct 6, 2017 at 5:35 PM, Jesper Pedersen
 wrote:
> Hi Amul,
>
> Could you rebase on latest master ?
>

Sure will post that soon, but before that, I need to test hash partitioning
with recent partition-wise join commit (f49842d1ee), thanks.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-06 Thread Jesper Pedersen

Hi Amul,

On 09/28/2017 05:56 AM, amul sul wrote:

It does not really do the partition pruning via constraint exclusion and I don't
think anyone is going to use the remainder in the where condition to fetch
data and hash partitioning is not meant for that.

But I am sure that we could solve this problem using your and Beena's work
toward faster partition pruning[1] and Runtime Partition Pruning[2].

Will think on this changes if it is required for the pruning feature.



Could you rebase on latest master ?

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-29 Thread Robert Haas
On Thu, Sep 28, 2017 at 1:54 AM, Amit Langote
 wrote:
> I looked into how satisfies_hash_partition() works and came up with an
> idea that I think will make constraint exclusion work.  What if we emitted
> the hash partition constraint in the following form instead:
>
> hash_partition_mod(hash_partition_hash(key1-exthash, key2-exthash),
>) = 
>
> With that form, constraint exclusion seems to work as illustrated below:
>
> \d+ p0
> <...>
> Partition constraint:
> (hash_partition_modulus(hash_partition_hash(hashint4extended(a,
> '8816678312871386367'::bigint)), 4) = 0)
>
> -- note only p0 is scanned
> explain select * from p where
> hash_partition_modulus(hash_partition_hash(hashint4extended(a,
> '8816678312871386367'::bigint)), 4) = 0;

What we actually want constraint exclusion to cover is SELECT * FROM p
WHERE a = 525600;

As Amul says, nobody's going to enter a query in the form you have it
here.  Life is too short to take time to put queries into bizarre
forms.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-28 Thread amul sul
On Thu, Sep 28, 2017 at 11:24 AM, Amit Langote
 wrote:
> On 2017/09/27 22:41, Jesper Pedersen wrote:
>> On 09/27/2017 03:05 AM, amul sul wrote:
> Attached rebased patch, thanks.
>
>
 While reading through the patch I thought it would be better to keep
 MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to
 highlight that these are "keywords" for hash partition.

 Also updated some of the documentation.


>>> Thanks a lot for the patch, included in the attached version.
>>>
>>
>> Thank you.
>>
>> Based on [1] I have moved the patch to "Ready for Committer".
>
> Thanks a lot Amul for working on this.  Like Jesper said, the patch looks
> pretty good overall.  I was looking at the latest version with intent to
> study certain things about hash partitioning the way patch implements it,
> during which I noticed some things.
>

Thanks Amit for looking at the patch.

> +  The modulus must be a positive integer, and the remainder must a
>
> must be a
>

Fixed in the attached version.

> +  suppose you have a hash-partitioned table with 8 children, each of
> which
> +  has modulus 8, but find it necessary to increase the number of
> partitions
> +  to 16.
>

Fixed in the attached version.

> Might it be a good idea to say 8 "partitions" instead of "children" in the
> first sentence?
>
> +  each modulus-8 partition until none remain.  While this may still
> involve
> +  a large amount of data movement at each step, it is still better than
> +  having to create a whole new table and move all the data at once.
> + 
> +
>

Fixed in the attached version.

> I read the paragraph that ends with the above text and started wondering
> if the example to redistribute data in hash partitions by detaching and
> attaching with new modulus/remainder could be illustrated with an example?
> Maybe in the examples section of the ALTER TABLE page?
>

I think hint in the documentation is more than enough. There is N number of
ways of data redistribution, the document is not meant to explain all of those.

> +  Since hash operator class provide only equality, not ordering,
> collation
>
> Either "Since hash operator classes provide" or "Since hash operator class
> provides"
>

Fixed in the attached version.

> Other than the above points, patch looks good.
>
>
> By the way, I noticed a couple of things about hash partition constraints:
>
> 1. In get_qual_for_hash(), using
> get_fn_expr_rettype(>partsupfunc[i]), which returns InvalidOid for
> the lack of fn_expr being set to non-NULL value, causes funcrettype of the
> FuncExpr being generated for hashing partition key columns to be set to
> InvalidOid, which I think is wrong.  That is, the following if condition
> in get_fn_expr_rettype() is always satisfied:
>
> if (!flinfo || !flinfo->fn_expr)
> return InvalidOid;
>
> I think we could use get_func_rettype(>partsupfunc[i].fn_oid)
> instead.  Attached patch
> hash-v21-set-funcexpr-funcrettype-correctly.patch, which applies on top
> v21 of your patch.
>

Thanks for the patch, included in the attached version.

> 2. It seems that the reason constraint exclusion doesn't work with hash
> partitions as implemented by the patch is that predtest.c:
> operator_predicate_proof() returns false even without looking into the
> hash partition constraint, which is of the following form:
>
> satisfies_hash_partition(, , ,..)
>
> beccause the above constraint expression doesn't translate into a a binary
> opclause (an OpExpr), which operator_predicate_proof() knows how to work
> with.  So, false is returned at the beginning of that function by the
> following code:
>
> if (!is_opclause(predicate))
> return false;
>
> For example,
>
> create table p (a int) partition by hash (a);
> create table p0 partition of p for values with (modulus 4, remainder 0);
> create table p1 partition of p for values with (modulus 4, remainder 1);
> \d+ p0
> <...>
> Partition constraint: satisfies_hash_partition(4, 0, hashint4extended(a,
> '8816678312871386367'::bigint))
>
> -- both p0 and p1 scanned
> explain select * from p where satisfies_hash_partition(4, 0,
> hashint4extended(a, '8816678312871386367'::bigint));
>  QUERY PLAN
>
> 
>  Append  (cost=0.00..96.50 rows=1700 width=4)
>->  Seq Scan on p0  (cost=0.00..48.25 rows=850 width=4)
>  Filter: satisfies_hash_partition(4, 0, hashint4extended(a,
> '8816678312871386367'::bigint))
>->  Seq Scan on p1  (cost=0.00..48.25 rows=850 width=4)
>  Filter: satisfies_hash_partition(4, 0, hashint4extended(a,
> '8816678312871386367'::bigint))
> (5 rows)
>
> -- both p0 and p1 scanned
> explain select * from p where satisfies_hash_partition(4, 1,
> hashint4extended(a, '8816678312871386367'::bigint));
>   

Re: [HACKERS] [POC] hash partitioning

2017-09-27 Thread Amit Langote
On 2017/09/27 22:41, Jesper Pedersen wrote:
> On 09/27/2017 03:05 AM, amul sul wrote:
 Attached rebased patch, thanks.


>>> While reading through the patch I thought it would be better to keep
>>> MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to
>>> highlight that these are "keywords" for hash partition.
>>>
>>> Also updated some of the documentation.
>>>
>>>
>> Thanks a lot for the patch, included in the attached version.​
>>
> 
> Thank you.
> 
> Based on [1] I have moved the patch to "Ready for Committer".

Thanks a lot Amul for working on this.  Like Jesper said, the patch looks
pretty good overall.  I was looking at the latest version with intent to
study certain things about hash partitioning the way patch implements it,
during which I noticed some things.

+  The modulus must be a positive integer, and the remainder must a

must be a

+  suppose you have a hash-partitioned table with 8 children, each of
which
+  has modulus 8, but find it necessary to increase the number of
partitions
+  to 16.

Might it be a good idea to say 8 "partitions" instead of "children" in the
first sentence?

+  each modulus-8 partition until none remain.  While this may still
involve
+  a large amount of data movement at each step, it is still better than
+  having to create a whole new table and move all the data at once.
+ 
+

I read the paragraph that ends with the above text and started wondering
if the example to redistribute data in hash partitions by detaching and
attaching with new modulus/remainder could be illustrated with an example?
Maybe in the examples section of the ALTER TABLE page?

+  Since hash operator class provide only equality, not ordering,
collation

Either "Since hash operator classes provide" or "Since hash operator class
provides"

Other than the above points, patch looks good.


By the way, I noticed a couple of things about hash partition constraints:

1. In get_qual_for_hash(), using
get_fn_expr_rettype(>partsupfunc[i]), which returns InvalidOid for
the lack of fn_expr being set to non-NULL value, causes funcrettype of the
FuncExpr being generated for hashing partition key columns to be set to
InvalidOid, which I think is wrong.  That is, the following if condition
in get_fn_expr_rettype() is always satisfied:

if (!flinfo || !flinfo->fn_expr)
return InvalidOid;

I think we could use get_func_rettype(>partsupfunc[i].fn_oid)
instead.  Attached patch
hash-v21-set-funcexpr-funcrettype-correctly.patch, which applies on top
v21 of your patch.

2. It seems that the reason constraint exclusion doesn't work with hash
partitions as implemented by the patch is that predtest.c:
operator_predicate_proof() returns false even without looking into the
hash partition constraint, which is of the following form:

satisfies_hash_partition(, , ,..)

beccause the above constraint expression doesn't translate into a a binary
opclause (an OpExpr), which operator_predicate_proof() knows how to work
with.  So, false is returned at the beginning of that function by the
following code:

if (!is_opclause(predicate))
return false;

For example,

create table p (a int) partition by hash (a);
create table p0 partition of p for values with (modulus 4, remainder 0);
create table p1 partition of p for values with (modulus 4, remainder 1);
\d+ p0
<...>
Partition constraint: satisfies_hash_partition(4, 0, hashint4extended(a,
'8816678312871386367'::bigint))

-- both p0 and p1 scanned
explain select * from p where satisfies_hash_partition(4, 0,
hashint4extended(a, '8816678312871386367'::bigint));
 QUERY PLAN


 Append  (cost=0.00..96.50 rows=1700 width=4)
   ->  Seq Scan on p0  (cost=0.00..48.25 rows=850 width=4)
 Filter: satisfies_hash_partition(4, 0, hashint4extended(a,
'8816678312871386367'::bigint))
   ->  Seq Scan on p1  (cost=0.00..48.25 rows=850 width=4)
 Filter: satisfies_hash_partition(4, 0, hashint4extended(a,
'8816678312871386367'::bigint))
(5 rows)

-- both p0 and p1 scanned
explain select * from p where satisfies_hash_partition(4, 1,
hashint4extended(a, '8816678312871386367'::bigint));
 QUERY PLAN


 Append  (cost=0.00..96.50 rows=1700 width=4)
   ->  Seq Scan on p0  (cost=0.00..48.25 rows=850 width=4)
 Filter: satisfies_hash_partition(4, 1, hashint4extended(a,
'8816678312871386367'::bigint))
   ->  Seq Scan on p1  (cost=0.00..48.25 rows=850 width=4)
 Filter: satisfies_hash_partition(4, 1, hashint4extended(a,
'8816678312871386367'::bigint))
(5 rows)


I looked into how satisfies_hash_partition() works and came up with an
idea that I think will make constraint exclusion work.  What if we emitted
the hash 

Re: [HACKERS] [POC] hash partitioning

2017-09-27 Thread Jesper Pedersen

On 09/27/2017 03:05 AM, amul sul wrote:

Attached rebased patch, thanks.



While reading through the patch I thought it would be better to keep
MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to
highlight that these are "keywords" for hash partition.

Also updated some of the documentation.



Thanks a lot for the patch, included in the attached version.​



Thank you.

Based on [1] I have moved the patch to "Ready for Committer".

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoYsw3pusDen4_A44c7od%2BbEAST0eYo%2BjODtyofR0W2soQ%40mail.gmail.com


Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-27 Thread amul sul
On Mon, Sep 18, 2017 at 8:55 PM, Jesper Pedersen  wrote:

> On 09/15/2017 02:30 AM, amul sul wrote:
>
>> Attached rebased patch, thanks.
>>
>>
> While reading through the patch I thought it would be better to keep
> MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to
> highlight that these are "keywords" for hash partition.
>
> Also updated some of the documentation.
>
>
Thanks a lot for the patch, included in the attached version.​


> V20 patch passes make check-world, and my testing (typical 64 partitions,
> and various ATTACH/DETACH scenarios).
>

Nice, ​thanks again.

Regards,
Amul


0001-hash-partitioning_another_design-v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-18 Thread Jesper Pedersen

On 09/15/2017 02:30 AM, amul sul wrote:

Attached rebased patch, thanks.



While reading through the patch I thought it would be better to keep 
MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order 
to highlight that these are "keywords" for hash partition.


Also updated some of the documentation.

V20 patch passes make check-world, and my testing (typical 64 
partitions, and various ATTACH/DETACH scenarios).


Thanks for working on this !

Best regards,
 Jesper
>From 189a40a5ca6c7a1bc79b750cbc95584b3061fda5 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Mon, 18 Sep 2017 11:13:54 -0400
Subject: [PATCH] * Documentation updates * Use caps for MODULUS / REMAINDER
 when CREATE TABLE is so too

---
 doc/src/sgml/ddl.sgml  | 10 -
 doc/src/sgml/ref/alter_table.sgml  |  2 +-
 src/backend/catalog/partition.c|  8 +++
 src/test/regress/expected/alter_table.out  | 20 +-
 src/test/regress/expected/create_table.out | 34 +++---
 src/test/regress/sql/alter_table.sql   | 20 +-
 src/test/regress/sql/create_table.sql  | 28 
 7 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 24b36caad3..e38d8fc0a0 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2881,11 +2881,11 @@ VALUES ('Albany', NULL, NULL, 'NY');
 

 
- The table is partitioned by specifying modulus and remainder for each
- partition. Each partition holds rows for which the hash value of
- partition keys when divided by specified modulus produces specified
- remainder. For more clarification on modulus and remainder please refer
- .
+ The table is partitioned by specifying a modulus and a remainder for each
+ partition. Each partition will hold the rows for which the hash value of
+ the partition key divided by the specified modulus will produce the specified
+ remainder. Refer to 
+ for additional clarification on modulus and remainder.
 

   
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index a6eefb8564..b5fb93edac 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1424,7 +1424,7 @@ ALTER TABLE cities
Attach a partition to hash partitioned table:
 
 ALTER TABLE orders
-ATTACH PARTITION orders_p4 FOR VALUES WITH (modulus 4, remainder 3);
+ATTACH PARTITION orders_p4 FOR VALUES WITH (MODULUS 4, REMAINDER 3);
 
 
   
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5c11c7ecea..3696b9a711 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1694,13 +1694,13 @@ make_partition_op_expr(PartitionKey key, int keynum,
  *	CREATE TABLE simple_hash (a int, b char(10)) PARTITION BY HASH (a, b);
  *
  * CREATE TABLE p_p1 PARTITION OF simple_hash
- *	FOR VALUES WITH (modulus 2, remainder 1);
+ *	FOR VALUES WITH (MODULUS 2, REMAINDER 1);
  * CREATE TABLE p_p2 PARTITION OF simple_hash
- *	FOR VALUES WITH (modulus 4, remainder 2);
+ *	FOR VALUES WITH (MODULUS 4, REMAINDER 2);
  * CREATE TABLE p_p3 PARTITION OF simple_hash
- *	FOR VALUES WITH (modulus 8, remainder 0);
+ *	FOR VALUES WITH (MODULUS 8, REMAINDER 0);
  * CREATE TABLE p_p4 PARTITION OF simple_hash
- *	FOR VALUES WITH (modulus 8, remainder 4);
+ *	FOR VALUES WITH (MODULUS 8, REMAINDER 4);
  *
  * This function will return one of the following in the form of an
  * expression:
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 124cbe483c..304fb97291 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3490,22 +3490,22 @@ CREATE TABLE hash_parted (
 	a int,
 	b int
 ) PARTITION BY HASH (a custom_opclass);
-CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH (modulus 4, remainder 0);
+CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH (MODULUS 4, REMAINDER 0);
 CREATE TABLE fail_part (LIKE hpart_1);
-ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (modulus 8, remainder 4);
+ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (MODULUS 8, REMAINDER 4);
 ERROR:  partition "fail_part" would overlap partition "hpart_1"
-ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (modulus 8, remainder 0);
+ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (MODULUS 8, REMAINDER 0);
 ERROR:  partition "fail_part" would overlap partition "hpart_1"
 DROP TABLE fail_part;
 -- check validation when attaching hash partitions
 -- check that violating rows are correctly reported
 CREATE TABLE hpart_2 (LIKE hash_parted);
 INSERT INTO hpart_2 VALUES (3, 0);
-ALTER TABLE hash_parted ATTACH PARTITION hpart_2 FOR VALUES 

Re: [HACKERS] [POC] hash partitioning

2017-09-15 Thread amul sul
On Fri, Sep 15, 2017 at 4:30 AM, Thom Brown  wrote:

> On 14 September 2017 at 09:58, amul sul  wrote:
> > On Wed, Sep 13, 2017 at 7:43 PM, Jesper Pedersen
> >  wrote:
> >>
> >> Hi Amul,
> >>
> >> On 09/08/2017 08:40 AM, amul sul wrote:
> >>>
> >>> Rebased 0002 against this commit & renamed to 0001, PFA.
> >>>
> >>
> >> This patch needs a rebase.
> >>
> >
> > Thanks for your note.
> > Attached is the patch rebased on the latest master head.
> > Also added error on
> > creating
> > d
> > efault partition
> > for the hash partitioned table
> > ,
> > and updated document &
> > test script for the same.
>
> Sorry, but this needs another rebase as it's broken by commit
> 77b6b5e9ceca04dbd6f0f6cd3fc881519acc8714.
>
> ​
Attached rebased patch, thanks.

Regards,
Amul


0001-hash-partitioning_another_design-v20.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Thom Brown
On 14 September 2017 at 09:58, amul sul  wrote:
> On Wed, Sep 13, 2017 at 7:43 PM, Jesper Pedersen
>  wrote:
>>
>> Hi Amul,
>>
>> On 09/08/2017 08:40 AM, amul sul wrote:
>>>
>>> Rebased 0002 against this commit & renamed to 0001, PFA.
>>>
>>
>> This patch needs a rebase.
>>
>
> Thanks for your note.
> Attached is the patch rebased on the latest master head.
> Also added error on
> creating
> d
> efault partition
> for the hash partitioned table
> ,
> and updated document &
> test script for the same.

Sorry, but this needs another rebase as it's broken by commit
77b6b5e9ceca04dbd6f0f6cd3fc881519acc8714.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Robert Haas
On Thu, Sep 14, 2017 at 2:05 PM, Jesper Pedersen
 wrote:
> However, it is a little bit difficult to follow the dependencies between
> different partition patches, so I may not always provide sane feedback, as
> seen in [1].
>
> [1]
> https://www.postgresql.org/message-id/579077fd-8f07-aff7-39bc-b92c855cdb70%40redhat.com

Yeah, no issues.  I knew about the dependency between those patches,
but I'm pretty sure there wasn't any terribly explicit discussion
about it, even if the issue probably came up parenthetically someplace
or other.  Oops.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Jesper Pedersen

On 09/14/2017 01:52 PM, Robert Haas wrote:

On Thu, Sep 14, 2017 at 1:07 PM, Jesper Pedersen
 wrote:

Yeah, it would be nice to have a syntax like

) PARTITION BY HASH (col) WITH (AUTO_CREATE = 64);

But then there also needs to be a way to create the 64 associated indexes
too for everything to be easy.


Well, for that, there's this proposal:

http://postgr.es/m/c8fe4f6b-ff46-aae0-89e3-e936a35f0...@postgrespro.ru

As several people have right pointed out, there's a lot of work to be
done on partitioning it to get it to where we want it to be.  Even in
v10, it's got significant benefits, such as much faster bulk-loading,
but I don't hear anybody disputing the notion that a lot more work is
needed.  The good news is that a lot of that work is already in
progress; the bad news is that a lot of that work is not done yet.

But I think that's OK.  We can't solve every problem at once, and I
think we're moving things along here at a reasonably brisk pace.  That
didn't stop me from complaining bitterly to someone just yesterday
that we aren't moving faster still, but unfortunately EnterpriseDB has
only been able to get 12 developers to do any work at all on
partitioning this release cycle, and 3 of those have so far helped
only with review and benchmarking.  It's a pity we can't do more, but
considering how many community projects are 1-person efforts I think
it's pretty good.

To be clear, I know you're not (or at least I assume you're not)
trying to beat me up about this, just raising a concern, and I'm not
trying to beat you up either, just let you know that it is definitely
on the radar screen but not there yet.



Definitely not a complain about the work being done.

I think the scope of Amul's and others work on hash partition support is 
where it needs to be. Improvements can always follow in future release.


My point was that is easy to script the definition of the partitions and 
their associated indexes, so it is more important to focus on the core 
functionality with the developer / review resources available.


However, it is a little bit difficult to follow the dependencies between 
different partition patches, so I may not always provide sane feedback, 
as seen in [1].


[1] 
https://www.postgresql.org/message-id/579077fd-8f07-aff7-39bc-b92c855cdb70%40redhat.com


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Robert Haas
On Thu, Sep 14, 2017 at 1:07 PM, Jesper Pedersen
 wrote:
> Yeah, it would be nice to have a syntax like
>
> ) PARTITION BY HASH (col) WITH (AUTO_CREATE = 64);
>
> But then there also needs to be a way to create the 64 associated indexes
> too for everything to be easy.

Well, for that, there's this proposal:

http://postgr.es/m/c8fe4f6b-ff46-aae0-89e3-e936a35f0...@postgrespro.ru

As several people have right pointed out, there's a lot of work to be
done on partitioning it to get it to where we want it to be.  Even in
v10, it's got significant benefits, such as much faster bulk-loading,
but I don't hear anybody disputing the notion that a lot more work is
needed.  The good news is that a lot of that work is already in
progress; the bad news is that a lot of that work is not done yet.

But I think that's OK.  We can't solve every problem at once, and I
think we're moving things along here at a reasonably brisk pace.  That
didn't stop me from complaining bitterly to someone just yesterday
that we aren't moving faster still, but unfortunately EnterpriseDB has
only been able to get 12 developers to do any work at all on
partitioning this release cycle, and 3 of those have so far helped
only with review and benchmarking.  It's a pity we can't do more, but
considering how many community projects are 1-person efforts I think
it's pretty good.

To be clear, I know you're not (or at least I assume you're not)
trying to beat me up about this, just raising a concern, and I'm not
trying to beat you up either, just let you know that it is definitely
on the radar screen but not there yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Jesper Pedersen

On 09/14/2017 12:56 PM, Robert Haas wrote:

On Thu, Sep 14, 2017 at 12:54 PM, David Fetter  wrote:

Should we be pointing the gun away from people's feet by making hash
partitions that cover the space automagically when the partitioning
scheme[1] is specified?  In other words, do we have a good reason to have
only some of the hash partitions so defined by default?


Sure, we can add some convenience syntax for that, but I'd like to get
the basic stuff working before doing that kind of polishing.

If nothing else, I assume Keith Fiske's pg_partman will provide a way
to magically DTRT about an hour after this goes in.  But probably we
can do better in core easily enough.



Yeah, it would be nice to have a syntax like

) PARTITION BY HASH (col) WITH (AUTO_CREATE = 64);

But then there also needs to be a way to create the 64 associated 
indexes too for everything to be easy.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Robert Haas
On Thu, Sep 14, 2017 at 12:54 PM, David Fetter  wrote:
> Should we be pointing the gun away from people's feet by making hash
> partitions that cover the space automagically when the partitioning
> scheme[1] is specified?  In other words, do we have a good reason to have
> only some of the hash partitions so defined by default?

Sure, we can add some convenience syntax for that, but I'd like to get
the basic stuff working before doing that kind of polishing.

If nothing else, I assume Keith Fiske's pg_partman will provide a way
to magically DTRT about an hour after this goes in.  But probably we
can do better in core easily enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread David Fetter
On Mon, Sep 11, 2017 at 07:43:29AM -0400, Robert Haas wrote:
> On Mon, Sep 11, 2017 at 4:17 AM, Ashutosh Bapat
>  wrote:
> >> Rebased 0002 against this commit & renamed to 0001, PFA.
> >
> > Given that we have default partition support now, I am wondering
> > whether hash partitioned tables also should have default
> > partitions.  The way we have structured hash partitioning syntax,
> > there can be "holes" in partitions. Default partition would help
> > plug those holes.
> 
> Yeah, I was thinking about that, too.  On the one hand, it seems
> like it's solving the problem the wrong way: if you've set up hash
> partitioning properly, you shouldn't have any holes.

Should we be pointing the gun away from people's feet by making hash
partitions that cover the space automagically when the partitioning
scheme[1] is specified?  In other words, do we have a good reason to have
only some of the hash partitions so defined by default?

Best,
David.

[1] For now, that's just the modulus, but the PoC included specifying
hashing functions, so I assume other ways to specify the partitioning
scheme could eventually be proposed.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Jesper Pedersen

Hi,

On 09/14/2017 12:05 PM, Robert Haas wrote:

On Thu, Sep 14, 2017 at 11:39 AM, Jesper Pedersen
 wrote:

When I do

CREATE TABLE mytab (
   a integer NOT NULL,
   b integer NOT NULL,
   c integer,
   d integer
) PARTITION BY HASH (b);

and create 64 partitions;

CREATE TABLE mytab_p00 PARTITION OF mytab FOR VALUES WITH (MODULUS 64,
REMAINDER 0);
...
CREATE TABLE mytab_p63 PARTITION OF mytab FOR VALUES WITH (MODULUS 64,
REMAINDER 63);

and associated indexes

CREATE INDEX idx_p00 ON mytab_p00 USING btree (b, a);
...
CREATE INDEX idx_p63 ON mytab_p63 USING btree (b, a);

Populate the database, and do ANALYZE.

Given

EXPLAIN (ANALYZE, VERBOSE, BUFFERS ON) SELECT a, b, c, d FROM mytab WHERE b
= 42

gives

Append
   -> Index Scan using idx_p00 (cost rows=7) (actual rows=0)
   ...
   -> Index Scan using idx_p63 (cost rows=7) (actual rows=0)

E.g. all partitions are being scanned. Of course one partition will contain
the rows I'm looking for.


Yeah, we need Amit Langote's work in
http://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce7...@lab.ntt.co.jp
to land and this patch to be adapted to make use of it.  I think
that's the major thing still standing in the way of this. Concerns
were also raised about not having a way to see the hash function, but
we fixed that in 81c5e46c490e2426db243eada186995da5bb0ba7 and
hopefully this patch has been updated to use a seed (I haven't looked
yet).  And there was a concern about hash functions not being
portable, but the conclusion of that was basically that most people
think --load-via-partition-root will be a satisfactory workaround for
cases where that becomes a problem (cf. commit
23d7680d04b958de327be96ffdde8f024140d50e).  So this is the major
remaining issue that I know about.



Thanks for the information, Robert !

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Robert Haas
On Thu, Sep 14, 2017 at 11:39 AM, Jesper Pedersen
 wrote:
> When I do
>
> CREATE TABLE mytab (
>   a integer NOT NULL,
>   b integer NOT NULL,
>   c integer,
>   d integer
> ) PARTITION BY HASH (b);
>
> and create 64 partitions;
>
> CREATE TABLE mytab_p00 PARTITION OF mytab FOR VALUES WITH (MODULUS 64,
> REMAINDER 0);
> ...
> CREATE TABLE mytab_p63 PARTITION OF mytab FOR VALUES WITH (MODULUS 64,
> REMAINDER 63);
>
> and associated indexes
>
> CREATE INDEX idx_p00 ON mytab_p00 USING btree (b, a);
> ...
> CREATE INDEX idx_p63 ON mytab_p63 USING btree (b, a);
>
> Populate the database, and do ANALYZE.
>
> Given
>
> EXPLAIN (ANALYZE, VERBOSE, BUFFERS ON) SELECT a, b, c, d FROM mytab WHERE b
> = 42
>
> gives
>
> Append
>   -> Index Scan using idx_p00 (cost rows=7) (actual rows=0)
>   ...
>   -> Index Scan using idx_p63 (cost rows=7) (actual rows=0)
>
> E.g. all partitions are being scanned. Of course one partition will contain
> the rows I'm looking for.

Yeah, we need Amit Langote's work in
http://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce7...@lab.ntt.co.jp
to land and this patch to be adapted to make use of it.  I think
that's the major thing still standing in the way of this. Concerns
were also raised about not having a way to see the hash function, but
we fixed that in 81c5e46c490e2426db243eada186995da5bb0ba7 and
hopefully this patch has been updated to use a seed (I haven't looked
yet).  And there was a concern about hash functions not being
portable, but the conclusion of that was basically that most people
think --load-via-partition-root will be a satisfactory workaround for
cases where that becomes a problem (cf. commit
23d7680d04b958de327be96ffdde8f024140d50e).  So this is the major
remaining issue that I know about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread Jesper Pedersen

Hi Amul,

On 09/14/2017 04:58 AM, amul sul wrote:

On Wed, Sep 13, 2017 at 7:43 PM, Jesper Pedersen  Index Scan using idx_p00 (cost rows=7) (actual rows=0)
  ...
  -> Index Scan using idx_p63 (cost rows=7) (actual rows=0)

E.g. all partitions are being scanned. Of course one partition will 
contain the rows I'm looking for.


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread amul sul
On Wed, Sep 13, 2017 at 7:43 PM, Jesper Pedersen  wrote:

> Hi Amul,
>
> On 09/08/2017 08:40 AM, amul sul wrote:
>
>> Rebased 0002 against this commit & renamed to 0001, PFA.
>>
>>
> This patch needs a rebase.
>
>
Thanks for your note.
​ ​
Attached is the patch rebased on the latest master head.
Also added error on
​creating ​
​d
efault partition
​for the hash partitioned table​
,
and updated document &
​ ​
test script for the same.

​Regards,
Amul​


0001-hash-partitioning_another_design-v19.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-13 Thread Jesper Pedersen

Hi Amul,

On 09/08/2017 08:40 AM, amul sul wrote:

Rebased 0002 against this commit & renamed to 0001, PFA.



This patch needs a rebase.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread amul sul
On Mon, Sep 11, 2017 at 5:30 PM, Alvaro Herrera 
wrote:

> Robert Haas wrote:
> > On Mon, Sep 11, 2017 at 4:17 AM, Ashutosh Bapat
> >  wrote:
> > >> Rebased 0002 against this commit & renamed to 0001, PFA.
> > >
> > > Given that we have default partition support now, I am wondering
> > > whether hash partitioned tables also should have default partitions.
> > > The way we have structured hash partitioning syntax, there can be
> > > "holes" in partitions. Default partition would help plug those holes.
> >
> > Yeah, I was thinking about that, too.  On the one hand, it seems like
> > it's solving the problem the wrong way: if you've set up hash
> > partitioning properly, you shouldn't have any holes.  On the other
> > hand, supporting it probably wouldn't cost anything noticeable and
> > might make things seem more consistent.  I'm not sure which way to
> > jump on this one.
>
> How difficult/tedious/troublesome would be to install the missing
> partitions if you set hash partitioning with a default partition and
> only later on notice that some partitions are missing?  I think if the
> answer is that you need to exclusive-lock something for a long time and
> this causes a disruption in production systems, then it's better not to
> allow a default partition at all and just force all the hash partitions
> to be there from the start.
>
>
I am also leaning toward ​not to support a default partition for a hash
partitioned table.

The major drawback I can see is the constraint get created on the default
partition
table.  IIUC, constraint on the default partition table are just negation
of partition
constraint on all its sibling partitions.

Consider a hash partitioned table having partitions with (modulus 64,
remainder 0) ,
, (modulus 64, remainder 62) hash bound and partition column are col1,
col2,...,so on,
then constraint for the default partition will be :

NOT( (satisfies_hash_partition(64, 0, hash_fn1(col1), hash_fn2(col2), ...)
&& ... &&
  satisfies_hash_partition(64, 62, hash_fn1(col1),hash_fn2(col2), ...))

​Which will be much harmful to the performance than any other partitioning
strategy because it calculate a hash for the same partitioning key multiple
time.
We could overcome this by having an another SQL function (e.g
satisfies_default_hash_partition)
which calculates hash value once and checks the remainder, and that would be
a different path from the current default partition framework.

​Regards,
Amul​


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread Robert Haas
On Mon, Sep 11, 2017 at 8:00 AM, Alvaro Herrera  wrote:
> How difficult/tedious/troublesome would be to install the missing
> partitions if you set hash partitioning with a default partition and
> only later on notice that some partitions are missing?  I think if the
> answer is that you need to exclusive-lock something for a long time and
> this causes a disruption in production systems, then it's better not to
> allow a default partition at all and just force all the hash partitions
> to be there from the start.
>
> On the other hand, if you can get tuples out of the default partition
> into their intended regular partitions without causing any disruption,
> then it seems okay to allow default partitions in hash partitioning
> setups.

I think there's no real use case for default partitioning, and yeah,
you do need exclusive locks to repartition things (whether hash
partitioning or otherwise).  It would be nice to fix that eventually,
but it's hard, because the executor has to cope with the floor moving
under it, and as of today, it really can't cope with that at all - not
because of partitioning specifically, but because of existing design
decisions that will require a lot of work (and probably arguing) to
revisit.

I think the way to get around the usability issues for hash
partitioning is to eventually add some syntax that does things like
(1) automatically create the table with N properly-configured
partitions, (2) automatically split an existing partition into N
pieces, and (3) automatically rewrite the whole table using a
different partition count.

People seem to find the hash partitioning stuff a little arcane.  I
don't want to discount that confusion with some sort of high-handed "I
know better" attitude, I think the interface that users will actually
see can end up being pretty straightforward.  The complexity that is
there in the syntax is to allow pg_upgrade and pg_dump/restore to work
properly.  But users don't necessarily have to use the same syntax
that pg_dump does, just as you can say CREATE INDEX ON a (b) and let
the system specify the index name, but at dump time the index name is
specified explicitly.

> (I, like many others, was unable to follow the default partition stuff
> as closely as I would have liked.)

Uh, sorry about that.  Would it help if I wrote a blog post on it or
something?  The general idea is simple: any tuples that don't route to
any other partition get routed to the default partition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Sep 11, 2017 at 4:17 AM, Ashutosh Bapat
>  wrote:
> >> Rebased 0002 against this commit & renamed to 0001, PFA.
> >
> > Given that we have default partition support now, I am wondering
> > whether hash partitioned tables also should have default partitions.
> > The way we have structured hash partitioning syntax, there can be
> > "holes" in partitions. Default partition would help plug those holes.
> 
> Yeah, I was thinking about that, too.  On the one hand, it seems like
> it's solving the problem the wrong way: if you've set up hash
> partitioning properly, you shouldn't have any holes.  On the other
> hand, supporting it probably wouldn't cost anything noticeable and
> might make things seem more consistent.  I'm not sure which way to
> jump on this one.

How difficult/tedious/troublesome would be to install the missing
partitions if you set hash partitioning with a default partition and
only later on notice that some partitions are missing?  I think if the
answer is that you need to exclusive-lock something for a long time and
this causes a disruption in production systems, then it's better not to
allow a default partition at all and just force all the hash partitions
to be there from the start.

On the other hand, if you can get tuples out of the default partition
into their intended regular partitions without causing any disruption,
then it seems okay to allow default partitions in hash partitioning
setups.

(I, like many others, was unable to follow the default partition stuff
as closely as I would have liked.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread Robert Haas
On Mon, Sep 11, 2017 at 4:17 AM, Ashutosh Bapat
 wrote:
>> Rebased 0002 against this commit & renamed to 0001, PFA.
>
> Given that we have default partition support now, I am wondering
> whether hash partitioned tables also should have default partitions.
> The way we have structured hash partitioning syntax, there can be
> "holes" in partitions. Default partition would help plug those holes.

Yeah, I was thinking about that, too.  On the one hand, it seems like
it's solving the problem the wrong way: if you've set up hash
partitioning properly, you shouldn't have any holes.  On the other
hand, supporting it probably wouldn't cost anything noticeable and
might make things seem more consistent.  I'm not sure which way to
jump on this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread Ashutosh Bapat
On Fri, Sep 8, 2017 at 6:10 PM, amul sul  wrote:
> On Fri, Sep 8, 2017 at 6:45 AM, Robert Haas  wrote:
>>
>> On Mon, Sep 4, 2017 at 6:38 AM, amul sul  wrote:
>> > I've updated patch to use an extended hash function (Commit #
>> > 81c5e46c490e2426db243eada186995da5bb0ba7) for the partitioning.
>>
>> Committed 0001 after noticing that Jeevan Ladhe also found that change
>> convenient for default partitioning.  I made a few minor cleanups;
>> hopefully I didn't break anything.
>
>
> Thanks you.
>
> Rebased 0002 against this commit & renamed to 0001, PFA.

Given that we have default partition support now, I am wondering
whether hash partitioned tables also should have default partitions.
The way we have structured hash partitioning syntax, there can be
"holes" in partitions. Default partition would help plug those holes.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-08 Thread amul sul
On Fri, Sep 8, 2017 at 6:45 AM, Robert Haas  wrote:

> On Mon, Sep 4, 2017 at 6:38 AM, amul sul  wrote:
> > I've updated patch to use an extended hash function (Commit #
> > 81c5e46c490e2426db243eada186995da5bb0ba7) for the partitioning.
>
> Committed 0001 after noticing that Jeevan Ladhe also found that change
> convenient for default partitioning.  I made a few minor cleanups;
> hopefully I didn't break anything.
>

​Thanks you.

Rebased 0002 against this commit & renamed to 0001, PFA.

Regards,
Amul​


0001-hash-partitioning_another_design-v18.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-07 Thread Robert Haas
On Mon, Sep 4, 2017 at 6:38 AM, amul sul  wrote:
> I've updated patch to use an extended hash function (Commit #
> 81c5e46c490e2426db243eada186995da5bb0ba7) for the partitioning.

Committed 0001 after noticing that Jeevan Ladhe also found that change
convenient for default partitioning.  I made a few minor cleanups;
hopefully I didn't break anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-05 Thread Rajkumar Raghuwanshi
On Mon, Sep 4, 2017 at 4:08 PM, amul sul  wrote:

> I've updated patch to use an extended hash function (​Commit #
> 81c5e46c490e2426db243eada186995da5bb0ba7) for the partitioning.
>
> I have done some testing with these patches, everything looks fine,
attaching sql and out file for reference.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


hash_partition_test.out
Description: Binary data
--Testing CREATE TABLE...PARTITION BY HASH syntax
--
--basic syntax --should pass
CREATE TABLE hp_tbl (a int) PARTITION BY HASH (a);
drop table hp_tbl;
--partition with more than one coloumn --should pass
CREATE TABLE hp_tbl (a int, b int) PARTITION BY HASH (a,b);
drop table hp_tbl;
--partition with expression --should pass
CREATE TABLE hp_tbl (a int, b int) PARTITION BY HASH (abs(a));
drop table hp_tbl;
--partition with airthmatic expression --should pass
CREATE TABLE hp_tbl (a int, b int) PARTITION BY HASH ((a+b));
drop table hp_tbl;
--partition with other data type --should pass
CREATE TABLE hp_tbl (a text, b Date) PARTITION BY HASH (a);
drop table hp_tbl;
CREATE TABLE hp_tbl (a text, b Date) PARTITION BY HASH (b);
drop table hp_tbl;

--Testing CREATE TABLE...PARTITION OF syntax
--
CREATE TABLE hp_tbl (a int, b text) PARTITION BY HASH (a);
--basic partition of syntax --should pass
CREATE TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 20, remainder 0);
--trying to attach same partition again --should fail
CREATE TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 20, remainder 0);
--trying to attach with same modulus different remainder --should pass
CREATE TABLE hp_tbl_p2 PARTITION OF hp_tbl FOR VALUES WITH (modulus 20, remainder 1);
--trying to attach with different modulus different remainder --should pass
CREATE TABLE hp_tbl_p3 PARTITION OF hp_tbl FOR VALUES WITH (modulus 40, remainder 2);
--trying to create with a value not factor of previous value --should fail
CREATE TABLE hp_tbl_p4 PARTITION OF hp_tbl FOR VALUES WITH (modulus 45, remainder 0);
-- trying to create with modulus equal to zero --should fail
CREATE TABLE hp_tbl_p4 PARTITION OF hp_tbl FOR VALUES WITH (modulus 0, remainder 1);
-- trying to create with remainder greater or equal than modulus --should fail
CREATE TABLE hp_tbl_p4 PARTITION OF hp_tbl FOR VALUES WITH (modulus 60, remainder 60);
--trying to create like list partition --should fail
CREATE TABLE hp_tbl_p4 PARTITION OF hp_tbl FOR VALUES IN (10);
--trying to create like range partition --should fail
CREATE TABLE hp_tbl_p4 PARTITION OF hp_tbl FOR VALUES FROM (0) TO (10);
DROP TABLE hp_tbl;
--trying to create for list partition --should fail
CREATE TABLE hp_tbl (a int, b text) PARTITION BY LIST (a);
CREATE TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 10, remainder 0);
DROP TABLE hp_tbl;
--trying to create for range partition --should fail
CREATE TABLE hp_tbl (a int, b text) PARTITION BY RANGE (a);
CREATE TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 10, remainder 0);
DROP TABLE hp_tbl;

--check for table description
--
CREATE TABLE hp_tbl (a int, b text) PARTITION BY HASH (a);
\d+ hp_tbl
CREATE TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 20, remainder 0);
\d+ hp_tbl;\d+ hp_tbl_p1
CREATE TABLE hp_tbl_p2 PARTITION OF hp_tbl FOR VALUES WITH (modulus 20, remainder 1);
\d+ hp_tbl;\d+ hp_tbl_p1;\d+ hp_tbl_p2
CREATE TABLE hp_tbl_p3 (like hp_tbl);
ALTER TABLE hp_tbl ATTACH PARTITION hp_tbl_p3 FOR VALUES WITH (modulus 20, remainder 2);
\d+ hp_tbl;\d+ hp_tbl_p1;\d+ hp_tbl_p2;\d+ hp_tbl_p3
ALTER TABLE hp_tbl DETACH PARTITION hp_tbl_p3;
\d+ hp_tbl;\d+ hp_tbl_p1;\d+ hp_tbl_p2
DROP TABLE hp_tbl_p3;
DROP TABLE hp_tbl;

--testing TEMP-NESS of Hash partition
--
--trying to add temp partition to permanent partiton --should pass
CREATE TABLE hp_tbl (a int, b text) PARTITION BY HASH (a);
CREATE TEMP TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 20, remainder 0);
DROP TABLE hp_tbl;
--trying to add permanent partition to temp partiton --should fail
CREATE TEMP TABLE hp_tbl (a int, b text) PARTITION BY HASH (a);
CREATE TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 20, remainder 0);
DROP TABLE hp_tbl;
--trying to add temp partition to temp partiton --should pass
CREATE TEMP TABLE hp_tbl (a int, b text) PARTITION BY HASH (a);
CREATE TEMP TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 20, remainder 0);
DROP TABLE hp_tbl;

--testing with/without oids for Hash partition
--
--when root partition have oid
create table hp_tbl (a int) partition by hash (a) with oids;
--partition can be created with oids --should pass
create table hp_tbl_p1 partition of hp_tbl FOR VALUES WITH (modulus 20, remainder 0) with oids;
\d+ hp_tbl_p1
drop table 

Re: [HACKERS] [POC] hash partitioning

2017-09-04 Thread amul sul
I've updated patch to use an extended hash function (​Commit #
81c5e46c490e2426db243eada186995da5bb0ba7) for the partitioning.

Regards,
Amul


On Thu, Jul 27, 2017 at 5:11 PM, amul sul  wrote:

> Attaching newer patches rebased against the latest master head. Thanks !
>
> Regards,
> Amul
>


0001-Cleanup_v6.patch
Description: Binary data


0002-hash-partitioning_another_design-v17.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-08-29 Thread yangjie












Hi,

This is my patch, before I forgot to add attachments, and the following address is also discussed.
https://www.postgresql.org/message-id/2017082612390093777512%40highgo.com


















---youngHighGo Database: http://www.highgo.com











On 8/28/2017 16:28,Yugo Nagata wrote: 


Hi young,

On Mon, 28 Aug 2017 15:33:46 +0800
"yang...@highgo.com"  wrote:

> Hello
> 
> Looking at your hash partitioning syntax, I implemented a hash partition in a more concise way, with no need to determine the number of sub-tables, and dynamically add partitions.

I think it is great work, but the current consensus about hash-partitioning supports 
Amul's patch[1], in which the syntax is different from the my original proposal. 
So, you will have to read Amul's patch and make a discussion if you still want to
propose your implementation.

Regards,

[1] https://www.postgresql.org/message-id/CAAJ_b965A2oog=6efuhelexl3rmgfssb3g7lwkva1bw0wuj...@mail.gmail.com


> 
> Description
> 
> The hash partition's implement is on the basis of the original range / list partition,and using similar syntax.
> 
> To create a partitioned table ,use:
> 
> CREATE TABLE h (id int) PARTITION BY HASH(id);
> 
> The partitioning key supports only one value, and I think the partition key can support multiple values, 
> which may be difficult to implement when querying, but it is not impossible.
> 
> A partition table can be create as bellow:
> 
>  CREATE TABLE h1 PARTITION OF h;
>  CREATE TABLE h2 PARTITION OF h;
>  CREATE TABLE h3 PARTITION OF h;
>  
> FOR VALUES clause cannot be used, and the partition bound is calclulated automatically as partition index of single integer value.
> 
> An inserted record is stored in a partition whose index equals 
> DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_proc, values[0])) % nparts/* Number of partitions */
> ;
> In the above example, this is DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_proc, id)) % 3;
> 
> postgres=# insert into h select generate_series(1,20);
> INSERT 0 20
> postgres=# select tableoid::regclass,* from h;
>  tableoid | id 
> --+
>  h1   |  3
>  h1   |  5
>  h1   | 17
>  h1   | 19
>  h2   |  2
>  h2   |  6
>  h2   |  7
>  h2   | 11
>  h2   | 12
>  h2   | 14
>  h2   | 15
>  h2   | 18
>  h2   | 20
>  h3   |  1
>  h3   |  4
>  h3   |  8
>  h3   |  9
>  h3   | 10
>  h3   | 13
>  h3   | 16
> (20 rows)
> 
> The number of partitions here can be dynamically added, and if a new partition is created, the number of partitions changes, the calculated target partitions will change, and the same data is not reasonable in different partitions,So you need to re-calculate the existing data and insert the target partition when you create a new partition.
> 
> postgres=# create table h4 partition of h;
> CREATE TABLE
> postgres=# select tableoid::regclass,* from h;
>  tableoid | id 
> --+
>  h1   |  5
>  h1   | 17
>  h1   | 19
>  h1   |  6
>  h1   | 12
>  h1   |  8
>  h1   | 13
>  h2   | 11
>  h2   | 14
>  h3   |  1
>  h3   |  9
>  h3   |  2
>  h3   | 15
>  h4   |  3
>  h4   |  7
>  h4   | 18
>  h4   | 20
>  h4   |  4
>  h4   | 10
>  h4   | 16
> (20 rows)
> 
> When querying the data, the hash partition uses the same algorithm as the insertion, and filters out the table that does not need to be scanned.
> 
> postgres=# explain analyze select * from h where id = 1;
>  QUERY PLAN 
> 
>  Append  (cost=0.00..41.88 rows=13 width=4) (actual time=0.020..0.023 rows=1 loops=1)
>->  Seq Scan on h3  (cost=0.00..41.88 rows=13 width=4) (actual time=0.013..0.016 rows=1 loops=1)
>  Filter: (id = 1)
>  Rows Removed by Filter: 3
>  Planning time: 0.346 ms
>  Execution time: 0.061 ms
> (6 rows)
> 
> postgres=# explain analyze select * from h where id in (1,5);;
>  QUERY PLAN 
> 
>  Append  (cost=0.00..83.75 rows=52 width=4) (actual time=0.016..0.028 rows=2 loops=1)
>->  Seq Scan on h1  (cost=0.00..41.88 rows=26 width=4) (actual time=0.015..0.018 rows=1 loops=1)
>  Filter: (id = ANY ('{1,5}'::integer[]))
>  Rows Removed by Filter: 6
>->  Seq Scan on h3  (cost=0.00..41.88 rows=26 width=4) (actual time=0.005..0.007 rows=1 loops=1)
> 

Re: [HACKERS] [POC] hash partitioning

2017-08-28 Thread yang...@highgo.com
Hello

Looking at your hash partitioning syntax, I implemented a hash partition in a 
more concise way, with no need to determine the number of sub-tables, and 
dynamically add partitions.

Description

The hash partition's implement is on the basis of the original range / list 
partition,and using similar syntax.

To create a partitioned table ,use:

CREATE TABLE h (id int) PARTITION BY HASH(id);

The partitioning key supports only one value, and I think the partition key can 
support multiple values, 
which may be difficult to implement when querying, but it is not impossible.

A partition table can be create as bellow:

 CREATE TABLE h1 PARTITION OF h;
 CREATE TABLE h2 PARTITION OF h;
 CREATE TABLE h3 PARTITION OF h;
 
FOR VALUES clause cannot be used, and the partition bound is calclulated 
automatically as partition index of single integer value.

An inserted record is stored in a partition whose index equals 
DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], 
TYPECACHE_HASH_PROC)->hash_proc, values[0])) % nparts/* Number of partitions */
;
In the above example, this is 
DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], 
TYPECACHE_HASH_PROC)->hash_proc, id)) % 3;

postgres=# insert into h select generate_series(1,20);
INSERT 0 20
postgres=# select tableoid::regclass,* from h;
 tableoid | id 
--+
 h1   |  3
 h1   |  5
 h1   | 17
 h1   | 19
 h2   |  2
 h2   |  6
 h2   |  7
 h2   | 11
 h2   | 12
 h2   | 14
 h2   | 15
 h2   | 18
 h2   | 20
 h3   |  1
 h3   |  4
 h3   |  8
 h3   |  9
 h3   | 10
 h3   | 13
 h3   | 16
(20 rows)

The number of partitions here can be dynamically added, and if a new partition 
is created, the number of partitions changes, the calculated target partitions 
will change, and the same data is not reasonable in different partitions,So you 
need to re-calculate the existing data and insert the target partition when you 
create a new partition.

postgres=# create table h4 partition of h;
CREATE TABLE
postgres=# select tableoid::regclass,* from h;
 tableoid | id 
--+
 h1   |  5
 h1   | 17
 h1   | 19
 h1   |  6
 h1   | 12
 h1   |  8
 h1   | 13
 h2   | 11
 h2   | 14
 h3   |  1
 h3   |  9
 h3   |  2
 h3   | 15
 h4   |  3
 h4   |  7
 h4   | 18
 h4   | 20
 h4   |  4
 h4   | 10
 h4   | 16
(20 rows)

When querying the data, the hash partition uses the same algorithm as the 
insertion, and filters out the table that does not need to be scanned.

postgres=# explain analyze select * from h where id = 1;
 QUERY PLAN 


 Append  (cost=0.00..41.88 rows=13 width=4) (actual time=0.020..0.023 rows=1 
loops=1)
   ->  Seq Scan on h3  (cost=0.00..41.88 rows=13 width=4) (actual 
time=0.013..0.016 rows=1 loops=1)
 Filter: (id = 1)
 Rows Removed by Filter: 3
 Planning time: 0.346 ms
 Execution time: 0.061 ms
(6 rows)

postgres=# explain analyze select * from h where id in (1,5);;
 QUERY PLAN 


 Append  (cost=0.00..83.75 rows=52 width=4) (actual time=0.016..0.028 rows=2 
loops=1)
   ->  Seq Scan on h1  (cost=0.00..41.88 rows=26 width=4) (actual 
time=0.015..0.018 rows=1 loops=1)
 Filter: (id = ANY ('{1,5}'::integer[]))
 Rows Removed by Filter: 6
   ->  Seq Scan on h3  (cost=0.00..41.88 rows=26 width=4) (actual 
time=0.005..0.007 rows=1 loops=1)
 Filter: (id = ANY ('{1,5}'::integer[]))
 Rows Removed by Filter: 3
 Planning time: 0.720 ms
 Execution time: 0.074 ms
(9 rows)

postgres=# explain analyze select * from h where id = 1 or id = 5;;
 QUERY PLAN 


 Append  (cost=0.00..96.50 rows=50 width=4) (actual time=0.017..0.078 rows=2 
loops=1)
   ->  Seq Scan on h1  (cost=0.00..48.25 rows=25 width=4) (actual 
time=0.015..0.019 rows=1 loops=1)
 Filter: ((id = 1) OR (id = 5))
 Rows Removed by Filter: 6
   ->  Seq Scan on h3  (cost=0.00..48.25 rows=25 width=4) (actual 
time=0.005..0.010 rows=1 loops=1)
 Filter: ((id = 1) OR (id = 5))
 Rows Removed by Filter: 3
 Planning time: 0.396 ms
 Execution time: 0.139 ms
(9 rows)

Can not detach / attach / drop partition table.

Best regards,
young


yonj1e.github.io
yang...@highgo.com


Re: [HACKERS] [POC] hash partitioning

2017-08-28 Thread Yugo Nagata
Hi young,

On Mon, 28 Aug 2017 15:33:46 +0800
"yang...@highgo.com"  wrote:

> Hello
> 
> Looking at your hash partitioning syntax, I implemented a hash partition in a 
> more concise way, with no need to determine the number of sub-tables, and 
> dynamically add partitions.

I think it is great work, but the current consensus about hash-partitioning 
supports 
Amul's patch[1], in which the syntax is different from the my original 
proposal. 
So, you will have to read Amul's patch and make a discussion if you still want 
to
propose your implementation.

Regards,

[1] 
https://www.postgresql.org/message-id/CAAJ_b965A2oog=6efuhelexl3rmgfssb3g7lwkva1bw0wuj...@mail.gmail.com


> 
> Description
> 
> The hash partition's implement is on the basis of the original range / list 
> partition,and using similar syntax.
> 
> To create a partitioned table ,use:
> 
> CREATE TABLE h (id int) PARTITION BY HASH(id);
> 
> The partitioning key supports only one value, and I think the partition key 
> can support multiple values, 
> which may be difficult to implement when querying, but it is not impossible.
> 
> A partition table can be create as bellow:
> 
>  CREATE TABLE h1 PARTITION OF h;
>  CREATE TABLE h2 PARTITION OF h;
>  CREATE TABLE h3 PARTITION OF h;
>  
> FOR VALUES clause cannot be used, and the partition bound is calclulated 
> automatically as partition index of single integer value.
> 
> An inserted record is stored in a partition whose index equals 
> DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], 
> TYPECACHE_HASH_PROC)->hash_proc, values[0])) % nparts/* Number of partitions 
> */
> ;
> In the above example, this is 
> DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], 
> TYPECACHE_HASH_PROC)->hash_proc, id)) % 3;
> 
> postgres=# insert into h select generate_series(1,20);
> INSERT 0 20
> postgres=# select tableoid::regclass,* from h;
>  tableoid | id 
> --+
>  h1   |  3
>  h1   |  5
>  h1   | 17
>  h1   | 19
>  h2   |  2
>  h2   |  6
>  h2   |  7
>  h2   | 11
>  h2   | 12
>  h2   | 14
>  h2   | 15
>  h2   | 18
>  h2   | 20
>  h3   |  1
>  h3   |  4
>  h3   |  8
>  h3   |  9
>  h3   | 10
>  h3   | 13
>  h3   | 16
> (20 rows)
> 
> The number of partitions here can be dynamically added, and if a new 
> partition is created, the number of partitions changes, the calculated target 
> partitions will change, and the same data is not reasonable in different 
> partitions,So you need to re-calculate the existing data and insert the 
> target partition when you create a new partition.
> 
> postgres=# create table h4 partition of h;
> CREATE TABLE
> postgres=# select tableoid::regclass,* from h;
>  tableoid | id 
> --+
>  h1   |  5
>  h1   | 17
>  h1   | 19
>  h1   |  6
>  h1   | 12
>  h1   |  8
>  h1   | 13
>  h2   | 11
>  h2   | 14
>  h3   |  1
>  h3   |  9
>  h3   |  2
>  h3   | 15
>  h4   |  3
>  h4   |  7
>  h4   | 18
>  h4   | 20
>  h4   |  4
>  h4   | 10
>  h4   | 16
> (20 rows)
> 
> When querying the data, the hash partition uses the same algorithm as the 
> insertion, and filters out the table that does not need to be scanned.
> 
> postgres=# explain analyze select * from h where id = 1;
>  QUERY PLAN   
>   
> 
>  Append  (cost=0.00..41.88 rows=13 width=4) (actual time=0.020..0.023 rows=1 
> loops=1)
>->  Seq Scan on h3  (cost=0.00..41.88 rows=13 width=4) (actual 
> time=0.013..0.016 rows=1 loops=1)
>  Filter: (id = 1)
>  Rows Removed by Filter: 3
>  Planning time: 0.346 ms
>  Execution time: 0.061 ms
> (6 rows)
> 
> postgres=# explain analyze select * from h where id in (1,5);;
>  QUERY PLAN   
>   
> 
>  Append  (cost=0.00..83.75 rows=52 width=4) (actual time=0.016..0.028 rows=2 
> loops=1)
>->  Seq Scan on h1  (cost=0.00..41.88 rows=26 width=4) (actual 
> time=0.015..0.018 rows=1 loops=1)
>  Filter: (id = ANY ('{1,5}'::integer[]))
>  Rows Removed by Filter: 6
>->  Seq Scan on h3  (cost=0.00..41.88 rows=26 width=4) (actual 
> time=0.005..0.007 rows=1 loops=1)
>  Filter: (id = ANY ('{1,5}'::integer[]))
>  Rows Removed by Filter: 3
>  Planning time: 0.720 ms
>  Execution time: 0.074 ms
> (9 rows)
> 
> postgres=# explain analyze select * from h where id = 1 or id = 5;;
>  QUERY PLAN   
>   
> 

Re: [HACKERS] [POC] hash partitioning

2017-07-27 Thread amul sul
Attaching newer patches rebased against the latest master head. Thanks !

Regards,
Amul


0001-Cleanup_v6.patch
Description: Binary data


0002-hash-partitioning_another_design-v16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-07-05 Thread amul sul
On Wed, Jul 5, 2017 at 4:50 PM, Dilip Kumar  wrote:
> On Mon, Jul 3, 2017 at 4:39 PM, amul sul  wrote:
>> Thanks to catching this, fixed in the attached version.
>
> Few comments on the latest version.
>

Thanks for your review, please find my comment inline:

> 0001 looks fine, for 0002 I have some comments.
>
> 1.
> + hbounds = (PartitionHashBound * *) palloc(nparts *
> +  sizeof(PartitionHashBound *));
>
> /s/(PartitionHashBound * *)/(PartitionHashBound **)/g
>

Fixed in the attached version.

> 2.
> RelationBuildPartitionDesc
> {
>  
>
>
> * catalog scan that retrieved them, whereas that in the latter is
> * defined by canonicalized representation of the list values or the
> * range bounds.
> */
> for (i = 0; i < nparts; i++)
> result->oids[mapping[i]] = oids[i];
>
> Should this comments mention about hash as well?
>

Instead, I have generalised this comment in the attached patch

> 3.
>
> if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
> return false;
>
> if (b1->ndatums != b2->ndatums)
> return false;
>
> If ndatums itself is different then no need to access datum memory, so
> better to check ndatum first.
>

You are correct, we already doing this in the
partition_bounds_equal().   This is a redundant code, removed in the
attached version.

> 4.
> + * next larger modulus.  For example, if you have a bunch
> + * of partitions that all have modulus 5, you can add a
> + * new new partition with modulus 10 or a new partition
>
> Typo, "new new partition"  -> "new partition"
>

Fixed in the attached version.

Regards,
Amul


0001-Cleanup_v6.patch
Description: Binary data


0002-hash-partitioning_another_design-v15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-07-05 Thread Dilip Kumar
On Mon, Jul 3, 2017 at 4:39 PM, amul sul  wrote:
> Thanks to catching this, fixed in the attached version.

Few comments on the latest version.

0001 looks fine, for 0002 I have some comments.

1.
+ hbounds = (PartitionHashBound * *) palloc(nparts *
+  sizeof(PartitionHashBound *));

/s/(PartitionHashBound * *)/(PartitionHashBound **)/g

2.
RelationBuildPartitionDesc
{
 


* catalog scan that retrieved them, whereas that in the latter is
* defined by canonicalized representation of the list values or the
* range bounds.
*/
for (i = 0; i < nparts; i++)
result->oids[mapping[i]] = oids[i];

Should this comments mention about hash as well?

3.

if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
return false;

if (b1->ndatums != b2->ndatums)
return false;

If ndatums itself is different then no need to access datum memory, so
better to check ndatum first.

4.
+ * next larger modulus.  For example, if you have a bunch
+ * of partitions that all have modulus 5, you can add a
+ * new new partition with modulus 10 or a new partition

Typo, "new new partition"  -> "new partition"


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-07-03 Thread amul sul
On Fri, Jun 23, 2017 at 11:19 AM, Yugo Nagata  wrote:
> On Fri, 23 Jun 2017 13:41:15 +0900
> Yugo Nagata  wrote:
>
>> On Tue, 6 Jun 2017 13:03:58 +0530
>> amul sul  wrote:
>>
>>
>> > Updated patch attached.
>>
>> I looked into the latest patch (v13) and have some comments
>> althogh they might be trivial.
>
> One more comment:
>
> +   if (spec->remainder < 0)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> +errmsg("remainder for hash partition must be a 
> non-negative integer")));
>
> The value of remainder is defined as Iconst in gram.y, so it never be 
> negative.
> Hence, I think this check is not necessary or Assert is enough.
>
Make sense, fixed this as well in the v14 patch. Thanks again.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-07-03 Thread amul sul
On Fri, Jun 23, 2017 at 10:11 AM, Yugo Nagata  wrote:
> On Tue, 6 Jun 2017 13:03:58 +0530
> amul sul  wrote:
>
>
>> Updated patch attached.
>
> I looked into the latest patch (v13) and have some comments
> althogh they might be trivial.
>
Thanks for your review.

> First, I couldn't apply this patch to the latest HEAD due to
> a documentation fix and pgintend updates. It needes rebase.
>
> $ git apply /tmp/0002-hash-partitioning_another_design-v13.patch
> error: patch failed: doc/src/sgml/ref/create_table.sgml:87
> error: doc/src/sgml/ref/create_table.sgml: patch does not apply
> error: patch failed: src/backend/catalog/partition.c:76
> error: src/backend/catalog/partition.c: patch does not apply
> error: patch failed: src/backend/commands/tablecmds.c:13371
> error: src/backend/commands/tablecmds.c: patch does not apply
>
Fixed.

>
>
> +   Hash Partitioning
> +
> +   
> +
> + The table is partitioned by specifying modulus and remainder for 
> each
> + partition. Each partition holds rows for which the hash value of
> + partition keys when divided by specified modulus produces specified
> + remainder. For more clarification on modulus and remainder please 
> refer
> + .
> +
> +   
> +  
> +
> +  
> Range Partitioning
>
> I think this section should be inserted after List Partitioning section 
> because
> the order of the descriptions is Range, List, then Hash in other places of
> the documentation. At least,
>
Fixed in the attached version.

>
> -partition bounds.  Currently supported
> -partitioning methods include range and list, where each partition is
> -assigned a range of keys and a list of keys, respectively.
> +partition bounds.  The currently supported
> +partitioning methods are list, range, and hash.
> 
>
> Also in this hunk. I think "The currently supported partitioning methods are
> range, list, and hash." is better. We don't need to change the order of
> the original description.
>
Fixed in the attached version.

>
>
> 
> -Declarative partitioning only supports list and range partitioning,
> -whereas table inheritance allows data to be divided in a manner of
> -the user's choosing.  (Note, however, that if constraint exclusion is
> -unable to prune partitions effectively, query performance will be 
> very
> -poor.)
> +Declarative partitioning only supports hash, list and range
> +partitioning, whereas table inheritance allows data to be divided in 
> a
> +manner of the user's choosing.  (Note, however, that if constraint
> +exclusion is unable to prune partitions effectively, query 
> performance
> +will be very poor.)
>
> Similarly, I think "Declarative partitioning only supports range, list and 
> hash
> partitioning," is better.
>
Fixed in the attached version.

>
> +
> +  
> +   Create a hash partitioned table:
> +
> +CREATE TABLE orders (
> +order_id bigint not null,
> +cust_id  bigint not null,
> +status   text
> +) PARTITION BY HASH (order_id);
> +
> +
>
> This paragraph should be inserted between "Create a list partitioned table:"
> paragraph and "Ceate partition of a range partitioned table:" paragraph
> as well as range and list.
>
Fixed in the attached version.

>
> *strategy = PARTITION_STRATEGY_LIST;
> else if (pg_strcasecmp(partspec->strategy, "range") == 0)
> *strategy = PARTITION_STRATEGY_RANGE;
> +   else if (pg_strcasecmp(partspec->strategy, "hash") == 0)
> +   *strategy = PARTITION_STRATEGY_HASH;
> else
> ereport(ERROR,
>
> In the most of codes, the order is hash, range, then list, but only
> in transformPartitionSpec(), the order is list, range, then hash,
> as above. Maybe it is better to be uniform.
>
Make sense, fixed in the attached version.

>
> +   {
> +   if (strategy == PARTITION_STRATEGY_HASH)
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_UNDEFINED_OBJECT),
> +errmsg("data type %s 
> has no default hash operator class",
> +   
> format_type_be(atttype)),
> +errhint("You must 
> specify a hash operator class or define a default hash operator class for the 
> data type.")));
> +   else
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_UNDEFINED_OBJECT),
> +errmsg("data type %s 
> has no default btree operator class",
> + 

Re: [HACKERS] [POC] hash partitioning

2017-06-22 Thread Yugo Nagata
On Fri, 23 Jun 2017 13:41:15 +0900
Yugo Nagata  wrote:

> On Tue, 6 Jun 2017 13:03:58 +0530
> amul sul  wrote:
> 
> 
> > Updated patch attached.
> 
> I looked into the latest patch (v13) and have some comments
> althogh they might be trivial.

One more comment:

+   if (spec->remainder < 0)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+errmsg("remainder for hash partition must be a 
non-negative integer")));

The value of remainder is defined as Iconst in gram.y, so it never be negative.
Hence, I think this check is not necessary or Assert is enough.

> 
> First, I couldn't apply this patch to the latest HEAD due to
> a documentation fix and pgintend updates. It needes rebase.
> 
> $ git apply /tmp/0002-hash-partitioning_another_design-v13.patch  
> error: patch failed: doc/src/sgml/ref/create_table.sgml:87
> error: doc/src/sgml/ref/create_table.sgml: patch does not apply
> error: patch failed: src/backend/catalog/partition.c:76
> error: src/backend/catalog/partition.c: patch does not apply
> error: patch failed: src/backend/commands/tablecmds.c:13371
> error: src/backend/commands/tablecmds.c: patch does not apply
> 
> 
>
> +   Hash Partitioning
> +
> +   
> +
> + The table is partitioned by specifying modulus and remainder for 
> each
> + partition. Each partition holds rows for which the hash value of
> + partition keys when divided by specified modulus produces specified
> + remainder. For more clarification on modulus and remainder please 
> refer
> + .
> +
> +   
> +  
> +
> +  
> Range Partitioning
> 
> I think this section should be inserted after List Partitioning section 
> because
> the order of the descriptions is Range, List, then Hash in other places of
> the documentation. At least, 
> 
> 
> -partition bounds.  Currently supported
> -partitioning methods include range and list, where each partition is
> -assigned a range of keys and a list of keys, respectively.
> +partition bounds.  The currently supported
> +partitioning methods are list, range, and hash.
> 
> 
> Also in this hunk. I think "The currently supported partitioning methods are
> range, list, and hash." is better. We don't need to change the order of
> the original description.
>  
> 
>
> 
> -Declarative partitioning only supports list and range partitioning,
> -whereas table inheritance allows data to be divided in a manner of
> -the user's choosing.  (Note, however, that if constraint exclusion is
> -unable to prune partitions effectively, query performance will be 
> very
> -poor.)
> +Declarative partitioning only supports hash, list and range
> +partitioning, whereas table inheritance allows data to be divided in 
> a
> +manner of the user's choosing.  (Note, however, that if constraint
> +exclusion is unable to prune partitions effectively, query 
> performance
> +will be very poor.)
> 
> Similarly, I think "Declarative partitioning only supports range, list and 
> hash
> partitioning," is better.
> 
> 
> +
> +  
> +   Create a hash partitioned table:
> +
> +CREATE TABLE orders (
> +order_id bigint not null,
> +cust_id  bigint not null,
> +status   text
> +) PARTITION BY HASH (order_id);
> +
> +
> 
> This paragraph should be inserted between "Create a list partitioned table:"
> paragraph and "Ceate partition of a range partitioned table:" paragraph
> as well as range and list.
> 
> 
>   *strategy = PARTITION_STRATEGY_LIST;
>   else if (pg_strcasecmp(partspec->strategy, "range") == 0)
>   *strategy = PARTITION_STRATEGY_RANGE;
> + else if (pg_strcasecmp(partspec->strategy, "hash") == 0)
> + *strategy = PARTITION_STRATEGY_HASH;
>   else
>   ereport(ERROR,
> 
> In the most of codes, the order is hash, range, then list, but only
> in transformPartitionSpec(), the order is list, range, then hash,
> as above. Maybe it is better to be uniform.
> 
> 
> + {
> + if (strategy == PARTITION_STRATEGY_HASH)
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_UNDEFINED_OBJECT),
> +  errmsg("data type %s 
> has no default hash operator class",
> + 
> format_type_be(atttype)),
> +  errhint("You must 
> specify a hash operator class or define a default hash operator class for the 
> data type.")));
> + else
> + ereport(ERROR,
> + 
> 

Re: [HACKERS] [POC] hash partitioning

2017-06-22 Thread Yugo Nagata
On Tue, 6 Jun 2017 13:03:58 +0530
amul sul  wrote:


> Updated patch attached.

I looked into the latest patch (v13) and have some comments
althogh they might be trivial.

First, I couldn't apply this patch to the latest HEAD due to
a documentation fix and pgintend updates. It needes rebase.

$ git apply /tmp/0002-hash-partitioning_another_design-v13.patch  
error: patch failed: doc/src/sgml/ref/create_table.sgml:87
error: doc/src/sgml/ref/create_table.sgml: patch does not apply
error: patch failed: src/backend/catalog/partition.c:76
error: src/backend/catalog/partition.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:13371
error: src/backend/commands/tablecmds.c: patch does not apply


   
+   Hash Partitioning
+
+   
+
+ The table is partitioned by specifying modulus and remainder for each
+ partition. Each partition holds rows for which the hash value of
+ partition keys when divided by specified modulus produces specified
+ remainder. For more clarification on modulus and remainder please 
refer
+ .
+
+   
+  
+
+  
Range Partitioning

I think this section should be inserted after List Partitioning section because
the order of the descriptions is Range, List, then Hash in other places of
the documentation. At least, 


-partition bounds.  Currently supported
-partitioning methods include range and list, where each partition is
-assigned a range of keys and a list of keys, respectively.
+partition bounds.  The currently supported
+partitioning methods are list, range, and hash.


Also in this hunk. I think "The currently supported partitioning methods are
range, list, and hash." is better. We don't need to change the order of
the original description.
 

   

-Declarative partitioning only supports list and range partitioning,
-whereas table inheritance allows data to be divided in a manner of
-the user's choosing.  (Note, however, that if constraint exclusion is
-unable to prune partitions effectively, query performance will be very
-poor.)
+Declarative partitioning only supports hash, list and range
+partitioning, whereas table inheritance allows data to be divided in a
+manner of the user's choosing.  (Note, however, that if constraint
+exclusion is unable to prune partitions effectively, query performance
+will be very poor.)

Similarly, I think "Declarative partitioning only supports range, list and hash
partitioning," is better.


+
+  
+   Create a hash partitioned table:
+
+CREATE TABLE orders (
+order_id bigint not null,
+cust_id  bigint not null,
+status   text
+) PARTITION BY HASH (order_id);
+
+

This paragraph should be inserted between "Create a list partitioned table:"
paragraph and "Ceate partition of a range partitioned table:" paragraph
as well as range and list.


*strategy = PARTITION_STRATEGY_LIST;
else if (pg_strcasecmp(partspec->strategy, "range") == 0)
*strategy = PARTITION_STRATEGY_RANGE;
+   else if (pg_strcasecmp(partspec->strategy, "hash") == 0)
+   *strategy = PARTITION_STRATEGY_HASH;
else
ereport(ERROR,

In the most of codes, the order is hash, range, then list, but only
in transformPartitionSpec(), the order is list, range, then hash,
as above. Maybe it is better to be uniform.


+   {
+   if (strategy == PARTITION_STRATEGY_HASH)
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("data type %s 
has no default hash operator class",
+   
format_type_be(atttype)),
+errhint("You must 
specify a hash operator class or define a default hash operator class for the 
data type.")));
+   else
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("data type %s 
has no default btree operator class",
+   
format_type_be(atttype)),
+errhint("You must 
specify a btree operator class or define a default btree operator class for the 
data type.")));
+
+


   atttype,
-   
   "btree",
- 

Re: [HACKERS] [POC] hash partitioning

2017-06-06 Thread Dilip Kumar
On Tue, Jun 6, 2017 at 2:41 PM, Amit Langote
 wrote:
> Consider an example using the partition hierarchy:
>
> root (a int, b char, c int) partition by range (a)
>
>  -> level1 from (1) to (10) partition by list (b)
>
>  -> level2 in ('a') parition by range (c)
>
>  -> leaf from (1) to (10)
>
> Inserting (1, 'b', 1) into level1 will fail, because tuple can't be routed
> at level1 (no partition defined for b = 'b').
>
> Inserting (1, 'a', 10) into level1 will fail, because tuple can't be
> routed at level2 (no partition defined for c >= 10).
>
> Inserting (10, 'a', 1) into level1 will fail, because, although it was
> able to get through level1 and level2 into leaf, a = 10 falls out of
> level1's defined range.  We don't check that 1 <= a < 10 before starting
> the tuple-routing.
>
> I wonder if we should...  Since we don't allow BR triggers on partitioned
> tables, there should not be any harm in doing it just before calling
> ExecFindPartition().  Perhaps, topic for a new thread.

Yeah, correct.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-06-06 Thread Amit Langote
On 2017/06/06 17:50, Dilip Kumar wrote:
> On Tue, Jun 6, 2017 at 1:03 PM, amul sul  wrote:
>> May I ask you, how you sure about 8 is an unfit value for t1 relation?
>> And what if the value other than 8, for e.g. 7?
> 
> Well, First I created t1 as a leaf relation like below, and I tested
> insert into t1 with value 8 and it was violating the partition
> constraint of t1, however, 7 was fine.
> 
> create table t (a int) partition by hash(a);
> create table t1 partition of t for values with (modulus 2, remainder 1);
> 
> Later I dropped this t1 and created 2 level partition with the leaf as a 
> range.
> 
> drop table t1;
> create table t1 partition of t for values with (modulus 2, remainder
> 1) partition by range(a);
> create table t1_1 partition of t1 for values from (8) to (10);
> 
> So now, I am sure that t1_1 can accept the value 8 and its parent t1 can't.
> 
> So I think this can only happen in the case of partitioned by hash
> that a value is legal for the child but illegal for the parent?  Isn't
> it a good idea that if a user is inserting in the top level relation
> he should know for which partition exactly the constraint got
> violated?

It's how the original partitioning code around ExecInsert/CopyFrom works,
not something that only affects hash partitioning.  So, I think that
Amul's patch is fine and if we want to change something here, it should be
done by an independent patch.  See the explanation below:

If we insert into a partition directly, we must check its partition
constraint.  If the partition happens to be itself a partitioned table,
the constraint will be checked *after* tuple-routing and ExecConstraints()
is passed the leaf partition's ResultRelInfo, so if an error occurs there
we will use the leaf partition's name in the message.  Since we combine
the leaf partition's own constraint with all of the ancestors' into a
single expression that is passed to ExecCheck(), it is hard to say exactly
which ancestor's constraint is violated.  However, if the partition
constraint of some intervening ancestor had been violated, we wouldn't be
in ExecConstraints() at all; tuple-routing itself would have failed.  So
it seems that we need worry (if at all) only about partition constraints
of the table mentioned in the insert statement.

Consider an example using the partition hierarchy:

root (a int, b char, c int) partition by range (a)

 -> level1 from (1) to (10) partition by list (b)

 -> level2 in ('a') parition by range (c)

 -> leaf from (1) to (10)

Inserting (1, 'b', 1) into level1 will fail, because tuple can't be routed
at level1 (no partition defined for b = 'b').

Inserting (1, 'a', 10) into level1 will fail, because tuple can't be
routed at level2 (no partition defined for c >= 10).

Inserting (10, 'a', 1) into level1 will fail, because, although it was
able to get through level1 and level2 into leaf, a = 10 falls out of
level1's defined range.  We don't check that 1 <= a < 10 before starting
the tuple-routing.

I wonder if we should...  Since we don't allow BR triggers on partitioned
tables, there should not be any harm in doing it just before calling
ExecFindPartition().  Perhaps, topic for a new thread.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-06-06 Thread Dilip Kumar
On Tue, Jun 6, 2017 at 1:03 PM, amul sul  wrote:
> May I ask you, how you sure about 8 is an unfit value for t1 relation?
> And what if the value other than 8, for e.g. 7?

Well, First I created t1 as a leaf relation like below, and I tested
insert into t1 with value 8 and it was violating the partition
constraint of t1, however, 7 was fine.

create table t (a int) partition by hash(a);
create table t1 partition of t for values with (modulus 2, remainder 1);

Later I dropped this t1 and created 2 level partition with the leaf as a range.

drop table t1;
create table t1 partition of t for values with (modulus 2, remainder
1) partition by range(a);
create table t1_1 partition of t1 for values from (8) to (10);

So now, I am sure that t1_1 can accept the value 8 and its parent t1 can't.

So I think this can only happen in the case of partitioned by hash
that a value is legal for the child but illegal for the parent?  Isn't
it a good idea that if a user is inserting in the top level relation
he should know for which partition exactly the constraint got
violated?

>
> Updated patch attached.

Thanks.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-06-06 Thread amul sul
Hi Dilip,

Thanks for review.

On Sat, Jun 3, 2017 at 6:54 PM, Dilip Kumar  wrote:
> On Thu, May 25, 2017 at 9:59 AM, amul sul  wrote:
>> On Mon, May 22, 2017 at 2:23 PM, amul sul  wrote:
>>>
>>> Updated patch attached. Thanks a lot for review.
>>>
>> Minor fix in the document, PFA.
>
> Patch need rebase
>

Done.

> ---
> Function header is not consistent with other neighbouring functions
> (some function contains function name in the header but others don't)
> +/*
> + * Compute the hash value for given not null partition key values.
> + */
>
Done.

> --
> postgres=# create table t1 partition of t for values with (modulus 2,
> remainder 1) partition by range(a);
> CREATE TABLE
> postgres=# create table t1_1 partition of t1 for values from (8) to (10);
> CREATE TABLE
> postgres=# insert into t1 values(8);
> 2017-06-03 18:41:46.067 IST [5433] ERROR:  new row for relation "t1_1"
> violates partition constraint
> 2017-06-03 18:41:46.067 IST [5433] DETAIL:  Failing row contains (8).
> 2017-06-03 18:41:46.067 IST [5433] STATEMENT:  insert into t1 values(8);
> ERROR:  new row for relation "t1_1" violates partition constraint
> DETAIL:  Failing row contains (8).
>
> The value 8 is violating the partition constraint of the t1 and we are
> trying to insert to value in t1,
> still, the error is coming from the leaf level table t1_1, that may be
> fine but from error, it appears that
> it's violating the constraint of t1_1 whereas it's actually violating
> the constraint of t1.
>
> From Implementation, it appears that based on the key are identifying
> the leaf partition and it's only failing during ExecInsert while
> checking the partition constraint.
>
May I ask you, how you sure about 8 is an unfit value for t1 relation?
And what if the value other than 8, for e.g. 7?

Updated patch attached.

Regards,
Amul Sul


0001-Cleanup_v5.patch
Description: Binary data


0002-hash-partitioning_another_design-v13.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-06-03 Thread Dilip Kumar
On Thu, May 25, 2017 at 9:59 AM, amul sul  wrote:
> On Mon, May 22, 2017 at 2:23 PM, amul sul  wrote:
>>
>> Updated patch attached. Thanks a lot for review.
>>
> Minor fix in the document, PFA.

Patch need rebase

---
Function header is not consistent with other neighbouring functions
(some function contains function name in the header but others don't)
+/*
+ * Compute the hash value for given not null partition key values.
+ */

--
postgres=# create table t1 partition of t for values with (modulus 2,
remainder 1) partition by range(a);
CREATE TABLE
postgres=# create table t1_1 partition of t1 for values from (8) to (10);
CREATE TABLE
postgres=# insert into t1 values(8);
2017-06-03 18:41:46.067 IST [5433] ERROR:  new row for relation "t1_1"
violates partition constraint
2017-06-03 18:41:46.067 IST [5433] DETAIL:  Failing row contains (8).
2017-06-03 18:41:46.067 IST [5433] STATEMENT:  insert into t1 values(8);
ERROR:  new row for relation "t1_1" violates partition constraint
DETAIL:  Failing row contains (8).

The value 8 is violating the partition constraint of the t1 and we are
trying to insert to value in t1,
still, the error is coming from the leaf level table t1_1, that may be
fine but from error, it appears that
it's violating the constraint of t1_1 whereas it's actually violating
the constraint of t1.

>From Implementation, it appears that based on the key are identifying
the leaf partition and it's only failing during ExecInsert while
checking the partition constraint.

Other than that, patch looks fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-25 Thread Robert Haas
On Mon, May 22, 2017 at 1:49 AM, Ashutosh Bapat
 wrote:
> The prologue is arranged as one paragraph (with a new line) per
> member. Within each paragraph explanation for each partitioning
> strategy starts on its own line. One paragraph per member is more
> readable than separate paragraphs for each member and strategy.

The point is that you can either make it a separate paragraph or you
can make it a single paragraph, but you can't leave it halfway in
between.  If it's one paragraph, every line should end at around the
80 character mark, without any short lines.  If it's multiple
paragraphs, they should be separated by blank lines.  The only line of
a paragraph that can be short is the last one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-24 Thread amul sul
On Mon, May 22, 2017 at 2:23 PM, amul sul  wrote:
>
> Updated patch attached. Thanks a lot for review.
>
Minor fix in the document, PFA.

Regards,
Amul


0002-hash-partitioning_another_design-v12.patch
Description: Binary data


0001-Cleanup_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-22 Thread amul sul
On Fri, May 19, 2017 at 10:35 PM, Robert Haas  wrote:
> On Fri, May 19, 2017 at 5:32 AM, amul sul  wrote:
>> Updated patch attached.  0001-patch rebased against latest head.
>> 0002-patch also incorporates code comments and error message changes
>> as per Robert's & your suggestions. Thanks !
>
> -if (spec->strategy != PARTITION_STRATEGY_LIST)
> -elog(ERROR, "invalid strategy in partition bound spec");
> +Assert(spec->strategy == PARTITION_STRATEGY_LIST);
>
> Let's just drop these hunks.  I realize this is a response to a review
> comment I made, but I take it back.  If the existing code is already
> doing it this way, there's no real need to revise it.  The patch
> doesn't even make it consistent anyway, since elsewhere you elog() for
> a similar case.  Perhaps elog() is best anyway.
>
Done.

> -partitioning methods include range and list, where each partition is
> -assigned a range of keys and a list of keys, respectively.
> +partitioning methods include hash, range and list, where each partition 
> is
> +assigned a modulus and remainder of keys, a range of keys and a list of
> +keys, respectively.
>
> I think this sentence has become too long and unwieldy, and is more
> unclear than helpful.  I'd just write "The currently supported
> partitioning methods are list, range, and hash."  The use of the word
> include is actually wrong here, because it implies that there are more
> not mentioned here, which is false.
>
Done.

> -  expression.  If no btree operator class is specified when creating a
> -  partitioned table, the default btree operator class for the datatype 
> will
> -  be used.  If there is none, an error will be reported.
> +  expression.  List and range partitioning uses only btree operator 
> class.
> +  Hash partitioning uses only hash operator class. If no operator class 
> is
> +  specified when creating a partitioned table, the default operator class
> +  for the datatype will be used.  If there is none, an error will be
> +  reported.
> + 
>
> I suggest: If no operator class is specified when creating a
> partitioned table, the default operator class of the appropriate type
> (btree for list and range partitioning, hash for hash partitioning)
> will be used.  If there is none, an error will be reported.
>
Done.

> + 
> +  Since hash partitiong operator class, provide only equality,
> not ordering,
> +  collation is not relevant in hash partition key column. An error will 
> be
> +  reported if collation is specified.
>
> partitiong -> partitioning.  Also, remove the comma after "operator
> class" and change "not relevant in hash partition key column" to "not
> relevant for hash partitioning".  Also change "if collation is
> specified" to "if a collation is specified".
>
Done.

> +   Create a hash partitioned table:
> +
> +CREATE TABLE orders (
> +order_id bigint not null,
> +cust_id  bigint not null,
> +status   text
> +) PARTITION BY HASH (order_id);
> +
>
> Move this down so it's just above the example of creating partitions.
>
Done.

> + * For range and list partitioned tables, datums is an array of datum-tuples
> + * with key->partnatts datums each.
> + * For hash partitioned tables, it is an array of datum-tuples with 2 datums,
> + * modulus and remainder, corresponding to a given partition.
>
> Second line is very short; reflow as one paragraph.
>
Done

>   * In case of range partitioning, it stores one entry per distinct range
>   * datum, which is the index of the partition for which a given datum
>   * is an upper bound.
> + * In the case of hash partitioning, the number of the entries in the indexes
> + * array is same as the greatest modulus amongst all partitions. For a given
> + * partition key datum-tuple, the index of the partition which would
> accept that
> + * datum-tuple would be given by the entry pointed by remainder produced when
> + * hash value of the datum-tuple is divided by the greatest modulus.
>
> Insert line break before the new text as a paragraph break.

Will wait for more inputs on Ashutosh's explanation upthread.

>
> +charstrategy;/* hash, list or range bounds? */
>
> Might be clearer to just write /* hash, list, or range? */ or /*
> bounds for hash, list, or range? */
>

Done, used "hash, list, or range?"

>
> +static uint32 compute_hash_value(PartitionKey key, Datum *values,
> bool *isnull);
> +
>
> I think there should be a blank line after this but not before it.
>

Done.

> I don't really see why hash partitioning needs to touch
> partition_bounds_equal() at all.  Why can't the existing logic work
> for hash partitioning without change?
>

Unlike list and range partition, ndatums does not represents size of
the indexes array, also dimension of datums  array in not the same as
a key->partnatts.

> +valid_bound 

Re: [HACKERS] [POC] hash partitioning

2017-05-21 Thread Ashutosh Bapat
On Fri, May 19, 2017 at 10:35 PM, Robert Haas  wrote:
>
> + * For range and list partitioned tables, datums is an array of datum-tuples
> + * with key->partnatts datums each.
> + * For hash partitioned tables, it is an array of datum-tuples with 2 datums,
> + * modulus and remainder, corresponding to a given partition.
>
> Second line is very short; reflow as one paragraph.
>
>   * In case of range partitioning, it stores one entry per distinct range
>   * datum, which is the index of the partition for which a given datum
>   * is an upper bound.
> + * In the case of hash partitioning, the number of the entries in the indexes
> + * array is same as the greatest modulus amongst all partitions. For a given
> + * partition key datum-tuple, the index of the partition which would
> accept that
> + * datum-tuple would be given by the entry pointed by remainder produced when
> + * hash value of the datum-tuple is divided by the greatest modulus.
>
> Insert line break before the new text as a paragraph break.

The prologue is arranged as one paragraph (with a new line) per
member. Within each paragraph explanation for each partitioning
strategy starts on its own line. One paragraph per member is more
readable than separate paragraphs for each member and strategy.

>
> I don't really see why hash partitioning needs to touch
> partition_bounds_equal() at all.  Why can't the existing logic work
> for hash partitioning without change?

Right now, it compares partnatts datums values for list and range. For
hash it requires to compare 2 datums remainder and modulus. So, the
difference?
Further, I suggested that we use the fact that equality of indexes
array implies equality of bounds for hash partitioning.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-19 Thread Robert Haas
On Fri, May 19, 2017 at 5:32 AM, amul sul  wrote:
> Updated patch attached.  0001-patch rebased against latest head.
> 0002-patch also incorporates code comments and error message changes
> as per Robert's & your suggestions. Thanks !

-if (spec->strategy != PARTITION_STRATEGY_LIST)
-elog(ERROR, "invalid strategy in partition bound spec");
+Assert(spec->strategy == PARTITION_STRATEGY_LIST);

Let's just drop these hunks.  I realize this is a response to a review
comment I made, but I take it back.  If the existing code is already
doing it this way, there's no real need to revise it.  The patch
doesn't even make it consistent anyway, since elsewhere you elog() for
a similar case.  Perhaps elog() is best anyway.

-partitioning methods include range and list, where each partition is
-assigned a range of keys and a list of keys, respectively.
+partitioning methods include hash, range and list, where each partition is
+assigned a modulus and remainder of keys, a range of keys and a list of
+keys, respectively.

I think this sentence has become too long and unwieldy, and is more
unclear than helpful.  I'd just write "The currently supported
partitioning methods are list, range, and hash."  The use of the word
include is actually wrong here, because it implies that there are more
not mentioned here, which is false.

-  expression.  If no btree operator class is specified when creating a
-  partitioned table, the default btree operator class for the datatype will
-  be used.  If there is none, an error will be reported.
+  expression.  List and range partitioning uses only btree operator class.
+  Hash partitioning uses only hash operator class. If no operator class is
+  specified when creating a partitioned table, the default operator class
+  for the datatype will be used.  If there is none, an error will be
+  reported.
+ 

I suggest: If no operator class is specified when creating a
partitioned table, the default operator class of the appropriate type
(btree for list and range partitioning, hash for hash partitioning)
will be used.  If there is none, an error will be reported.

+ 
+  Since hash partitiong operator class, provide only equality,
not ordering,
+  collation is not relevant in hash partition key column. An error will be
+  reported if collation is specified.

partitiong -> partitioning.  Also, remove the comma after "operator
class" and change "not relevant in hash partition key column" to "not
relevant for hash partitioning".  Also change "if collation is
specified" to "if a collation is specified".

+   Create a hash partitioned table:
+
+CREATE TABLE orders (
+order_id bigint not null,
+cust_id  bigint not null,
+status   text
+) PARTITION BY HASH (order_id);
+

Move this down so it's just above the example of creating partitions.

+ * For range and list partitioned tables, datums is an array of datum-tuples
+ * with key->partnatts datums each.
+ * For hash partitioned tables, it is an array of datum-tuples with 2 datums,
+ * modulus and remainder, corresponding to a given partition.

Second line is very short; reflow as one paragraph.

  * In case of range partitioning, it stores one entry per distinct range
  * datum, which is the index of the partition for which a given datum
  * is an upper bound.
+ * In the case of hash partitioning, the number of the entries in the indexes
+ * array is same as the greatest modulus amongst all partitions. For a given
+ * partition key datum-tuple, the index of the partition which would
accept that
+ * datum-tuple would be given by the entry pointed by remainder produced when
+ * hash value of the datum-tuple is divided by the greatest modulus.

Insert line break before the new text as a paragraph break.

+charstrategy;/* hash, list or range bounds? */

Might be clearer to just write /* hash, list, or range? */ or /*
bounds for hash, list, or range? */


+static uint32 compute_hash_value(PartitionKey key, Datum *values,
bool *isnull);
+

I think there should be a blank line after this but not before it.

I don't really see why hash partitioning needs to touch
partition_bounds_equal() at all.  Why can't the existing logic work
for hash partitioning without change?

+valid_bound = true;

valid_modulus, maybe?

-   errmsg("data type %s has no default btree operator class",
-  format_type_be(atttype)),
- errhint("You must specify a btree operator
class or define a default btree operator class for the data type.")));
+  errmsg("data type %s has no default %s operator class",
+ format_type_be(atttype), am_method),
+ errhint("You must specify a %s operator
class or define a default %s operator class for the data 

Re: [HACKERS] [POC] hash partitioning

2017-05-19 Thread amul sul
On Wed, May 17, 2017 at 6:54 PM, Ashutosh Bapat
 wrote:
[...]

>
> Comments on the tests
> +#ifdef USE_ASSERT_CHECKING
> +{
> +/*
> + * Hash partition bound stores modulus and remainder at
> + * b1->datums[i][0] and b1->datums[i][1] position respectively.
> + */
> +for (i = 0; i < b1->ndatums; i++)
> +Assert((b1->datums[i][0] == b2->datums[i][0] &&
> +b1->datums[i][1] == b2->datums[i][1]));
> +}
> +#endif
> Why do we need extra {} here?
>

Okay, removed in the attached version.

> Comments on testcases
> +CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH
> (modulus 8, remainder 0);
> +CREATE TABLE fail_part (LIKE hpart_1 INCLUDING CONSTRAINTS);
> +ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH
> (modulus 4, remainder 0);
> Probably you should also test the other-way round case i.e. create modulus 4,
> remainder 0 partition and then try to add partitions with modulus 8, remainder
> 4 and modulus 8, remainder 0. That should fail.
>

Fixed.

> Why to create two tables hash_parted and hash_parted2, you should be able to
> test with only a single table.
>

Fixed.

> +INSERT INTO hpart_2 VALUES (3, 'a');
> +DELETE FROM hpart_2;
> +INSERT INTO hpart_5_a (a, b) VALUES (6, 'a');
> This is slightly tricky. On different platforms the row may map to different
> partitions depending upon how the values are hashed. So, this test may not be
> portable on all the platforms. Probably you should add such testcases with a
> custom hash operator class which is identity function as suggested by Robert.
> This also applies to the tests in insert.sql and update.sql for partitioned
> table without custom opclass.
>

Yes, you are correct. Fixed in the attached version.

> +-- delete the faulting row and also add a constraint to skip the scan
> +ALTER TABLE hpart_5 ADD CONSTRAINT hcheck_a CHECK (a IN (5)), ALTER a
> SET NOT NULL;
> The constraint is not same as the implicit constraint added for that 
> partition.
> I am not sure whether it's really going to avoid the scan. Did you verify it?
> If yes, then how?
>

I haven't tested that, may be I've copied blindly, sorry about that.
I don't think this test is needed again for hash partitioning, so removed.

> +ALTER TABLE hash_parted2 ATTACH PARTITION fail_part FOR VALUES WITH
> (modulus 3, remainder 2);
> +ERROR:  every hash partition modulus must be a factor of the next
> larger modulus
> We should add this test with at least two partitions in there so that we can
> check lower and upper modulus. Also, testing with some interesting
> bounds discussed earlier
> in this mail e.g. adding modulus 15 when 5, 10, 60 exist will be better than
> testing with 3, 4 and 8.
>
Similar test do exists in create_table.sql file.

> +ERROR:  cannot use collation for hash partition key column "a"
> This seems to indicate that we can not specify collation for hash partition 
> key
> column, which isn't true. Column a here can have its collation. What's not
> allowed is specifying collation in PARTITION BY clause.
> May be reword the error as "cannot use collation for hash partitioning". or
> plain "cannot use collation in PARTITION BY clause for hash partitioning".
>
> +ERROR:  invalid bound specification for a list partition
> +LINE 1: CREATE TABLE fail_part PARTITION OF list_parted FOR VALUES W...
> +^
> Should the location for this error be that of WITH clause like in case of 
> range
> and list partitioned table.
>

Fixed.

> +select tableoid::regclass as part, a from hash_parted order by part;
> May be add a % 4 to show clearly that the data really goes to the partitioning
> with that remainder.
>

Fixed.

Updated patch attached.  0001-patch rebased against latest head.
0002-patch also incorporates code comments and error message changes
as per Robert's & your suggestions. Thanks !

Regards,
Amul


0001-Cleanup_v3.patch
Description: Binary data


0002-hash-partitioning_another_design-v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-18 Thread Amit Langote
On 2017/05/19 1:09, Dilip Kumar wrote:
> On Wed, May 17, 2017 at 2:07 PM, amul sul  wrote:
>>> I would suggest "non-zero positive", since that's what we are using in
>>> the documentation.
>>>
>>
>> Understood, Fixed in the attached version.
> 
> Why non-zero positive?  We do support zero for the remainder right?

Using "non-negative integers" (for remainders) was suggested upthread.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-18 Thread Dilip Kumar
On Wed, May 17, 2017 at 2:07 PM, amul sul  wrote:
>> I would suggest "non-zero positive", since that's what we are using in
>> the documentation.
>>
>
> Understood, Fixed in the attached version.

Why non-zero positive?  We do support zero for the remainder right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-17 Thread Ashutosh Bapat
On Wed, May 17, 2017 at 11:51 PM, Robert Haas  wrote:
> On Wed, May 17, 2017 at 1:41 AM, Ashutosh Bapat
>  wrote:
>>> Fixed in the attached version;  used "hash partition remainder must be
>>> greater than or equal to 0" instead.
>>
>> I would suggest "non-zero positive", since that's what we are using in
>> the documentation.
>
> Well, that's not very good terminology, because zero is not a positive
> number.  Existing error messages seem to use phrasing such as "THING
> must be a positive integer" when zero is not allowed or "THING must be
> a non-negative integer" when zero is allowed.  For examples, do git
> grep errmsg.*positive or git grep errmsg.*negative.

Ok. We need to change all the usages in the documentation and in the
comments to non-negative. The point is to use same phrases
consistently.

>
>> In partition_bounds_equal(), please add comments explaining why is it safe to
>> check just the indexes? May be we should add code under assertion to make 
>> sure
>> that the datums are equal as well. The comment could be something
>> like, "If two partitioned tables have different greatest moduli, their
>> partition schemes don't match. If they have same greatest moduli, and
>> all remainders have different indexes, they all have same modulus
>> specified and the partitions are ordered by remainders, thus indexes
>> array will be an identity i.e. index[i] = i. If the partition
>> corresponding to a given remainder exists, it will have same index
>> entry for both partitioned tables or if it's missing it will be -1.
>> Thus if indexes array matches, corresponding datums array matches. If
>> there are multiple remainders corresponding to a given partition,
>> their partitions are ordered by the lowest of the remainders, thus if
>> indexes array matches, both of the tables have same indexes arrays, in
>> both the tables remainders corresponding to multiple partitions all
>> have same indexes and thus same modulus. Thus again if the indexes are
>> same, datums are same.".
>
> That seems quite long.

I have shared a patch containing a denser explanation with my last set
of comments.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-17 Thread Robert Haas
On Wed, May 17, 2017 at 1:41 AM, Ashutosh Bapat
 wrote:
>> Fixed in the attached version;  used "hash partition remainder must be
>> greater than or equal to 0" instead.
>
> I would suggest "non-zero positive", since that's what we are using in
> the documentation.

Well, that's not very good terminology, because zero is not a positive
number.  Existing error messages seem to use phrasing such as "THING
must be a positive integer" when zero is not allowed or "THING must be
a non-negative integer" when zero is allowed.  For examples, do git
grep errmsg.*positive or git grep errmsg.*negative.

> In partition_bounds_equal(), please add comments explaining why is it safe to
> check just the indexes? May be we should add code under assertion to make sure
> that the datums are equal as well. The comment could be something
> like, "If two partitioned tables have different greatest moduli, their
> partition schemes don't match. If they have same greatest moduli, and
> all remainders have different indexes, they all have same modulus
> specified and the partitions are ordered by remainders, thus indexes
> array will be an identity i.e. index[i] = i. If the partition
> corresponding to a given remainder exists, it will have same index
> entry for both partitioned tables or if it's missing it will be -1.
> Thus if indexes array matches, corresponding datums array matches. If
> there are multiple remainders corresponding to a given partition,
> their partitions are ordered by the lowest of the remainders, thus if
> indexes array matches, both of the tables have same indexes arrays, in
> both the tables remainders corresponding to multiple partitions all
> have same indexes and thus same modulus. Thus again if the indexes are
> same, datums are same.".

That seems quite long.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-17 Thread Ashutosh Bapat
On Wed, May 17, 2017 at 2:07 PM, amul sul  wrote:

>
>> In partition_bounds_equal(), please add comments explaining why is it safe to
>> check just the indexes? May be we should add code under assertion to make 
>> sure
>> that the datums are equal as well.
>
> Added assert in the attached version.
>
>> The comment could be something
>> like, "If two partitioned tables have different greatest moduli, their
>> partition schemes don't match. If they have same greatest moduli, and
>> all remainders have different indexes, they all have same modulus
>> specified and the partitions are ordered by remainders, thus indexes
>> array will be an identity i.e. index[i] = i. If the partition
>> corresponding to a given remainder exists, it will have same index
>> entry for both partitioned tables or if it's missing it will be -1.
>> Thus if indexes array matches, corresponding datums array matches. If
>> there are multiple remainders corresponding to a given partition,
>> their partitions are ordered by the lowest of the remainders, thus if
>> indexes array matches, both of the tables have same indexes arrays, in
>> both the tables remainders corresponding to multiple partitions all
>> have same indexes and thus same modulus. Thus again if the indexes are
>> same, datums are same.".
>>
>
> Thanks, added with minor modification.

I have reworded this slightly better. See the attached patch as diff of 0002.

>
>> In the same function
>> if (key->strategy == PARTITION_STRATEGY_HASH)
>> {
>> intgreatest_modulus;
>>
>> /*
>>  * Compare greatest modulus of hash partition bound which
>>  * is the last element of datums array.
>>  */
>> if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
>> return false;
>>
>> /* Compare indexes */
>> greatest_modulus = DatumGetInt32(b1->datums[b1->ndatums - 1][0]);
>> for (i = 0; i < greatest_modulus; i++)
>> if (b1->indexes[i] != b2->indexes[i])
>> return false;
>> }
>> if we return true from where this block ends, we will save one indenation 
>> level
>> for rest of the code and also FWIW extra diffs in this patch because of this
>> indentation change.
>>
>
> I still do believe having this code in the IF - ELSE block will be
> better for longterm, rather having code clutter to avoid diff that
> unpleasant for now.

Ok, I will leave it to the committer to judge.


Comments on the tests
+#ifdef USE_ASSERT_CHECKING
+{
+/*
+ * Hash partition bound stores modulus and remainder at
+ * b1->datums[i][0] and b1->datums[i][1] position respectively.
+ */
+for (i = 0; i < b1->ndatums; i++)
+Assert((b1->datums[i][0] == b2->datums[i][0] &&
+b1->datums[i][1] == b2->datums[i][1]));
+}
+#endif
Why do we need extra {} here?

Comments on testcases
+CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH
(modulus 8, remainder 0);
+CREATE TABLE fail_part (LIKE hpart_1 INCLUDING CONSTRAINTS);
+ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH
(modulus 4, remainder 0);
Probably you should also test the other-way round case i.e. create modulus 4,
remainder 0 partition and then try to add partitions with modulus 8, remainder
4 and modulus 8, remainder 0. That should fail.

Why to create two tables hash_parted and hash_parted2, you should be able to
test with only a single table.

+INSERT INTO hpart_2 VALUES (3, 'a');
+DELETE FROM hpart_2;
+INSERT INTO hpart_5_a (a, b) VALUES (6, 'a');
This is slightly tricky. On different platforms the row may map to different
partitions depending upon how the values are hashed. So, this test may not be
portable on all the platforms. Probably you should add such testcases with a
custom hash operator class which is identity function as suggested by Robert.
This also applies to the tests in insert.sql and update.sql for partitioned
table without custom opclass.

+-- delete the faulting row and also add a constraint to skip the scan
+ALTER TABLE hpart_5 ADD CONSTRAINT hcheck_a CHECK (a IN (5)), ALTER a
SET NOT NULL;
The constraint is not same as the implicit constraint added for that partition.
I am not sure whether it's really going to avoid the scan. Did you verify it?
If yes, then how?

+ALTER TABLE hash_parted2 ATTACH PARTITION fail_part FOR VALUES WITH
(modulus 3, remainder 2);
+ERROR:  every hash partition modulus must be a factor of the next
larger modulus
We should add this test with at least two partitions in there so that we can
check lower and upper modulus. Also, testing with some interesting
bounds discussed earlier
in this mail e.g. adding modulus 15 when 5, 10, 60 exist will be better than
testing with 3, 4 and 8.

+ERROR:  cannot use collation for hash partition key column "a"
This seems to indicate that we can not specify collation for hash partition key

Re: [HACKERS] [POC] hash partitioning

2017-05-17 Thread amul sul
On Wed, May 17, 2017 at 11:11 AM, Ashutosh Bapat
 wrote:
> On Wed, May 17, 2017 at 12:04 AM, amul sul  wrote:
>> On Tue, May 16, 2017 at 10:00 PM, Dilip Kumar  wrote:
>>> On Tue, May 16, 2017 at 4:22 PM, amul sul  wrote:
 v6 patch has bug in partition oid mapping and indexing, fixed in the
 attached version.

 Now partition oids will be arranged in the ascending order of hash
 partition bound  (i.e. modulus and remainder sorting order)
>>>
>>> Thanks for the update patch. I have some more comments.
>>>
>>> 
>>> + if (spec->remainder < 0)
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>>> +  errmsg("hash partition remainder must be less than modulus")));
>>>
>>> I think this error message is not correct, you might want to change it
>>> to "hash partition remainder must be non-negative integer"
>>>
>>
>> Fixed in the attached version;  used "hash partition remainder must be
>> greater than or equal to 0" instead.
>
> I would suggest "non-zero positive", since that's what we are using in
> the documentation.
>

Understood, Fixed in the attached version.

>>
>>> ---
>>>
>>> + The table is partitioned by specifying remainder and modulus for 
>>> each
>>> + partition. Each partition holds rows for which the hash value of
>>>
>>> Wouldn't it be better to say "modulus and remainder" instead of
>>> "remainder and modulus" then it will be consistent?
>>>
>>
>> You are correct, fixed in the attached version.
>>
>>> ---
>>> +   An UPDATE that causes a row to move from one partition 
>>> to
>>> +   another fails, because
>>>
>>> fails, because -> fails because
>>>
>>
>> This hunk is no longer exists in the attached patch, that was mistaken
>> copied, sorry about that.
>>
>>> ---
>>>
>>> Wouldn't it be a good idea to document how to increase the number of
>>> hash partitions, I think we can document it somewhere with an example,
>>> something like Robert explained upthread?
>>>
>>> create table foo (a integer, b text) partition by hash (a);
>>> create table foo1 partition of foo with (modulus 2, remainder 0);
>>> create table foo2 partition of foo with (modulus 2, remainder 1);
>>>
>>> You can detach foo1, create two new partitions with modulus 4 and
>>> remainders 0 and 2, and move the data over from the old partition
>>>
>>> I think it will be good information for a user to have? or it's
>>> already documented and I missed it?
>>>
>
> This is already part of documentation contained in the patch.
>
> Here are some more comments
> @@ -3296,6 +3311,14 @@ ALTER TABLE measurement ATTACH PARTITION
> measurement_y2008m02
> not the partitioned table.
>
>   
> +
> + 
> +  
> +   An UPDATE that causes a row to move from one partition to
> +   another fails, because the new value of the row fails to satisfy the
> +   implicit partition constraint of the original partition.
> +  
> + 
>  
>  
>  
> The description in this chunk is applicable to all the kinds of partitioning.
> Why should it be part of a patch implementing hash partitioning?
>

This was already addressed in the previous patch(v8).

> +Declarative partitioning only supports hash, list and range
> +partitioning, whereas table inheritance allows data to be
> +divided in a manner of the user's choosing.  (Note, however,
> +that if constraint exclusion is unable to prune partitions
> +effectively, query performance will be very poor.)
> Looks like the line width is less than 80 characters.
>

Fixed in the attached version.

> In partition_bounds_equal(), please add comments explaining why is it safe to
> check just the indexes? May be we should add code under assertion to make sure
> that the datums are equal as well.

Added assert in the attached version.

> The comment could be something
> like, "If two partitioned tables have different greatest moduli, their
> partition schemes don't match. If they have same greatest moduli, and
> all remainders have different indexes, they all have same modulus
> specified and the partitions are ordered by remainders, thus indexes
> array will be an identity i.e. index[i] = i. If the partition
> corresponding to a given remainder exists, it will have same index
> entry for both partitioned tables or if it's missing it will be -1.
> Thus if indexes array matches, corresponding datums array matches. If
> there are multiple remainders corresponding to a given partition,
> their partitions are ordered by the lowest of the remainders, thus if
> indexes array matches, both of the tables have same indexes arrays, in
> both the tables remainders corresponding to multiple partitions all
> have same indexes and thus same modulus. Thus again if the indexes are
> same, datums are same.".
>

Thanks, added with minor modification.

> In the same function
> if 

Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Ashutosh Bapat
On Wed, May 17, 2017 at 12:04 AM, amul sul  wrote:
> On Tue, May 16, 2017 at 10:00 PM, Dilip Kumar  wrote:
>> On Tue, May 16, 2017 at 4:22 PM, amul sul  wrote:
>>> v6 patch has bug in partition oid mapping and indexing, fixed in the
>>> attached version.
>>>
>>> Now partition oids will be arranged in the ascending order of hash
>>> partition bound  (i.e. modulus and remainder sorting order)
>>
>> Thanks for the update patch. I have some more comments.
>>
>> 
>> + if (spec->remainder < 0)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> +  errmsg("hash partition remainder must be less than modulus")));
>>
>> I think this error message is not correct, you might want to change it
>> to "hash partition remainder must be non-negative integer"
>>
>
> Fixed in the attached version;  used "hash partition remainder must be
> greater than or equal to 0" instead.

I would suggest "non-zero positive", since that's what we are using in
the documentation.

>
>> ---
>>
>> + The table is partitioned by specifying remainder and modulus for 
>> each
>> + partition. Each partition holds rows for which the hash value of
>>
>> Wouldn't it be better to say "modulus and remainder" instead of
>> "remainder and modulus" then it will be consistent?
>>
>
> You are correct, fixed in the attached version.
>
>> ---
>> +   An UPDATE that causes a row to move from one partition to
>> +   another fails, because
>>
>> fails, because -> fails because
>>
>
> This hunk is no longer exists in the attached patch, that was mistaken
> copied, sorry about that.
>
>> ---
>>
>> Wouldn't it be a good idea to document how to increase the number of
>> hash partitions, I think we can document it somewhere with an example,
>> something like Robert explained upthread?
>>
>> create table foo (a integer, b text) partition by hash (a);
>> create table foo1 partition of foo with (modulus 2, remainder 0);
>> create table foo2 partition of foo with (modulus 2, remainder 1);
>>
>> You can detach foo1, create two new partitions with modulus 4 and
>> remainders 0 and 2, and move the data over from the old partition
>>
>> I think it will be good information for a user to have? or it's
>> already documented and I missed it?
>>

This is already part of documentation contained in the patch.

Here are some more comments
@@ -3296,6 +3311,14 @@ ALTER TABLE measurement ATTACH PARTITION
measurement_y2008m02
not the partitioned table.
   
  
+
+ 
+  
+   An UPDATE that causes a row to move from one partition to
+   another fails, because the new value of the row fails to satisfy the
+   implicit partition constraint of the original partition.
+  
+ 
 
 
 
The description in this chunk is applicable to all the kinds of partitioning.
Why should it be part of a patch implementing hash partitioning?

+Declarative partitioning only supports hash, list and range
+partitioning, whereas table inheritance allows data to be
+divided in a manner of the user's choosing.  (Note, however,
+that if constraint exclusion is unable to prune partitions
+effectively, query performance will be very poor.)
Looks like the line width is less than 80 characters.

In partition_bounds_equal(), please add comments explaining why is it safe to
check just the indexes? May be we should add code under assertion to make sure
that the datums are equal as well. The comment could be something
like, "If two partitioned tables have different greatest moduli, their
partition schemes don't match. If they have same greatest moduli, and
all remainders have different indexes, they all have same modulus
specified and the partitions are ordered by remainders, thus indexes
array will be an identity i.e. index[i] = i. If the partition
corresponding to a given remainder exists, it will have same index
entry for both partitioned tables or if it's missing it will be -1.
Thus if indexes array matches, corresponding datums array matches. If
there are multiple remainders corresponding to a given partition,
their partitions are ordered by the lowest of the remainders, thus if
indexes array matches, both of the tables have same indexes arrays, in
both the tables remainders corresponding to multiple partitions all
have same indexes and thus same modulus. Thus again if the indexes are
same, datums are same.".

In the same function
if (key->strategy == PARTITION_STRATEGY_HASH)
{
intgreatest_modulus;

/*
 * Compare greatest modulus of hash partition bound which
 * is the last element of datums array.
 */
if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
return false;

/* Compare indexes */
greatest_modulus = DatumGetInt32(b1->datums[b1->ndatums - 1][0]);
for (i = 0; i < 

Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Ashutosh Bapat
On Wed, May 17, 2017 at 9:38 AM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> On Tue, May 16, 2017 at 6:50 PM, Peter Eisentraut
>>  wrote:
>>> On 5/15/17 23:45, Ashutosh Bapat wrote:
 +1. We should throw an error and add a line in documentation that
 collation should not be specified for hash partitioned table.
>
>>> Why is it even allowed in the parser then?
>
>> That grammar is common to all the partitioning strategies. It looks
>> like it's easy to handle collation for hash partitions in
>> transformation than in grammar. But, if we could handle it in grammar,
>> I don't have any objection to it.
>
> If you disallow something in the grammar, the error message is unlikely to
> be better than "syntax error".  That's not very desirable.

Right +1.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Tom Lane
Ashutosh Bapat  writes:
> On Tue, May 16, 2017 at 6:50 PM, Peter Eisentraut
>  wrote:
>> On 5/15/17 23:45, Ashutosh Bapat wrote:
>>> +1. We should throw an error and add a line in documentation that
>>> collation should not be specified for hash partitioned table.

>> Why is it even allowed in the parser then?

> That grammar is common to all the partitioning strategies. It looks
> like it's easy to handle collation for hash partitions in
> transformation than in grammar. But, if we could handle it in grammar,
> I don't have any objection to it.

If you disallow something in the grammar, the error message is unlikely to
be better than "syntax error".  That's not very desirable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Ashutosh Bapat
On Tue, May 16, 2017 at 6:50 PM, Peter Eisentraut
 wrote:
> On 5/15/17 23:45, Ashutosh Bapat wrote:
>> +1. We should throw an error and add a line in documentation that
>> collation should not be specified for hash partitioned table.
>
> Why is it even allowed in the parser then?

That grammar is common to all the partitioning strategies. It looks
like it's easy to handle collation for hash partitions in
transformation than in grammar. But, if we could handle it in grammar,
I don't have any objection to it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 10:00 PM, Dilip Kumar  wrote:
> On Tue, May 16, 2017 at 4:22 PM, amul sul  wrote:
>> v6 patch has bug in partition oid mapping and indexing, fixed in the
>> attached version.
>>
>> Now partition oids will be arranged in the ascending order of hash
>> partition bound  (i.e. modulus and remainder sorting order)
>
> Thanks for the update patch. I have some more comments.
>
> 
> + if (spec->remainder < 0)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> +  errmsg("hash partition remainder must be less than modulus")));
>
> I think this error message is not correct, you might want to change it
> to "hash partition remainder must be non-negative integer"
>

Fixed in the attached version;  used "hash partition remainder must be
greater than or equal to 0" instead.

> ---
>
> + The table is partitioned by specifying remainder and modulus for 
> each
> + partition. Each partition holds rows for which the hash value of
>
> Wouldn't it be better to say "modulus and remainder" instead of
> "remainder and modulus" then it will be consistent?
>

You are correct, fixed in the attached version.

> ---
> +   An UPDATE that causes a row to move from one partition to
> +   another fails, because
>
> fails, because -> fails because
>

This hunk is no longer exists in the attached patch, that was mistaken
copied, sorry about that.

> ---
>
> Wouldn't it be a good idea to document how to increase the number of
> hash partitions, I think we can document it somewhere with an example,
> something like Robert explained upthread?
>
> create table foo (a integer, b text) partition by hash (a);
> create table foo1 partition of foo with (modulus 2, remainder 0);
> create table foo2 partition of foo with (modulus 2, remainder 1);
>
> You can detach foo1, create two new partitions with modulus 4 and
> remainders 0 and 2, and move the data over from the old partition
>
> I think it will be good information for a user to have? or it's
> already documented and I missed it?
>

I think, we should, but not sure about it.

Regards,
Amul


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Robert Haas
On Tue, May 16, 2017 at 3:19 AM, Ashutosh Bapat
 wrote:
> While earlier, I thought the same, I am wondering whether this is
> true. Don't different collations deem different strings equal e.g one
> collation may deem 'aa' and 'AA' as same but other may not.

No, that's not allowed.  This has been discussed many times on this
mailing list.  See varstr_cmp(), which you will notice refuses to
return 0 unless the strings are bytewise identical.

> Or is that
> encoding problem being discussed in hash functions thread?

No, that's something else entirely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Dilip Kumar
On Tue, May 16, 2017 at 4:22 PM, amul sul  wrote:
> v6 patch has bug in partition oid mapping and indexing, fixed in the
> attached version.
>
> Now partition oids will be arranged in the ascending order of hash
> partition bound  (i.e. modulus and remainder sorting order)

Thanks for the update patch. I have some more comments.


+ if (spec->remainder < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+  errmsg("hash partition remainder must be less than modulus")));

I think this error message is not correct, you might want to change it
to "hash partition remainder must be non-negative integer"

---

+ The table is partitioned by specifying remainder and modulus for each
+ partition. Each partition holds rows for which the hash value of

Wouldn't it be better to say "modulus and remainder" instead of
"remainder and modulus" then it will be consistent?

---
+   An UPDATE that causes a row to move from one partition to
+   another fails, because

fails, because -> fails because

---

Wouldn't it be a good idea to document how to increase the number of
hash partitions, I think we can document it somewhere with an example,
something like Robert explained upthread?

create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 2, remainder 0);
create table foo2 partition of foo with (modulus 2, remainder 1);

You can detach foo1, create two new partitions with modulus 4 and
remainders 0 and 2, and move the data over from the old partition

I think it will be good information for a user to have? or it's
already documented and I missed it?



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Peter Eisentraut
On 5/16/17 03:19, Ashutosh Bapat wrote:
> On Tue, May 16, 2017 at 10:03 AM, amul sul  wrote:
>> On Mon, May 15, 2017 at 9:13 PM, Robert Haas  wrote:
> Collation is only relevant for ordering, not equality.
> 
> While earlier, I thought the same, I am wondering whether this is
> true. Don't different collations deem different strings equal e.g one
> collation may deem 'aa' and 'AA' as same but other may not. Or is that
> encoding problem being discussed in hash functions thread?

The collations we currently support don't do that, unless someone made a
custom one.  However, we might want to support that in the future.

Also, text/varchar comparisons always use strcmp() as a tie-breaker.
Again, this might be something to review at some point.

But you currently have the citext type that would indeed consider 'aa'
and 'AA' equal.  But citext also has a hash function in the hash
operator class that handles that.  So you could look into using that
approach.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread Peter Eisentraut
On 5/15/17 23:45, Ashutosh Bapat wrote:
> +1. We should throw an error and add a line in documentation that
> collation should not be specified for hash partitioned table.

Why is it even allowed in the parser then?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 3:30 PM, amul sul  wrote:
> On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat
>  wrote:
>  [...]

 +if (key->strategy == PARTITION_STRATEGY_HASH)
 +{
 +ndatums = nparts;
 +hbounds = (PartitionHashBound **) palloc(nparts *
 +
 sizeof(PartitionHashBound *));
 +i = 0;
 +foreach (cell, boundspecs)
 +{
 +PartitionBoundSpec *spec = lfirst(cell);
 +
 [ clipped ]
 +hbounds[i]->index = i;
 +i++;
 +}
 For list and range partitioned table we order the bounds so that two
 partitioned tables have them in the same order irrespective of order in 
 which
 they are specified by the user or hence stored in the catalogs. The 
 partitions
 then get indexes according the order in which their bounds appear in 
 ordered
 arrays of bounds. Thus any two partitioned tables with same partition
 specification always have same PartitionBoundInfoData. This helps in
 partition-wise join to match partition bounds of two given tables.  Above 
 code
 assigns the indexes to the partitions as they appear in the catalogs. This
 means that two partitioned tables with same partition specification but
 different order for partition bound specification will have different
 PartitionBoundInfoData represenation.

 If we do that, probably partition_bounds_equal() would reduce to just 
 matching
 indexes and the last element of datums array i.e. the greatest modulus 
 datum.
 If ordered datums array of two partitioned table do not match exactly, the
 mismatch can be because missing datums or different datums. If it's a 
 missing
 datum it will change the greatest modulus or have corresponding entry in
 indexes array as -1. If the entry differs it will cause mismatching 
 indexes in
 the index arrays.

>>> Make sense, will fix this.
>>
>> I don't see this being addressed in the patches attached in the reply to 
>> Dilip.
>>
>
> Fixed in the attached version.
>

v6 patch has bug in partition oid mapping and indexing, fixed in the
attached version.

Now partition oids will be arranged in the ascending order of hash
partition bound  (i.e. modulus and remainder sorting order)

Regards,
Amul


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 1:17 PM, Ashutosh Bapat
 wrote:
> Hi,
> Here's patch with some cosmetic fixes to 0002, to be applied on top of 0002.
>

Thank you, included in v6 patch.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat
 wrote:
 [...]
>>>
>>> +if (key->strategy == PARTITION_STRATEGY_HASH)
>>> +{
>>> +ndatums = nparts;
>>> +hbounds = (PartitionHashBound **) palloc(nparts *
>>> +
>>> sizeof(PartitionHashBound *));
>>> +i = 0;
>>> +foreach (cell, boundspecs)
>>> +{
>>> +PartitionBoundSpec *spec = lfirst(cell);
>>> +
>>> [ clipped ]
>>> +hbounds[i]->index = i;
>>> +i++;
>>> +}
>>> For list and range partitioned table we order the bounds so that two
>>> partitioned tables have them in the same order irrespective of order in 
>>> which
>>> they are specified by the user or hence stored in the catalogs. The 
>>> partitions
>>> then get indexes according the order in which their bounds appear in ordered
>>> arrays of bounds. Thus any two partitioned tables with same partition
>>> specification always have same PartitionBoundInfoData. This helps in
>>> partition-wise join to match partition bounds of two given tables.  Above 
>>> code
>>> assigns the indexes to the partitions as they appear in the catalogs. This
>>> means that two partitioned tables with same partition specification but
>>> different order for partition bound specification will have different
>>> PartitionBoundInfoData represenation.
>>>
>>> If we do that, probably partition_bounds_equal() would reduce to just 
>>> matching
>>> indexes and the last element of datums array i.e. the greatest modulus 
>>> datum.
>>> If ordered datums array of two partitioned table do not match exactly, the
>>> mismatch can be because missing datums or different datums. If it's a 
>>> missing
>>> datum it will change the greatest modulus or have corresponding entry in
>>> indexes array as -1. If the entry differs it will cause mismatching indexes 
>>> in
>>> the index arrays.
>>>
>> Make sense, will fix this.
>
> I don't see this being addressed in the patches attached in the reply to 
> Dilip.
>

Fixed in the attached version.

>>
>>>
>>> +if (key->partattrs[i] != 0)
>>> +{
>>> +keyCol = (Node *) makeVar(1,
>>> +  key->partattrs[i],
>>> +  key->parttypid[i],
>>> +  key->parttypmod[i],
>>> +  key->parttypcoll[i],
>>> +  0);
>>> +
>>> +/* Form hash_fn(value) expression */
>>> +keyCol = (Node *) makeFuncExpr(key->partsupfunc[i].fn_oid,
>>> +
>>> get_fn_expr_rettype(>partsupfunc[i]),
>>> +list_make1(keyCol),
>>> +InvalidOid,
>>> +InvalidOid,
>>> +COERCE_EXPLICIT_CALL);
>>> +}
>>> +else
>>> +{
>>> +keyCol = (Node *) copyObject(lfirst(partexprs_item));
>>> +partexprs_item = lnext(partexprs_item);
>>> +}
>>> I think we should add FuncExpr for column Vars as well as expressions.
>>>
>> Okay, will fix this.
>
> Here, please add a check similar to get_quals_for_range()
> 1840 if (partexprs_item == NULL)
> 1841 elog(ERROR, "wrong number of partition key expressions");
>
>

Fixed in the attached version.

>>
>>> I think we need more comments for compute_hash_value(), mix_hash_value() and
>>> satisfies_hash_partition() as to what each of them accepts and what it
>>> computes.
>>>
>>> +/* key's hash values start from third argument of function. */
>>> +if (!PG_ARGISNULL(i + 2))
>>> +{
>>> +values[i] = PG_GETARG_DATUM(i + 2);
>>> +isnull[i] = false;
>>> +}
>>> +else
>>> +isnull[i] = true;
>>> You could write this as
>>> isnull[i] = PG_ARGISNULL(i + 2);
>>> if (isnull[i])
>>> values[i] = PG_GETARG_DATUM(i + 2);
>>>
>> Okay.
>
> If we have used this technique somewhere else in PG code, please
> mention that function/place.
> /*
>  * Rotate hash left 1 bit before mixing in the next column.  This
>  * prevents equal values in different keys from cancelling each other.
>  */
>

Fixed in the attached version.

>
>>
>>> +foreach (lc, $5)
>>> +{
>>> +DefElem*opt = (DefElem *) lfirst(lc);
>>> A search on WITH in gram.y shows that we do not handle WITH options in 
>>> gram.y.
>>> Usually they are handled at the transformation stage. Why is this an 
>>> exception?
>>> If you do that, we can have all the error handling in
>>> transformPartitionBound().
>>>
>> If so, ForValues need to return list for hash and PartitionBoundSpec
>> for other two; wouldn't  that break code consistency? And such
>> validation is not new in gram.y see 

  1   2   >