Re: [HACKERS] Runtime Partition Pruning

2017-11-10 Thread amul sul
On Thu, Nov 9, 2017 at 4:48 PM, Beena Emerson  wrote:
> Hello all,
>
> Here is the updated patch which is rebased over v10 of Amit Langote's
> path towards faster pruning patch [1]. It modifies the PartScanKeyInfo
> struct to hold expressions which is then evaluated by the executor to
> fetch the correct partitions using the function.
>

Hi Beena,

I have started looking into your patch, here few initial comments
for your 0001 patch:

1.
 351 + * Evaluate and store the ooutput of ExecInitExpr for each
of the keys.

Typo: ooutput

2.
 822 +   if (IsA(constexpr, Const) &_runtime)
 823 +   continue;
 824 +
 825 +   if (IsA(constexpr, Param) &&!is_runtime)
 826 +   continue;
 827 +

 Add space after '&&'

 3.
 1095 +* Generally for appendrel we don't fetch the clause from the the

Typo: Double 'the'

4.
 272 
-/*-
 273 + 
/*-

 Unnecessary hunk.

5.
 313 +   Node   *n =
eval_const_expressions_from_list(estate->es_param_list_info, val);
 314 +

Crossing 80 column window.  Same at line # 323 & 325

6.
 315 +   keys->eqkeys_datums[i++] = ((Const *) n)->constvalue;

Don’t we need a check for IsA(n, Const) or assert ?

7.
1011 +   if (prmList)
1012 +   context.boundParams = prmList;  /* bound Params */
1013 +   else
1014 +   context.boundParams = NULL;

No need of prmList null check, context.boundParams = prmList; is enough.

8.  It would be nice if you create a separate patch where you are moving
PartScanKeyInfo and exporting function declaration.

9.  Could you please add few regression tests, that would help in
review & testing.

10. Could you please rebase your patch against latest "path toward faster
partition pruning" patch by Amit.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-11-09 Thread amul sul
On Fri, Nov 10, 2017 at 4:41 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Nov 1, 2017 at 6:16 AM, amul sul <sula...@gmail.com> wrote:
>> Fixed in the 0003 patch.
>
> I have committed this patch set with the attached adjustments.
>

Thanks a lot for your support & a ton of thanks to all reviewer.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-11-09 Thread amul sul
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote
 wrote:
> On 2017/11/06 14:32, David Rowley wrote:
>> On 6 November 2017 at 17:30, Amit Langote wrote:
>>> On 2017/11/03 13:32, David Rowley wrote:
 On 31 October 2017 at 21:43, Amit Langote wrote:
[]
>
> Attached updated set of patches, including the fix to make the new pruning
> code handle Boolean partitioning.
>

I am getting following warning on mac os x:

partition.c:1800:24: warning: comparison of constant -1 with
expression of type 'NullTestType'
  (aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~ ^  ~~
partition.c:1932:25: warning: comparison of constant -1 with
expression of type 'NullTestType'
  (aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~ ^  ~~
2 warnings generated.


Comment for 0004 patch:
270 +   /* -1 represents an invalid value of NullTestType. */
 271 +   memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));

I think we should not use memset to set a value other than 0 or true/false.
This will work for -1 on the system where values are stored in the 2's
complement but I am afraid of other architecture.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-11-09 Thread amul sul
Hi Dilip,

v6 patch:
 42 +   /*
 43 +* Estimate number of hashtable entries we can have within
maxbytes. This
 44 +* estimates the hash cost as sizeof(PagetableEntry).
 45 +*/
 46 +   nbuckets = maxbytes /
 47 +   (sizeof(PagetableEntry) + sizeof(Pointer) + sizeof(Pointer));

It took me a little while to understand this calculation.  You have moved this
code from tbm_create(), but I think you should move the following
comment as well:

tidbitmap.c:
 276 /*
 277  * Estimate number of hashtable entries we can have within
maxbytes. This
 278  * estimates the hash cost as sizeof(PagetableEntry), which
is good enough
 279  * for our purpose.  Also count an extra Pointer per entry
for the arrays
 280  * created during iteration readout.
 281  */

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-11-02 Thread amul sul
On Thu, Nov 2, 2017 at 1:35 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Nov 1, 2017 at 3:46 PM, amul sul <sula...@gmail.com> wrote:
>> Although partition constraints become more simple, there isn't any 
>> performance
>> gain with 0005 patch. Also I am little skeptic about logic in 0005 where we
>> copied extended hash function info from the partition key, what if parent is
>> changed while we are using it? Do we need to keep lock on parent until 
>> commit in
>> satisfies_hash_partition?
>
> I don't think it should be possible for the parent to be changed.  I
> mean, the partition key is altogether immutable -- it can't be changed
> after creation time.  The partition bounds can be changed for
> individual partitions but that would require a lock on the partition.
>
> Can you give an example of the kind of scenario about which you are concerned?
>

Yes, you are correct, column involved in the partitioning are immutable.

I was just worried about any change in the partition key column that
might change selected hash function.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-30 Thread amul sul
On Tue, Oct 31, 2017 at 9:54 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Oct 30, 2017 at 5:52 PM, amul sul <sula...@gmail.com> wrote:
>> Actually, int4[] is also inappropriate type as we have started using a 64bit
>> hash function.  We need something int8[] which is not available, so that I
>> have used ANYARRAYOID in the attached patch(0004).
>
> I don't know why you think int8[] is not available.
>
> rhaas=# select 'int8[]'::regtype;
>  regtype
> --
>  bigint[]
> (1 row)
>

I missed _int8, was searching for INT8ARRAYOID in pg_type.h, my bad.

>>> I wrote the following query
>>> to detect problems of this type, and I think we might want to just go
>>> ahead and add this to the regression test suite, verifying that it
>>> returns no rows:
>>>
>>> select oid::regprocedure, provariadic::regtype, proargtypes::regtype[]
>>> from pg_proc where provariadic != 0
>>> and case proargtypes[array_length(proargtypes, 1)-1]
>>> when 2276 then 2276 -- any -> any
>>> when 2277 then 2283 -- anyarray -> anyelement
>>> else (select t.oid from pg_type t where t.typarray =
>>> proargtypes[array_length(proargtypes, 1)-1]) end
>>> != provariadic;
>>>
>>
>> Added in 0001 patch.
>
> Committed.
>

Thanks !

>> One advantage of current implementation is that we can see which hash
>> function are used for the each partitioning column and also we don't need to
>> worry about user specified opclass and different input types.
>>
>> Something similar I've tried in my initial patch version[1], but I have 
>> missed
>> user specified opclass handling for each partitioning column.  Do you want me
>> to handle opclass using RelabelType node? I am afraid that, that would make
>> the \d+ output more horrible than the current one if non-default opclass 
>> used.
>
> Maybe we should just pass the OID of the partition (or both the
> partition and the parent, so we can get the lock ordering right?)
> instead.
>
Okay, will try this.


Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-24 Thread amul sul
On Tue, Oct 24, 2017 at 5:00 PM, Andres Freund <and...@anarazel.de> wrote:
> On 2017-10-24 12:43:12 +0530, amul sul wrote:
>> I tried to get suggested SMHasher[1] test result for the hash_combine
>> for 32-bit and 64-bit version.
>>
>> SMHasher works on hash keys of the form {0}, {0,1}, {0,1,2}... up to
>> N=255, using 256-N as the seed, for the hash_combine testing we
>> needed two hash value to be combined, for that, I've generated 64
>> and 128-bit hash using cityhash functions[2] for the given smhasher
>> key then split in two part to test 32-bit and 64-bit hash_combine
>> function respectively.   Attached patch for SMHasher code changes &
>> output of 32-bit and 64-bit hash_combine testing. Note that I have
>> skipped speed test this test which is irrelevant here.
>>
>> By referring other hash function results [3], we can see that hash_combine
>> test results are not bad either.
>>
>> Do let me know if current testing is not good enough or if you want me to do
>> more testing, thanks.
>
> This looks very good! Both the tests you did, and the results for
> hash_combineXX. I therefore think we can go ahead with that formulation
> of hash_combine64?
>

Thanks, Andres. Yes we can, I've added your suggested hash_combine64 in
the latest patch[1].

Regards,
Amul

1] 
https://postgr.es/m/CAAJ_b97R2rJinGPAVmZZzpNV%3D-5BgYFxDfY9HYdM1bCYJFGmQw%40mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-24 Thread amul sul
On Fri, Oct 13, 2017 at 3:00 AM, Andres Freund  wrote:
> On 2017-10-12 17:27:52 -0400, Robert Haas wrote:
>> On Thu, Oct 12, 2017 at 4:20 PM, Andres Freund  wrote:
>> >> In other words, it's not utterly fixed in stone --- we invented
>> >> --load-via-partition-root primarily to cope with circumstances that
>> >> could change hash values --- but we sure don't want to be changing it
>> >> with any regularity, or for a less-than-excellent reason.
>> >
>> > Yea, that's what I expected. It'd probably good for somebody to run
>> > smhasher or such on the output of the combine function (or even better,
>> > on both the 32 and 64 bit variants) in that case.
>>
>> Not sure how that test suite works exactly, but presumably the
>> characteristics in practice will depend the behavior of the hash
>> functions used as input the combine function - so the behavior could
>> be good for an (int, int) key but bad for a (text, date) key, or
>> whatever.
>
> I don't think that's true, unless you have really bad hash functions on
> the the component hashes. A hash combine function can't really do
> anything about badly hashed input, what you want is that it doesn't
> *reduce* the quality of the hash by combining.
>

I tried to get suggested SMHasher[1] test result for the hash_combine
for 32-bit and 64-bit version.

SMHasher works on hash keys of the form {0}, {0,1}, {0,1,2}... up to
N=255, using 256-N as the seed, for the hash_combine testing we
needed two hash value to be combined, for that, I've generated 64
and 128-bit hash using cityhash functions[2] for the given smhasher
key then split in two part to test 32-bit and 64-bit hash_combine
function respectively.   Attached patch for SMHasher code changes &
output of 32-bit and 64-bit hash_combine testing. Note that I have
skipped speed test this test which is irrelevant here.

By referring other hash function results [3], we can see that hash_combine
test results are not bad either.

Do let me know if current testing is not good enough or if you want me to do
more testing, thanks.

1] https://github.com/aappleby/smhasher
2] https://github.com/aappleby/smhasher/blob/master/src/CityTest.cpp
3] https://github.com/rurban/smhasher/tree/master/doc

Regards,
Amul
[amul.sul@power2 src]$ ./SMHasher hash_combine64
---
--- Testing hash_combine64 (test hash combine 64 pg function.)

[[[ Sanity Tests ]]]   

Verification value 0x3B439A64 : Passed!
Running sanity check 1..PASS
Running sanity check 2..PASS

[[[ Differential Tests ]]]

Testing 8303632 up-to-5-bit differentials in 64-bit keys -> 64 bit hashes.
1000 reps, 8303632000 total tests, expecting 0.00 random collisions..
0 total collisions, of which 0 single collisions were ignored

Testing 11017632 up-to-4-bit differentials in 128-bit keys -> 64 bit hashes.
1000 reps, 11017632000 total tests, expecting 0.00 random collisions..
0 total collisions, of which 0 single collisions were ignored

Testing 2796416 up-to-3-bit differentials in 256-bit keys -> 64 bit hashes.
1000 reps, 2796416000 total tests, expecting 0.00 random collisions..
0 total collisions, of which 0 single collisions were ignored


[[[ Avalanche Tests ]]]

Testing  32-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.742667%
Testing  40-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.684667%
Testing  48-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.570667%
Testing  56-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.726667%
Testing  64-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.67%
Testing  72-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.659333%
Testing  80-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.666000%
Testing  88-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.712000%
Testing  96-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.632667%
Testing 104-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.646000%
Testing 112-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.746000%
Testing 120-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.692667%
Testing 128-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.684000%
Testing 136-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.758667%
Testing 144-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.725333%
Testing 152-bit keys ->  64-bit hashes,   30 reps.. worst bias is 
0.776000%

[[[ Keyset 'Cyclic' Tests ]]]

Keyset 'Cyclic' - 8 cycles of 8 bytes - 1000 keys
Testing collisions   - Expected 0.00, actual 0.00 ( 0.00x)
Testing distribution - Worst bias is the  20-bit window at bit   6 - 0.039%

Keyset 'Cyclic' - 8 cycles of 9 bytes 

Re: [HACKERS] [POC] hash partitioning

2017-10-12 Thread amul sul
On Thu, Oct 12, 2017 at 6:31 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Oct 10, 2017 at 7:07 AM, amul sul <sula...@gmail.com> wrote:
>> How about the attached patch(0003)?
>> Also, the dim variable is renamed to natts.
>
> I'm not sure I believe this comment:
>
> +/*
> + * We arrange the partitions in the ascending order of their modulus
> + * and remainders.  Also every modulus is factor of next larger
> + * modulus.  This means that the index of a given partition is same 
> as
> + * the remainder of that partition.  Also entries at (remainder + N *
> + * modulus) positions in indexes array are all same for (modulus,
> + * remainder) specification for any partition.  Thus datums array 
> from
> + * both the given bounds are same, if and only if their indexes array
> + * will be same.  So, it suffices to compare indexes array.
> + */
>
> I am particularly not sure that I believe that the index of a
> partition must be the same as the remainder.  It doesn't seem like
> that would be true when there is more than one modulus or when some
> partitions are missing.
>

Looks like an explanation by the comment is not good enough, will think on this.

Here are the links for the previous discussion:
1] 
https://postgr.es/m/CAFjFpRfHqSGBjNgJV2p%2BC4Yr5Qxvwygdsg4G_VQ6q9NTB-i3MA%40mail.gmail.com
2] 
https://postgr.es/m/CAFjFpRdeESKFkVGgmOdYvmD3d56-58c5VCBK0zDRjHrkq_VcNg%40mail.gmail.com


> +if (offset < 0)
> +{
> +next_modulus = DatumGetInt32(datums[0][0]);
> +valid_modulus = (next_modulus % spec->modulus) == 0;
> +}
> +else
> +{
> +prev_modulus = DatumGetInt32(datums[offset][0]);
> +valid_modulus = (spec->modulus % prev_modulus) == 0;
> +
> +if (valid_modulus && (offset + 1) < ndatums)
> +{
> +next_modulus =
> DatumGetInt32(datums[offset + 1][0]);
> +valid_modulus = (next_modulus %
> spec->modulus) == 0;
> +}
> +}
>
> I don't think this is quite right.  It checks the new modulus against
> prev_modulus whenever prev_modulus is defined, which is correct, but
> it doesn't check the new modulus against the next_modulus except when
> offset < 0.  But actually that check needs to be performed, I think,
> whenever the new modulus is less than the greatest modulus seen so
> far.
>
It does. See the "if (valid_modulus && (offset + 1) < ndatums)"  block in the
else part of the snippet that you are referring.

For e.g new modulus 25 & 150 is not accepted for the hash partitioned bound with
modulus 10,50,200. Will cover this test as well.

> + * For a partitioned table defined as:
> + *CREATE TABLE simple_hash (a int, b char(10)) PARTITION BY HASH (a, b);
> + *
> + * CREATE TABLE p_p1 PARTITION OF simple_hash
> + *FOR VALUES WITH (MODULUS 2, REMAINDER 1);
> + * CREATE TABLE p_p2 PARTITION OF simple_hash
> + *FOR VALUES WITH (MODULUS 4, REMAINDER 2);
> + * CREATE TABLE p_p3 PARTITION OF simple_hash
> + *FOR VALUES WITH (MODULUS 8, REMAINDER 0);
> + * CREATE TABLE p_p4 PARTITION OF simple_hash
> + *FOR VALUES WITH (MODULUS 8, REMAINDER 4);
> + *
> + * This function will return one of the following in the form of an
> + * expression:
> + *
> + * for p_p1: satisfies_hash_partition(2, 1, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
> + * for p_p2: satisfies_hash_partition(4, 2, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
> + * for p_p3: satisfies_hash_partition(8, 0, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
> + * for p_p4: satisfies_hash_partition(8, 4, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
>
> I think instead of this lengthy example you should try to explain the
> general rule.  Maybe something like: the partition constraint for a
> hash partition is always a call to the built-in function
> satisfies_hash_partition().  The first two arguments are the modulus
> and remainder for the partition; the remaining arguments are the hash
> values computed for each column of the partition key using the
> extended hash function from the appropriate opcl

Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread amul sul
On Tue, Oct 10, 2017 at 3:42 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Tue, Oct 10, 2017 at 3:32 PM, amul sul <sula...@gmail.com> wrote:
>
>>> +hash_part? true : key->parttypbyval[j],
>>> +key->parttyplen[j]);
>>> parttyplen is the length of partition key attribute, whereas what you want 
>>> here
>>> is the length of type of modulus and remainder. Is that correct? Probably we
>>> need some special handling wherever parttyplen and parttypbyval is used 
>>> e.g. in
>>> call to partition_bounds_equal() from build_joinrel_partition_info().
>>>
>>
>> Unless I am missing something, I don't think we should worry about parttyplen
>> because in the datumCopy() when the datatype is pass-by-value then typelen
>> is ignored.
>
> That's true, but it's ugly, passing typbyvalue of one type and len of other.
>

How about the attached patch(0003)?
Also, the dim variable is renamed to natts.

Regards,
Amul


0001-partition_bounds_copy-code-refactoring-v1.patch
Description: Binary data


0002-hash-partitioning_another_design-v24.patch
Description: Binary data


0003-Enable-partition-wise-join-support-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread amul sul
On Tue, Oct 10, 2017 at 3:32 PM, amul sul <sula...@gmail.com> wrote:
> On Mon, Oct 9, 2017 at 5:51 PM, Ashutosh Bapat
> <ashutosh.ba...@enterprisedb.com> wrote:
>> On Mon, Oct 9, 2017 at 4:44 PM, amul sul <sula...@gmail.com> wrote:
>>
>
> Thanks Ashutosh for your review, please find my comment inline.
>
>>
>>> 0002 few changes in partition-wise join code to support
>>> hash-partitioned table as well & regression tests.
>>
>> +switch (key->strategy)
>> +{
>> +case PARTITION_STRATEGY_HASH:
>> +/*
>> + * Indexes array is same as the greatest modulus.
>> + * See partition_bounds_equal() for more explanation.
>> + */
>> +num_indexes = DatumGetInt32(src->datums[ndatums - 1][0]);
>> +break;
>> This logic is duplicated at multiple places.  I think it's time we 
>> consolidate
>> these changes in a function/macro and call it from the places where we have 
>> to
>> calculate number of indexes based on the information in partition descriptor.
>> Refactoring existing code might be a separate patch and then add hash
>> partitioning case in hash partitioning patch.
>>
>
> Make sense, added get_partition_bound_num_indexes() to get number of index
> elements in 0001 & get_greatest_modulus() as name suggested to get the 
> greatest
> modulus of the hash partition bound in 0002.
>
>> +intdim = hash_part? 2 : partnatts;
>> Call the variable as natts_per_datum or just natts?
>>
>
> natts represents the number of attributes, but for the hash partition bound we
> are not dealing with the attribute so that I have used short-form of 
> dimension,
> thoughts?

Okay, I think the dimension(dim) is also unfit here.  Any suggestions?

>
>> +hash_part? true : key->parttypbyval[j],
>> +key->parttyplen[j]);
>> parttyplen is the length of partition key attribute, whereas what you want 
>> here
>> is the length of type of modulus and remainder. Is that correct? Probably we
>> need some special handling wherever parttyplen and parttypbyval is used e.g. 
>> in
>> call to partition_bounds_equal() from build_joinrel_partition_info().
>>
>
> Unless I am missing something, I don't think we should worry about parttyplen
> because in the datumCopy() when the datatype is pass-by-value then typelen
> is ignored.
>
> Regards,
> Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-10 Thread amul sul
On Mon, Oct 9, 2017 at 5:51 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Mon, Oct 9, 2017 at 4:44 PM, amul sul <sula...@gmail.com> wrote:
>

Thanks Ashutosh for your review, please find my comment inline.

>
>> 0002 few changes in partition-wise join code to support
>> hash-partitioned table as well & regression tests.
>
> +switch (key->strategy)
> +{
> +case PARTITION_STRATEGY_HASH:
> +/*
> + * Indexes array is same as the greatest modulus.
> + * See partition_bounds_equal() for more explanation.
> + */
> +num_indexes = DatumGetInt32(src->datums[ndatums - 1][0]);
> +break;
> This logic is duplicated at multiple places.  I think it's time we consolidate
> these changes in a function/macro and call it from the places where we have to
> calculate number of indexes based on the information in partition descriptor.
> Refactoring existing code might be a separate patch and then add hash
> partitioning case in hash partitioning patch.
>

Make sense, added get_partition_bound_num_indexes() to get number of index
elements in 0001 & get_greatest_modulus() as name suggested to get the greatest
modulus of the hash partition bound in 0002.

> +intdim = hash_part? 2 : partnatts;
> Call the variable as natts_per_datum or just natts?
>

natts represents the number of attributes, but for the hash partition bound we
are not dealing with the attribute so that I have used short-form of dimension,
thoughts?

> +hash_part? true : key->parttypbyval[j],
> +key->parttyplen[j]);
> parttyplen is the length of partition key attribute, whereas what you want 
> here
> is the length of type of modulus and remainder. Is that correct? Probably we
> need some special handling wherever parttyplen and parttypbyval is used e.g. 
> in
> call to partition_bounds_equal() from build_joinrel_partition_info().
>

Unless I am missing something, I don't think we should worry about parttyplen
because in the datumCopy() when the datatype is pass-by-value then typelen
is ignored.

Regards,
Amul


0001-partition_bounds_copy-code-refactoring-v1.patch
Description: Binary data


0002-hash-partitioning_another_design-v24.patch
Description: Binary data


0003-Enable-partition-wise-join-support-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-09 Thread amul sul
On Sat, Oct 7, 2017 at 5:22 PM, amul sul <sula...@gmail.com> wrote:
> On Fri, Oct 6, 2017 at 5:35 PM, Jesper Pedersen
> <jesper.peder...@redhat.com> wrote:
>> Hi Amul,
>>
>> Could you rebase on latest master ?
>>
>
> Sure will post that soon, but before that, I need to test hash partitioning
> with recent partition-wise join commit (f49842d1ee), thanks.
>

Updated patch attached.

0001 is the rebased of the previous patch, no new change.
0002 few changes in partition-wise join code to support
hash-partitioned table as well & regression tests.

Thanks & Regards,
Amul


0001-hash-partitioning_another_design-v23.patch
Description: Binary data


0002-Enable-partition-wise-join-support-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-07 Thread amul sul
On Fri, Oct 6, 2017 at 5:35 PM, Jesper Pedersen
 wrote:
> Hi Amul,
>
> Could you rebase on latest master ?
>

Sure will post that soon, but before that, I need to test hash partitioning
with recent partition-wise join commit (f49842d1ee), thanks.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-09-29 Thread amul sul
On Wed, Sep 13, 2017 at 4:24 PM, amul sul <sula...@gmail.com> wrote:
>
>
> On Sun, Sep 10, 2017 at 8:47 AM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
>>
>> On Fri, Sep 8, 2017 at 4:51 PM, amul sul <sula...@gmail.com> wrote:
>> > On Thu, May 18, 2017 at 9:13 AM, Amit Kapila <amit.kapil...@gmail.com>
>> > wrote:
>> >>
>> >>  On Wed, May 17, 2017 at 5:17 PM, Robert Haas <robertmh...@gmail.com>
>> >> wrote:
>> >> > On Wed, May 17, 2017 at 6:29 AM, Amit Kapila
>> >> > <amit.kapil...@gmail.com>
>> >> > wrote:
>> >> >> I think we can do this even without using an additional infomask
>> >> >> bit.
>> >> >> As suggested by Greg up thread, we can set InvalidBlockId in ctid to
>> >> >> indicate such an update.
>> >> >
>> >> > Hmm.  How would that work?
>> >> >
>> >>
>> >> We can pass a flag say row_moved (or require_row_movement) to
>> >> heap_delete which will in turn set InvalidBlockId in ctid instead of
>> >> setting it to self. Then the ExecUpdate needs to check for the same
>> >> and return an error when heap_update is not successful (result !=
>> >> HeapTupleMayBeUpdated).  Can you explain what difficulty are you
>> >> envisioning?
>> >>
>> >
>> > Attaching WIP patch incorporates the above logic, although I am yet to
>> > check
>> > all the code for places which might be using ip_blkid.  I have got a
>> > small
>> > query here,
>> > do we need an error on HeapTupleSelfUpdated case as well?
>> >
>>
>> No, because that case is anyway a no-op (or error depending on whether
>> is updated/deleted by same command or later command).  Basically, even
>> if the row wouldn't have been moved to another partition, we would not
>> have allowed the command to proceed with the update.  This handling is
>> to make commands fail rather than a no-op where otherwise (when the
>> tuple is not moved to another partition) the command would have
>> succeeded.
>>
> Thank you.
>
> I've rebased patch against  Amit Khandekar's latest patch (v17_rebased_2).
> Also, added ip_blkid validation check in heap_get_latest_tid(), 
> rewrite_heap_tuple()
> & rewrite_heap_tuple() function, because only ItemPointerEquals() check is no
> longer sufficient after this patch.

FYI, I have posted this patch in a separate thread :
https://postgr.es/m/caaj_b95pkwojoyfz0bzxu8ookctvgzn6vygcnvuukeudrnf...@mail.gmail.com

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-28 Thread amul sul
On Thu, Sep 28, 2017 at 11:24 AM, Amit Langote
<langote_amit...@lab.ntt.co.jp> wrote:
> On 2017/09/27 22:41, Jesper Pedersen wrote:
>> On 09/27/2017 03:05 AM, amul sul wrote:
>>>>> Attached rebased patch, thanks.
>>>>>
>>>>>
>>>> While reading through the patch I thought it would be better to keep
>>>> MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to
>>>> highlight that these are "keywords" for hash partition.
>>>>
>>>> Also updated some of the documentation.
>>>>
>>>>
>>> Thanks a lot for the patch, included in the attached version.
>>>
>>
>> Thank you.
>>
>> Based on [1] I have moved the patch to "Ready for Committer".
>
> Thanks a lot Amul for working on this.  Like Jesper said, the patch looks
> pretty good overall.  I was looking at the latest version with intent to
> study certain things about hash partitioning the way patch implements it,
> during which I noticed some things.
>

Thanks Amit for looking at the patch.

> +  The modulus must be a positive integer, and the remainder must a
>
> must be a
>

Fixed in the attached version.

> +  suppose you have a hash-partitioned table with 8 children, each of
> which
> +  has modulus 8, but find it necessary to increase the number of
> partitions
> +  to 16.
>

Fixed in the attached version.

> Might it be a good idea to say 8 "partitions" instead of "children" in the
> first sentence?
>
> +  each modulus-8 partition until none remain.  While this may still
> involve
> +  a large amount of data movement at each step, it is still better than
> +  having to create a whole new table and move all the data at once.
> + 
> +
>

Fixed in the attached version.

> I read the paragraph that ends with the above text and started wondering
> if the example to redistribute data in hash partitions by detaching and
> attaching with new modulus/remainder could be illustrated with an example?
> Maybe in the examples section of the ALTER TABLE page?
>

I think hint in the documentation is more than enough. There is N number of
ways of data redistribution, the document is not meant to explain all of those.

> +  Since hash operator class provide only equality, not ordering,
> collation
>
> Either "Since hash operator classes provide" or "Since hash operator class
> provides"
>

Fixed in the attached version.

> Other than the above points, patch looks good.
>
>
> By the way, I noticed a couple of things about hash partition constraints:
>
> 1. In get_qual_for_hash(), using
> get_fn_expr_rettype(>partsupfunc[i]), which returns InvalidOid for
> the lack of fn_expr being set to non-NULL value, causes funcrettype of the
> FuncExpr being generated for hashing partition key columns to be set to
> InvalidOid, which I think is wrong.  That is, the following if condition
> in get_fn_expr_rettype() is always satisfied:
>
> if (!flinfo || !flinfo->fn_expr)
> return InvalidOid;
>
> I think we could use get_func_rettype(>partsupfunc[i].fn_oid)
> instead.  Attached patch
> hash-v21-set-funcexpr-funcrettype-correctly.patch, which applies on top
> v21 of your patch.
>

Thanks for the patch, included in the attached version.

> 2. It seems that the reason constraint exclusion doesn't work with hash
> partitions as implemented by the patch is that predtest.c:
> operator_predicate_proof() returns false even without looking into the
> hash partition constraint, which is of the following form:
>
> satisfies_hash_partition(, , ,..)
>
> beccause the above constraint expression doesn't translate into a a binary
> opclause (an OpExpr), which operator_predicate_proof() knows how to work
> with.  So, false is returned at the beginning of that function by the
> following code:
>
> if (!is_opclause(predicate))
> return false;
>
> For example,
>
> create table p (a int) partition by hash (a);
> create table p0 partition of p for values with (modulus 4, remainder 0);
> create table p1 partition of p for values with (modulus 4, remainder 1);
> \d+ p0
> <...>
> Partition constraint: satisfies_hash_partition(4, 0, hashint4extended(a,
> '8816678312871386367'::bigint))
>
> -- both p0 and p1 scanned
> explain select * from p where satisfies_hash_partition(4, 0,
> hashint4extended(a, '8816678312871386367'::bigint));
>  QUERY PLAN
>
> 
>  Ap

[HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2017-09-27 Thread amul sul
Hi All,

Attaching POC patch that throws an error in the case of a concurrent update
to an already deleted tuple due to UPDATE of partition key[1].

In a normal update new tuple is linked to the old one via ctid forming
a chain of tuple versions but UPDATE of partition key[1] move tuple
from one partition to an another partition table which breaks that chain.

Consider a following[2] concurrent update case where one session trying to
update a row that's locked for a concurrent update by the another session
cause tuple movement to the another partition.

create table foo (a int2, b text) partition by list (a);
create table foo1 partition of foo for values IN (1);
create table foo2 partition of foo for values IN (2);
insert into foo values(1, 'ABC');

 --- session 1 ---
postgres=# begin;
BEGIN
postgres=# update foo set a=2 where a=1;
UPDATE 1


 --- session 2  ---
postgres=# update foo set b='EFG' where a=1;

 ….. wait state ……

--- session 1 ---
postgres=# commit;
COMMIT

 --- session 2  ---
UPDATE 0

This UPDATE 0 is the problematic, see Greg Stark's update[3] explains
why we need an error.

To throw an error we need an indicator that the targeted row has been
already moved to the another partition, and for that setting a ctid.ip_blkid to
InvalidBlockId looks viable option for now.

The attached patch incorporates the following logic suggested by Amit
Kapila[4]:

"We can pass a flag say row_moved (or require_row_movement) to heap_delete
which will in turn set InvalidBlockId in ctid instead of setting it to
self. Then the
ExecUpdate needs to check for the same and return an error when heap_update is
not successful (result != HeapTupleMayBeUpdated)."

1] 
https://postgr.es/m/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com
2] With 
https://postgr.es/m/CAJ3gD9fzD4jBpv+zXqZYnW=h9JXUFG9E7NGdA9gR_JJbOj=q...@mail.gmail.com
patch applied.
3] 
https://postgr.es/m/CAM-w4HPis7rbnwi%2BoXjnouqMSRAC5DeVcMdxEXTMfDos1kaYPQ%40mail.gmail.com
4] 
https://postgr.es/m/CAA4eK1KEZQ%2BCyXbBzfn1jFHoEfa_OemDLhLyy7xfD1QUZLo1DQ%40mail.gmail.com

Regards,
Amul


0001-POC-Invalidate-ip_blkid-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-27 Thread amul sul
On Mon, Sep 18, 2017 at 8:55 PM, Jesper Pedersen <jesper.peder...@redhat.com
> wrote:

> On 09/15/2017 02:30 AM, amul sul wrote:
>
>> Attached rebased patch, thanks.
>>
>>
> While reading through the patch I thought it would be better to keep
> MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to
> highlight that these are "keywords" for hash partition.
>
> Also updated some of the documentation.
>
>
Thanks a lot for the patch, included in the attached version.​


> V20 patch passes make check-world, and my testing (typical 64 partitions,
> and various ATTACH/DETACH scenarios).
>

Nice, ​thanks again.

Regards,
Amul


0001-hash-partitioning_another_design-v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-27 Thread amul sul
On Wed, Sep 27, 2017 at 6:09 AM, Amit Langote  wrote:

> On 2017/09/27 1:51, Robert Haas wrote:
> > On Tue, Sep 26, 2017 at 10:57 AM, Jesper Pedersen
> >  wrote:
> >> One could advocate (*cough*) that the hash partition patch [1] should be
> >> merged first in order to find other instances of where other CommitFest
> >> entries doesn't account for hash partitions at the moment in their
> method
> >> signatures; Beena noted something similar in [2]. I know that you said
> >> otherwise [3], but this is CommitFest 1, so there is time for a revert
> >> later, and hash partitions are already useful in internal testing.
> >
> > Well, that's a fair point.  I was assuming that committing things in
> > that order would cause me to win the "least popular committer" award
> > at least for that day, but maybe not.  It's certainly not ideal to
> > have to juggle that patch along and keep rebasing it over other
> > changes when it's basically done, and just waiting on other
> > improvements to land.  Anybody else wish to express an opinion?
>
> FWIW, I tend to agree that it would be nice to get the hash partitioning
> patch in, even with old constraint exclusion based partition-pruning not
> working for hash partitions.  That way, it might be more clear what we
> need to do in the partition-pruning patches to account for hash partitions.
>
​
+1

regards,
Amul​


Re: [HACKERS] Improve catcache/syscache performance.

2017-09-26 Thread amul sul
On Fri, Sep 22, 2017 at 11:47 AM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2017-09-20 18:26:50 +0530, amul sul wrote:
> > Patch 0007:
>

> Other than these concern, patch looks pretty reasonable to me.
>
> I'd appreciate if you could have a look at the new version as well.
>
>
​I have reviewed
​
the newer version
​, looks good to me​.

​
Here are the few cosmetic comments ​for the 0003 patch​​:

1
​.
 982 +   ct->tuple.t_data = (HeapTupleHeader)
 983 +   MAXALIGN(((char *) ct) + sizeof(CatCTup));

It would be nice if you add a comment for this address alignment​​
​.​


2
​.
 947 /*
 948 -* If there are any out-of-line toasted fields in the tuple,
expand them
 949 -* in-line.  This saves cycles during later use of the catcache
entry, and
 950 -* also protects us against the possibility of the toast tuples
being
 951 -* freed before we attempt to fetch them, in case of something
using a
 952 -* slightly stale catcache entry.
 953  */

Empty comment block left in the CatalogCacheCreateEntry().

3
​.
​ ​
411 +/*
 412 + * CatalogCacheCompareTuple
 413 + *
 414 + * Compare a tuple to the passed arguments.
 415 + */
 416 +static inline bool
 417 +CatalogCacheCompareTuple(const CatCache *cache, int nkeys,
 418 +const Datum *cachekeys,
 419 +const Datum *searchkeys)

​Need an adjust to the p
rolog comment
​.​

Regards,
Amul


Re: [HACKERS] UPDATE of partition key

2017-09-21 Thread amul sul
On Wed, Sep 20, 2017 at 9:27 PM, Amit Khandekar 
wrote:

> On 20 September 2017 at 00:06, Robert Haas  wrote:
> > On Fri, Sep 15, 2017 at 7:25 AM, Amit Khandekar 
> wrote:
> >> [ new patch ]
>

  86 -   (event == TRIGGER_EVENT_UPDATE &&
!trigdesc->trig_update_after_row))
  87 +   (event == TRIGGER_EVENT_UPDATE &&
!trigdesc->trig_update_after_row) ||
  88 +   (event == TRIGGER_EVENT_UPDATE && (oldtup == NULL ||
newtup == NULL)))
  89 return;
  90 }


Either of oldtup or newtup will be valid at a time & vice versa.  Can we
improve
this check accordingly?

For e.g.:
(event == TRIGGER_EVENT_UPDATE && )(HeapTupleIsValid(oldtup) ^
ItemPointerIsValid(newtup)


 247
 248 +   /*
 249 +* EDB: In case this is part of update tuple routing, put this row
into the
 250 +* transition NEW TABLE if we are capturing transition tables. We
need to
 251 +* do this separately for DELETE and INSERT because they happen on
 252 +* different tables.
 253 +*/
 254 +   if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_
capture)
 255 +   ExecARUpdateTriggers(estate, resultRelInfo, NULL,
 256 +NULL,
 257 +tuple,
 258 +NULL,
 259 +mtstate->mt_transition_capture);
 260 +
 261 list_free(recheckIndexes);

 267
 268 +   /*
 269 +* EDB: In case this is part of update tuple routing, put this row
into the
 270 +* transition OLD TABLE if we are capturing transition tables. We
need to
 271 +* do this separately for DELETE and INSERT because they happen on
 272 +* different tables.
 273 +*/
 274 +   if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_
capture)
 275 +   ExecARUpdateTriggers(estate, resultRelInfo, tupleid,
 276 +oldtuple,
 277 +NULL,
 278 +NULL,
 279 +mtstate->mt_transition_capture);
 280 +

Initially, I wondered that why can't we have above code right after
​ExecInsert()​ & ​ExecIDelete()​ in ​ExecUpdate​ respectively?

We can do that for ExecIDelete() but not easily in the ExecInsert() case,
because ExecInsert() internally searches the correct partition's
resultRelInfo
for an insert and before returning to ExecUpdate resultRelInfo is restored
to the old one.  That's why current logic seems to be reasonable for now.
Is there anything that we can do?

Regards,
Amul


Re: [HACKERS] Improve catcache/syscache performance.

2017-09-20 Thread amul sul
Patch 0007:

1:
400 +   /*
401 +* XXX: might be worthwhile to only handle oid sysattr, to
reduce
402 +* overhead - it's the most common key.
403 +*/

IMHO, let fix that as well. I tested this by fixing (see the attach patch)
but does
not found much gain on my local centos vm (of course, the pgbench load
wasn't big enough).

2:  How about have wrapping following condition in SearchCatCacheMiss() by
unlikely():

if (IsBootstrapProcessingMode())
return NULL;

3: Can we have following assert in SearchCatCacheN() instead
SearchSysCacheN(), so that we'll assert direct SearchCatCacheN() call as
well?

Assert(SysCache[cacheId]->cc_nkeys == );

Other than these concern, patch looks pretty reasonable to me.

Regards,
Amul


0007_ex_handle_oid.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-15 Thread amul sul
On Fri, Sep 15, 2017 at 4:30 AM, Thom Brown <t...@linux.com> wrote:

> On 14 September 2017 at 09:58, amul sul <sula...@gmail.com> wrote:
> > On Wed, Sep 13, 2017 at 7:43 PM, Jesper Pedersen
> > <jesper.peder...@redhat.com> wrote:
> >>
> >> Hi Amul,
> >>
> >> On 09/08/2017 08:40 AM, amul sul wrote:
> >>>
> >>> Rebased 0002 against this commit & renamed to 0001, PFA.
> >>>
> >>
> >> This patch needs a rebase.
> >>
> >
> > Thanks for your note.
> > Attached is the patch rebased on the latest master head.
> > Also added error on
> > creating
> > d
> > efault partition
> > for the hash partitioned table
> > ,
> > and updated document &
> > test script for the same.
>
> Sorry, but this needs another rebase as it's broken by commit
> 77b6b5e9ceca04dbd6f0f6cd3fc881519acc8714.
>
> ​
Attached rebased patch, thanks.

Regards,
Amul


0001-hash-partitioning_another_design-v20.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-14 Thread amul sul
On Wed, Sep 13, 2017 at 7:43 PM, Jesper Pedersen <jesper.peder...@redhat.com
> wrote:

> Hi Amul,
>
> On 09/08/2017 08:40 AM, amul sul wrote:
>
>> Rebased 0002 against this commit & renamed to 0001, PFA.
>>
>>
> This patch needs a rebase.
>
>
Thanks for your note.
​ ​
Attached is the patch rebased on the latest master head.
Also added error on
​creating ​
​d
efault partition
​for the hash partitioned table​
,
and updated document &
​ ​
test script for the same.

​Regards,
Amul​


0001-hash-partitioning_another_design-v19.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-09-13 Thread amul sul
On Sun, Sep 10, 2017 at 8:47 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Fri, Sep 8, 2017 at 4:51 PM, amul sul <sula...@gmail.com> wrote:
> > On Thu, May 18, 2017 at 9:13 AM, Amit Kapila <amit.kapil...@gmail.com>
> > wrote:
> >>
> >>  On Wed, May 17, 2017 at 5:17 PM, Robert Haas <robertmh...@gmail.com>
> >> wrote:
> >> > On Wed, May 17, 2017 at 6:29 AM, Amit Kapila <amit.kapil...@gmail.com
> >
> >> > wrote:
> >> >> I think we can do this even without using an additional infomask bit.
> >> >> As suggested by Greg up thread, we can set InvalidBlockId in ctid to
> >> >> indicate such an update.
> >> >
> >> > Hmm.  How would that work?
> >> >
> >>
> >> We can pass a flag say row_moved (or require_row_movement) to
> >> heap_delete which will in turn set InvalidBlockId in ctid instead of
> >> setting it to self. Then the ExecUpdate needs to check for the same
> >> and return an error when heap_update is not successful (result !=
> >> HeapTupleMayBeUpdated).  Can you explain what difficulty are you
> >> envisioning?
> >>
> >
> > Attaching WIP patch incorporates the above logic, although I am yet to
> check
> > all the code for places which might be using ip_blkid.  I have got a
> small
> > query here,
> > do we need an error on HeapTupleSelfUpdated case as well?
> >
>
> No, because that case is anyway a no-op (or error depending on whether
> is updated/deleted by same command or later command).  Basically, even
> if the row wouldn't have been moved to another partition, we would not
> have allowed the command to proceed with the update.  This handling is
> to make commands fail rather than a no-op where otherwise (when the
> tuple is not moved to another partition) the command would have
> succeeded.
>
> ​
Thank you.

I've rebased patch against  Amit Khandekar's latest
​ ​
patch
​ ​
(v17_rebased​_2​)​
​.
​Also ​
added ip_blkid validation
​ ​
check in heap_get_latest_tid(), rewrite_heap_tuple​()​​​
& rewrite_heap_tuple​​() function​, because only
​ ​
ItemPointerEquals() check is no
longer sufficient
​after
 this patch.

Regards,
Amul


0002-invalidate_ctid-ip_blkid-WIP_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-11 Thread amul sul
On Mon, Sep 11, 2017 at 5:30 PM, Alvaro Herrera 
wrote:

> Robert Haas wrote:
> > On Mon, Sep 11, 2017 at 4:17 AM, Ashutosh Bapat
> >  wrote:
> > >> Rebased 0002 against this commit & renamed to 0001, PFA.
> > >
> > > Given that we have default partition support now, I am wondering
> > > whether hash partitioned tables also should have default partitions.
> > > The way we have structured hash partitioning syntax, there can be
> > > "holes" in partitions. Default partition would help plug those holes.
> >
> > Yeah, I was thinking about that, too.  On the one hand, it seems like
> > it's solving the problem the wrong way: if you've set up hash
> > partitioning properly, you shouldn't have any holes.  On the other
> > hand, supporting it probably wouldn't cost anything noticeable and
> > might make things seem more consistent.  I'm not sure which way to
> > jump on this one.
>
> How difficult/tedious/troublesome would be to install the missing
> partitions if you set hash partitioning with a default partition and
> only later on notice that some partitions are missing?  I think if the
> answer is that you need to exclusive-lock something for a long time and
> this causes a disruption in production systems, then it's better not to
> allow a default partition at all and just force all the hash partitions
> to be there from the start.
>
>
I am also leaning toward ​not to support a default partition for a hash
partitioned table.

The major drawback I can see is the constraint get created on the default
partition
table.  IIUC, constraint on the default partition table are just negation
of partition
constraint on all its sibling partitions.

Consider a hash partitioned table having partitions with (modulus 64,
remainder 0) ,
, (modulus 64, remainder 62) hash bound and partition column are col1,
col2,...,so on,
then constraint for the default partition will be :

NOT( (satisfies_hash_partition(64, 0, hash_fn1(col1), hash_fn2(col2), ...)
&& ... &&
  satisfies_hash_partition(64, 62, hash_fn1(col1),hash_fn2(col2), ...))

​Which will be much harmful to the performance than any other partitioning
strategy because it calculate a hash for the same partitioning key multiple
time.
We could overcome this by having an another SQL function (e.g
satisfies_default_hash_partition)
which calculates hash value once and checks the remainder, and that would be
a different path from the current default partition framework.

​Regards,
Amul​


Re: [HACKERS] [POC] hash partitioning

2017-09-08 Thread amul sul
On Fri, Sep 8, 2017 at 6:45 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Sep 4, 2017 at 6:38 AM, amul sul <sula...@gmail.com> wrote:
> > I've updated patch to use an extended hash function (Commit #
> > 81c5e46c490e2426db243eada186995da5bb0ba7) for the partitioning.
>
> Committed 0001 after noticing that Jeevan Ladhe also found that change
> convenient for default partitioning.  I made a few minor cleanups;
> hopefully I didn't break anything.
>

​Thanks you.

Rebased 0002 against this commit & renamed to 0001, PFA.

Regards,
Amul​


0001-hash-partitioning_another_design-v18.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-09-08 Thread amul sul
On Thu, May 18, 2017 at 9:13 AM, Amit Kapila 
wrote:

>  On Wed, May 17, 2017 at 5:17 PM, Robert Haas 
> wrote:
> > On Wed, May 17, 2017 at 6:29 AM, Amit Kapila 
> wrote:
> >> I think we can do this even without using an additional infomask bit.
> >> As suggested by Greg up thread, we can set InvalidBlockId in ctid to
> >> indicate such an update.
> >
> > Hmm.  How would that work?
> >
>
> We can pass a flag say row_moved (or require_row_movement) to
> heap_delete which will in turn set InvalidBlockId in ctid instead of
> setting it to self. Then the ExecUpdate needs to check for the same
> and return an error when heap_update is not successful (result !=
> HeapTupleMayBeUpdated).  Can you explain what difficulty are you
> envisioning?
>
>
Attaching WIP patch incorporates the above logic, although I am yet to check
all the code for places which might be using ip_blkid.  I have got a small
query here,
do we need an error on HeapTupleSelfUpdated case as well?

Note that patch should be applied to the top of Amit Khandekar's latest
patch(v17_rebased).

Regards,
Amul


0002-invalidate-ctid.ip_blkid-WIP.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-09-08 Thread amul sul
On Fri, Sep 1, 2017 at 8:01 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Aug 31, 2017 at 8:40 AM, amul sul <sula...@gmail.com> wrote:
> > Fixed in the attached version.
>
> I fixed these up a bit and committed them.  Thanks.
>
> I think this takes care of adding not only the infrastructure but
> support for all the core data types, but I'm not quite sure how to
> handle upgrading types in contrib.  It looks like citext, hstore, and
> several data types provided by isn have hash opclasses, and I think
> that there's no syntax for adding a support function to an existing
> opclass.  We could add that, but I'm not sure how safe it would be.
>
> TBH, I really don't care much about fixing isn, but it seems like
> fixing citext and hstore would be worthwhile.
>

Attached patch proposes the fix for the citext and hstore contrib.

To make it easy to understand I've split these patch in two part.  0001 adds
a new file for the contrib upgrade & renames an existing file to the higher
version, and 0002 is the actual implementation of extended hash function for
that contrib's data type.

Regards,
Amul


0001-hstore-File-renaming-v1.patch
Description: Binary data


0002-hstore-add-extended-hash-function-v1.patch
Description: Binary data


0001-citext-File-renaming-v1.patch
Description: Binary data


0002-citext-add-extended-hash-function-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-09-04 Thread amul sul
I've updated patch to use an extended hash function (​Commit #
81c5e46c490e2426db243eada186995da5bb0ba7) for the partitioning.

Regards,
Amul


On Thu, Jul 27, 2017 at 5:11 PM, amul sul <sula...@gmail.com> wrote:

> Attaching newer patches rebased against the latest master head. Thanks !
>
> Regards,
> Amul
>


0001-Cleanup_v6.patch
Description: Binary data


0002-hash-partitioning_another_design-v17.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-08-31 Thread amul sul
On Wed, Aug 30, 2017 at 9:05 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Wed, Aug 30, 2017 at 10:43 AM, amul sul <sula...@gmail.com> wrote:
> > Thanks for the suggestion, I have updated 0002-patch accordingly.
> > Using this I found some strange behaviours as follow:
> >
> > 1) standard and extended0 output for the jsonb_hash case is not same.
> > 2) standard and extended0 output for the hash_range case is not same when
> > input
> >is int4range(550274, 1550274)  other case in the patch are fine. This
> can
> > be
> >reproducible with other values as well, for e.g. int8range(1570275,
> > 208112489).
> >
> > Will look into this tomorrow.
>
> Those sound like bugs in your patch.  Specifically:
>
> +/* Same approach as hash_range */
> +result = hash_uint32_extended((uint32) flags, seed);
> +result ^= lower_hash;
> +result = (result << 1) | (result >> 63);
> +result ^= upper_hash;
> ​
>
​
Yes, you are correct.​


> ​
>
> That doesn't give compatible results.  The easiest thing to do might
> be to rotate the high 32 bits and the low 32 bits separately.
> JsonbHashScalarValueExtended has the same problem.  Maybe create a
> helper function that does something like this (untested):
>
> ((x << 1) & UINT64COUNT(0xfffefffe)) | ((x >> 31) &
> UINT64CONST(0x10001))
> ​
>
​
This working as expected,  also tested by executing the following SQL
multiple times:

SELECT v as value, hash_range(v)::bit(32) as standard,
   hash_range_extended(v, 0)::bit(32) as extended0,
   hash_range_extended(v, 1)::bit(32) as extended1
FROM   (VALUES (int8range(floor(random() * 100)::int8, floor(random() *
1000)::int8)),
(int8range(floor(random() * 1000)::int8, floor(random() *
1)::int8)),
(int8range(floor(random() * 1)::int8, floor(random() *
10)::int8)),
 (int8range(floor(random() * 1000)::int8, floor(random() *
1)::int8)),
 (int8range(floor(random() * 1)::int8, floor(random() *
10)::int8))) x(v)
WHERE  hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
   OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);  ​



> > Another case which I want to discuss is, extended and standard version of
> > hashfloat4, hashfloat8 & hash_numeric function will have the same output
> for
> > zero
> > value irrespective of seed value. Do you think we need a fix for this?
>
> Yes, I think you should return the seed rather than 0 in the cases
> where the current code hard-codes a 0 return.  That will give the same
> behavior in the seed == 0 case while cheaply getting us something a
> bit different when there is a seed.
>
> ​Fixed in the attached version.​



> BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this
> patch.
>
> Added i
n the attached version
​.​

Thanks for your all the suggestions.

Regards,
Amul


0001-add-optional-second-hash-proc-v4.patch
Description: Binary data


0002-test-Hash_functions_v3_wip.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-08-30 Thread amul sul
On Tue, Aug 29, 2017 at 11:48 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Aug 22, 2017 at 8:14 AM, amul sul <sula...@gmail.com> wrote:
> > Attaching patch 0002 for the reviewer's testing.
>
> I think that this 0002 is not something we can think of committing
> because there's no guarantee that hash functions will return the same
> results on all platforms.  However, what we could and, I think, should
> do is hash some representative values of each data type and verify
> that hash(x) and hashextended(x, 0) come out the same at least as to
> the low-order 32 bits -- and maybe that hashextend(x, 1) comes out
> different as to the low-order 32 bits.  The easiest way to do this
> seems to be to cast to bit(32).  For example:
>
> SELECT v, hashint4(v)::bit(32) as standard,
> hashint4extended(v, 0)::bit(32) as extended0,
> hashint4extended(v, 1)::bit(32) as extended1
> FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
> WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
>OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);
>
> I suggest adding a version of this for each data type.
>

​Thanks for the suggestion, I have updated 0002-patch accordingly.
Using this I found some strange behaviours as follow:

1) standard and extended0 output for the jsonb_hash case is not same.
2) standard and extended0 output for the hash_range case is not same when
input
   is int4range(550274, 1550274)  other case in the patch are fine. This
can be
   reproducible with other values as well, for e.g. int8range(1570275,
208112489).

Will look into this tomorrow.

​Another case which I want to discuss is, extended and standard version of
hashfloat4, hashfloat8 & hash_numeric function will have the same output
for zero
value irrespective of seed value. Do you think we need a fix for this?


> From your latest version of 0001, I get:
>
> rangetypes.c:1297:8: error: unused variable 'rngtypid'
>   [-Werror,-Wunused-variable]
> Oid rngtypid = RangeTypeGetOid(r);
>
> ​Fixed in the attached version.​


> I suggest not duplicating the comments from each regular function into
> the extended function, but just writing /* Same approach as hashfloat8
> */ when the implementation is non-trivial (no need for this if the
> extended function is a single line or the original function had no
> comments anyway).
>
> ​
Fixed in the attached version.​


> hash_aclitem seems embarrassingly stupid.  I suggest that we make the
> extended version slightly less stupid -- i.e. if the seed is non-zero,
> actually call hash_any_extended on the sum and pass the seed through.
>
>   * Reset info about hash function whenever we pick up new info
> about
>   * equality operator.  This is so we can ensure that the hash
> function
>   * matches the operator.
>   */
>  typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
> +typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
>
> Adjust comment: "about hash function" -> "about hash functions", "hash
> functions matches" -> "hash functions match".
>
> ​
Fixed in the attached version.​


> +extern Datum
> +hash_any_extended(register const unsigned char *k, register int
> +  keylen, uint64 seed);
>
> Formatting.
> ​
>

​Fixed in the attached version.​

​Thanks !

Regards,
Amul​


0001-add-optional-second-hash-proc-v3.patch
Description: Binary data


0002-test-Hash_functions_v2_wip.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-08-29 Thread amul sul
On Tue, Aug 22, 2017 at 5:44 PM, amul sul <sula...@gmail.com> wrote:

> On Fri, Aug 18, 2017 at 11:01 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
>
>> On Fri, Aug 18, 2017 at 1:12 PM, amul sul <sula...@gmail.com> wrote:
>> > I have a small query,  what if I want a cache entry with extended hash
>> > function instead standard one, I might require that while adding
>> > hash_array_extended function? Do you think we need to extend
>> > lookup_type_cache() as well?
>>
>> Hmm, I thought you had changed the hash partitioning stuff so that it
>> didn't rely on lookup_type_cache().  You have to look up the function
>> using the opclass provided in the partition key definition;
>> lookup_type_cache() will give you the default one for the datatype.
>> Maybe just use get_opfamily_proc?
>>
>>
> Yes, we can do that for
> ​the ​
> partitioning code, but my concern is a little bit
> different.  I apologize, I wasn't clear enough.
>
> I am trying to extend hash_array & hash_range function. The hash_array and
> hash_range function calculates hash by using the respective hash function
> for
> the given argument type (i.e. array/range element type), and those hash
> functions are made available in the TypeCacheEntry via
> lookup_type_cache(). But
> in the hash_array & hash_range extended version requires a respective
> extended
> hash function for those element type.
>
> I have added hash_array_extended & hash_range_extended function in the
> attached
> patch 0001, which maintains a local copy of TypeCacheEntry with extended
> hash
> functions. But I am a little bit skeptic about this logic, any
> ​ ​
> advice/suggestions will be
> greatly appreciated.
>
>
​Instead, in the attached patch, I have modified lookup_type_cache() to
request it to get extended hash function in the TypeCacheEntry.

For that, I've introduced new flags as TYPECACHE_HASH_EXTENDED_PROC,
TYPECACHE_HASH_EXTENDED_PROC_FINFO & TCFLAGS_CHECKED_HASH_EXTENDED_PROC, and
additional variables in TypeCacheEntry structure to hold extended hash proc
information.


> The logic in the rest of the extended hash functions is same as the
> standard
> one.
>

​Same for the hash_array_extended() & hash_range_extended() function as
well.

​Regards,
Amul​


0001-add-optional-second-hash-proc-v2.patch
Description: Binary data


0002-test-Hash_functions_wip.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-08-22 Thread amul sul
On Fri, Aug 18, 2017 at 11:01 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Aug 18, 2017 at 1:12 PM, amul sul <sula...@gmail.com> wrote:
> > I have a small query,  what if I want a cache entry with extended hash
> > function instead standard one, I might require that while adding
> > hash_array_extended function? Do you think we need to extend
> > lookup_type_cache() as well?
>
> Hmm, I thought you had changed the hash partitioning stuff so that it
> didn't rely on lookup_type_cache().  You have to look up the function
> using the opclass provided in the partition key definition;
> lookup_type_cache() will give you the default one for the datatype.
> Maybe just use get_opfamily_proc?
>
>
Yes, we can do that for
​the ​
partitioning code, but my concern is a little bit
different.  I apologize, I wasn't clear enough.

I am trying to extend hash_array & hash_range function. The hash_array and
hash_range function calculates hash by using the respective hash function
for
the given argument type (i.e. array/range element type), and those hash
functions are made available in the TypeCacheEntry via lookup_type_cache().
But
in the hash_array & hash_range extended version requires a respective
extended
hash function for those element type.

I have added hash_array_extended & hash_range_extended function in the
attached
patch 0001, which maintains a local copy of TypeCacheEntry with extended
hash
functions. But I am a little bit skeptic about this logic, any
​ ​
advice/suggestions will be
greatly appreciated.

The logic in the rest of the extended hash functions is same as the standard
one.

​Attaching patch 0002​ for the reviewer's testing.

​Regards,
Amul​


0001-add-optional-second-hash-proc-v2-wip.patch
Description: Binary data


0002-test-Hash_functions.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-08-18 Thread amul sul
On Fri, Aug 18, 2017 at 8:49 AM, Robert Haas  wrote:

> On Wed, Aug 16, 2017 at 5:34 PM, Robert Haas 
> wrote:
> > Attached is a quick sketch of how this could perhaps be done (ignoring
> > for the moment the relatively-boring opclass pushups).
>
> Here it is with some relatively-boring opclass pushups added.  I just
> did the int4 bit; the same thing will need to be done, uh, 35 more
> times.  But you get the gist.  No, not that kind of gist.
>
> ​
I will work on this.

I have a small query,  what if I want a cache entry with extended hash
function instead standard one, I might require that while adding
hash_array_extended function? Do you think we need to extend
lookup_type_cache() as well?

Regards,
Amul


Re: [HACKERS] reload-through-the-top-parent switch the partition table

2017-08-08 Thread amul sul
On Tue, Aug 8, 2017 at 6:18 PM, Rushabh Lathia  wrote:
> Thanks Rajkumar for testing and reporting this.
>
>
> It seems like with we set the numParents and parents only for the
> dumpable objects (flagInhTables()). Current patch relies on the numParents
> and parents to get the root partition TableInfo, but when --schema is been
> specified - it doesn't load those information for the object which is not
> dumpable.
>
> Now one options is:
>
> 1) restrict the --load-via-partition-root to be used with
> the --schema or may be some other options - where we restrict the
> objects.
>
> Consider this, partition root is in schema 'a' and the partition table is in
> schema 'b', if someone specify the --schema b with
> --load-via-partition-root,
> I think we should not do "INSERT INTO a.tab" to load the data (because
> user specified --schema b).
>

+1, this looks cleaner to me.

> 2) fix flagInhTables() to load the numParents and the parents information
> for the partition table (can be checked using ispartition), irrespective of
> whether object is dumpable is true or not.
>
> May be something like:
>
> @@ -322,7 +322,11 @@ flagInhTables(TableInfo *tblinfo, int numTables,
>
> /* Don't bother computing anything for non-target tables, either */
> if (!tblinfo[i].dobj.dump)
> +   {
> +   if (tblinfo[i].ispartition)
> +   findParentsByOid([i], inhinfo, numInherits);
> continue;
> +   }
>
> I am still looking into this, meanwhile any inputs are welcome.
>

See the note given in the pg_dump document[1] is :

"When -n is specified, pg_dump makes no attempt to dump any other
database objects that the selected schema(s) might depend upon.
Therefore, there is no guarantee that the results of a specific-schema
dump can be successfully restored by themselves into a clean
database."

If we want to follow the same trend then we could simply dump all
partition(ed) belong to a specified schema. In the Rajkumar’s example,
we need to dump partitions belong to schema 'a' (i.e t1_p1 and
t1_p2_p1), and target root for the insert query will be the same table
or the parent belong to the same schema.

Regards,
Amul

[1] https://www.postgresql.org/docs/devel/static/app-pgdump.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-07-27 Thread amul sul
Attaching newer patches rebased against the latest master head. Thanks !

Regards,
Amul


0001-Cleanup_v6.patch
Description: Binary data


0002-hash-partitioning_another_design-v16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-07-05 Thread amul sul
On Wed, Jul 5, 2017 at 4:50 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Mon, Jul 3, 2017 at 4:39 PM, amul sul <sula...@gmail.com> wrote:
>> Thanks to catching this, fixed in the attached version.
>
> Few comments on the latest version.
>

Thanks for your review, please find my comment inline:

> 0001 looks fine, for 0002 I have some comments.
>
> 1.
> + hbounds = (PartitionHashBound * *) palloc(nparts *
> +  sizeof(PartitionHashBound *));
>
> /s/(PartitionHashBound * *)/(PartitionHashBound **)/g
>

Fixed in the attached version.

> 2.
> RelationBuildPartitionDesc
> {
>  
>
>
> * catalog scan that retrieved them, whereas that in the latter is
> * defined by canonicalized representation of the list values or the
> * range bounds.
> */
> for (i = 0; i < nparts; i++)
> result->oids[mapping[i]] = oids[i];
>
> Should this comments mention about hash as well?
>

Instead, I have generalised this comment in the attached patch

> 3.
>
> if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
> return false;
>
> if (b1->ndatums != b2->ndatums)
> return false;
>
> If ndatums itself is different then no need to access datum memory, so
> better to check ndatum first.
>

You are correct, we already doing this in the
partition_bounds_equal().   This is a redundant code, removed in the
attached version.

> 4.
> + * next larger modulus.  For example, if you have a bunch
> + * of partitions that all have modulus 5, you can add a
> + * new new partition with modulus 10 or a new partition
>
> Typo, "new new partition"  -> "new partition"
>

Fixed in the attached version.

Regards,
Amul


0001-Cleanup_v6.patch
Description: Binary data


0002-hash-partitioning_another_design-v15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-07-03 Thread amul sul
On Fri, Jun 23, 2017 at 11:19 AM, Yugo Nagata <nag...@sraoss.co.jp> wrote:
> On Fri, 23 Jun 2017 13:41:15 +0900
> Yugo Nagata <nag...@sraoss.co.jp> wrote:
>
>> On Tue, 6 Jun 2017 13:03:58 +0530
>> amul sul <sula...@gmail.com> wrote:
>>
>>
>> > Updated patch attached.
>>
>> I looked into the latest patch (v13) and have some comments
>> althogh they might be trivial.
>
> One more comment:
>
> +   if (spec->remainder < 0)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> +errmsg("remainder for hash partition must be a 
> non-negative integer")));
>
> The value of remainder is defined as Iconst in gram.y, so it never be 
> negative.
> Hence, I think this check is not necessary or Assert is enough.
>
Make sense, fixed this as well in the v14 patch. Thanks again.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-07-03 Thread amul sul
On Fri, Jun 23, 2017 at 10:11 AM, Yugo Nagata <nag...@sraoss.co.jp> wrote:
> On Tue, 6 Jun 2017 13:03:58 +0530
> amul sul <sula...@gmail.com> wrote:
>
>
>> Updated patch attached.
>
> I looked into the latest patch (v13) and have some comments
> althogh they might be trivial.
>
Thanks for your review.

> First, I couldn't apply this patch to the latest HEAD due to
> a documentation fix and pgintend updates. It needes rebase.
>
> $ git apply /tmp/0002-hash-partitioning_another_design-v13.patch
> error: patch failed: doc/src/sgml/ref/create_table.sgml:87
> error: doc/src/sgml/ref/create_table.sgml: patch does not apply
> error: patch failed: src/backend/catalog/partition.c:76
> error: src/backend/catalog/partition.c: patch does not apply
> error: patch failed: src/backend/commands/tablecmds.c:13371
> error: src/backend/commands/tablecmds.c: patch does not apply
>
Fixed.

>
>
> +   Hash Partitioning
> +
> +   
> +
> + The table is partitioned by specifying modulus and remainder for 
> each
> + partition. Each partition holds rows for which the hash value of
> + partition keys when divided by specified modulus produces specified
> + remainder. For more clarification on modulus and remainder please 
> refer
> + .
> +
> +   
> +  
> +
> +  
> Range Partitioning
>
> I think this section should be inserted after List Partitioning section 
> because
> the order of the descriptions is Range, List, then Hash in other places of
> the documentation. At least,
>
Fixed in the attached version.

>
> -partition bounds.  Currently supported
> -partitioning methods include range and list, where each partition is
> -assigned a range of keys and a list of keys, respectively.
> +partition bounds.  The currently supported
> +partitioning methods are list, range, and hash.
> 
>
> Also in this hunk. I think "The currently supported partitioning methods are
> range, list, and hash." is better. We don't need to change the order of
> the original description.
>
Fixed in the attached version.

>
>
> 
> -Declarative partitioning only supports list and range partitioning,
> -whereas table inheritance allows data to be divided in a manner of
> -the user's choosing.  (Note, however, that if constraint exclusion is
> -unable to prune partitions effectively, query performance will be 
> very
> -poor.)
> +Declarative partitioning only supports hash, list and range
> +partitioning, whereas table inheritance allows data to be divided in 
> a
> +manner of the user's choosing.  (Note, however, that if constraint
> +exclusion is unable to prune partitions effectively, query 
> performance
> +will be very poor.)
>
> Similarly, I think "Declarative partitioning only supports range, list and 
> hash
> partitioning," is better.
>
Fixed in the attached version.

>
> +
> +  
> +   Create a hash partitioned table:
> +
> +CREATE TABLE orders (
> +order_id bigint not null,
> +cust_id  bigint not null,
> +status   text
> +) PARTITION BY HASH (order_id);
> +
> +
>
> This paragraph should be inserted between "Create a list partitioned table:"
> paragraph and "Ceate partition of a range partitioned table:" paragraph
> as well as range and list.
>
Fixed in the attached version.

>
> *strategy = PARTITION_STRATEGY_LIST;
> else if (pg_strcasecmp(partspec->strategy, "range") == 0)
> *strategy = PARTITION_STRATEGY_RANGE;
> +   else if (pg_strcasecmp(partspec->strategy, "hash") == 0)
> +   *strategy = PARTITION_STRATEGY_HASH;
> else
> ereport(ERROR,
>
> In the most of codes, the order is hash, range, then list, but only
> in transformPartitionSpec(), the order is list, range, then hash,
> as above. Maybe it is better to be uniform.
>
Make sense, fixed in the attached version.

>
> +   {
> +   if (strategy == PARTITION_STRATEGY_HASH)
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_UNDEFINED_OBJECT),
> +errmsg("data type %s 
> has no default hash operator class",
> +   
> format_type_be(atttype)),
> + 

Re: [HACKERS] Multi column range partition table

2017-06-22 Thread amul sul
On Fri, Jun 23, 2017 at 6:58 AM, Amit Langote
<langote_amit...@lab.ntt.co.jp> wrote:
> On 2017/06/22 20:48, amul sul wrote:
>> Hi,
>>
>> While working on the another patch, I came across the case where
>> I need an auto generated partition for a mutil-column range partitioned
>> table having following range bound:
>>
>> PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
>> PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
>> PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
>> PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
>> PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)
>>
>> In this, a lower bound of the partition is an upper bound of the
>> previous partition.
>>
>> While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
>> got an overlap partition error.
>>
>> Here is the SQL to reproduced this error:
>>
>> CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
>> CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
>> UNBOUNDED) TO (10, 10);
>> CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
>> (10, UNBOUNDED);
>> CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 
>> 10);
>>
>> ERROR:  partition "p3" would overlap partition "tab1_p_10_10"
>>
>> This happened because of UNBOUNDED handling, where it is a negative infinite
>> if it is in FROM clause.  Wondering can't we explicitly treat this as
>> a positive infinite value, can we?
>
> No, we cannot.  What would be greater than (or equal to) +infinite?
> Nothing.  So, even if you will want p3 to accept (10, 9890148), it won't
> because 9890148 is not >= +infinite.  It will accept only the rows where
> the first column is > 10 (second column is not checked in that case).
>
> You will have to define p3 as follows:
>
> CREATE TABLE p3 PARTITION OF tab1 FOR VALUES FROM (11, UNBOUNDED) TO (20, 10);
>
What if the partition key column is FLOAT ?

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multi column range partition table

2017-06-22 Thread amul sul
Hi,

While working on the another patch, I came across the case where
I need an auto generated partition for a mutil-column range partitioned
table having following range bound:

PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)

In this, a lower bound of the partition is an upper bound of the
previous partition.

While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
got an overlap partition error.

Here is the SQL to reproduced this error:

CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
UNBOUNDED) TO (10, 10);
CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
(10, UNBOUNDED);
CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 10);

ERROR:  partition "p3" would overlap partition "tab1_p_10_10"

This happened because of UNBOUNDED handling, where it is a negative infinite
if it is in FROM clause.  Wondering can't we explicitly treat this as
a positive infinite value, can we?

Thoughts/Comments?

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove unnecessary flag has_null from PartitionBoundInfoData

2017-06-12 Thread amul sul
On Wed, May 17, 2017 at 10:22 PM, Robert Haas  wrote:
[...]
> I committed this with fixes for those issues, plus I renamed the macro
> to partition_bound_accepts_nulls, which I think is more clear.
>
partition_bound_accepts_nulls() will alway yield true for a range
partitioning case, because in RelationBuildPartitionDesc, we forgot to
set boundinfo->null_index to -1.

The attached patch fixes that.


Regards,
Amul


set_boundinfo-null_index.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-07 Thread amul sul
On Wed, Jun 7, 2017 at 10:30 AM, Jeevan Ladhe
 wrote:
>
>
>> IIUC, default partition constraints is simply NOT IN (> other sibling partitions>).
>> If constraint on the default partition refutes the new partition's
>> constraints that means we have overlapping partition, and perhaps
>> error.
>
>
> You are correct Amul, but this error will be thrown before we try to
> check for the default partition data. So, in such cases I think we really
> do not need to have logic to check if default partition refutes the new
> partition contraints.
>

But Ashutosh's suggestion make sense, we might have constraints other
than that partitioning constraint on default partition.  If those
constraints refutes the new partition's constraints, we should skip
the scan.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-06 Thread amul sul
On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
 wrote:
[...]
>>
>> The code in check_default_allows_bound() to check whether the default
>> partition
>> has any rows that would fit new partition looks quite similar to the code
>> in
>> ATExecAttachPartition() checking whether all rows in the table being
>> attached
>> as a partition fit the partition bounds. One thing that
>> check_default_allows_bound() misses is, if there's already a constraint on
>> the
>> default partition refutes the partition constraint on the new partition,
>> we can
>> skip the scan of the default partition since it can not have rows that
>> would
>> fit the new partition. ATExecAttachPartition() has code to deal with a
>> similar
>> case i.e. the table being attached has a constraint which implies the
>> partition
>> constraint. There may be more cases which check_default_allows_bound()
>> does not
>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>> it's
>> better to somehow take out the common code into a function and use it. We
>> will
>> have to deal with a difference through. The first one would throw an error
>> when
>> finding a row that satisfies partition constraints whereas the second one
>> would
>> throw an error when it doesn't find such a row. But this difference can be
>> handled through a flag or by negating the constraint. This would also take
>> care
>> of Amit Langote's complaint about foreign partitions. There's also another
>> difference that the ATExecAttachPartition() queues the table for scan and
>> the
>> actual scan takes place in ATRewriteTable(), but there is not such queue
>> while
>> creating a table as a partition. But we should check if we can reuse the
>> code to
>> scan the heap for checking a constraint.
>>
>> In case of ATTACH PARTITION, probably we should schedule scan of default
>> partition in the alter table's work queue like what
>> ATExecAttachPartition() is
>> doing for the table being attached. That would fit in the way alter table
>> works.
>
>
> I am still working on this.
> But, about your comment here:
> "if there's already a constraint on the default partition refutes the
> partition
> constraint on the new partition, we can skip the scan":
> I am so far not able to imagine such a case, since default partition
> constraint
> can be imagined something like "minus infinity to positive infinity with
> some finite set elimination", and any new non-default partition being added
> would simply be a set of finite values(at-least in case of list, but I think
> range
> should not also differ here). Hence one cannot imply the other here.
> Possibly,
> I might be missing something that you had visioned when you raised the flag,
> please correct me if I am missing something.
>

IIUC, default partition constraints is simply NOT IN ().
If constraint on the default partition refutes the new partition's
constraints that means we have overlapping partition, and perhaps
error.


Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-06-06 Thread amul sul
Hi Dilip,

Thanks for review.

On Sat, Jun 3, 2017 at 6:54 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Thu, May 25, 2017 at 9:59 AM, amul sul <sula...@gmail.com> wrote:
>> On Mon, May 22, 2017 at 2:23 PM, amul sul <sula...@gmail.com> wrote:
>>>
>>> Updated patch attached. Thanks a lot for review.
>>>
>> Minor fix in the document, PFA.
>
> Patch need rebase
>

Done.

> ---
> Function header is not consistent with other neighbouring functions
> (some function contains function name in the header but others don't)
> +/*
> + * Compute the hash value for given not null partition key values.
> + */
>
Done.

> --
> postgres=# create table t1 partition of t for values with (modulus 2,
> remainder 1) partition by range(a);
> CREATE TABLE
> postgres=# create table t1_1 partition of t1 for values from (8) to (10);
> CREATE TABLE
> postgres=# insert into t1 values(8);
> 2017-06-03 18:41:46.067 IST [5433] ERROR:  new row for relation "t1_1"
> violates partition constraint
> 2017-06-03 18:41:46.067 IST [5433] DETAIL:  Failing row contains (8).
> 2017-06-03 18:41:46.067 IST [5433] STATEMENT:  insert into t1 values(8);
> ERROR:  new row for relation "t1_1" violates partition constraint
> DETAIL:  Failing row contains (8).
>
> The value 8 is violating the partition constraint of the t1 and we are
> trying to insert to value in t1,
> still, the error is coming from the leaf level table t1_1, that may be
> fine but from error, it appears that
> it's violating the constraint of t1_1 whereas it's actually violating
> the constraint of t1.
>
> From Implementation, it appears that based on the key are identifying
> the leaf partition and it's only failing during ExecInsert while
> checking the partition constraint.
>
May I ask you, how you sure about 8 is an unfit value for t1 relation?
And what if the value other than 8, for e.g. 7?

Updated patch attached.

Regards,
Amul Sul


0001-Cleanup_v5.patch
Description: Binary data


0002-hash-partitioning_another_design-v13.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-24 Thread amul sul
On Mon, May 22, 2017 at 2:23 PM, amul sul <sula...@gmail.com> wrote:
>
> Updated patch attached. Thanks a lot for review.
>
Minor fix in the document, PFA.

Regards,
Amul


0002-hash-partitioning_another_design-v12.patch
Description: Binary data


0001-Cleanup_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-22 Thread amul sul
On Fri, May 19, 2017 at 10:35 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, May 19, 2017 at 5:32 AM, amul sul <sula...@gmail.com> wrote:
>> Updated patch attached.  0001-patch rebased against latest head.
>> 0002-patch also incorporates code comments and error message changes
>> as per Robert's & your suggestions. Thanks !
>
> -if (spec->strategy != PARTITION_STRATEGY_LIST)
> -elog(ERROR, "invalid strategy in partition bound spec");
> +Assert(spec->strategy == PARTITION_STRATEGY_LIST);
>
> Let's just drop these hunks.  I realize this is a response to a review
> comment I made, but I take it back.  If the existing code is already
> doing it this way, there's no real need to revise it.  The patch
> doesn't even make it consistent anyway, since elsewhere you elog() for
> a similar case.  Perhaps elog() is best anyway.
>
Done.

> -partitioning methods include range and list, where each partition is
> -assigned a range of keys and a list of keys, respectively.
> +partitioning methods include hash, range and list, where each partition 
> is
> +assigned a modulus and remainder of keys, a range of keys and a list of
> +keys, respectively.
>
> I think this sentence has become too long and unwieldy, and is more
> unclear than helpful.  I'd just write "The currently supported
> partitioning methods are list, range, and hash."  The use of the word
> include is actually wrong here, because it implies that there are more
> not mentioned here, which is false.
>
Done.

> -  expression.  If no btree operator class is specified when creating a
> -  partitioned table, the default btree operator class for the datatype 
> will
> -  be used.  If there is none, an error will be reported.
> +  expression.  List and range partitioning uses only btree operator 
> class.
> +  Hash partitioning uses only hash operator class. If no operator class 
> is
> +  specified when creating a partitioned table, the default operator class
> +  for the datatype will be used.  If there is none, an error will be
> +  reported.
> + 
>
> I suggest: If no operator class is specified when creating a
> partitioned table, the default operator class of the appropriate type
> (btree for list and range partitioning, hash for hash partitioning)
> will be used.  If there is none, an error will be reported.
>
Done.

> + 
> +  Since hash partitiong operator class, provide only equality,
> not ordering,
> +  collation is not relevant in hash partition key column. An error will 
> be
> +  reported if collation is specified.
>
> partitiong -> partitioning.  Also, remove the comma after "operator
> class" and change "not relevant in hash partition key column" to "not
> relevant for hash partitioning".  Also change "if collation is
> specified" to "if a collation is specified".
>
Done.

> +   Create a hash partitioned table:
> +
> +CREATE TABLE orders (
> +order_id bigint not null,
> +cust_id  bigint not null,
> +status   text
> +) PARTITION BY HASH (order_id);
> +
>
> Move this down so it's just above the example of creating partitions.
>
Done.

> + * For range and list partitioned tables, datums is an array of datum-tuples
> + * with key->partnatts datums each.
> + * For hash partitioned tables, it is an array of datum-tuples with 2 datums,
> + * modulus and remainder, corresponding to a given partition.
>
> Second line is very short; reflow as one paragraph.
>
Done

>   * In case of range partitioning, it stores one entry per distinct range
>   * datum, which is the index of the partition for which a given datum
>   * is an upper bound.
> + * In the case of hash partitioning, the number of the entries in the indexes
> + * array is same as the greatest modulus amongst all partitions. For a given
> + * partition key datum-tuple, the index of the partition which would
> accept that
> + * datum-tuple would be given by the entry pointed by remainder produced when
> + * hash value of the datum-tuple is divided by the greatest modulus.
>
> Insert line break before the new text as a paragraph break.

Will wait for more inputs on Ashutosh's explanation upthread.

>
> +charstrategy;/* hash, list or range bounds? */
>
> Might be clearer to just write /* hash, list, or range? */ or /*
> bounds for hash, list, or range? */
>

Done, used "hash, list, or range?"

>
> +static uint32 compute_hash_value(PartitionKey key, Datum *values,
> bool *isnull);
> +
>
> I think there should be a blank line a

Re: [HACKERS] [POC] hash partitioning

2017-05-19 Thread amul sul
On Wed, May 17, 2017 at 6:54 PM, Ashutosh Bapat
 wrote:
[...]

>
> Comments on the tests
> +#ifdef USE_ASSERT_CHECKING
> +{
> +/*
> + * Hash partition bound stores modulus and remainder at
> + * b1->datums[i][0] and b1->datums[i][1] position respectively.
> + */
> +for (i = 0; i < b1->ndatums; i++)
> +Assert((b1->datums[i][0] == b2->datums[i][0] &&
> +b1->datums[i][1] == b2->datums[i][1]));
> +}
> +#endif
> Why do we need extra {} here?
>

Okay, removed in the attached version.

> Comments on testcases
> +CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH
> (modulus 8, remainder 0);
> +CREATE TABLE fail_part (LIKE hpart_1 INCLUDING CONSTRAINTS);
> +ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH
> (modulus 4, remainder 0);
> Probably you should also test the other-way round case i.e. create modulus 4,
> remainder 0 partition and then try to add partitions with modulus 8, remainder
> 4 and modulus 8, remainder 0. That should fail.
>

Fixed.

> Why to create two tables hash_parted and hash_parted2, you should be able to
> test with only a single table.
>

Fixed.

> +INSERT INTO hpart_2 VALUES (3, 'a');
> +DELETE FROM hpart_2;
> +INSERT INTO hpart_5_a (a, b) VALUES (6, 'a');
> This is slightly tricky. On different platforms the row may map to different
> partitions depending upon how the values are hashed. So, this test may not be
> portable on all the platforms. Probably you should add such testcases with a
> custom hash operator class which is identity function as suggested by Robert.
> This also applies to the tests in insert.sql and update.sql for partitioned
> table without custom opclass.
>

Yes, you are correct. Fixed in the attached version.

> +-- delete the faulting row and also add a constraint to skip the scan
> +ALTER TABLE hpart_5 ADD CONSTRAINT hcheck_a CHECK (a IN (5)), ALTER a
> SET NOT NULL;
> The constraint is not same as the implicit constraint added for that 
> partition.
> I am not sure whether it's really going to avoid the scan. Did you verify it?
> If yes, then how?
>

I haven't tested that, may be I've copied blindly, sorry about that.
I don't think this test is needed again for hash partitioning, so removed.

> +ALTER TABLE hash_parted2 ATTACH PARTITION fail_part FOR VALUES WITH
> (modulus 3, remainder 2);
> +ERROR:  every hash partition modulus must be a factor of the next
> larger modulus
> We should add this test with at least two partitions in there so that we can
> check lower and upper modulus. Also, testing with some interesting
> bounds discussed earlier
> in this mail e.g. adding modulus 15 when 5, 10, 60 exist will be better than
> testing with 3, 4 and 8.
>
Similar test do exists in create_table.sql file.

> +ERROR:  cannot use collation for hash partition key column "a"
> This seems to indicate that we can not specify collation for hash partition 
> key
> column, which isn't true. Column a here can have its collation. What's not
> allowed is specifying collation in PARTITION BY clause.
> May be reword the error as "cannot use collation for hash partitioning". or
> plain "cannot use collation in PARTITION BY clause for hash partitioning".
>
> +ERROR:  invalid bound specification for a list partition
> +LINE 1: CREATE TABLE fail_part PARTITION OF list_parted FOR VALUES W...
> +^
> Should the location for this error be that of WITH clause like in case of 
> range
> and list partitioned table.
>

Fixed.

> +select tableoid::regclass as part, a from hash_parted order by part;
> May be add a % 4 to show clearly that the data really goes to the partitioning
> with that remainder.
>

Fixed.

Updated patch attached.  0001-patch rebased against latest head.
0002-patch also incorporates code comments and error message changes
as per Robert's & your suggestions. Thanks !

Regards,
Amul


0001-Cleanup_v3.patch
Description: Binary data


0002-hash-partitioning_another_design-v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-17 Thread amul sul
On Wed, May 17, 2017 at 11:11 AM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Wed, May 17, 2017 at 12:04 AM, amul sul <sula...@gmail.com> wrote:
>> On Tue, May 16, 2017 at 10:00 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>>> On Tue, May 16, 2017 at 4:22 PM, amul sul <sula...@gmail.com> wrote:
>>>> v6 patch has bug in partition oid mapping and indexing, fixed in the
>>>> attached version.
>>>>
>>>> Now partition oids will be arranged in the ascending order of hash
>>>> partition bound  (i.e. modulus and remainder sorting order)
>>>
>>> Thanks for the update patch. I have some more comments.
>>>
>>> 
>>> + if (spec->remainder < 0)
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>>> +  errmsg("hash partition remainder must be less than modulus")));
>>>
>>> I think this error message is not correct, you might want to change it
>>> to "hash partition remainder must be non-negative integer"
>>>
>>
>> Fixed in the attached version;  used "hash partition remainder must be
>> greater than or equal to 0" instead.
>
> I would suggest "non-zero positive", since that's what we are using in
> the documentation.
>

Understood, Fixed in the attached version.

>>
>>> ---
>>>
>>> + The table is partitioned by specifying remainder and modulus for 
>>> each
>>> + partition. Each partition holds rows for which the hash value of
>>>
>>> Wouldn't it be better to say "modulus and remainder" instead of
>>> "remainder and modulus" then it will be consistent?
>>>
>>
>> You are correct, fixed in the attached version.
>>
>>> ---
>>> +   An UPDATE that causes a row to move from one partition 
>>> to
>>> +   another fails, because
>>>
>>> fails, because -> fails because
>>>
>>
>> This hunk is no longer exists in the attached patch, that was mistaken
>> copied, sorry about that.
>>
>>> ---
>>>
>>> Wouldn't it be a good idea to document how to increase the number of
>>> hash partitions, I think we can document it somewhere with an example,
>>> something like Robert explained upthread?
>>>
>>> create table foo (a integer, b text) partition by hash (a);
>>> create table foo1 partition of foo with (modulus 2, remainder 0);
>>> create table foo2 partition of foo with (modulus 2, remainder 1);
>>>
>>> You can detach foo1, create two new partitions with modulus 4 and
>>> remainders 0 and 2, and move the data over from the old partition
>>>
>>> I think it will be good information for a user to have? or it's
>>> already documented and I missed it?
>>>
>
> This is already part of documentation contained in the patch.
>
> Here are some more comments
> @@ -3296,6 +3311,14 @@ ALTER TABLE measurement ATTACH PARTITION
> measurement_y2008m02
> not the partitioned table.
>
>   
> +
> + 
> +  
> +   An UPDATE that causes a row to move from one partition to
> +   another fails, because the new value of the row fails to satisfy the
> +   implicit partition constraint of the original partition.
> +  
> + 
>  
>  
>  
> The description in this chunk is applicable to all the kinds of partitioning.
> Why should it be part of a patch implementing hash partitioning?
>

This was already addressed in the previous patch(v8).

> +Declarative partitioning only supports hash, list and range
> +partitioning, whereas table inheritance allows data to be
> +divided in a manner of the user's choosing.  (Note, however,
> +that if constraint exclusion is unable to prune partitions
> +effectively, query performance will be very poor.)
> Looks like the line width is less than 80 characters.
>

Fixed in the attached version.

> In partition_bounds_equal(), please add comments explaining why is it safe to
> check just the indexes? May be we should add code under assertion to make sure
> that the datums are equal as well.

Added assert in the attached version.

> The comment could be something
> like, "If two partitioned tables have different greatest moduli, their
> partition schemes don't match. If they have same greatest moduli, and
> all remainders have different indexes, they all have same modulus
> specified and the partitions

Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 10:00 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Tue, May 16, 2017 at 4:22 PM, amul sul <sula...@gmail.com> wrote:
>> v6 patch has bug in partition oid mapping and indexing, fixed in the
>> attached version.
>>
>> Now partition oids will be arranged in the ascending order of hash
>> partition bound  (i.e. modulus and remainder sorting order)
>
> Thanks for the update patch. I have some more comments.
>
> 
> + if (spec->remainder < 0)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> +  errmsg("hash partition remainder must be less than modulus")));
>
> I think this error message is not correct, you might want to change it
> to "hash partition remainder must be non-negative integer"
>

Fixed in the attached version;  used "hash partition remainder must be
greater than or equal to 0" instead.

> ---
>
> + The table is partitioned by specifying remainder and modulus for 
> each
> + partition. Each partition holds rows for which the hash value of
>
> Wouldn't it be better to say "modulus and remainder" instead of
> "remainder and modulus" then it will be consistent?
>

You are correct, fixed in the attached version.

> ---
> +   An UPDATE that causes a row to move from one partition to
> +   another fails, because
>
> fails, because -> fails because
>

This hunk is no longer exists in the attached patch, that was mistaken
copied, sorry about that.

> ---
>
> Wouldn't it be a good idea to document how to increase the number of
> hash partitions, I think we can document it somewhere with an example,
> something like Robert explained upthread?
>
> create table foo (a integer, b text) partition by hash (a);
> create table foo1 partition of foo with (modulus 2, remainder 0);
> create table foo2 partition of foo with (modulus 2, remainder 1);
>
> You can detach foo1, create two new partitions with modulus 4 and
> remainders 0 and 2, and move the data over from the old partition
>
> I think it will be good information for a user to have? or it's
> already documented and I missed it?
>

I think, we should, but not sure about it.

Regards,
Amul


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 3:30 PM, amul sul <sula...@gmail.com> wrote:
> On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat
> <ashutosh.ba...@enterprisedb.com> wrote:
>  [...]
>>>>
>>>> +if (key->strategy == PARTITION_STRATEGY_HASH)
>>>> +{
>>>> +ndatums = nparts;
>>>> +hbounds = (PartitionHashBound **) palloc(nparts *
>>>> +
>>>> sizeof(PartitionHashBound *));
>>>> +i = 0;
>>>> +foreach (cell, boundspecs)
>>>> +{
>>>> +PartitionBoundSpec *spec = lfirst(cell);
>>>> +
>>>> [ clipped ]
>>>> +hbounds[i]->index = i;
>>>> +i++;
>>>> +}
>>>> For list and range partitioned table we order the bounds so that two
>>>> partitioned tables have them in the same order irrespective of order in 
>>>> which
>>>> they are specified by the user or hence stored in the catalogs. The 
>>>> partitions
>>>> then get indexes according the order in which their bounds appear in 
>>>> ordered
>>>> arrays of bounds. Thus any two partitioned tables with same partition
>>>> specification always have same PartitionBoundInfoData. This helps in
>>>> partition-wise join to match partition bounds of two given tables.  Above 
>>>> code
>>>> assigns the indexes to the partitions as they appear in the catalogs. This
>>>> means that two partitioned tables with same partition specification but
>>>> different order for partition bound specification will have different
>>>> PartitionBoundInfoData represenation.
>>>>
>>>> If we do that, probably partition_bounds_equal() would reduce to just 
>>>> matching
>>>> indexes and the last element of datums array i.e. the greatest modulus 
>>>> datum.
>>>> If ordered datums array of two partitioned table do not match exactly, the
>>>> mismatch can be because missing datums or different datums. If it's a 
>>>> missing
>>>> datum it will change the greatest modulus or have corresponding entry in
>>>> indexes array as -1. If the entry differs it will cause mismatching 
>>>> indexes in
>>>> the index arrays.
>>>>
>>> Make sense, will fix this.
>>
>> I don't see this being addressed in the patches attached in the reply to 
>> Dilip.
>>
>
> Fixed in the attached version.
>

v6 patch has bug in partition oid mapping and indexing, fixed in the
attached version.

Now partition oids will be arranged in the ascending order of hash
partition bound  (i.e. modulus and remainder sorting order)

Regards,
Amul


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 1:17 PM, Ashutosh Bapat
 wrote:
> Hi,
> Here's patch with some cosmetic fixes to 0002, to be applied on top of 0002.
>

Thank you, included in v6 patch.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-16 Thread amul sul
On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat
 wrote:
 [...]
>>>
>>> +if (key->strategy == PARTITION_STRATEGY_HASH)
>>> +{
>>> +ndatums = nparts;
>>> +hbounds = (PartitionHashBound **) palloc(nparts *
>>> +
>>> sizeof(PartitionHashBound *));
>>> +i = 0;
>>> +foreach (cell, boundspecs)
>>> +{
>>> +PartitionBoundSpec *spec = lfirst(cell);
>>> +
>>> [ clipped ]
>>> +hbounds[i]->index = i;
>>> +i++;
>>> +}
>>> For list and range partitioned table we order the bounds so that two
>>> partitioned tables have them in the same order irrespective of order in 
>>> which
>>> they are specified by the user or hence stored in the catalogs. The 
>>> partitions
>>> then get indexes according the order in which their bounds appear in ordered
>>> arrays of bounds. Thus any two partitioned tables with same partition
>>> specification always have same PartitionBoundInfoData. This helps in
>>> partition-wise join to match partition bounds of two given tables.  Above 
>>> code
>>> assigns the indexes to the partitions as they appear in the catalogs. This
>>> means that two partitioned tables with same partition specification but
>>> different order for partition bound specification will have different
>>> PartitionBoundInfoData represenation.
>>>
>>> If we do that, probably partition_bounds_equal() would reduce to just 
>>> matching
>>> indexes and the last element of datums array i.e. the greatest modulus 
>>> datum.
>>> If ordered datums array of two partitioned table do not match exactly, the
>>> mismatch can be because missing datums or different datums. If it's a 
>>> missing
>>> datum it will change the greatest modulus or have corresponding entry in
>>> indexes array as -1. If the entry differs it will cause mismatching indexes 
>>> in
>>> the index arrays.
>>>
>> Make sense, will fix this.
>
> I don't see this being addressed in the patches attached in the reply to 
> Dilip.
>

Fixed in the attached version.

>>
>>>
>>> +if (key->partattrs[i] != 0)
>>> +{
>>> +keyCol = (Node *) makeVar(1,
>>> +  key->partattrs[i],
>>> +  key->parttypid[i],
>>> +  key->parttypmod[i],
>>> +  key->parttypcoll[i],
>>> +  0);
>>> +
>>> +/* Form hash_fn(value) expression */
>>> +keyCol = (Node *) makeFuncExpr(key->partsupfunc[i].fn_oid,
>>> +
>>> get_fn_expr_rettype(>partsupfunc[i]),
>>> +list_make1(keyCol),
>>> +InvalidOid,
>>> +InvalidOid,
>>> +COERCE_EXPLICIT_CALL);
>>> +}
>>> +else
>>> +{
>>> +keyCol = (Node *) copyObject(lfirst(partexprs_item));
>>> +partexprs_item = lnext(partexprs_item);
>>> +}
>>> I think we should add FuncExpr for column Vars as well as expressions.
>>>
>> Okay, will fix this.
>
> Here, please add a check similar to get_quals_for_range()
> 1840 if (partexprs_item == NULL)
> 1841 elog(ERROR, "wrong number of partition key expressions");
>
>

Fixed in the attached version.

>>
>>> I think we need more comments for compute_hash_value(), mix_hash_value() and
>>> satisfies_hash_partition() as to what each of them accepts and what it
>>> computes.
>>>
>>> +/* key's hash values start from third argument of function. */
>>> +if (!PG_ARGISNULL(i + 2))
>>> +{
>>> +values[i] = PG_GETARG_DATUM(i + 2);
>>> +isnull[i] = false;
>>> +}
>>> +else
>>> +isnull[i] = true;
>>> You could write this as
>>> isnull[i] = PG_ARGISNULL(i + 2);
>>> if (isnull[i])
>>> values[i] = PG_GETARG_DATUM(i + 2);
>>>
>> Okay.
>
> If we have used this technique somewhere else in PG code, please
> mention that function/place.
> /*
>  * Rotate hash left 1 bit before mixing in the next column.  This
>  * prevents equal values in different keys from cancelling each other.
>  */
>

Fixed in the attached version.

>
>>
>>> +foreach (lc, $5)
>>> +{
>>> +DefElem*opt = (DefElem *) lfirst(lc);
>>> A search on WITH in gram.y shows that we do not handle WITH options in 
>>> gram.y.
>>> Usually they are handled at the transformation stage. Why is this an 
>>> exception?
>>> If you do that, we can have all the error handling in
>>> transformPartitionBound().
>>>
>> If so, ForValues need to return list for hash and PartitionBoundSpec
>> for other two; wouldn't  that break code consistency? And such
>> validation is not new in gram.y see 

Re: [HACKERS] [POC] hash partitioning

2017-05-15 Thread amul sul
On Mon, May 15, 2017 at 9:13 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, May 15, 2017 at 6:57 AM, amul sul <sula...@gmail.com> wrote:
>>> Collation is only relevant for ordering, not equality.  Since hash
>>> opclasses provide only equality, not ordering, it's not relevant here.
>>> I'm not sure whether we should error out if it's specified or just
>>> silently ignore it.  Maybe an ERROR is a good idea?  But not sure.
>>>
>> IMHO, we could simply have a WARNING, and ignore collation, thoughts?
>>
>> Updated patches attached.
>
> I think that WARNING is rarely a good compromise between ERROR and
> nothing.  I think we should just decide whether this is legal (and
> then allow it without a WARNING) or not legal (and then ERROR).
> Telling the user that it's allowed but we don't like it doesn't really
> help much.

Understood, will throw an ERROR instead.

Thank you.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-15 Thread amul sul
On Wed, May 10, 2017 at 10:13 PM, Robert Haas  wrote:
> On Wed, May 10, 2017 at 8:34 AM, Ashutosh Bapat
>  wrote:
>> Hash partitioning will partition the data based on the hash value of the
>> partition key. Does that require collation? Should we throw an error/warning 
>> if
>> collation is specified in PARTITION BY clause?
>
> Collation is only relevant for ordering, not equality.  Since hash
> opclasses provide only equality, not ordering, it's not relevant here.
> I'm not sure whether we should error out if it's specified or just
> silently ignore it.  Maybe an ERROR is a good idea?  But not sure.
>
IMHO, we could simply have a WARNING, and ignore collation, thoughts?

Updated patches attached.

Regards,
Amul


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-14 Thread amul sul
On Sat, May 13, 2017 at 12:11 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Fri, May 12, 2017 at 6:08 PM, amul sul <sula...@gmail.com> wrote:
>> Hi,
>>
>> Please find the following updated patches attached:
>
> I have done some testing with the new patch, most of the cases worked
> as per the expectation except below
>
> I expect the planner to select only "Seq Scan on t1" whereas it's
> scanning both the partitions?
>
> create table t (a int, b varchar) partition by hash(a);
> create table t1 partition of t for values with (modulus 8, remainder 0);
> create table t2 partition of t for values with (modulus 8, remainder 1);
>
> postgres=# explain select * from t where a=8;
> QUERY PLAN
> --
>  Append  (cost=0.00..51.75 rows=12 width=36)
>->  Seq Scan on t1  (cost=0.00..25.88 rows=6 width=36)
>  Filter: (a = 8)
>->  Seq Scan on t2  (cost=0.00..25.88 rows=6 width=36)
>  Filter: (a = 8)
> (5 rows)
>
You are correct.  As of now constraint exclusion doesn't work on
partition constraint involves function call[1], and hash partition
constraint does have satisfies_hash_partition() function call.

>
> Some cosmetic comments.
> ---
> + RangeVar   *rv = makeRangeVarFromNameList(castNode(List, nameEl->arg));
> +
>
> Useless Hunk.
>
>  /*
> - * Build a CREATE SEQUENCE command to create the sequence object, and
> - * add it to the list of things to be done before this CREATE/ALTER
> - * TABLE.
> + * Build a CREATE SEQUENCE command to create the sequence object, and add
> + * it to the list of things to be done before this CREATE/ALTER TABLE.
>   */
>
> Seems like, in src/backend/parser/parse_utilcmd.c, you have changed
> the existing code with
> pgindent.  I think it's not a good idea to mix pgindent changes with your 
> patch.
>
Oops, my silly mistake, sorry about that. Fixed in attached version.

Regards,
Amul

1] 
https://www.postgresql.org/message-id/CA%2BTgmoaE9NZ_RiqZQLp2aJXPO4E78QxkQYL-FR2zCDop96Ahdg%40mail.gmail.com


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-14 Thread amul sul
On Fri, May 12, 2017 at 10:39 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Fri, May 12, 2017 at 6:08 PM, amul sul <sula...@gmail.com> wrote:
>> Hi,
>>
>> Please find the following updated patches attached:
>>
>> 0001-Cleanup.patch : Does some cleanup and code refactoring required
>> for hash partition patch. Otherwise, there will be unnecessary diff in
>> 0002 patch
>
> Thanks for splitting the patch.
>
> +if (isnull[0])
> +cur_index = partdesc->boundinfo->null_index;
> This code assumes that null_index will be set to -1 when has_null is false. 
> Per
> RelationBuildPartitionDesc() this is true. But probably we should write this
> code as
> if (isnull[0])
> {
> if (partdesc->boundinfo->has_null)
> cur_index = partdesc->boundinfo->null_index;
> }
> That way we are certain that when has_null is false, cur_index = -1 similar to
> the original code.
>
Okay will add this.  I still don't understood point of having has_null
variable, if no null accepting partition exists then null_index is
alway set to -1 in RelationBuildPartitionDesc.  Anyway, let not change
the original code.

> Additional arguement to ComputePartitionAttrs() isn't used anywhere in this
> patch, so may be this better be part of 0002. If we do this the only change
> that will remain in patch is the refactoring of RelationBuildPartitionDesc(),
> so we may consider merging it into 0002, unless we find that some more
> refactoring is needed. But for now, having it as a separate patch helps.
>
Okay.

> Here's some more comments on 0002
>
> + * In the case of hash partitioning, datums is a 2-D array, stores modulus 
> and
> + * remainder values at datums[x][0] and datums[x][1] respectively for each
> + * partition in the ascending order.
>
> This comment about datums should appear in a paragraph of itself and may be
> rephrased as in the attached patch. May be we could also add a line about
> ndatums for hash partitioned tables as in the attached patch.
>
Thanks, looks good to me; will include this.

[...]
>
> +if (key->strategy == PARTITION_STRATEGY_HASH)
> +{
> +ndatums = nparts;
> +hbounds = (PartitionHashBound **) palloc(nparts *
> +
> sizeof(PartitionHashBound *));
> +i = 0;
> +foreach (cell, boundspecs)
> +{
> +PartitionBoundSpec *spec = lfirst(cell);
> +
> [ clipped ]
> +hbounds[i]->index = i;
> +i++;
> +}
> For list and range partitioned table we order the bounds so that two
> partitioned tables have them in the same order irrespective of order in which
> they are specified by the user or hence stored in the catalogs. The partitions
> then get indexes according the order in which their bounds appear in ordered
> arrays of bounds. Thus any two partitioned tables with same partition
> specification always have same PartitionBoundInfoData. This helps in
> partition-wise join to match partition bounds of two given tables.  Above code
> assigns the indexes to the partitions as they appear in the catalogs. This
> means that two partitioned tables with same partition specification but
> different order for partition bound specification will have different
> PartitionBoundInfoData represenation.
>
> If we do that, probably partition_bounds_equal() would reduce to just matching
> indexes and the last element of datums array i.e. the greatest modulus datum.
> If ordered datums array of two partitioned table do not match exactly, the
> mismatch can be because missing datums or different datums. If it's a missing
> datum it will change the greatest modulus or have corresponding entry in
> indexes array as -1. If the entry differs it will cause mismatching indexes in
> the index arrays.
>
Make sense, will fix this.

[...]
>
> +if (offset < 0)
> +{
> +nmod = DatumGetInt32(datums[0][0]);
> +valid_bound = (nmod % spec->modulus) == 0;
> +}
> +else
> +{
> +pmod = DatumGetInt32(datums[offset][0]);
> +valid_bound = (spec->modulus % pmod) == 0;
> +
> +if (valid_bound && (offset + 1) < ndatums)
> +{
> +nmod = DatumGetInt32(datums[offset + 1][0]);
> +valid_bound = (nmod % spec->modulus) == 0;
> +}
> +}
> May be name the variables as prev_mod(ulus) and next_mod

Re: [HACKERS] [POC] hash partitioning

2017-05-12 Thread amul sul
On Thu, May 11, 2017 at 9:32 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
> On Wed, May 3, 2017 at 6:39 PM, amul sul <sula...@gmail.com> wrote:
>> On Thu, Apr 27, 2017 at 1:42 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>
>>>I spent some time today looking at these patches.  It seems like there
>>>is some more work still needed here to produce something committable
>>>regardless of which way we go, but I am inclined to think that Amul's
>>>patch is a better basis for work going forward than Nagata-san's
>>>patch. Here are some general comments on the two patches:
>>
>> Thanks for your time.
>>
>> [...]
>>
>>> - Neither patch contains any documentation updates, which is bad.
>>
>> Fixed in the attached version.
>
> I have done an intial review of the patch and I have some comments.  I
> will continue the review
> and testing and report the results soon
>
> -
> Patch need to be rebased
>
> 
>
> if (key->strategy == PARTITION_STRATEGY_RANGE)
> {
> /* Disallow nulls in the range partition key of the tuple */
> for (i = 0; i < key->partnatts; i++)
> if (isnull[i])
> ereport(ERROR,
> (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
> errmsg("range partition key of row contains null")));
> }
>
> We need to add PARTITION_STRATEGY_HASH as well, we don't support NULL
> for hash also, right?
> 
We do.

>
> RangeDatumContent **content;/* what's contained in each range bound datum?
>   * (see the above enum); NULL for list
>   * partitioned tables */
>
> This will be NULL for hash as well we need to change the comments.
> -
Fixed in previously posted patch(v3).

>
>   bool has_null; /* Is there a null-accepting partition? false
>   * for range partitioned tables */
>   int null_index; /* Index of the null-accepting partition; -1
>
> Comments needs to be changed for these two members as well
> 
Fixed in previously posted patch(v3).

>
> +/* One bound of a hash partition */
> +typedef struct PartitionHashBound
> +{
> + int modulus;
> + int remainder;
> + int index;
> +} PartitionHashBound;
>
> It will good to add some comments to explain the structure members
>
I think we don't really need that, variable names are ample to explain
its purpose.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-12 Thread amul sul
On Wed, May 10, 2017 at 6:04 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Wed, May 3, 2017 at 6:39 PM, amul sul <sula...@gmail.com> wrote:
>> On Thu, Apr 27, 2017 at 1:42 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>
>>>
>>> This is not yet a detailed review - I may be missing things, and
>>> review and commentary from others is welcome.  If there is no major
>>> disagreement with the idea of moving forward using Amul's patch as a
>>> base, then I will do a more detailed review of that patch (or,
>>> hopefully, an updated version that addresses the above comments).
>>
>
> I agree that Amul's approach makes dump/restore feasible whereas
> Nagata-san's approach makes that difficult. That is a major plus point
> about Amul's patch. Also, it makes it possible to implement
> Nagata-san's syntax, which is more user-friendly in future.
>
> Here are some review comments after my initial reading of Amul's patch:
>
> Hash partitioning will partition the data based on the hash value of the
> partition key. Does that require collation? Should we throw an error/warning 
> if
> collation is specified in PARTITION BY clause?
>
> +int   *indexes;/* Partition indexes; in case of hash
> + * partitioned table array length will be
> + * value of largest modulus, and for others
> + * one entry per member of the datums array
> + * (plus one if range partitioned table) */
> This may be rewritten as "Partition indexes: For hash partitioned table the
> number of indexes will be same as the largest modulus. For list partitioned
> table the number of indexes will be same as the number of datums. For range
> partitioned table the number of indexes will be number of datums plus one.".
> You may be able to reword it to a shorter version, but essentially we will 
> have
> separate description for each strategy.
>
Okay, will fix this.

> I guess, we need to change the comments for the other members too. For example
> "datums" does not contain tuples with key->partnatts attributes for hash
> partitions. It contains a tuple with two attributes, modulus and remainder. We
> may not want to track null_index separately since rows with NULL partition key
> will fit in the partition corresponding to the hash value of NULL. OR may be 
> we
> want to set null_index to partition which contains NULL values, if there is a
> partition created for corresponding remainder, modulus pair and set has_null
> accordingly. Accordingly we will need to update the comments.
>
> cal_hash_value() may be renamed as calc_has_value() or compute_hash_value()?
>
Okay, will rename to compute_hash_value().

> Should we change the if .. else if .. construct in 
> RelationBuildPartitionDesc()
> to a switch case? There's very less chance that we will support a fourth
> partitioning strategy, so if .. else if .. may be fine.
>
> +intmod = hbounds[i]->modulus,
> +place = hbounds[i]->remainder;
> Although there are places in the code where we separate variable declaration
> with same type by comma, most of the code declares each variable with the data
> type on separate line. Should variable "place" be renamed as "remainder" since
> that's what it is ultimately?
>
Okay, will rename "place" to "remainder".

> RelationBuildPartitionDesc() fills up mapping array but never uses it. In this

Agreed, mapping array is not that much useful but not useless, it
required at the end of RelationBuildPartitionDesc() while assigning
OIDs to result->oids, see for-loop just before releasing mapping
memory.

> code the index into mapping array itself is the mapping so it doesn't need to
> be maintained separately like list partiioning case. Similary next_index usage
> looks unnecessary, although that probably improves readability, so may be 
> fine.
>
Anyway, will remove uses of "next_index".

> + *   for p_p1: satisfies_hash_partition(2, 1, pkey, value)
> + *   for p_p2: satisfies_hash_partition(4, 2, pkey, value)
> + *   for p_p3: satisfies_hash_partition(8, 0, pkey, value)
> + *   for p_p4: satisfies_hash_partition(8, 4, pkey, value)
> What the function builds is satisfies_hash_partition(2, 1, pkey). I don't see
> code to add value as an argument to the function. Is that correct?
>
Sorry for confusion,  "pkey" & "value" are the column of table in the
give example.
Renamed those column name to "a" & "b".

> + 

Re: [HACKERS] Bug in pg_dump --table and --exclude-table for declarative partition table handling.

2017-05-09 Thread amul sul
On Tue, May 9, 2017 at 3:51 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Tue, May 9, 2017 at 2:59 PM, Amit Langote
> <langote_amit...@lab.ntt.co.jp> wrote:
>> Hi Amul,
>>
>> On 2017/05/09 16:13, amul sul wrote:
>>> Hi,
>>>
>>> Current pg_dump --exclude-table option excludes partitioned relation
>>> and dumps all its child relations and vice versa for --table option, which
>>> I think is incorrect.
>>
>> I agree that `pg_dump -t partitioned_table` should dump all of its
>> partitions too and that `pg_dump -T partitioned_table` should exclude
>> partitions.  Your patch achieves that.  Some comments:
>>
>> I think we should update doc/src/sgml/ref/pg_dump.sgml to mention this
>> behavior.
>>
>> +static void
>> +find_partition_by_relid(Archive *fout, Oid parentId, SimpleOidList *oids)
>>
>> Is expand_partitioned_table() a slightly better name?
>>
>>
>> +   appendPQExpBuffer(query, "SELECT relkind "
>> + "FROM pg_catalog.pg_class "
>> + "WHERE oid = %u", partid);
>> +
>> +   res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
>> +
>> +   if (PQntuples(res) == 0)
>> +   exit_horribly(NULL, "no matching partition tables 
>> were ");
>> +
>> +   relkind = PQgetvalue(res, 0, 0);
>> +
>> +   if (relkind[0] == RELKIND_PARTITIONED_TABLE)
>> +   find_partition_by_relid(fout, partid, oids);
>>
>> Instead of recursing like this requiring to send one query for every
>> partition, why not issue just one query such that all the partitions in
>> the inheritance tree are returned.  For example, as follows:
>>
>> +appendPQExpBuffer(query, "WITH RECURSIVE partitions (partoid) AS ("
>> + "   SELECT inhrelid"
>> + "   FROM pg_inherits"
>> + "   WHERE inhparent = %u"
>> + " UNION ALL"
>> + "   SELECT inhrelid"
>> + "   FROM pg_inherits, partitions"
>> + "   WHERE inhparent = partitions.partoid )"
>> + " SELECT partoid FROM partitions",
>> + parentId);
>>
>> I included your patch with the above modifications in the attached 0001
>> patch, which also contains the documentation updates.  Please take a look.
>
> I think this patch too has the same problem I described in my reply to
> Amul; it fires a query to server for every partitioned table it
> encounters, that's not very efficient.
>
Agree with Ashutosh, If such information is already available then we need to
leverage of it.

>>
>> When testing the patch, I found that if --table specifies the parent and
>> --exclude specifies one of its partitions, the latter won't be forcefully
>> be included due to the partitioned table expansion, which seems like an
>> acceptable behavior.
>
> unless the partition is default partition or a hash partition.
>
>> On the other hand, if --table specifies a partition
>> and --exclude specifies the partition's parent, the latter makes partition
>> to be excluded as well as part of the expansion, which seems fine too.
>>
>
> I am not sure if it's worth considering partitions and partitioned
> table as different entities as far as dump is concerned. We should
> probably dump the whole partitioned table or none of it. There's merit
> in dumping just a single partition to transfer that data to some other
> server, but I am not sure how much the feature would be used.
>
but won't be useless.

> In order to avoid any such potential anomalies, we should copy dump
> flag for a partition from that of the parent as I have described in my
> reply to Amul.
>
>> One more thing I am wondering about is whether `pg_dump -t partition`
>> should be dumped as a partition definition (that is, as CREATE TABLE
>> PARTITION OF) which is what happens currently or as a regular table (just
>> CREATE TABLE)?  I'm thinking the latter would be better, but would like to
>> confirm before writing any code for that.
>
> If we go about dumping a partition without its parent table, we should
> dump CREATE TABLE with proper list of columns and constraints without
> PARTITION OF clause. But I am not sure whether we should go that
> route.

IMO, I think it's worth to dump CREATE TABLE without PARTITION OF clause.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug in pg_dump --table and --exclude-table for declarative partition table handling.

2017-05-09 Thread amul sul
Hi,

Current pg_dump --exclude-table option excludes partitioned relation
and dumps all its child relations and vice versa for --table option, which
I think is incorrect.

In this case we might need to explore all partitions and exclude or include
from dump according to given pg_dump option, attaching WIP patch proposing
same fix.   Thoughts/Comments?

Regards,
Amul


pg_dump_fix_for_partitioned_rel_wip.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-04 Thread amul sul
On Tue, May 2, 2017 at 9:33 PM, Rahila Syed  wrote:
> Please find attached updated patch with review comments by Robert and Jeevan
> implemented.
>
Patch v8 got clean apply on latest head but server got crash at data
insert in the following test:

-- Create test table
CREATE TABLE test ( a int, b date) PARTITION BY LIST (a);
CREATE TABLE p1 PARTITION OF test FOR VALUES IN  (DEFAULT) PARTITION BY LIST(b);
CREATE TABLE p11 PARTITION OF p1 FOR VALUES IN (DEFAULT);

-- crash
INSERT INTO test VALUES (210,'1/1/2002');

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-03 Thread amul sul
On Thu, Apr 27, 2017 at 1:42 AM, Robert Haas  wrote:

>I spent some time today looking at these patches.  It seems like there
>is some more work still needed here to produce something committable
>regardless of which way we go, but I am inclined to think that Amul's
>patch is a better basis for work going forward than Nagata-san's
>patch. Here are some general comments on the two patches:

Thanks for your time.

[...]

> - Neither patch contains any documentation updates, which is bad.

Fixed in the attached version.

>
> Nagata-san's patch also contains no regression tests.  Amul's patch
> does, but they need to be rebased, since they no longer apply, and I
> think some other improvements are possible as well.  It's probably not
> necessary to re-test things like whether temp and non-temp tables can
> be mixed within a partitioning hierarchy, but there should be tests
> that tuple routing actually works.  The case where it fails because no
> matching partition exists should be tested as well.  Also, the tests
> should validate not only that FOR VALUES isn't accept when creating a
> hash partition (which they do) but also that WITH (...) isn't accepted
> for a range or list partition (which they do not).
>

Fixed in the attached version.

[...]
> - Amul's patch should perhaps update tab completion support:  create
> table foo1 partition of foo  completes with "for values", but now
> "with" will be another option.
>

Fixed in the attached version.

>
> - Amul's patch probably needs to validate the WITH () clause more
> thoroughly.  I bet you get a not-very-great error message if you leave
> out "modulus" and no error at all if you leave out "remainder".
>

Thats not true, there will be syntax error if you leave modulus or
remainder, see this:

postgres=# CREATE TABLE hpart_2 PARTITION OF hash_parted  WITH(modulus 4);
ERROR:  syntax error at or near ")"
LINE 1: ...hpart_2 PARTITION OF hash_parted WITH(modulus 4);

>
> This is not yet a detailed review - I may be missing things, and
> review and commentary from others is welcome.  If there is no major
> disagreement with the idea of moving forward using Amul's patch as a
> base, then I will do a more detailed review of that patch (or,
> hopefully, an updated version that addresses the above comments).
>

I have made a smaller change in earlier proposed syntax to create
partition to be aligned with current range and list partition syntax,
new syntax will be as follow:

CREATE TABLE p1 PARTITION OF hash_parted FOR VALUES WITH (modulus 10,
remainder 1);

Regards,
Amul


hash-partitioning_another_design-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-03-03 Thread amul sul
On Fri, Mar 3, 2017 at 5:00 PM, Greg Stark <st...@mit.edu> wrote:

> On 2 March 2017 at 13:03, amul sul <sula...@gmail.com> wrote:
> > create table foo (a integer, b text) partition by hash (a);
> > create table foo1 partition of foo with (modulus 4, remainder 0);
> > create table foo2 partition of foo with (modulus 8, remainder 1);  --
> legal,
> > modulus doesn't need to match
> > create table foo3 partition of foo with (modulus 8, remainder 4);  --
> > illegal, overlaps foo1
>
>
> Instead of using modulus, why not just divide up the range of hash
> keys using ranges?
> ​ ​
> That should be just as good for a good hash

function (effectively using the high bits instead of the low bits of
> the hash value). And it would mean you could reuse the machinery for
> list partitioning for partition exclusion.
>
It also has the advantage that it's easier to see how to add more
> partitions. You just split all the ranges and (and migrate the
> data...). There's even the possibility of having uneven partitions if
> you have a data distribution skew -- which can happen even if you have
> a good hash function. In a degenerate case you could have a partition
> for a single hash of a particularly common value then a reasonable
> number of partitions for the remaining hash ranges.
>

Initially
​we
 had
​to have ​
somewhat similar thought to make a range of hash
values for
​ ​
each partition, using the same half-open interval syntax we use in general:

create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo for values from (0) to (1073741824);
create table foo2 partition of foo for values from (1073741824) to
(-2147483648);
create table foo3 partition of foo for values from (-2147483648) to
(-1073741824);
create table foo4 partition of foo for values from (-1073741824) to (0);

That's really nice for the system, but not so much for the users. The
system can
now generate each partition constraint correctly immediately upon seeing
the SQL
statement for the corresponding table, which is very desirable. However,
users are
not likely to know that the magic numbers to distribute keys equally across
four
partitions are 1073741824, -2147483648, and -1073741824.
​
So it's pretty
​ ​
user-unfriendly.
​​
​

​Regards,
Amul​


Re: [HACKERS] [POC] hash partitioning

2017-03-02 Thread amul sul
On Wed, Mar 1, 2017 at 3:50 PM, Yugo Nagata  wrote:

> ​[]​
>
> I Agree that it is unavoidable partitions number in modulo hashing,
> > but we can do in other hashing technique.  Have you had thought about
> > Linear hashing[1] or Consistent hashing​[2]?​  This will allow us to
> > add/drop
> > partition with minimal row moment. ​
>
> Thank you for your information of hash technique. I'll see them
> and try to allowing the number of partitions to be changed.
>
> ​
Thanks for showing interest, I was also talking about this with Robert Haas
and
hacking on this, here is what we came up with this.

If we want to introduce hash partitioning without syntax contort and minimal
movement while changing hash partitions (ADD-DROP/ATTACH-DETACH operation),
at start I thought we could pick up linear hashing, because of in both the
hashing we might need to move approx tot_num_of_tuple/tot_num_of_partitions
tuples at adding new partition and no row moment required at dropping
partitioning.

With further thinking and talking through the idea of using linear hashing
with my team, we realized that has some problems specially during pg_dump
and pg_upgrade. Both a regular pg_dump and the binary-upgrade version of
pg_dump which is used by pg_restore need to maintain the identity of the
partitions. We can't rely on things like OID order which may be unstable, or
name order which might not match the order in which partitions were added.
So
somehow the partition position would need to be specified explicitly.

So later we came up with some syntax like this (just fyi, this doesn't add
any new keywords):

create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 4, remainder 0);
create table foo2 partition of foo with (modulus 8, remainder 1);  --
legal, modulus doesn't need to match
create table foo3 partition of foo with (modulus 8, remainder 4);  --
illegal, overlaps foo1

Here we​ need to​ enforce a rule that every modulus must be a factor of the
next
larger modulus. So, for example, if you have a bunch of partitions that all
have
modulus 5, you can add a new​ ​partition with modulus 10 or a new partition
with
modulus 15, but you cannot add both a partition with modulus 10 and a
partition
with modulus 15, because 10 is not a factor of 15. However, you could
simultaneously use modulus 4, modulus 8, modulus 16, and modulus 32 if you
wished, because each modulus is a factor of the next larger one. You could
also use modulus 10, modulus 20, and modulus 60. But you could not use
modulus
10, modulus 15, and modulus 60, because while both of the smaller module are
factors of 60, it is not true that each is a factor of the next.

Other advantages with this rule are:

1. Dropping ​(or detaching) and adding (or attaching) ​a partition can never
cause the rule to be violated.

2. We can easily build a tuple-routing data structure based on the largest
modulus.

For example: If the user has
partition 1 with (modulus 2, remainder 1),
partition 2 with (modulus 4, remainder 2),
partition 3 with (modulus 8, remainder 0) and
partition 4 with (modulus 8, remainder 4),

then we can build the following tuple routing array in the relcache:

== lookup table for hashvalue % 8 ==
0 => p3
1 => p1
2 => p2
3 => p1
4 => p4
5 => p1
6 => p2
7 => p1

3. It's also quite easy to test with a proposed new partition overlaps with
any
existing partition. Just build the mapping array and see if you ever end up
trying to assign a partition to a slot that's already been assigned to some
other partition.

We can still work on the proposed syntax - and I am open for suggestions.
One
more thought is to use FOR VALUES HAVING like:
CREATE TABLE foo1 PARTITION OF foo FOR VALUES HAVING (modulus 2, remainder
1);

But still more thoughts/inputs welcome here.

Attached patch implements former syntax, here is quick demonstration:

1.CREATE :
create table foo (a integer, b text) partition by hash (a);
create table foo1 partition of foo with (modulus 2, remainder 1);
create table foo2 partition of foo with (modulus 4, remainder 2);
create table foo3 partition of foo with (modulus 8, remainder 0);
create table foo4 partition of foo with (modulus 8, remainder 4);

2. Display parent table info:
postgres=# \d+ foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats
target | Description
+-+---+--+-+--+--+-
 a  | integer |   |  | | plain|
 |
 b  | text|   |  | | extended |
 |
Partition key: HASH (a)
Partitions: foo1 WITH (modulus 2, remainder 1),
foo2 WITH (modulus 4, remainder 2),
foo3 WITH (modulus 8, remainder 0),
foo4 WITH (modulus 8, remainder 4)

3. Display child table info:
postgres=# \d+ foo1
Table "public.foo1"
 Column |  Type 

Re: [HACKERS] [POC] hash partitioning

2017-02-28 Thread amul sul
On Tue, Feb 28, 2017 at 8:03 PM, Yugo Nagata  wrote:
> Hi all,
>
> Now we have a declarative partitioning, but hash partitioning is not
> implemented yet. Attached is a POC patch to add the hash partitioning
> feature. I know we will need more discussions about the syntax and other
> specifications before going ahead the project, but I think this runnable
> code might help to discuss what and how we implement this.
>

Great.

> * Description
>
> In this patch, the hash partitioning implementation is basically based
> on the list partitioning mechanism. However, partition bounds cannot be
> specified explicitly, but this is used internally as hash partition
> index, which is calculated when a partition is created or attached.
>
> The tentative syntax to create a partitioned table is as bellow;
>
>  CREATE TABLE h (i int) PARTITION BY HASH(i) PARTITIONS 3 USING hashint4;
>
> The number of partitions is specified by PARTITIONS, which is currently
> constant and cannot be changed, but I think this is needed to be changed
in
> some manner. A hash function is specified by USING. Maybe, specifying hash
> function may be ommitted, and in this case, a default hash function
> corresponding to key type will be used.
>
> A partition table can be create as bellow;
>
>  CREATE TABLE h1 PARTITION OF h;
>  CREATE TABLE h2 PARTITION OF h;
>  CREATE TABLE h3 PARTITION OF h;
>
> FOR VALUES clause cannot be used, and the partition bound is
> calclulated automatically as partition index of single integer value.
>
> When trying create partitions more than the number specified
> by PARTITIONS, it gets an error.
>
> postgres=# create table h4 partition of h;
> ERROR:  cannot create hash partition more than 3 for h
>
> An inserted record is stored in a partition whose index equals
> abs(hashfunc(key)) % . In the above
> example, this is abs(hashint4(i))%3.
>
> postgres=# insert into h (select generate_series(0,20));
> INSERT 0 21
>
> postgres=# select *,tableoid::regclass from h;
>  i  | tableoid
> +--
>   0 | h1
>   1 | h1
>   2 | h1
>   4 | h1
>   8 | h1
>  10 | h1
>  11 | h1
>  14 | h1
>  15 | h1
>  17 | h1
>  20 | h1
>   5 | h2
>  12 | h2
>  13 | h2
>  16 | h2
>  19 | h2
>   3 | h3
>   6 | h3
>   7 | h3
>   9 | h3
>  18 | h3
> (21 rows)
>
> * Todo / discussions
>
> In this patch, we cannot change the number of partitions specified
> by PARTITIONS. I we can change this, the partitioning rule
> ( = abs(hashfunc(key)) % )
> is also changed and then we need reallocatiing records between
> partitions.
>
> In this patch, user can specify a hash function USING. However,
> we migth need default hash functions which are useful and
> proper for hash partitioning.
>
​IMHO, we should try to keep create partition syntax simple and aligned
with other partition strategy. For e.g:
CREATE TABLE h (i int) PARTITION BY HASH(i);

I Agree that it is unavoidable partitions number in modulo hashing,
but we can do in other hashing technique.  Have you had thought about
Linear hashing[1] or Consistent hashing​[2]?​  This will allow us to
add/drop
partition with minimal row moment. ​

​+1 for the default hash function corresponding to partitioning key type.​

Regards,
Amul
​

[1] https://en.wikipedia.org/wiki/Linear_hashing
[2] https://en.wikipedia.org/wiki/Consistent_hashing


Re: [HACKERS] Bug in to_timestamp().

2017-02-15 Thread Amul Sul
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Review of v7 patch: 
- Patch applies to the top of master HEAD cleanly & make check-world also fine.
- Tom Lane's review comments are fixed.



The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2017-02-08 Thread amul sul
About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch,
following test is already covered in alter_table.sql @ Line # 1918,
instead of this kindly add test for list_partition:

 77 +-- cannot drop NOT NULL constraint of a range partition key column
 78 +ALTER TABLE range_parted ALTER a DROP NOT NULL;
 79 +

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2017-02-08 Thread amul sul
Hi Amit,

Regarding following code in ATExecDropNotNull function, I don't see
any special check for RANGE partitioned, is it intended to have same
restriction for LIST partitioned too, I guess not?

  /*
 * If the table is a range partitioned table, check that the column is not
 * in the partition key.
 */
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
PartitionKey key = RelationGetPartitionKey(rel);
int partnatts = get_partition_natts(key),
i;

for (i = 0; i < partnatts; i++)
{
AttrNumber  partattnum = get_partition_col_attnum(key, i);

if (partattnum == attnum)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 errmsg("column \"%s\" is in range partition key",
colName)));
}
}

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Constraint exclusion failed to prune partition in case of partition expression involves function call

2017-02-04 Thread amul sul
I see, thanks Amit.

Regards,
Amul

Sent from a mobile device. Please excuse brevity and tpyos.


[HACKERS] Constraint exclusion failed to prune partition in case of partition expression involves function call

2017-02-02 Thread amul sul
Hi,

In following case, constraint exclusion not able prune partition (even
if function is immutable), is this know behaviour?

--CASE 1 :  create table & insert data
create table foo_list (a integer, b text) partition by list (abs(a));
create table foo_list1 partition of foo_list for values in (0);
create table foo_list2 partition of foo_list for values in (1);
create table foo_list3 partition of foo_list for values in (2);
create table foo_list4 partition of foo_list for values in (3);
insert into foo_list values(0),(1),(-1),(2),(-2),(3),(-3);

--Explain plan
postgres=# explain select * from foo_list where a = 2;
   QUERY PLAN
-
 Append  (cost=0.00..103.50 rows=25 width=36)
   ->  Seq Scan on foo_list  (cost=0.00..0.00 rows=1 width=36)
 Filter: (a = 2)
   ->  Seq Scan on foo_list1  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
   ->  Seq Scan on foo_list2  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
   ->  Seq Scan on foo_list3  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
   ->  Seq Scan on foo_list4  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
(11 rows)

AFAUI, constraint exclusion should prune all above table other than
foo_list3 as happens in the following case :

-- CASE 2: create table & insert data
create table bar_list (a integer, b text) partition by list (a);
create table bar_list1 partition of bar_list for values in (0);
create table bar_list2 partition of bar_list for values in (1);
create table bar_list3 partition of bar_list for values in (2);
create table bar_list4 partition of bar_list for values in (3);
insert into bar_list values(0),(1),(2),(3);

--- Explain plan
postgres=# explain select * from bar_list where a = 2;
   QUERY PLAN
-
 Append  (cost=0.00..25.88 rows=7 width=36)
   ->  Seq Scan on bar_list  (cost=0.00..0.00 rows=1 width=36)
 Filter: (a = 2)
   ->  Seq Scan on bar_list3  (cost=0.00..25.88 rows=6 width=36)
 Filter: (a = 2)
(5 rows)

Thanks & Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background contrib module proposal

2017-01-12 Thread amul sul
On Tue, Jan 10, 2017 at 7:09 AM, Jim Nasby <jim.na...@bluetreble.com> wrote:
> On 1/9/17 7:22 AM, amul sul wrote:
>>
>> On Sun, Jan 8, 2017 at 8:50 AM, Jim Nasby <jim.na...@bluetreble.com>
>> wrote:
>>>
>>>
>> [skipped...]
>>>
>>>
>>> Oh, hmm. So I guess if you do that when the background process is idle
>>> it's
>>> the same as a close?
>>>
>>> I think we need some way to safeguard against accidental forkbombs for
>>> cases
>>> where users aren't intending to leave something running in the
>>> background.
>>> There's other reasons to use this besides spawning long running
>>> processes,
>>> and I'd certainly want to be able to ensure the calling function wasn't
>>> accidentally leaving things running that it didn't mean to. (Maybe the
>>> patch
>>> already does this...)
>>>
>>
>> Current pg_background patch built to the top of BackgroundSession code
>> take care of that;
>> user need to call pg_background_close() to gracefully close previously
>> forked background
>> worker.  Even though if user session who forked this worker exited
>> without calling
>> pg_background_close(), this background worked force to exit with following
>> log:
>>
>> ERROR:  could not read from message queue: Other process has detached
>> queue
>> LOG:  could not send on message queue: Other process has detached queue
>> LOG:  worker process: background session by PID 61242 (PID 61358)
>> exited with exit code 1
>>
>> Does this make sense to you?
>
>
> I'm specifically thinking of autonomous transactions, where you do NOT want
> to run the command in the background, but you do want to run it in another
> connection.
>
> The other example is running a command in another database. Not the same as
> the autonomous transaction use case, but once again you likely don't want
> that command running in the background.
>
> Put another way, when you launch a new process in a shell script, you have
> to do something specific for that process to run in the background.
>
> My concern here is that AIUI the only way to launch a bgworker is with it
> already backgrounded. That makes it much more likely that a bug in your code
> produces an unintended fork-bomb (connection-bomb?). That would be
> especially true if the function was re-entrant.
>
> Since one of the major points of bgworkers is exactly to run stuff in the
> background, while the parent connection is doing something else, I can
> certainly understand the default case being to background something, but I
> think it'd be good to have a simple way to start a bgworker, have it do
> something, and wait until it's done.
>
> Make sense?
>
I am not sure that I've understand this completely, but note that in
current pg_background design we simply launch worker once and feed the
subsequent queries using pg_background_run() api.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background contrib module proposal

2017-01-09 Thread amul sul
On Sun, Jan 8, 2017 at 8:50 AM, Jim Nasby  wrote:
>
[skipped...]
>
> Oh, hmm. So I guess if you do that when the background process is idle it's
> the same as a close?
>
> I think we need some way to safeguard against accidental forkbombs for cases
> where users aren't intending to leave something running in the background.
> There's other reasons to use this besides spawning long running processes,
> and I'd certainly want to be able to ensure the calling function wasn't
> accidentally leaving things running that it didn't mean to. (Maybe the patch
> already does this...)
>

Current pg_background patch built to the top of BackgroundSession code
take care of that;
user need to call pg_background_close() to gracefully close previously
forked background
worker.  Even though if user session who forked this worker exited
without calling
pg_background_close(), this background worked force to exit with following log:

ERROR:  could not read from message queue: Other process has detached queue
LOG:  could not send on message queue: Other process has detached queue
LOG:  worker process: background session by PID 61242 (PID 61358)
exited with exit code 1

Does this make sense to you?


Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2017-01-09 Thread amul sul
Hi,

I got server crash due to assert failure at ATTACHing overlap rang
partition, here is test case to reproduce this:

CREATE TABLE test_parent(a int) PARTITION BY RANGE (a);
CREATE TABLE test_parent_part2 PARTITION OF test_parent FOR VALUES
FROM(100) TO(200);
CREATE TABLE test_parent_part1(a int NOT NULL);
ALTER TABLE test_parent ATTACH PARTITION test_parent_part1 FOR VALUES
FROM(1) TO(200);

I think, this bug exists in the following code of check_new_partition_bound():

 767 if (equal || off1 != off2)
 768 {
 769 overlap = true;
 770 with = boundinfo->indexes[off2 + 1];
 771 }

When equal is true array index should not be 'off2 + 1'.

While reading code related to this, I wondered why
partition_bound_bsearch is not immediately returns when cmpval==0?

Apologise if this has been already reported.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background contrib module proposal

2017-01-07 Thread amul sul
On Sat, Jan 7, 2017 at 2:06 PM, Andrew Borodin <boro...@octonica.com> wrote:
> Hi!
>
> 2017-01-07 11:44 GMT+05:00 amul sul <sula...@gmail.com>:
>
>> Changes:
>> 1. pg_background_launch renamed to pg_background_start
>> 2. pg_background_detach renamed to pg_background_close
>> 3. Added new api to display previously launch background worker & its
>> result count waiting to be read.
> Great work!
>
Thanks.
>
>> Note that attached patches need to apply to the top of the Peter
>> Eisentraut's v2 patch[1].
> I've looked a bit into that patch. I thought you would switch
> MemoryContext before calling BackgroundSessionStart() from
> pg_background? Have I got something wrong?
>
Hmm, understood your point, let pg_background extension do the
required context setup, attached version patch does that change.
Thanks.

Regards,
Amul


0001-Changes-to-Peter-Eisentraut-s-bgsession-v2-patch.patch
Description: Binary data


0002-pg_background-as-client-of-BackgroundSession-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background contrib module proposal

2017-01-06 Thread amul sul
Hi all,

Attaching latest pg_background patch for review as per design proposed
on 22 Dec '16 with following minor changes in the api.

Changes:
1. pg_background_launch renamed to pg_background_start
2. pg_background_detach renamed to pg_background_close
3. Added new api to display previously launch background worker & its
result count waiting to be read.

Todo:
1. Documentation.

Thanks to Andrew Borodin & David Fetter for regression test script,
added in the attached version patch.

Note that attached patches need to apply to the top of the Peter
Eisentraut's v2 patch[1].
Patch 0001 is does some changes in BackgroundSession code required by
pg_background, which we are currently discussing on BackgroundSession
thread.

References:
[1] 
https://www.postgresql.org/message-id/attachment/48403/v2-0001-Add-background-sessions.patch

Regards,
Amul


0001-Changes-to-Peter-Eisentraut-s-bgsession-v2-patch.patch
Description: Binary data


0002-pg_background-as-client-of-BackgroundSession-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2017-01-04 Thread amul sul
On Wed, Jan 4, 2017 at 2:57 PM, Andrew Borodin <boro...@octonica.com> wrote:
> 2017-01-04 10:23 GMT+05:00 amul sul <sula...@gmail.com>:
>> One more query, can we modify
>> BackgroundSessionStart()/BackgroundSession struct to get background
>> worker PID as well?
> I think since session always has a PID it's absoultley reasonable to return 
> PID.
>
>> I can understand this requirement could be sound useless for now,
>> because it only for the benefit of pg_background contrib module only.
> As far as i can unserstand BackgroundSession is not just a feature
> itself, it's the API. So PID would benefit to pg_background and all
> API use cases we didn't implement yet. I do not think that one PID in
> structure will waste huge amount of memory, cycles, dev time,
> readbility of docs, clearness of API etc. AFAIK the only reason may be
> if the PID is not always there.
>

+1, but to make BackgroundSession member accessible outside of
bgsession.c,  we might need to moved BackgroundSession definition to
bgsession.h.

Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2017-01-03 Thread amul sul
On Tue, Jan 3, 2017 at 11:36 PM, Andrew Borodin <boro...@octonica.com> wrote:
> 2017-01-03 19:39 GMT+05:00 Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com>:
>>
>> On 1/3/17 1:26 AM, amul sul wrote:
>> > One more requirement for pg_background is session, command_qh,
>> > response_qh and worker_handle should be last longer than current
>> > memory context, for that we might need to allocate these in
>> > TopMemoryContext.  Please find attach patch does the same change in
>> > BackgroundSessionStart().
>>
>> I had pondered this issue extensively.  The standard coding convention
>> in postgres is that the caller sets the memory context.  See the dblink
>> and plpython patches that make this happen in their own way.
>>
>> I agree it would make sense that you either pass in a memory context or
>> always use TopMemoryContext.  I'm open to these ideas, but they did not
>> seem to match any existing usage.
>
> +1
> Please excuse me if I'm not getting something obvious, but seems like
> BackgroundSessionNew() caller from pg_background_launch() can pick up
> TopMemoryCtx. I think that context setup should be done by extension, not by
> bg_session API.
>

Agree, will do this changes for pg_background.

One more query, can we modify
BackgroundSessionStart()/BackgroundSession struct to get background
worker PID as well?
I am asking because of BackgroundSessionNew() only returns session pointer,
but pg_background_launch() requires this PID to pass to user, which
will be further
used as session identifier at fetching result and/or executing further
queries as well as
to send query cancelation signal to background worker.

I can understand this requirement could be sound useless for now,
because it only for the benefit of pg_background contrib module only.
Thoughts?

Thanks & Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2017-01-02 Thread amul sul
On Fri, Dec 30, 2016 at 3:48 AM, Peter Eisentraut
 wrote:
> On 12/16/16 10:38 AM, Andrew Borodin wrote:
>> 2016-12-16 20:17 GMT+05:00 Peter Eisentraut 
>> :
 And one more thing... Can we have BackgroundSessionExecute() splitted
 into two parts: start query and wait for results?
 It would allow pg_background to reuse bgsession's code.
>>>
>>> Yes, I will look into that.
>>
>> Thank you. I'm marking both patches as "Waiting for author", keeping
>> in mind that pg_background is waiting for bgsessions.
>> After updates I'll review these patches.
>
> New patch, mainly with the function split as you requested above, not
> much else changed.
>

Thanks for your v2 patch, this is really helpful.

One more requirement for pg_background is session, command_qh,
response_qh and worker_handle should be last longer than current
memory context, for that we might need to allocate these in
TopMemoryContext.  Please find attach patch does the same change in
BackgroundSessionStart().

Do let me know if you have any other thoughts/suggestions, thank you.

Regards,
Amul


BackgroundSessionStart.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background contrib module proposal

2016-12-22 Thread amul sul
Hi all,

As we have discussed previously, we need to rework this patch as a client of
Peter Eisentraut's background sessions code[1].

Attaching trial version patch to discussed possible design and api.
We could have following APIs :

• pg_background_launch : This function start and stores new background
session, and returns the process id of background worker.

• pg_background_run : This API takes the process id and SQL command as
input parameter. Using this process id, stored worker's session is
retrieved and give SQL command is executed under it.

• pg_background_result : This API takes the process id as input
parameter and returns the result of command executed thought the
background worker session.  Same as it was before but now result can
be fetch in LIFO order i.e. result of last executed query using
pg_background_run will be fetched first.

• pg_background_detach : This API takes the process id and detach the
background process. Stored worker's session is not dropped until this
called.

• TBC : API to discard result of last query or discard altogether?

• TBC : How about having one more api to see all existing sessions ?


Kindly share your thoughts/suggestions.  Note that attach patch is WIP
version, code, comments & behaviour could be vague.

--
Quick demo:
--
Apply attach patch to the top of Peter Eisentraut's
0001-Add-background-sessions.patch[1]

postgres=# select pg_background_launch();
 pg_background_launch
--
21004
(1 row)

postgres=# select pg_background_run(21004, 'vacuum verbose foo');
 pg_background_run
---

(1 row)

postgres=# select * from pg_background_result(21004) as (x text);
INFO:  vacuuming "public.foo"
INFO:  "foo": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
   x

 VACUUM
(1 row)

postgres=# select pg_background_run(21004, 'select * from foo');
 pg_background_run
---

(1 row)

postgres=# select * from pg_background_result(21004) as (x int);
 x
---
 1
 2
 3
 4
 5
(5 rows)

postgres=# select pg_background_detach(21004);
 pg_background_detach
--

(1 row)


References :
[1] 
https://www.postgresql.org/message-id/e1c2d331-ee6a-432d-e9f5-dcf85cffaf29%402ndquadrant.com.


Regards,
Amul Sul


0002-pg_background_worker_as_client_of_bgsession_trial.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background contrib module proposal

2016-12-19 Thread amul sul
On Tue, Dec 20, 2016 at 12:21 AM, David Fetter <da...@fetter.org> wrote:
> On Thu, Nov 24, 2016 at 09:16:53AM +0530, amul sul wrote:
>> Hi All,
>>
>> I would like to take over pg_background patch and repost for
>> discussion and review.
>
> This looks great.
>
> Sadly, it now breaks initdb when applied atop
> dd728826c538f000220af98de025c00114ad0631 with:
>
> performing post-bootstrap initialization ... TRAP: 
> FailedAssertion("!(const Node*)(rinfo))->type) == T_RestrictInfo))", 
> File: "costsize.c", Line: 660)
> sh: line 1:  2840 Aborted (core dumped) 
> "/home/shackle/pggit/postgresql/tmp_install/home/shackle/10/bin/postgres" 
> --single -F -O -j -c search_path=pg_catalog -c exit_on_error=true template1 > 
> /dev/null
>

I've complied binary with --enable-cassert flag and pg_background
patch to the top of described commit as well as on latest HEAD, but I
am not able to reproduce this crash, see this:

[VM postgresql]$ /home/amul/Public/pg_inst/pg-master/bin/postgres
--single -F -O -j -c search_path=pg_catalog -c exit_on_error=true
template1

PostgreSQL stand-alone backend 10devel
backend> CREATE EXTENSION pg_background;

backend> select * from pg_extension;

1: extname (typeid = 19, len = 64, typmod = -1, byval = f)
2: extowner (typeid = 26, len = 4, typmod = -1, byval = t)
3: extnamespace (typeid = 26, len = 4, typmod = -1, byval = t)
4: extrelocatable (typeid = 16, len = 1, typmod = -1, byval = t)
5: extversion (typeid = 25, len = -1, typmod = -1, byval = f)
6: extconfig (typeid = 1028, len = -1, typmod = -1, byval = f)
7: extcondition (typeid = 1009, len = -1, typmod = -1, byval = f)

1: extname = "plpgsql" (typeid = 19, len = 64, typmod = -1, byval = f)
2: extowner = "10" (typeid = 26, len = 4, typmod = -1, byval = t)
3: extnamespace = "11" (typeid = 26, len = 4, typmod = -1, byval = t)
4: extrelocatable = "f" (typeid = 16, len = 1, typmod = -1, byval = t)
5: extversion = "1.0" (typeid = 25, len = -1, typmod = -1, byval = f)

1: extname = "pg_background" (typeid = 19, len = 64, typmod = -1, byval = f)
2: extowner = "10" (typeid = 26, len = 4, typmod = -1, byval = t)
3: extnamespace = "11" (typeid = 26, len = 4, typmod = -1, byval = t)
4: extrelocatable = "t" (typeid = 16, len = 1, typmod = -1, byval = t)
5: extversion = "1.0" (typeid = 25, len = -1, typmod = -1, byval = f)

backend>

I hope I am missing anything. Thanks !

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2016-12-14 Thread amul sul
On Thu, Dec 15, 2016 at 12:24 PM, Andrew Borodin  wrote:
> 2016-12-15 0:30 GMT+05:00 Peter Eisentraut :
> TryBeginSession()?

 What exactly would that do?
>>> Return status (success\failure) and session object, if a function succeeded.
>>>
>>> If there is max_connections exceeded, then (false,null).
>>>
>>> I'm not sure whether this idiom is common for Python.
>>
>> You can catch PostgreSQL exceptions in PL/Python, so this can be handled
>> in user code.
>>
>> Some better connection management or pooling can probably be built on
>> top of the primitives later, I'd say.
>
> Agree, doing this in Python is the better option.
>
> And one more thing... Can we have BackgroundSessionExecute() splitted
> into two parts: start query and wait for results?
> It would allow pg_background to reuse bgsession's code.
>
+1

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background contrib module proposal

2016-12-13 Thread amul sul
On Tue, Dec 13, 2016 at 2:05 PM, Andrew Borodin <boro...@octonica.com> wrote:
> 2016-12-13 12:55 GMT+05:00 amul sul <sula...@gmail.com>:
>> I think background-session code is not that much deviated from
>> pg_background code,
> It is not derived, though it is not much deviated. background sessions
> code do not have detailed exceptions and code comments, but it is
> doing somewhat similar things.
>>IIUC background-session patch we can manage to
>> reuse it, as suggested by Robert.  This will allow us to maintain
>> session in long run, we could reuse this session to run subsequent
>> queries instead of maintaining separate worker pool.  Thoughts?
>
> One API to deal with both features would be much better, sure.
> "Object" like sessions pool are much easier to implement on top of
> session "object" then on top of worker process, PIDs etc.
>
>>> 4. Query as a future (actually it is implemented already by
>>> pg_background_result)
>>> 5. Promised result. Declare that you are waiting for data of specific
>>> format, execute a query, someone (from another backend) will
>>> eventually place a data to promised result and your query continues
>>> execution.
>>
>> Could you please elaborate more?
>> Do you mean two way communication between foreground & background process?
>
> It is from C++11 threading: future, promise and async - these are
> related but different kinds of aquiring result between threads.
> Feature - is when caller Cl start thread T(or dequeue thread from
> pool) and Cl can wait until T finishes and provides result.
> Here waiting the result is just like selecting from pg_background_result().
> Promise - is when you declare a variable and caller do not know which
> thread will put the data to this variable. But any thread reading
> promise will wait until other thread put a data to promise.
> There are more parallelism patterns there, like async, deffered, lazy,
> but futures and promises from my point of view are most used.
>
Nice, thanks for detailed explanation.

We can use shm_mq infrastructure to share any kind of message between
two processes,
but perhaps we might end up with overestimating what originally pg_background
could used for - the user backend will launch workers and give them an
initial set
of instruction and then results will stream back from the workers to
the user backend.

>>> 6. Cancelation: a way to signal to background query that it's time to
>>> quit gracefully.
>> Let me check this too.
> I think Craig is right: any background query must be ready to be shut
> down. That's what databases are about, you can safely pull the plug at
> any moment.

SIGTERM is handled in current pg_background patch, user can terminate
backend execution using pg_cancel_backend() or pg_terminate_backend()
as shown below:

postgres=> select pg_background_launch('insert into foo
values(generate_series(1,1))');
 pg_background_launch
--
67069
(1 row)

postgres=> select pg_terminate_backend(67069);
 pg_terminate_backend
--
 t
(1 row)

postgres=> select * from pg_background_result(67069) as (x text);
ERROR:  terminating connection due to administrator command
CONTEXT:  background worker, pid 67069
postgres=>

Thanks & Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background contrib module proposal

2016-12-12 Thread amul sul
On Mon, Dec 12, 2016 at 10:47 PM, Andrew Borodin  wrote:
> Hi!
>

Thanks a lot for your review.

> Just in case you'd like to include sleepsort as a test, here it is
> wrapped as a regression test(see attachment). But it has serious
> downside: it runs no less than 5 seconds.
>
> Also I'll list here every parallelism feature I managed to imagine. It
> is not to say that I suggest having some of these features at v1. We
> certainly have to start somewhere, this list is just for information
> purposes.
> 1. Poolable workers. Just to be sure you can reliably queue your task
> somewhere without having "increase max_connections" hint.
> 2. Inside one pool, you can have task chaining. After competition of
> task A (query A) start task B, in case of failure start task C. Task C
> is followed by task D.

I think background-session code is not that much deviated from
pg_background code, IIUC background-session patch we can manage to
reuse it, as suggested by Robert.  This will allow us to maintain
session in long run, we could reuse this session to run subsequent
queries instead of maintaining separate worker pool.  Thoughts?

> 3. Reliably read task status: running\awaiting\completed\errored\in a
> lock. Get a query plan of a task? (I know, there are reasons why it is
> not possible now)
+1, Let me check this.

> 4. Query as a future (actually it is implemented already by
> pg_background_result)
> 5. Promised result. Declare that you are waiting for data of specific
> format, execute a query, someone (from another backend) will
> eventually place a data to promised result and your query continues
> execution.

Could you please elaborate more?
Do you mean two way communication between foreground & background process?

> 6. Cancelation: a way to signal to background query that it's time to
> quit gracefully.
>
Let me check this too.

Thanks & Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_background contrib module proposal

2016-11-23 Thread amul sul
Hi All,

I would like to take over pg_background patch and repost for
discussion and review.

Initially Robert Haas has share this for parallelism demonstration[1]
and abandoned later with
summary of open issue[2] with this pg_background patch need to be
fixed, most of them seems to be
addressed in core except handling of type exists without binary
send/recv functions and documentation.
I have added handling for types that don't have binary send/recv
functions in the attach patch and will
work on documentation at the end.

One concern with this patch is code duplication with
exec_simple_query(), we could
consider Jim Nasby’s patch[3] to overcome this,  but  certainly we
will end up by complicating
exec_simple_query() to make pg_background happy.

As discussed previously[1] pg_background is a contrib module that lets
you launch
arbitrary command in a background worker.

• VACUUM in background
• Autonomous transaction implementation better than dblink way (i.e.
no separate authentication required).
• Allows to perform task like CREATE INDEX CONCURRENTLY from a
procedural language.

This module comes with following SQL APIs:

• pg_background_launch : This API takes SQL command, which user wants
to execute, and size of queue buffer.
  This function returns the process id of background worker.
• pg_background_result   : This API takes the process id as input
parameter and returns the result of command
  executed thought the background worker.
• pg_background_detach : This API takes the process id and detach the
background process which is waiting for
 user to read its results.


Here's an example of running vacuum and then fetching the results.
Notice that the
notices from the original session are propagated to our session; if an
error had occurred,
it would be re-thrown locally when we try to read the results.

postgres=# create table foo (a int);
CREATE TABLE
postgres=# insert into foo values(generate_series(1,5));
INSERT 0 5

postgres=# select pg_background_launch('vacuum verbose foo');
pg_background_launch
--
  65427
(1 row)

postgres=# select * from pg_background_result(65427) as (x text);
INFO:  vacuuming "public.foo"
INFO:  "foo": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
 x

VACUUM
(1 row)


Thanks to Vibhor kumar, Rushabh Lathia and Robert Haas for feedback.

Please let me know your thoughts, and thanks for reading.

[1]. 
https://www.postgresql.org/message-id/CA%2BTgmoam66dTzCP8N2cRcS6S6dBMFX%2BJMba%2BmDf68H%3DKAkNjPQ%40mail.gmail.com
[2]. 
https://www.postgresql.org/message-id/CA%2BTgmobPiT_3Qgjeh3_v%2B8Cq2nMczkPyAYernF_7_W9a-6T1PA%40mail.gmail.com
[3]. https://www.postgresql.org/message-id/54541779.1010906%40BlueTreble.com

Regards,
Amul


0001-pg_background_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_largeobject form pg_dump

2016-11-16 Thread Amul Sul
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Patch v6 looks good to me, passing to committer.

Thanks !

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-06 Thread amul sul
On Mon, Nov 7, 2016 at 10:46 AM, Tsunakawa, Takayuki
<tsunakawa.ta...@jp.fujitsu.com> wrote:
>
> The third paragraph may be redundant, I'm a bit inclined to leave it for 
> kindness and completeness.  The attached revised patch just correct the 
> existing typo (large -> larger).
>

I am not agree to having this paragraph either, I'll leave the
decision to committer.

> I'll change the status to needs review.

The new status of this patch is: Ready for Committer

Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-03 Thread amul sul
Hi Takayuki-san,

IMHO, I think we could remove third paragraph completely and
generalised starting of second paragraph, somewhat looks likes as
follow:


-If you have a dedicated database server with 1GB or more of RAM, a
-reasonable starting value for shared_buffers is 25%
-of the memory in your system.  There are some workloads where even
+A reasonable starting value for
shared_buffers is 25%
+   of the RAM in your system.  There are some workloads where even
 large settings for shared_buffers are effective, but
 because PostgreSQL also relies on the
 operating system cache, it is unlikely that an allocation of more than

I may be wrong here, would like know your and/or community's thought
on this. Thanks.

Regards,
Amul Sul


The new status of this patch is: Waiting on Author


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_largeobject form pg_dump

2016-11-03 Thread amul sul
Kindly ignore this, i've added this note to original thread.

Sorry for noise.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exclude pg_largeobject form pg_dump

2016-11-03 Thread amul sul
Hi Guillaume,

With your v2 patch, -B options working as expected but --no-blobs
options is still unrecognized, this happens is because of you have
forgot to add entry for 'no-blobs' in long_options[] array.

Apart from this concern patch looks good to me. Thanks

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query regarding selectDumpableExtension()

2016-10-31 Thread amul sul
On 31 Oct 2016 6:48 pm, "Tom Lane" <t...@sss.pgh.pa.us> wrote:
>
> amul sul <sula...@gmail.com> writes:
> > On Fri, Oct 28, 2016 at 6:22 PM, Robert Haas <robertmh...@gmail.com>
wrote:
> >> There's a comment in dumpExtension() that explains it.
>
> > Let me explain the case I'm trying to tackle. I have two old dump
> > data, each of them have couple objects depend on plpgsql. I have
> > restored first dump and trying restore second dump using 'pg_restore
> > -c' command, it is failing with following error:
> > ERROR:  cannot drop extension plpgsql because other objects depend on it
>
> This is hardly specific to extensions.  If you try a restore with -c into
> a database that has other random objects besides what's in the dump, you
> could get errors from
> * dropping tables that are referenced by foreign keys from tables not
>   known in the dump
> * dropping functions that are used in views not known in the dump
> * dropping operators or opclasses used by indexes not known in the dump
> etc etc.
>
> > Works well without '-c' option, but that what not a general solution,
IMHO.
>
> The general solution is either don't restore into a database containing
> unrelated objects, or be prepared to ignore errors from the DROP commands.
> The extension case actually works more smoothly than most of the others.
>

Thanks for your explanation, I agree that this is not a single scenario
where we need special care, but still my question stands there, why do we
really need to dump built-in extension?

Of course you could ask, why not? And my answer will be same, "to placate
pg_restore at least in the case I've explained before" :)

Regards,
Amul

Sent from a mobile device. Please excuse brevity and tpyos.


Re: [HACKERS] Query regarding selectDumpableExtension()

2016-10-31 Thread amul sul
On Fri, Oct 28, 2016 at 6:22 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Oct 27, 2016 at 2:11 AM, amul sul <sula...@gmail.com> wrote:
>> selectDumpableExtension() function skip
>> s dump of
>> built-in extensions in case of binary-upgrade  only,
>> why not in normal
>> dump
>> ?
>> Can't we assume those will already be installed in the target database
>> at restore
>> ?
>
Apologise for the delayed response.

> There's a comment in dumpExtension() that explains it.
>
> /*
>  * In a regular dump, we use IF NOT EXISTS so that there isn't a
>  * problem if the extension already exists in the target database;
>  * this is essential for installed-by-default extensions such as
>  * plpgsql.
>  *

Understood.

>  * In binary-upgrade mode, that doesn't work well, so instead we skip
>  * built-in extensions based on their OIDs; see
>  * selectDumpableExtension.
>  */
>

I don't see the necessity of dumping it in normal mode, unless I'm
missing something.

Let me explain the case I'm trying to tackle. I have two old dump
data, each of them have couple objects depend on plpgsql. I have
restored first dump and trying restore second dump using 'pg_restore
-c' command, it is failing with following error:

ERROR:  cannot drop extension plpgsql because other objects depend on it

Works well without '-c' option, but that what not a general solution, IMHO.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Query regarding selectDumpableExtension()

2016-10-27 Thread amul sul
Hi
​,​

 ​
selectDumpableExtension() function skip
​s dump of​
built-in extensions in case of binary-upgrade  only,
​ ​
why not in normal
​dump​
?
​ ​
Can't we assume those will already be installed in the target database
​ at restore
?

Thanks
​ &
Regards,
Amul​


Re: [HACKERS] Bug in to_timestamp().

2016-09-29 Thread Amul Sul
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Appreciate your support. 
I've tested v6 patch again, looks good to me, code in v6 does not differ much 
from v4 patch. Ready for committer review. Thanks !

Regards,
Amul Sul

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-09-29 Thread amul sul
On Thu, Sep 29, 2016 at 2:48 AM, Tom Lane  wrote:
> Artur Zakirov  writes:
>> - now DCH_cache_getnew() is called after parse_format(). Because now
>> parse_format() can raise an error and in the next attempt
>> DCH_cache_search() could return broken cache entry.
>
> I started looking at your 0001-to-timestamp-format-checking-v4.patch
> and this point immediately jumped out at me.  Currently the code relies
> ... without any documentation ... on no elog being thrown out of
> parse_format().  That's at the very least trouble waiting to happen.
> There's a hack to deal with errors from within the NUMDesc_prepare
> subroutine, but it's a pretty ugly and underdocumented hack.  And what
> you had here was randomly different from that solution, too.
>
> After a bit of thought it seemed to me that a much cleaner fix is to add
> a "valid" flag to the cache entries, which we can leave clear until we
> have finished parsing the new format string.  That avoids adding extra
> data copying as you suggested, removes the need for PG_TRY, and just
> generally seems cleaner and more bulletproof.
>
> I've pushed a patch that does it that way.  The 0001 patch will need
> to be rebased over that (might just require removal of some hunks,
> not sure).
>
> I also pushed 0002-to-timestamp-validation-v2.patch with some revisions
> (it'd broken acceptance of BC dates, among other things, but I think
> I fixed everything).
>
> Since you told us earlier that you'd be on vacation through the end of
> September, I'm assuming that nothing more will happen on this patch during
> this commitfest, so I will mark the CF entry Returned With Feedback.

Behalf of Artur I've rebased patch, removed hunk dealing with broken
cache entries by copying it, which is no longer required after 83bed06
commit.

Commitfest status left untouched, I am not sure how to deal with
"Returned With Feedback" patch. Is there any chance that, this can be
considered again for committer review?

Thanks !

Regards,
Amul


0001-to-timestamp-format-checking-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-09-28 Thread amul sul
On Fri, Sep 16, 2016 at 10:01 PM, Artur Zakirov
<a.zaki...@postgrespro.ru> wrote:
> On 25.08.2016 13:26, amul sul wrote:
>>>
>>> Thanks. I've created the entry in
>>> https://commitfest.postgresql.org/10/713/
>>> . You can add yourself as a reviewer.
>>>
>>
>> Done, added myself as reviewer & changed status to "Ready for
>> Committer". Thanks !
>>
>> Regards,
>> Amul Sul
>>
>>
>
> It seems there is no need to rebase patches. But I will not be able to fix
> patches for two weeks because of travel if someone will have a note or
> remark.
>
> So it would be nice if someone will be able to fix possible issues for above
> reasone.

Sure, Ill do that, if required.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix some corner cases that cube_in rejects

2016-09-27 Thread Amul Sul
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Note for committer :  There are unnecessary files (cube_1.out, cube_2.out & 
cube_3.out) in contrib directory, that need to be removed at code checkin,  
other than this concern, I think this patch looks pretty reasonable. Thanks.

Regards,
Amul

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-08-25 Thread amul sul
On Thu, Aug 25, 2016 at 3:41 PM, Artur Zakirov <a.zaki...@postgrespro.ru> wrote:
>>> You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to
>>> execute such query:
>>>
>>>
>>> SELECT to_timestamp('---2000JUN', ' MON');
>>>
>>>
>>> Will be it a proper behaviour?
>>
>>
>>
>> Looks good to me, no one will complain if something working on PG but not
>> on Oracle.
>
>
> Thanks. I've created the entry in https://commitfest.postgresql.org/10/713/
> . You can add yourself as a reviewer.
>

Done, added myself as reviewer & changed status to "Ready for
Committer". Thanks !

Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-08-25 Thread amul sul
On Thursday, August 25, 2016 1:56 PM, Artur Zakirov <a.zaki...@postgrespro.ru> 
wrote:
>> #2. Warning at compilation;
>>
>> formatting.c: In function ‘do_to_timestamp’:
>> formatting.c:3049:37: warning: ‘prev_type’ may be used uninitialized in this 
>> function [-Wmaybe-uninitialized]
>> if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
>> ^
>> formatting.c:2988:5: note: ‘prev_type’ was declared here
>> prev_type;
>> ^
>>
>> You can avoid this by assigning  zero (or introduce NODE_TYPE_INVAL ) to 
>> prev_type at following line:
>>
>> 256 +   prev_type;
>
>
>You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to 
>execute such query:
>
>
>SELECT to_timestamp('---2000JUN', ' MON');
>
>
>Will be it a proper behaviour?


Looks good to me, no one will complain if something working on PG but not on 
Oracle. 


Thanks & Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >