On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat <[email protected]> wrote: [...] >>> >>> + if (key->strategy == PARTITION_STRATEGY_HASH) >>> + { >>> + ndatums = nparts; >>> + hbounds = (PartitionHashBound **) palloc(nparts * >>> + >>> sizeof(PartitionHashBound *)); >>> + i = 0; >>> + foreach (cell, boundspecs) >>> + { >>> + PartitionBoundSpec *spec = lfirst(cell); >>> + >>> [ clipped ] >>> + hbounds[i]->index = i; >>> + i++; >>> + } >>> For list and range partitioned table we order the bounds so that two >>> partitioned tables have them in the same order irrespective of order in >>> which >>> they are specified by the user or hence stored in the catalogs. The >>> partitions >>> then get indexes according the order in which their bounds appear in ordered >>> arrays of bounds. Thus any two partitioned tables with same partition >>> specification always have same PartitionBoundInfoData. This helps in >>> partition-wise join to match partition bounds of two given tables. Above >>> code >>> assigns the indexes to the partitions as they appear in the catalogs. This >>> means that two partitioned tables with same partition specification but >>> different order for partition bound specification will have different >>> PartitionBoundInfoData represenation. >>> >>> If we do that, probably partition_bounds_equal() would reduce to just >>> matching >>> indexes and the last element of datums array i.e. the greatest modulus >>> datum. >>> If ordered datums array of two partitioned table do not match exactly, the >>> mismatch can be because missing datums or different datums. If it's a >>> missing >>> datum it will change the greatest modulus or have corresponding entry in >>> indexes array as -1. If the entry differs it will cause mismatching indexes >>> in >>> the index arrays. >>> >> Make sense, will fix this. > > I don't see this being addressed in the patches attached in the reply to > Dilip. >
Fixed in the attached version.
>>
>>>
>>> + if (key->partattrs[i] != 0)
>>> + {
>>> + keyCol = (Node *) makeVar(1,
>>> + key->partattrs[i],
>>> + key->parttypid[i],
>>> + key->parttypmod[i],
>>> + key->parttypcoll[i],
>>> + 0);
>>> +
>>> + /* Form hash_fn(value) expression */
>>> + keyCol = (Node *) makeFuncExpr(key->partsupfunc[i].fn_oid,
>>> +
>>> get_fn_expr_rettype(&key->partsupfunc[i]),
>>> + list_make1(keyCol),
>>> + InvalidOid,
>>> + InvalidOid,
>>> + COERCE_EXPLICIT_CALL);
>>> + }
>>> + else
>>> + {
>>> + keyCol = (Node *) copyObject(lfirst(partexprs_item));
>>> + partexprs_item = lnext(partexprs_item);
>>> + }
>>> I think we should add FuncExpr for column Vars as well as expressions.
>>>
>> Okay, will fix this.
>
> Here, please add a check similar to get_quals_for_range()
> 1840 if (partexprs_item == NULL)
> 1841 elog(ERROR, "wrong number of partition key expressions");
>
>
Fixed in the attached version.
>>
>>> I think we need more comments for compute_hash_value(), mix_hash_value() and
>>> satisfies_hash_partition() as to what each of them accepts and what it
>>> computes.
>>>
>>> + /* key's hash values start from third argument of function. */
>>> + if (!PG_ARGISNULL(i + 2))
>>> + {
>>> + values[i] = PG_GETARG_DATUM(i + 2);
>>> + isnull[i] = false;
>>> + }
>>> + else
>>> + isnull[i] = true;
>>> You could write this as
>>> isnull[i] = PG_ARGISNULL(i + 2);
>>> if (isnull[i])
>>> values[i] = PG_GETARG_DATUM(i + 2);
>>>
>> Okay.
>
> If we have used this technique somewhere else in PG code, please
> mention that function/place.
> /*
> * Rotate hash left 1 bit before mixing in the next column. This
> * prevents equal values in different keys from cancelling each other.
> */
>
Fixed in the attached version.
>
>>
>>> + foreach (lc, $5)
>>> + {
>>> + DefElem *opt = (DefElem *) lfirst(lc);
>>> A search on WITH in gram.y shows that we do not handle WITH options in
>>> gram.y.
>>> Usually they are handled at the transformation stage. Why is this an
>>> exception?
>>> If you do that, we can have all the error handling in
>>> transformPartitionBound().
>>>
>> If so, ForValues need to return list for hash and PartitionBoundSpec
>> for other two; wouldn't that break code consistency? And such
>> validation is not new in gram.y see xmltable_column_el.
>
> Thanks for pointing that out. Ok, then may be leave it in gram.y. But
> may be we should move the error handling in transform function.
>
IMO, let it be there for readability. It will be easier to understand
why do we have set -1 for modulus and remainder.
>
>>
>>> +DATA(insert OID = 5028 ( satisfies_hash_partition PGNSP PGUID 12 1 0
>>> 2276 0 f f f f f f i s 3 0 16 "23 23 2276" _null_ _null_ _null_ _null_
>>> _null_ satisfies_hash_partition _null_ _null_ _null_ ));
>>> Why is third argument to this function ANY? Shouldn't it be INT4ARRAY
>>> (variadic
>>> INT4)?
>>>
>> Will use INT4ARRAY in next patch, but I am little sceptical of it. we
>> need an unsigned int32, but unfortunately there is not variadic uint32
>> support. How about INT8ARRAY?
>
> Hmm, I think as long as the binary representation of given unsigned
> integer doesn't change in the function call, we could cast an INT32
> datums into unsigned int32, so spending extra 4 bytes per partition
> key doesn't look like worth the effort.
>
> A related question is, all hash functions have return type as
> "integer" but internally they return uint32. Why not to do the same
> for this function as well?
I see. IIUC, there is no harm to use INT4ARRAY, thanks for explanation.
Regards,
Amul
0001-Cleanup_v2.patch
Description: Binary data
0002-hash-partitioning_another_design-v6.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
