Hi,
When loading some data into a partitioned table for testing purpose, I found even if I specified constant value for the partition key[1], it still do the tuple routing for each row. [1]--------------------- UPDATE partitioned set part_key = 2 , … INSERT into partitioned(part_key, ...) select 1, … --------------------- I saw such SQLs automatically generated by some programs, So , personally, It’d be better to skip the tuple routing for this case. IMO, we can use the following steps to skip the tuple routing: 1) collect the column that has constant value in the targetList. 2) compare the constant column with the columns used in partition key. 3) if all the columns used in key are constant then we cache the routed partition and do not do the tuple routing again. In this approach, I did some simple and basic performance tests: ----For plain single column partition key.(partition by range(col)/list(a)...) When loading 100000000 rows into the table, I can see about 5-7% performance gain for both cross-partition UPDATE and INSERT if specified constant for the partition key. ----For more complicated expression partition key(partition by range(UDF_func(col)+x)…) When loading 100000000 rows into the table, it will bring more performance gain. About > 20% performance gain Besides, I did not see noticeable performance degradation for other cases(small data set). Attaching a POC patch about this improvement. Thoughts ? Best regards, houzj
0001-skip-tuple-routing-for-constant-partition-key.patch
Description: 0001-skip-tuple-routing-for-constant-partition-key.patch