Hi,


When loading some data into a partitioned table for testing purpose,

I found even if I specified constant value for the partition key[1], it still do

the tuple routing for each row.



[1]---------------------

UPDATE partitioned set part_key = 2 , …

INSERT into partitioned(part_key, ...) select 1, …

---------------------



I saw such SQLs automatically generated by some programs,

So , personally, It’d be better to skip the tuple routing for this case.



IMO, we can use the following steps to skip the tuple routing:

1) collect the column that has constant value in the targetList.

2) compare the constant column with the columns used in partition key.

3) if all the columns used in key are constant then we cache the routed 
partition

 and do not do the tuple routing again.



In this approach, I did some simple and basic performance tests:



----For plain single column partition key.(partition by range(col)/list(a)...)

When loading 100000000 rows into the table, I can see about 5-7% performance 
gain

for both cross-partition UPDATE and INSERT if specified constant for the 
partition key.



----For more complicated expression partition key(partition by 
range(UDF_func(col)+x)…)

When loading 100000000 rows into the table, it will bring more performance gain.

About > 20% performance gain



Besides, I did not see noticeable performance degradation for other cases(small 
data set).



Attaching a POC patch about this improvement.

Thoughts ?



Best regards,

houzj


Attachment: 0001-skip-tuple-routing-for-constant-partition-key.patch
Description: 0001-skip-tuple-routing-for-constant-partition-key.patch

Reply via email to