[GitHub] [spark] pan3793 edited a comment on pull request #35696: [SPARK-38361][SQL] Factory method getConnection should take Partition as optional parameter.

GitBox Tue, 01 Mar 2022 20:58:36 -0800


pan3793 edited a comment on pull request #35696:
URL: https://github.com/apache/spark/pull/35696#issuecomment-1056221700



   @srowen Let me give some background how clickhouse shard works.
   
   The concept `distributed table` in clickhouse is something like "remote 
view", which is a logical union of `local table`s from all cluster nodes. 
Generally, all of nods has the same `distributed table`.
   
   When SQL `select * from distribute_table` summit to one clickhouse node, it 
will collect recrods from all nodes and send back to JDBC client. Pass 
partition infomation to JDBC Driver, then the driver can leverage it to 
determine which node(shard) has the best data locality, it can significant 
reduce the network traffic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pan3793 edited a comment on pull request #35696: [SPARK-38361][SQL] Factory method getConnection should take Partition as optional parameter.

Reply via email to