[GitHub] [spark] pan3793 commented on pull request #25536: [SPARK-28837][SQL] CTAS/RTAS should use nullable schema

via GitHub Tue, 31 Jan 2023 05:45:04 -0800


pan3793 commented on PR #25536:
URL: https://github.com/apache/spark/pull/25536#issuecomment-1410384142


   I am stuck in the SQL case. 
   
   The background is, ClickHouse is extremely fast on single-wide table OLAP 
queries, and some data engineers want to use Spark to do heavy data preparation 
and save the result as a temp table into ClickHouse through pure SQL, usually, 
the result set is quite large, and won't do append/update/delete during the 
table's lifecycle.
   
   Since NULL is not good for performance, ClickHouse has quite strict 
restrictions on table schema, e.g. the sorting keys are not allowed to be 
nullable
   
   > Using Nullable almost always negatively affects performance, keep this in 
mind when designing your databases.
   
   https://clickhouse.com/docs/en/sql-reference/data-types/nullable
   
   In the above use case, things become simple if the connector knows and 
respects the nullable of DataFrame's schema on CTAS.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pan3793 commented on pull request #25536: [SPARK-28837][SQL] CTAS/RTAS should use nullable schema

Reply via email to