pan3793 commented on PR #25536: URL: https://github.com/apache/spark/pull/25536#issuecomment-1410384142
I am stuck in the SQL case. The background is, ClickHouse is extremely fast on single-wide table OLAP queries, and some data engineers want to use Spark to do heavy data preparation and save the result as a temp table into ClickHouse through pure SQL, usually, the result set is quite large, and won't do append/update/delete during the table's lifecycle. Since NULL is not good for performance, ClickHouse has quite strict restrictions on table schema, e.g. the sorting keys are not allowed to be nullable > Using Nullable almost always negatively affects performance, keep this in mind when designing your databases. https://clickhouse.com/docs/en/sql-reference/data-types/nullable In the above use case, things become simple if the connector knows and respects the nullable of DataFrame's schema on CTAS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
