BOOTMGR commented on code in PR #49678:
URL: https://github.com/apache/spark/pull/49678#discussion_r1966485874
##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -2721,6 +2721,25 @@ class DataFrameSuite extends QueryTest
parameters = Map("name" -> ".whatever")
)
}
+
+ test("SPARK-50994: RDD conversion is performed with execution context") {
+ withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") {
Review Comment:
@cloud-fan I took a close look at
https://github.com/apache/spark/pull/48325 and I see that It takes stab at a
bigger problem: `SQLConf` are not propagated when actual execution of RDD
happens (when iterator is called) because that is triggered on-demand by user.
This PR only ensures that when RDD is computed, It gets correct `SQLConf` but
not during iterator traversal.
I followed conversation there and I agree with you that all `SQLConf`
accesses should have been done during RDD computation (by storing configs
locally) but not when iterator is called. I also agree with @bersprockets 's
view that fixing it everywhere would be troublesome and there is not guarantee
for future additions. I believe that change needs some bigger considerations
like how we see interoperability between Dataset and RDD. I am ready to
volunteer there.
However, I feel this change should ship independently because
1. We need to have correct configs set when RDD computation happens. This is
needed regardless of https://github.com/apache/spark/pull/48325 . We can wait
for it later.
2. We need to have tracking on Spark UI for stages submitted during RDD
computation. For example, Snowflake's official spark connector internally
converts DF to RDD for serialising it into CSV format. Due to this, none of the
dependent stages are show on Spark UI.
Let me know what you think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]