Re: [PR] [SPARK-48229][SQL] Add collation support for inputFile expressions [spark]

via GitHub Sun, 12 May 2024 23:20:49 -0700


uros-db commented on code in PR #46503:
URL: https://github.com/apache/spark/pull/46503#discussion_r1597930015



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/inputFileBlock.scala:
##########
@@ -39,7 +40,7 @@ case class InputFileName() extends LeafExpression with 
Nondeterministic {
 
   override def nullable: Boolean = false
 
-  override def dataType: DataType = StringType
+  override def dataType: DataType = SQLConf.get.defaultStringType

Review Comment:
   I agree it's dangerous, but there appears to be no obvious way around it. 
Here's how we're battling that for now:
   
   1. manually adding collation support and testing collation-awareness for 
each function (we've already covered around 50% of Spark expressions)
   2. working on RQG tests for collation, which should hopefully catch anything 
we missed along the way (this is also in progress within the collation effort)
   
   
   And here's two scenarios of what happens if we unfortunately don't implement 
this for an expression:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48229][SQL] Add collation support for inputFile expressions [spark]

Reply via email to