cloud-fan commented on a change in pull request #27817: [SPARK-31060][SQL] 
Handle column names containing `dots` in data source `Filter`
URL: https://github.com/apache/spark/pull/27817#discussion_r389526013
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ##########
 @@ -652,10 +652,11 @@ object DataSourceStrategy {
  */
 object PushableColumn {
   def unapply(e: Expression): Option[String] = {
+    import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
     def helper(e: Expression) = e match {
       case a: Attribute => Some(a.name)
       case _ => None
     }
-    helper(e)
+    helper(e).map(quoteIfNeeded)
 
 Review comment:
   Is this really needed to support column name containing dot?
   
   And a few thoughts to roll it out more smoothly:
   1. if we refer to a top-level column, don't quote it (so no breaking change)
   2. if we refer to a nested field, use a special encoding (like json array?)
   
   If users use a v1 source with Spark 3.0, and Spark pushes down a filter with 
nested fields, then it would either fail or ignore it as it doesn't recognize 
the json array encoded column name.
   
   However, if we use dot to separate field names, then v1 source can be broken 
if a filter is pushed down pointing to a top-level column containing dot.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to