HyukjinKwon commented on a change in pull request #27817: [SPARK-31060][SQL] 
Handle column names containing `dots` in data source `Filter`
URL: https://github.com/apache/spark/pull/27817#discussion_r390067601
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ##########
 @@ -652,10 +652,11 @@ object DataSourceStrategy {
  */
 object PushableColumn {
   def unapply(e: Expression): Option[String] = {
+    import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
     def helper(e: Expression) = e match {
       case a: Attribute => Some(a.name)
       case _ => None
     }
-    helper(e)
+    helper(e).map(quoteIfNeeded)
 
 Review comment:
   @dbtsai, I thought Parquet also supports the dots in column names at 
https://github.com/apache/spark/pull/27780.
   Also, the problem is that there are multiple external sources. Dots can be 
used to express namespace-like annotations just like our SQL configurations.
   
   JSON array might not be an option either because column names themselves can 
be JSON array:
   
   ```scala
   scala> sql("""select 'a' as `["a", "'b"]`""")
   res1: org.apache.spark.sql.DataFrame = [["a", "'b"]: string]
   ```
   
   It's unlikely but it's possible breaking change.
   
   The only option I can think of now is to have a SQL conf that lists 
datasources to support to backquote dots in column name (and don't backquote 
for nested column access).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to