HyukjinKwon commented on a change in pull request #27817: [SPARK-31060][SQL]
Handle column names containing `dots` in data source `Filter`
URL: https://github.com/apache/spark/pull/27817#discussion_r390067601
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
##########
@@ -652,10 +652,11 @@ object DataSourceStrategy {
*/
object PushableColumn {
def unapply(e: Expression): Option[String] = {
+ import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
def helper(e: Expression) = e match {
case a: Attribute => Some(a.name)
case _ => None
}
- helper(e)
+ helper(e).map(quoteIfNeeded)
Review comment:
@dbtsai, I thought Parquet also supports the dots in column names at
https://github.com/apache/spark/pull/27780.
Also, the problem is that there are multiple external sources. Dots can be
used to express namespace-like annotations just like our SQL configurations.
JSON array might not be an option either because column names themselves can
be JSON array:
```scala
scala> sql("""select 'a' as `["a", "'b"]`""")
res1: org.apache.spark.sql.DataFrame = [["a", "'b"]: string]
```
It's unlikely but it's possible breaking change.
The only option I can think of now is to have a SQL conf that lists
datasources to support to backquote dots in column name (and don't backquote
for nested column access).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]