dbtsai commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and ORC predicate pushdown in nested fields URL: https://github.com/apache/spark/pull/27155#issuecomment-606881584 In this PR, you also use `dots` to create the source filter api. This doesn't handle column name containing `dots` by quoting it properly. As we have proper parser to parse mutipart identifier that is proven and used everywhere, it's much more easy to use `dots` in source filter apis. The implementation of each data source can be different. I choose to use key as a string containing `dots` in parquet for simplicity. But you can always do the schema stuff. ``` private def translateLeafNodeFilter(predicate: Expression): Option[Filter] = { // Recursively try to find an attribute name from the top level that can be pushed down. def attrName(e: Expression): Option[String] = e match { case a: Attribute if a.dataType != StructType => Some(a.name) case s: GetStructField if s.childSchema(s.ordinal).dataType != StructType => attrName(s.child).map(_ + s".${s.childSchema(s.ordinal).name}") case _ => None } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org