dbtsai commented on issue #27155: [SPARK-17636][SPARK-25557][SQL] Parquet and 
ORC predicate pushdown in nested fields
URL: https://github.com/apache/spark/pull/27155#issuecomment-606881584
 
 
   In this PR, you also use `dots` to create the source filter api. This 
doesn't handle column name containing `dots` by quoting it properly. As we have 
proper parser to parse mutipart identifier that is proven and used everywhere, 
it's much more easy to use `dots` in source filter apis.
   
   The implementation of each data source can be different. I choose to use key 
as a string containing `dots` in parquet for simplicity. But you can always do 
the schema stuff.
   
   ```
     private def translateLeafNodeFilter(predicate: Expression): Option[Filter] 
= {
       // Recursively try to find an attribute name from the top level that can 
be pushed down.
       def attrName(e: Expression): Option[String] = e match {
         case a: Attribute if a.dataType != StructType =>
           Some(a.name)
         case s: GetStructField if s.childSchema(s.ordinal).dataType != 
StructType =>
           attrName(s.child).map(_ + s".${s.childSchema(s.ordinal).name}")
         case _ =>
           None
       }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to