[GitHub] [spark] dbtsai opened a new pull request #27817: [SPARK-31060] Handle column names containing `dots` in data source `Filter`

GitBox Thu, 05 Mar 2020 15:45:28 -0800

dbtsai opened a new pull request #27817: [SPARK-31060] Handle column names 
containing `dots` in data source `Filter`
URL: https://github.com/apache/spark/pull/27817
 
 
   ### What changes were proposed in this pull request?
   In data source `Filter`, currently, if a column name contains `dots`, it is 
not quoted. This causes couple issues.
   
   1. Hard to extend the `Filter` to support nested column predicate pushdown 
as many data sources such as Parquet and ORC are using `dots` as separators for 
nested columns. This can be addressed if we quote the name containing `dots` 
properly in this PR.
   
   2. Because of the above issues,  we are handling the quoting in data source 
implementations before we convert the predicates into specific implementation 
for a particular data source. We should handle them in data source filter to 
make it consistently.
   
   ### Why are the changes needed?
   To handle column names containing `dots` more consistently. 
   
   ### Does this PR introduce any user-facing change?
   No.
   
   ### How was this patch tested?
   Existing UTs and new UTs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dbtsai opened a new pull request #27817: [SPARK-31060] Handle column names containing `dots` in data source `Filter`

Reply via email to