cloud-fan commented on a change in pull request #33030:
URL: https://github.com/apache/spark/pull/33030#discussion_r656478363



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
##########
@@ -142,7 +142,7 @@ private[sql] object OrcFilters extends OrcFiltersBase {
     case BooleanType => PredicateLeaf.Type.BOOLEAN
     case ByteType | ShortType | IntegerType | LongType => 
PredicateLeaf.Type.LONG
     case FloatType | DoubleType => PredicateLeaf.Type.FLOAT
-    case StringType | _: CharType | _: VarcharType => PredicateLeaf.Type.STRING
+    case StringType => PredicateLeaf.Type.STRING

Review comment:
       The test added by @yaooqinn should be good enough. AFAIK ORC is the only 
popular file format that supports CHAR/VARCHAR natively, and the only file 
format in Spark that we use a parser to convert its schema to catalyst schema.
   
   For long-term maintainability, I think we should follow 
`ParquetToSparkSchemaConverter` and write a true schema converter for ORC, 
instead of using the parser.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to