[GitHub] [spark] cloud-fan commented on a change in pull request #33030: [SPARK-35700][SQL][FOLLOWUP] Read schema from ORC files should strip CHAR/VARCHAR types

GitBox Tue, 22 Jun 2021 11:26:17 -0700


cloud-fan commented on a change in pull request #33030:
URL: https://github.com/apache/spark/pull/33030#discussion_r656478363




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
##########
@@ -142,7 +142,7 @@ private[sql] object OrcFilters extends OrcFiltersBase {
     case BooleanType => PredicateLeaf.Type.BOOLEAN
     case ByteType | ShortType | IntegerType | LongType => 
PredicateLeaf.Type.LONG
     case FloatType | DoubleType => PredicateLeaf.Type.FLOAT
-    case StringType | _: CharType | _: VarcharType => PredicateLeaf.Type.STRING
+    case StringType => PredicateLeaf.Type.STRING

Review comment:
       The test added by @yaooqinn should be good enough. AFAIK ORC is the only 
popular file format that supports CHAR/VARCHAR natively, and the only file 
format in Spark that we use a parser to convert its schema to catalyst schema.
   
   For long-term maintainability, I think we should follow 
`ParquetToSparkSchemaConverter` and write a true schema converter for ORC, 
instead of using the parser.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #33030: [SPARK-35700][SQL][FOLLOWUP] Read schema from ORC files should strip CHAR/VARCHAR types

Reply via email to