Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/19943#discussion_r159627680
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
---
@@ -110,4 +107,22 @@ object OrcUtils extends Logging {
}
}
}
+
+ /**
+ * Return a fixed ORC schema with data schema information, if needed.
+ * The schema inside old ORC files might consist of invalid column names
like '_col0'.
+ */
+ def getFixedTypeDescription(
+ schema: TypeDescription,
+ dataSchema: StructType): TypeDescription = {
+ if (schema.getFieldNames.asScala.forall(_.startsWith("_col"))) {
+ var schemaString = schema.toString
+ dataSchema.zipWithIndex.foreach { case (field: StructField, index:
Int) =>
--- End diff --
Yep. I added the condition into `if`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]