Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/2701#discussion_r18867403
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala
---
@@ -377,24 +378,37 @@ case class ArrayType(elementType: DataType,
containsNull: Boolean) extends DataT
* @param name The name of this field.
* @param dataType The data type of this field.
* @param nullable Indicates if values of this field can be `null` values.
+ * @param metadata The metadata of this field, which is a map from string
to simple type that can be
+ * serialized to JSON automatically. The metadata should
be preserved during
+ * transformation if the content of the column is not
modified, e.g, in selection.
*/
-case class StructField(name: String, dataType: DataType, nullable:
Boolean) {
+case class StructField(
+ name: String,
+ dataType: DataType,
+ nullable: Boolean,
+ metadata: Map[String, Any] = Map.empty) {
private[sql] def buildFormattedString(prefix: String, builder:
StringBuilder): Unit = {
builder.append(s"$prefix-- $name: ${dataType.typeName} (nullable =
$nullable)\n")
DataType.buildFormattedString(dataType, s"$prefix |", builder)
}
+ override def toString: String = {
+ // Do not add metadata to be consistent with CaseClassStringParser.
--- End diff --
Yes, for newer versions, data type information written in Parquet file is
in JSON format only.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]