Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/20929#discussion_r186584474
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/TypePlaceholder.scala ---
@@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.types
+
+/**
+ * An internal type that is a not yet available and will be replaced by an
actual type later.
+ */
+case object TypePlaceholder extends StringType
--- End diff --
In the first attempt, I used the new type instead of `NullType` because
some `Sink`s (`FileStreamSink`) could not handle `NullType`;
```
// parquet
java.lang.RuntimeException: Unsupported data type NullType.
at scala.sys.package$.error(package.scala:27)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.org$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$makeWriter(ParquetWriteSupport.scala:206)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
// orc
java.lang.IllegalArgumentException: Can't parse category at
'struct<c0:bigint,c1:null^,c2:array<null>>'
at
org.apache.orc.TypeDescription.parseCategory(TypeDescription.java:223)
at
org.apache.orc.TypeDescription.parseType(TypeDescription.java:332)
at
org.apache.orc.TypeDescription.parseStruct(TypeDescription.java:327)
at
org.apache.orc.TypeDescription.parseType(TypeDescription.java:385)
at
org.apache.orc.TypeDescription.fromString(TypeDescription.java:406)
// csv
java.lang.UnsupportedOperationException: CSV data source does not support
null data type.
at
org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:130)
at
org.apache.spark.sql.execution.datasources.csv.CSVUtils$$anonfun$verifySchema$1.apply(CSVUtils.scala:134)
at
org.apache.spark.sql.execution.datasources.csv.CSVUtils$$anonfun$verifySchema$1.apply(CSVUtils.scala:134)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
```
So, in the previous fix, I tried to add `PlaceholderType` inherited from
`StringType` and this type could be correctly handled in all the `Sink`, but
too tricky.
In the suggested, `NullType, ArrayType(NullType), etc should be dropped`
means that we need to handle an inferred schema as follows? e.g.,
```
Inferred schema: "StructType<IntegerType, NullType, ArrayType(NullType)>"
-> Schema used in FileStreamSource: "StructType<IntegerType>"
```
Is this right?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]