srowen commented on a change in pull request #23417: [SPARK-26374][TEST][SQL]
Enable TimestampFormatter in HadoopFsRelationTest
URL: https://github.com/apache/spark/pull/23417#discussion_r244594290
##########
File path:
sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala
##########
@@ -126,61 +126,60 @@ abstract class HadoopFsRelationTest extends QueryTest
with SQLTestUtils with Tes
} else {
Seq(false)
}
- // TODO: Support new parser too, see SPARK-26374.
- withSQLConf(SQLConf.LEGACY_TIME_PARSER_ENABLED.key -> "true") {
- for (dataType <- supportedDataTypes) {
- for (parquetDictionaryEncodingEnabled <-
parquetDictionaryEncodingEnabledConfs) {
- val extraMessage = if (isParquetDataSource) {
- s" with parquet.enable.dictionary =
$parquetDictionaryEncodingEnabled"
- } else {
- ""
- }
- logInfo(s"Testing $dataType data type$extraMessage")
-
- val extraOptions = Map[String, String](
- "parquet.enable.dictionary" ->
parquetDictionaryEncodingEnabled.toString
- )
-
- withTempPath { file =>
- val path = file.getCanonicalPath
-
- val seed = System.nanoTime()
- withClue(s"Random data generated with the seed: ${seed}") {
- val dataGenerator = RandomDataGenerator.forType(
- dataType = dataType,
- nullable = true,
- new Random(seed)
- ).getOrElse {
- fail(s"Failed to create data generator for schema $dataType")
- }
-
- // Create a DF for the schema with random data. The index field
is used to sort the
- // DataFrame. This is a workaround for SPARK-10591.
- val schema = new StructType()
- .add("index", IntegerType, nullable = false)
- .add("col", dataType, nullable = true)
- val rdd =
- spark.sparkContext.parallelize((1 to 10).map(i => Row(i,
dataGenerator())))
- val df = spark.createDataFrame(rdd,
schema).orderBy("index").coalesce(1)
-
- df.write
- .mode("overwrite")
- .format(dataSourceName)
- .option("dataSchema", df.schema.json)
- .options(extraOptions)
- .save(path)
-
- val loadedDF = spark
- .read
- .format(dataSourceName)
- .option("dataSchema", df.schema.json)
- .schema(df.schema)
- .options(extraOptions)
- .load(path)
- .orderBy("index")
-
- checkAnswer(loadedDF, df)
+
+ for (dataType <- supportedDataTypes) {
+ for (parquetDictionaryEncodingEnabled <-
parquetDictionaryEncodingEnabledConfs) {
+ val extraMessage = if (isParquetDataSource) {
+ s" with parquet.enable.dictionary =
$parquetDictionaryEncodingEnabled"
+ } else {
+ ""
+ }
+ logInfo(s"Testing $dataType data type$extraMessage")
+
+ val extraOptions = Map[String, String](
+ "parquet.enable.dictionary" ->
parquetDictionaryEncodingEnabled.toString,
+ "timestampFormat" -> "yyyy-MM-dd'T'HH:mm:ss.SSSXXXXX"
Review comment:
Looks like `CSVOptions` and `JSONOptions` both use a default of ...
```
parameters.getOrElse("timestampFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSXXX")
```
Interesting, so `XXX` uses ISO 8601 style and there is no support for time
zone offset seconds in ISO 8601 so you can't specify `XXXXX`, but this is
allowed for all the formatter classes, which causes it to emit a timezone
offset with seconds.
It does seem most important to emit the correct timezone, above all; this is
an example of the corner case that needs it. But I'm not clear how it's parsed
correctly, then? what code can use this pattern to parse -- do you know?
I agree that if we start emitting time zone offsets like "-8:00:00" in the
normal case that is bad. However I tried formatting `ZonedDateTime.now()` with
this pattern (5 Xs) and it yields "2018-12-31T09:24:10.128-06:00" which is fine.
So...
1. Yes, but it seems to emit an ISO 8601 TZ in the normal case, and in the
rare pre-1582 case, better to be correct above all
2. We will remove the legacy parser anyway, and this test doesn't use it
now, right?
3. Isn't this the case we're trying to fix?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]