[GitHub] MaxGekk commented on a change in pull request #23417: [SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest

GitBox Mon, 31 Dec 2018 14:09:28 -0800

MaxGekk commented on a change in pull request #23417: [SPARK-26374][TEST][SQL] 
Enable TimestampFormatter in HadoopFsRelationTest
URL: https://github.com/apache/spark/pull/23417#discussion_r244620744


 ##########
 File path: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala
 ##########
 @@ -126,61 +126,60 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils with Tes
     } else {
       Seq(false)
     }
-    // TODO: Support new parser too, see SPARK-26374.
-    withSQLConf(SQLConf.LEGACY_TIME_PARSER_ENABLED.key -> "true") {
-      for (dataType <- supportedDataTypes) {
-        for (parquetDictionaryEncodingEnabled <- 
parquetDictionaryEncodingEnabledConfs) {
-          val extraMessage = if (isParquetDataSource) {
-            s" with parquet.enable.dictionary = 
$parquetDictionaryEncodingEnabled"
-          } else {
-            ""
-          }
-          logInfo(s"Testing $dataType data type$extraMessage")
-
-          val extraOptions = Map[String, String](
-            "parquet.enable.dictionary" -> 
parquetDictionaryEncodingEnabled.toString
-          )
-
-          withTempPath { file =>
-            val path = file.getCanonicalPath
-
-            val seed = System.nanoTime()
-            withClue(s"Random data generated with the seed: ${seed}") {
-              val dataGenerator = RandomDataGenerator.forType(
-                dataType = dataType,
-                nullable = true,
-                new Random(seed)
-              ).getOrElse {
-                fail(s"Failed to create data generator for schema $dataType")
-              }
-
-              // Create a DF for the schema with random data. The index field 
is used to sort the
-              // DataFrame.  This is a workaround for SPARK-10591.
-              val schema = new StructType()
-                .add("index", IntegerType, nullable = false)
-                .add("col", dataType, nullable = true)
-              val rdd =
-                spark.sparkContext.parallelize((1 to 10).map(i => Row(i, 
dataGenerator())))
-              val df = spark.createDataFrame(rdd, 
schema).orderBy("index").coalesce(1)
-
-              df.write
-                .mode("overwrite")
-                .format(dataSourceName)
-                .option("dataSchema", df.schema.json)
-                .options(extraOptions)
-                .save(path)
-
-              val loadedDF = spark
-                .read
-                .format(dataSourceName)
-                .option("dataSchema", df.schema.json)
-                .schema(df.schema)
-                .options(extraOptions)
-                .load(path)
-                .orderBy("index")
-
-              checkAnswer(loadedDF, df)
+
+    for (dataType <- supportedDataTypes) {
+      for (parquetDictionaryEncodingEnabled <- 
parquetDictionaryEncodingEnabledConfs) {
+        val extraMessage = if (isParquetDataSource) {
+          s" with parquet.enable.dictionary = 
$parquetDictionaryEncodingEnabled"
+        } else {
+          ""
+        }
+        logInfo(s"Testing $dataType data type$extraMessage")
+
+        val extraOptions = Map[String, String](
+          "parquet.enable.dictionary" -> 
parquetDictionaryEncodingEnabled.toString,
+          "timestampFormat" -> "yyyy-MM-dd'T'HH:mm:ss.SSSXXXXX"
 
 Review comment:
   > what code can use this pattern to parse -- do you know?
   > I'm still kinda curious where the parsing logic is and how it could use 
this format...
   
   I guess it is inside of `DateTimeFormatterBuilder.java` in the method 
`parse`:
   ```java
               char sign = text.charAt(position);  // IOOBE if invalid position
               if (sign == '+' || sign == '-') {
                   // starts
                   int negative = (sign == '-' ? -1 : 1);
                   int[] array = new int[4];
                   array[0] = position + 1;
                   if ((parseNumber(array, 1, text, true) ||
                           parseNumber(array, 2, text, type >=3) ||
                           parseNumber(array, 3, text, false)) == false) {
                       // success
                       long offsetSecs = negative * (array[1] * 3600L + 
array[2] * 60L + array[3]);
                       return context.setParsedField(OFFSET_SECONDS, 
offsetSecs, position, array[0]);
                   }
               }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] MaxGekk commented on a change in pull request #23417: [SPARK-26374][TEST][SQL] Enable TimestampFormatter in HadoopFsRelationTest

Reply via email to