[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

GitBox Wed, 03 Jun 2020 06:47:02 -0700


MaxGekk commented on a change in pull request #28705:
URL: https://github.com/apache/spark/pull/28705#discussion_r434578570




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##########
@@ -37,30 +37,45 @@ object HiveResult {
    * Returns the result as a hive compatible sequence of strings. This is used 
in tests and
    * `SparkSQLDriver` for CLI applications.
    */
-  def hiveResultString(executedPlan: SparkPlan): Seq[String] = executedPlan 
match {
-    case ExecutedCommandExec(_: DescribeCommandBase) =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    case _: DescribeTableExec =>
-      formatDescribeTableOutput(executedPlan.executeCollectPublic())
-    // SHOW TABLES in Hive only output table names while our v1 command outputs
-    // database, table name, isTemp.
-    case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended 
=>
-      command.executeCollect().map(_.getString(1))
-    // SHOW TABLES in Hive only output table names while our v2 command outputs
-    // namespace and table name.
-    case command : ShowTablesExec =>
-      command.executeCollect().map(_.getString(1))
-    // SHOW VIEWS in Hive only outputs view names while our v1 command outputs
-    // namespace, viewName, and isTemporary.
-    case command @ ExecutedCommandExec(_: ShowViewsCommand) =>
-      command.executeCollect().map(_.getString(1))
-    case other =>
-      val result: Seq[Seq[Any]] = 
other.executeCollectPublic().map(_.toSeq).toSeq
-      // We need the types so we can output struct field names
-      val types = executedPlan.output.map(_.dataType)
-      // Reformat to match hive tab delimited output.
-      result.map(_.zip(types).map(e => toHiveString(e)))

Review comment:
       - The date literal '2020-06-03' (and make_date(2020, 6, 3)) is converted 
to the number of days since the epoch '1970-01-01'. The result is 18416, and it 
doesn't depend on time zone. You get the same via Java 8 API:
   ```scala
   scala> println(LocalDate.of(2020, 6, 3).toEpochDay)
   18416
   ```
   The number is stored as date value internally in Spark.
   
   - To print it out, we should collect it and convert to string. The following 
steps are for Java 8 OFF:
   1. The days are converted to `java.sql.Date` by `toJavaDate()` which is 
called from 
https://github.com/apache/spark/blob/b917a6593dc969b9b766259eb8cbbd6e90e0dc53/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L306-L309
 .
     1. `toJavaDate()` has to create an instance of java.sql.Date from 
milliseconds since the epoch 1970-01-01 00:00:00Z in UTC time zone. It converts 
the days 18416 to milliseconds via 18416 * 86400000 and gets 1591142400000.
     2. 1591142400000 is interpreted as local milliseconds in the JVM time zone 
`Europe/Moscow` which has wall clock offset of 10800000 millis or 3 hours. So, 
1591142400000 is shifted by 10800000 to get "UTC timestamp". The result is 
1591131600000 which is:
         - `2020-06-02T21:00:00` in UTC
         - `2020-06-03T00:00:00` in Europe/Moscow
         - `2020-06-02T14:00:00` in America/Los_Angeles
   
   2. new Date(1591131600000) is collected and formatted in `toHiveString` by 
using the legacy date formatter. Currently, the legacy date formatter ignores 
Spark session time zone `America/Los_Angeles` and uses JVM time zone 
`Europe/Moscow`. In this way, it converts `new Date(1591131600000)` = 
`2020-06-03T00:00:00` in Europe/Moscow to `2020-06-03`. Looks fine but after 
this PR https://github.com/apache/spark/pull/28709, it takes 
`America/Los_Angeles` and performs the conversion `2020-06-02T14:00:00 
America/Los_Angeles` to `2020-06-02`
   
   So, the problem is in `toJavaDate()` which still uses the default JVM time 
zone.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk commented on a change in pull request #28705: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

Reply via email to