[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16781 Did we conduct any performance tests on this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-08 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 great! thanks @ueshin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-07 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76556/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #76556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76556/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-07 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 LGTM, pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #76556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76556/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-07 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76419/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #76419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76419/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #76419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76419/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76391/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #76391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76391/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-02 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 @ueshin sorry it took me a while to figure out how a table partitioned by timestamps work (I didn't even realize that was possible, I don't think it is in hive?) and I was traveling. The

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #76391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76391/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76141/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #76141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76141/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-25 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 @ueshin updated per your feedback. I should have explained that the last update *did* handle partition tables (it added the second call to `getStorageTzOptions` in `HiveMetastoreCatalog`),

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #76141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76141/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75966/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #75966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75966/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-19 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 @ueshin I've pushed an update which addresses your comments. I also realized that partitioned tables weren't handled correctly! I fixed that as well. --- If your project is set up for it, you

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #75966 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75966/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-18 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 @ueshin thanks for taking a look. Yes, that understanding is correct. Another way to think about it is to compare those same operations with different file formats, eg. textfile. Those work more

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-18 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 @squito I'm sorry for late reply. I was reviewing this pr again and current Hive behavior deeply. It seems the both behaviors are the same when the table property exists, but this is a

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-11 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 @ueshin thanks for taking a look earlier, sorry it has taken me some time to update this. Things to note since last time: 1) Hive has seen been updated in

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75686/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #75686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75686/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-04-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #75686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75686/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-29 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 thanks @ueshin ... I am going to chat w/ some folks involved in that hive patch, that was not my understanding conceptually of their patch. I heard that there is a bug they need to fix so maybe its

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-27 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 Btw, I noticed that if we decided to use `SQLConf.PARQUET_TABLE_INCLUDE_TIMEZONE`, the default `PARQUET_TIMEZONE_TABLE_PROPERTY` value should be `GMT` instead of session local timezone as Hive

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-27 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 I looked over this pr and I noticed that you're using the `TimeZone.getDefault()` as the "local" timezone. Since the timestamp values are represented as "the number of micros since epoch"

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-27 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 @ueshin I've updated this to remove the conf entirely. So updating your previous description, the new behavior can be described as: when creating table: * if a property

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75177/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #75177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75177/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #75177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75177/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-23 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 Ah, I see, it looks good to me for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-23 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 > Do you mean the default SQLConf.PARQUET_TABLE_INCLUDE_TIMEZONE should be false for now? If so, I agree with it. I was proposing something more drastic -- eliminating that conf

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75101/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #75101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75101/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-23 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 Do you mean the default `SQLConf.PARQUET_TABLE_INCLUDE_TIMEZONE` should be `false` for now? If so, I agree with it. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-23 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #75101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75101/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-23 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 @ueshin thanks for taking a look. Yes, I think your summary is exactly correct. Your suggestion of using the session timezone makes a lot of sense. I can update the pr accordingly. But I

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-21 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 I checked HIVE-12767 and reviewed this pr roughly. And I collected my thoughts about the desired behavior of this issue, please correct me if I'm wrong. when creating table: -

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-17 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 pinging some potential reviewers: @liancheng @yhuai for prior work in hive compatibility @ueshin for work on timezone support Note that the timezone support from SPARK-18350 is to

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74688/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #74688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74688/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #74688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74688/testReport)** for PR 16781 at commit

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74611/ Test PASSed. ---

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16781 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-03-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16781 **[Test build #74611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74611/testReport)** for PR 16781 at commit