[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r30359968 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- FYI most systems don't support leap seconds. I'm not sure why we'd want to support them here... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r30381187 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- This mainly aims to make sure we are compatible with hive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r30381621 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- Does Hive support leap seconds? I looked into the implementation of jodd -- I don't think it supports that when doing date timestamp conversion. I could be wrong though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r30382532 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- If we are going to do this conversion ourselves, I think it is fine... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72611319 [Test build #26622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26622/consoleFull) for PR 3820 at commit [`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3820 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72724023 Thanks! Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72573565 I tried the newly uploaded parquet data in https://issues.apache.org/jira/browse/SPARK-4768 (I set my timezone to UTC), for one line, I got ``` [test row 5,2015-01-02 20:54:10.000456789] ``` But, the data was generated by ``` insert into string_timestamp (dummy,timestamp1) values('test row 5', '2015-01-02 20:54:10.123456789'); ``` Can you take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72573027 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26568/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72573025 [Test build #26568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26568/consoleFull) for PR 3820 at commit [`5152f2a`](https://github.com/apache/spark/commit/5152f2ab02a4d27be0a5e540f73aab166774fe2e). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72572992 I just rebased my code and upgrade jodd to 3.6.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r23972062 --- Diff: pom.xml --- @@ -149,6 +149,7 @@ scala.binary.version2.10/scala.binary.version jline.version${scala.version}/jline.version jline.groupidorg.scala-lang/jline.groupid +jodd.version3.5.2/jodd.version --- End diff -- Any reason not to use the latest version (3.6.3?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72567388 Dependency looks fine to me, thanks for running it by. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72572873 [Test build #26568 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26568/consoleFull) for PR 3820 at commit [`5152f2a`](https://github.com/apache/spark/commit/5152f2ab02a4d27be0a5e540f73aab166774fe2e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r23972716 --- Diff: pom.xml --- @@ -149,6 +149,7 @@ scala.binary.version2.10/scala.binary.version jline.version${scala.version}/jline.version jline.groupidorg.scala-lang/jline.groupid +jodd.version3.5.2/jodd.version --- End diff -- To keep align with Hive 0.14.0, not necessary though --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72594576 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26597/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72588533 [Test build #26583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26583/consoleFull) for PR 3820 at commit [`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class LogisticGradient(numClasses: Int) extends Gradient ` * `case class HiveScriptIOSchema (` * ` val trimed_class = serdeClassName.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72588539 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26583/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72594570 [Test build #26597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26597/consoleFull) for PR 3820 at commit [`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72582056 [Test build #26583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26583/consoleFull) for PR 3820 at commit [`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72581952 @yhuai Sorry, I got a misunderstanding on the setNanos API, it works OK now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72601905 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72588631 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72588700 [Test build #26597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26597/consoleFull) for PR 3820 at commit [`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-72602402 [Test build #26622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26622/consoleFull) for PR 3820 at commit [`b1e2a0d`](https://github.com/apache/spark/commit/b1e2a0d8b40f6651a0a2b36cdc9070e67e9d6bf3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-71797016 ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-71004690 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25963/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-71004681 [Test build #25963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25963/consoleFull) for PR 3820 at commit [`5d1eeed`](https://github.com/apache/spark/commit/5d1eeedd43d60d9ef9c5dcc0e97fff829ccea8ed). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70992950 I have fixed the bug This is quite embarrassing, I forgot to set those factors(NANOS_PER_SECOND, SECONDS_PER_MINUTE, MINUTES_PER_HOUR) to Long when divide, so it overflew... I have tested it, now it works fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70992996 [Test build #25963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25963/consoleFull) for PR 3820 at commit [`5d1eeed`](https://github.com/apache/spark/commit/5d1eeedd43d60d9ef9c5dcc0e97fff829ccea8ed). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70904105 Good to hear. Here's how I create my test data, I run this in Hive and then take the data from HDFS directly and Spark is able to read/parse the data file (with issue above): Set parquet.compression = SNAPPY; DROP TABLE testdata; CREATE TABLE testdata STORED AS PARQUET AS SELECT a.*, from_utc_timestamp('1970-01-01 08:00:00','PST') as timestamp FROM sample_07 AS a; I have looked into this a fair bit and attempted a fix. Thanks for working on this fix and let me know if I could help in any way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r23207875 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala --- @@ -294,14 +294,14 @@ class ParquetQuerySuite extends QueryTest with FunSuiteLike with BeforeAndAfterA // Check to make sure that the attributes from either side of the join have unique expression // ids. query.queryExecution.analyzed.output.filter(_.name == myint) match { - case Seq(i1, i2) if(i1.exprId == i2.exprId) = + case Seq(i1, i2) if i1.exprId == i2.exprId = --- End diff -- Just a remind, `ParquetQuerySuite` is considered deprecated, and I'm going to remove it in #4116. I should have added a comment to make this clear. Didn't remove it at first because several Parquet PRs were out there and might cause unwanted merge conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r23207993 --- Diff: docs/sql-programming-guide.md --- @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or /td /tr tr + tdcodespark.sql.parquet.int96AsTimestamp/code/td + tdtrue/td + td +Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also +store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This +flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + /td +/tr --- End diff -- I'm probably OK with this property. Just wanna ask doesn't Impala store original type information (TIMESTAMP) together in Parquet metainfo? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r23256480 --- Diff: docs/sql-programming-guide.md --- @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or /td /tr tr + tdcodespark.sql.parquet.int96AsTimestamp/code/td + tdtrue/td + td +Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also +store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This +flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + /td +/tr --- End diff -- From my digging only Parquet-format 2.2 has the TIMESTAMP and TIMESTAMP_MILLS types. Cloudera is still on 1.5.0 Hive/Impala has been writing this INT96 nano sec format that's different. --- Original Message --- From: Michael Armbrust notificati...@github.com Sent: January 20, 2015 11:25 AM To: apache/spark sp...@noreply.github.com Cc: Felix Cheung felixcheun...@hotmail.com Subject: Re: [spark] [SPARK-4987] [SQL] parquet timestamp type support (#3820) @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or /td /tr tr + tdcodespark.sql.parquet.int96AsTimestamp/code/td + tdtrue/td + td +Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also +store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This +flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + /td +/tr Yeah, I agree that it's weird though. Perhaps we should ask the parquet list why they don't support the int 96 version. On Jan 20, 2015 11:21 AM, Cheng Lian notificati...@github.com wrote: In docs/sql-programming-guide.md https://github.com/apache/spark/pull/3820#discussion-diff-23247492: @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or /td /tr tr + tdcodespark.sql.parquet.int96AsTimestamp/code/td + tdtrue/td + td +Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also +store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This +flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + /td +/tr Oh, I see the difference here. Double checked, Parquet only provides TIMESTAMP and TIMESTAMP_MILLIS â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3820/files#r23247492. --- Reply to this email directly or view it on GitHub: https://github.com/apache/spark/pull/3820/files#r23247815 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70791893 @adrian-wang Can you try the parquet file uploaded in https://issues.apache.org/jira/browse/SPARK-4768? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70789269 @felixcheung I have tested my code with HIVE 0.14.0 release. I first tried to generate data with hive but spark need metadata to read hive's data. Then I generated data using spark, then save as parquet file, copied the saved file to my hdfs to my hive warehouse dir, and in hive cli I executed create table. Then I use select to read the data, it worked fine. How did you read data from hive? Can you show me how to reproduce this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70795739 @yhuai Thanks for the link, I'm afraid the attachment contains only the last row, in which the timestamp field is null. Anyway, I have reproduced the bug @felixcheung mentioned by read my hive generated parquet records. It's strange that Hive can read this PR's output correctly but this PR cannot read Hive's output correctly... I'll look into that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r23247492 --- Diff: docs/sql-programming-guide.md --- @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or /td /tr tr + tdcodespark.sql.parquet.int96AsTimestamp/code/td + tdtrue/td + td +Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also +store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This +flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + /td +/tr --- End diff -- Oh, I see the difference here. Double checked, Parquet only provides TIMESTAMP and TIMESTAMP_MILLIS --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r23246274 --- Diff: docs/sql-programming-guide.md --- @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or /td /tr tr + tdcodespark.sql.parquet.int96AsTimestamp/code/td + tdtrue/td + td +Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also +store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This +flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + /td +/tr --- End diff -- Do they? As far as I can tell the parquet spec does not have a nanosecond precision timestamp type. On Jan 20, 2015 12:26 AM, Cheng Lian notificati...@github.com wrote: In docs/sql-programming-guide.md https://github.com/apache/spark/pull/3820#discussion-diff-23207993: @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or /td /tr tr + tdcodespark.sql.parquet.int96AsTimestamp/code/td + tdtrue/td + td +Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also +store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This +flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + /td +/tr I'm probably OK with this property. Just wanna ask doesn't Impala store original type information (TIMESTAMP) together in Parquet metainfo? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3820/files#r23207993. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r23247815 --- Diff: docs/sql-programming-guide.md --- @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or /td /tr tr + tdcodespark.sql.parquet.int96AsTimestamp/code/td + tdtrue/td + td +Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also +store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This +flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + /td +/tr --- End diff -- Yeah, I agree that it's weird though. Perhaps we should ask the parquet list why they don't support the int 96 version. On Jan 20, 2015 11:21 AM, Cheng Lian notificati...@github.com wrote: In docs/sql-programming-guide.md https://github.com/apache/spark/pull/3820#discussion-diff-23247492: @@ -581,6 +581,15 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or /td /tr tr + tdcodespark.sql.parquet.int96AsTimestamp/code/td + tdtrue/td + td +Some Parquet-producing systems, in particular Impala, store Timestamp into INT96. Spark would also +store Timestamp as INT96 because we need to avoid precision lost of the nanoseconds field. This +flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + /td +/tr Oh, I see the difference here. Double checked, Parquet only provides TIMESTAMP and TIMESTAMP_MILLIS â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3820/files#r23247492. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70450284 I've tested this PR but the result seems to be off. Parquet generated from Hive with timestamp values set by 'from_utc_timestamp('1970-01-01 08:00:00','PST')' What I see with this PR: scala t.take(10).foreach(println(_)) ... 15/01/18 22:06:41 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: file:/users/x/parquetwithtimestamp start: 0 end: 25448 length: 25448 hosts: [] requestedSchema: message root { optional binary code (UTF8); optional binary description (UTF8); optional int32 total_emp; optional int32 salary; optional int96 timestamp; } readSupportMetadata: {org.apache.spark.sql.parquet.row.metadata={type:struct,fields:[{name:code,type:string,nullable:true,metadata:{}},{name:description,type:string,nullable:true,metadata:{}},{name:total_emp,type:integer,nullable:true,metadata:{}},{name:salary,type:integer,nullable:true,metadata:{}},{name:timestamp,type:timestamp,nullable:true,metadata:{}}]}, org.apache.spark.sql.parquet.row.requested_schema={type:struct,fields:[{name:code,type:string,nullable:true,metadata:{}},{name:description,type:string,nullable:true,metadata:{}},{name:total_emp,type:integer,nullable:true,metadata:{}},{name:salary,type:integer,nullable:true,metadata:{}},{name:timestamp,type:timestamp,nullable:true,metadata:{}}]}}} 15/01/18 22:06:41 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl 15/01/18 22:06:41 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 823 records. 15/01/18 22:06:41 INFO InternalParquetRecordReader: at row 0. reading next block 15/01/18 22:06:41 INFO CodecPool: Got brand-new decompressor [.snappy] 15/01/18 22:06:41 INFO InternalParquetRecordReader: block read in memory in 27 ms. row count = 823 [00-,All Occupations,134354250,40690,1974-01-07 17:58:00.08896] [11-,Management occupations,6003930,96150,1974-01-07 17:58:00.08896] Expect: 1970-01-01 08:00:00 Actual: 1974-01-07 17:58:00.08896 Any idea? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70052102 [Test build #25598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25598/consoleFull) for PR 3820 at commit [`a309058`](https://github.com/apache/spark/commit/a309058a618feb86813b4ac0df087e5447b37485). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70053453 Thanks Yin, I have fixed that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70061688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25598/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-70061677 [Test build #25598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25598/consoleFull) for PR 3820 at commit [`a309058`](https://github.com/apache/spark/commit/a309058a618feb86813b4ac0df087e5447b37485). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ExperimentalMethods protected[sql](sqlContext: SQLContext) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-69703883 Can you add unit tests? I tried it and got ``` java.lang.RuntimeException: unable to convert datatype TimestampType in CatalystConverter ``` I think you need to update `org.apache.spark.sql.parquet.CatalystConverter#createConverter`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22773648 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- I have read the documents of joda-time. The toJulianDayNumber API only valid since 2.2, but what we use in spark is 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22762553 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- Okay, I talked to @pwendell and we think it would be better to use Joda time if possible since spark already depends on that library in other subprojects. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22743372 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- Okay, you are right that its a bad idea to do this by hand. Are there any dependencies that Spark SQL already has that could be used instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-69150651 [Test build #25211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25211/consoleFull) for PR 3820 at commit [`8526a33`](https://github.com/apache/spark/commit/8526a33d254fc4181c37b7e4b7976d1d48b8eaa7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22640517 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- Writing it by hand may be dangerous, because of leap seconds [https://en.wikipedia.org/wiki/Leap_second], as the specific date could be inaccurate. And by check the pom[http://repo1.maven.org/maven2/org/jodd/jodd-core/3.5.2/jodd-core-3.5.2.pom] of jodd-core, there's no additional dependence, so the influence is comparatively small. Use jodd to covert also make everything consistent with hive, so we can be 100% compatible with those data generated by hive. So I'd prefer keep this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-69151953 The [NanoTime in Hive-0.14](https://github.com/apache/hive/blob/f64d707e4d59aafbcf7e9b82ed199f6781d97946/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTime.java) is a little bit different from NanoTime in parquet-examples, and here are some [related discussions](https://issues.apache.org/jira/browse/HIVE-7263?focusedCommentId=14038502page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14038502). So I just rewrite `NanoTime` in hive into scala, instead of using NanoTime from parquet-examples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-69160063 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25211/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-69160056 [Test build #25211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25211/consoleFull) for PR 3820 at commit [`8526a33`](https://github.com/apache/spark/commit/8526a33d254fc4181c37b7e4b7976d1d48b8eaa7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-69003778 [Test build #25161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25161/consoleFull) for PR 3820 at commit [`5cb8f97`](https://github.com/apache/spark/commit/5cb8f97da9317e5106a0f6fb4214a902f952a1ae). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-69013582 [Test build #25161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25161/consoleFull) for PR 3820 at commit [`5cb8f97`](https://github.com/apache/spark/commit/5cb8f97da9317e5106a0f6fb4214a902f952a1ae). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-69013592 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25161/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22636915 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -138,7 +139,13 @@ private[sql] trait SQLConf { * When set to true, we always treat byte arrays in Parquet files as strings. */ private[spark] def isParquetBinaryAsString: Boolean = -getConf(PARQUET_BINARY_AS_STRING, false).toBoolean +getConf(PARQUET_BINARY_AS_STRING, true).toBoolean --- End diff -- Oh, this is a mistake... I'll modify it in next commit. Actually what I meant to change is the following value. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22636339 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -138,7 +139,13 @@ private[sql] trait SQLConf { * When set to true, we always treat byte arrays in Parquet files as strings. */ private[spark] def isParquetBinaryAsString: Boolean = -getConf(PARQUET_BINARY_AS_STRING, false).toBoolean +getConf(PARQUET_BINARY_AS_STRING, true).toBoolean --- End diff -- why this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22624670 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- I'm pretty hesitant to add a dependency here as they are very high cost for a project as big as Spark. Is there any way to do this without adding this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22630330 --- Diff: sql/core/pom.xml --- @@ -69,6 +69,11 @@ version2.3.0/version /dependency dependency + groupIdorg.jodd/groupId + artifactIdjodd-core/artifactId + version${jodd.version}/version +/dependency --- End diff -- We can also convert to/from Julian by ourselves... I'll draft it, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68847248 [Test build #25094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25094/consoleFull) for PR 3820 at commit [`d4dbc8a`](https://github.com/apache/spark/commit/d4dbc8a36dee0c708a275d6c23865e5ae9dc8bb4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68847251 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25094/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68839674 [Test build #25094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25094/consoleFull) for PR 3820 at commit [`d4dbc8a`](https://github.com/apache/spark/commit/d4dbc8a36dee0c708a275d6c23865e5ae9dc8bb4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68839779 Oh sorry, I just checked Impala's configuration and I think it is not what it is here. I'll change my code to conform to that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68834477 Thanks for doing this, I've been getting a ton of requests for this feature! Can you also add this to the sql programming guide? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22511350 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -141,6 +142,12 @@ private[sql] trait SQLConf { getConf(PARQUET_BINARY_AS_STRING, false).toBoolean /** + * When set to true, we always treat INT96Values in Parquet files as timestamp. + */ + private[spark] def isParquetINT96AsTimestamp: Boolean = +getConf(PARQUET_INT96_AS_TIMESTAMP, false).toBoolean --- End diff -- We don't really use INT96 for anything else (and I don't think other systems do either?) so maybe this should be true by default? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68624719 [Test build #25027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25027/consoleFull) for PR 3820 at commit [`44d3ab1`](https://github.com/apache/spark/commit/44d3ab1f5bf1bea69a0e49921c0e3a295a387d67). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68626346 [Test build #25027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25027/consoleFull) for PR 3820 at commit [`44d3ab1`](https://github.com/apache/spark/commit/44d3ab1f5bf1bea69a0e49921c0e3a295a387d67). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68626348 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25027/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68346947 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68347060 [Test build #24889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24889/consoleFull) for PR 3820 at commit [`dc6eaba`](https://github.com/apache/spark/commit/dc6eaba7db957eb9038532c7c57282c040e870d4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68351142 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24889/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68351139 [Test build #24889 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24889/consoleFull) for PR 3820 at commit [`dc6eaba`](https://github.com/apache/spark/commit/dc6eaba7db957eb9038532c7c57282c040e870d4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22360301 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala --- @@ -84,7 +86,8 @@ private[parquet] class RowReadSupport extends ReadSupport[Row] with Logging { // TODO: Why it can be null? if (schema == null) { log.debug(falling back to Parquet read schema) - schema = ParquetTypesConverter.convertToAttributes(parquetSchema, false) + schema = ParquetTypesConverter.convertToAttributes( +parquetSchema, new SQLContext(new SparkContext)) --- End diff -- I don't think its safe to instantiate a SparkContext here as thats a pretty expensive operations and will throw exceptions if there is more than one in a single JVM. We can try to refactor this in the future, but I'd just pass two options here (using named parameters for booleans). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68241728 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24855/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68241724 [Test build #24855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24855/consoleFull) for PR 3820 at commit [`d44831a`](https://github.com/apache/spark/commit/d44831a2462b2c049b0222fbb7b8e08023d1f67c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68333255 [Test build #24884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24884/consoleFull) for PR 3820 at commit [`dc6eaba`](https://github.com/apache/spark/commit/dc6eaba7db957eb9038532c7c57282c040e870d4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/3820#discussion_r22339701 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala --- @@ -84,7 +86,8 @@ private[parquet] class RowReadSupport extends ReadSupport[Row] with Logging { // TODO: Why it can be null? if (schema == null) { log.debug(falling back to Parquet read schema) - schema = ParquetTypesConverter.convertToAttributes(parquetSchema, false) + schema = ParquetTypesConverter.convertToAttributes( +parquetSchema, new SQLContext(new SparkContext)) --- End diff -- The only thing used here inside this SQLContext is the `isParquetBinaryAsString` and `isParquetINT96AsTimestamp`. I'll add a comment here if necessary, to point this out clearly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68335232 [Test build #24884 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24884/consoleFull) for PR 3820 at commit [`dc6eaba`](https://github.com/apache/spark/commit/dc6eaba7db957eb9038532c7c57282c040e870d4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68335236 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24884/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/3820 [SPARK-4987] [SQL] parquet timestamp type support You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark parquettimestamp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3820.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3820 commit d44831a2462b2c049b0222fbb7b8e08023d1f67c Author: Daoyuan Wang daoyuan.w...@intel.com Date: 2014-12-29T07:41:13Z parquet timestamp type support --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3820#issuecomment-68238365 [Test build #24855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24855/consoleFull) for PR 3820 at commit [`d44831a`](https://github.com/apache/spark/commit/d44831a2462b2c049b0222fbb7b8e08023d1f67c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org